Systematically detecting VPN Services
Published: April 9, 2024
Last Modified: May 8, 2024
VPN Detection Exit Node Enumeration NordVPN ExpressVPN Machine Learning

Systematic VPN Detection

This blog post will introduce various methods of systematically detecting VPN IP addresses from providers such as NordVPN or ExpressVPN.

You can inspect examples of VPN IP addresses from some of the most widely used VPN providers by clicking on the button below.

VPN Exit Node Examples

Example Data

At the time of writing (8th May 2024), the following VPN IP Addresses for NordVPN and IPVanish were obtained through Exit Node Enumeration:

Too long; Didn't read

VPN detection is the process of finding out whether an arbitrary IP address belongs to a VPN service and to which VPN provider it belongs (Such as IPVanish or HideMyAss). It is important to understand that the true IP address of the VPN user cannot be revealed.

There are many good reasons why websites or apps want to detect users with a VPN enabled:

  • VPN users bypass geo-restricted content by connecting to an VPN in a different country
  • VPN users committing cybercrime cannot be easily identified - the VPN is protecting the bad guys as well!
  • A VPN service allows a single entity to obtain access to hundreds of different IP addresses from different countries and thus to bypass rate limits- Therefore, VPN usage is also associated with bot attacks and automated threats

Introduction

A VPN, short for Virtual Private Network, is a network application that routes the end user's Internet traffic over a server owned by the VPN service provider. VPN's for end users are mostly used for the following reasons:

  • Security - Encrypting traffic on the network layer (compared to the application layer as it is the case with HTTPS / TLS)
  • Anonymity - Gaining anonymity when browsing the web (Since the user's traffic is routed through the VPN server, the real user's IP address is not exposed to the websites visited.)
  • Changing IP Geolocation - Changing the geographic IP location of the user by connecting to a VPN server in a different country

Those features that VPN's bring along can be very attractive to its users, since it allows them to evade geographic restrictions by websites, IP bans or to remain anonymous when browsing the web.

For example, BBC iPlayer is only intended to be used for users in the UK. Therefore, BBC disallows any user that is not located in the UK. By using a VPN server located in the UK, it is possible to bypass this geo-restriction and use the BBC iPlayer from anywhere in the world.

Since BBC likely knows that their geo-restriction is bypassed by some cunning VPN users, they have a justified reason to disallow VPN users from accessing their streaming service. This explains why there is a legitimate interest in knowing whether an Internet user is hiding behind a VPN server or not.

Another reason why some websites don't allow VPN traffic: Streaming sites such as Netflix have different pricing schemes in different countries across the globe. Furthermore, the media content on Netflix is also tailored to a certain geographic region. Therefore, companies such as Netflix have an incentive to disallow users from using a VPN to bypass their restrictions.

VPN Detection Model

The detection model defines the exact detection goal and the detection environment. It is important to define the VPN detection model accurately in order to later conclude the effectiveness of the proposed detection methods.

For this blog article, the goal is to detect the largest commercial VPN providers such as the examples

In this research project, it is not a goal to detect any kind of VPN connection. This is also not useful, since many VPN setups are not used for the purpose of bypassing some kind of restriction.

For example: If an organization is using an VPN in their company network, it is not the goal of this blog article to detect VPN traffic originating from such a company network, since such VPN setups server a different purpose.

Furthermore, what exactly means detection?

For the sake of this research, detection refers to gaining the knowledge that a certain public IP address belonged to a VPN service at a certain point in time. It is important to understand that detected IP addresses become stale after some time (maybe a week?) and that VPN detection is a process that needs to be constantly repeated.

Last but not least, detection does not mean that the true client IP address can be revealed. This is generally not possible, except for some rare cases where there is a leakage of the true IP address via WebRTC (JavaScript) or if users forget to enable the VPN.

VPN Detection Methods

There are many different VPN detection methods. They all have their advantages and disadvantages. Unfortunately, there is not a perfect VPN detection method. VPN detection is a complicated and error prone endeavour and requires significant skill to implement in a scalable fashion.

In order to rate the various VPN detection methods, there is a need to commonly evaluate each detection method. Hence, the following evaluation criteria are suggested:

Proposed Evaluation Criteria for VPN Detection Methods

Scalability

How good does the VPN detection method scale? Is it only useful for detecting a specific VPN provider? Or is the detection method provider-agnostic and can be used to detect any kind of VPN service?

Cost

How expensive is this VPN detection method to implement? How much time is required to implement it?

Staleness

How often does the VPN detection method need to be repeated? How long does it take until the data becomes stale?

Accuracy

How many false positives are expected for this VPN detection method? The more false positives the detection method produces, the worse it is. False positive are extremely bad and need to be avoided at all cost.

False negatives don't cause nearly as much problems as false positives, since it is always better to not block a malicious user than to block an innocent user.

Requires Traffic Visibility

This criteria specifies whether the detection method is limited to live traffic under the detection method's control or if the detection method can possibly detect traffic that it has no control over.

Simple Example: A VPN detection method that requires JavaScript execution works only on live traffic.

Method 1 - VPN Exit Node Enumeration

The idea behind VPN Exit Node Enumeration is very simple: A paid plan from each VPN provider is purchased. Then, the VPN client application of each provider is installed, a connection to each region of each provider is made and the VPN exit node IP address is stored. Most VPN providers allow their users to connect to all their available regions, but some of the providers might throttle the amount of reconnections per time span.

Put differently, this detection method works by enumerating all available regions, connecting to each region and storing the VPN IP address by using tools such as ipapi.is to get the public IP address.

The hardest part of VPN exit node enumeration is to automate the process of rotating the VPN region automatically. Most commercial providers such as NordVPN or ExpressVPN do not make it easy to rotate the region automatically. Nevertheless, automation is certainly possibly for every provider, it just takes a bit longer.

The biggest advantage of this VPN detection method is the certainty that each obtained IP address actually belongs to the VPN provider. False positives are virtually impossible.

Furthermore, this detection method is the only one that allows to reliably distinguish to which VPN provider the IP address actually belongs, since it is known at all times which service is used. It is possible that some VPN providers share exit nodes.

A downside of this detection method is the uncertainty whether the VPN application gives you access to all available regions. In order to be sure, one would need to create several different paid subscriptions from different identities and regions and verify if different subscriptions lead to different VPN regions available.

The following python3 script implements VPN Exit Node Enumeration for NordVPN and ExpressVPN. The logic of the script is very simple. First, a connection to a certain region is made, the IP address is obtained and then a connection to the next region is made. The script can be periodically invoked via a cronjob.

import subprocess
import os
import json
import time
import sys

"""
Cronjob for NordVPN:

0 1,5,9,13,17,21 * * * /usr/bin/python3 /root/vpn/vpn_enumeration.py nordvpn

Cronjob for ExpressVPN:

0 0,4,8,12,16,20 * * * /usr/bin/python3 /root/vpn/vpn_enumeration.py expressvpn
"""

apiKey = '' # get the API key from https://ipapi.is/

# nordvpn countries
nordvpnCountries = ["Albania","Germany","Netherlands","Algeria","Ghana","New_Zealand","Andorra","Greece","Nigeria","Argentina","Greenland","North_Macedonia","Armenia","Guam","Norway","Australia","Guatemala","Pakistan","Austria","Honduras","Panama","Azerbaijan","Hong_Kong","Papua_New_Guinea","Bahamas","Hungary","Paraguay","Bangladesh","Iceland","Peru","Belgium","India","Philippines","Belize","Indonesia","Poland","Bermuda","Ireland","Portugal","Bhutan","Isle_Of_Man","Puerto_Rico","Bolivia","Israel","Romania","Bosnia_And_Herzegovina","Italy","Serbia","Brazil","Jamaica","Singapore","Brunei_Darussalam","Japan","Slovakia","Bulgaria","Jersey","Slovenia","Cambodia","Kazakhstan","South_Africa","Canada","Kenya","South_Korea","Cayman_Islands","Lao_People'S_Democratic_Republic", "Spain", "Chile","Latvia","Sri_Lanka","Colombia","Lebanon","Sweden","Costa_Rica","Liechtenstein","Switzerland","Croatia","Lithuania","Taiwan","Cyprus","Luxembourg","Thailand","Czech_Republic","Malaysia","Trinidad_And_Tobago","Denmark","Malta","Turkey","Dominican_Republic","Mexico","Ukraine","Ecuador","Moldova","United_Arab_Emirates","Egypt","Monaco","United_Kingdom","El_Salvador","Mongolia","United_States","Estonia","Montenegro","Uruguay","Finland","Morocco","Uzbekistan","France","Myanmar","Venezuela","Georgia","Nepal","Vietnam"]

# expressvpn regions
expressvpnShort = ['ausy', 'aume', 'auwo', 'aubr', 'aupe', 'auad', 'defr1', 'denu', 'defr3', 'usny', 'uswd', 'usnj1', 'usnj3', 'usat', 'usmi', 'usmi2', 'usda', 'usch', 'uslp', 'usal', 'tr', 'nlam', 'nlro', 'nlth', 'ukdo', 'ukel', 'uklo', 'ukmi', 'ch', 'pl', 'at', 'cz', 'esma', 'frst', 'frma', 'frpa2', 'itco', 'itna', 'hr', 'dk', 'se2', 'se']
expressvpnRegions = ["Argentina","Bahamas","Bermuda","Bolivia","Brazil","Canada","Canada - Montreal","Canada - Toronto","Canada - Toronto - 2","Canada - Vancouver","Cayman Islands","Chile","Colombia","Costa Rica","Cuba","Dominican Republic","Ecuador","Guatemala","Honduras","Jamaica","Mexico","Panama","Peru","Puerto Rico","Trinidad and Tobago","United States","USA - Albuquerque","USA - Atlanta","USA - Chicago","USA - Dallas","USA - Dallas - 2","USA - Denver","USA - Lincoln Park","USA - Los Angeles - 1","USA - Los Angeles - 2","USA - Los Angeles - 3","USA - Los Angeles - 5","USA - Miami","USA - Miami - 2","USA - New Jersey - 1","USA - New Jersey - 2","USA - New Jersey - 3","USA - New York","USA - Phoenix","USA - Salt Lake City","USA - San Francisco","USA - Santa Monica","USA - Seattle","USA - Tampa - 1","USA - Washington DC","Uruguay","Venezuela","Albania","Andorra","Armenia","Austria","Belarus","Belgium","Bosnia & Herzegovina","Bulgaria","Croatia","Cyprus","Czech Republic","Denmark","Estonia","Finland","France","France - Alsace","France - Marseille","France - Paris - 1","France - Paris - 2","France - Strasbourg","Georgia","Germany","Germany - Frankfurt - 1","Germany - Frankfurt - 3","Greece","Hungary","Iceland","Ireland","Isle of Man","Italy","Italy - Milan","Jersey","Latvia","Liechtenstein","Lithuania","Luxembourg","Malta","Moldova","Monaco","Montenegro","Netherlands","Netherlands - Amsterdam","Netherlands - Rotterdam","Netherlands - The Hague","Norway","Poland","Portugal","Romania","Serbia","Slovakia","Slovenia","Spain","Spain - Barcelona","Spain - Barcelona - 2","Spain - Madrid","Sweden","Switzerland","Turkey","Ukraine","United Kingdom","UK - Docklands","UK - East London","UK - London","UK - Wembley","Australia","Australia - Adelaide","Australia - Brisbane","Australia - Melbourne","Australia - Perth","Australia - Sydney","Australia - Sydney - 2","Australia - Woolloomooloo","Bangladesh","Bhutan","Brunei Darussalam","Cambodia","Guam","Hong Kong","India","Indonesia","Japan","Japan - Shibuya","Japan - Tokyo","Japan - Tokyo - 2","Japan - Yokohama","Kazakhstan","Laos","Macau","Malaysia","Mongolia","Myanmar","Nepal","New Zealand","Pakistan","Philippines","Singapore","South Korea","Sri Lanka","Taiwan","Thailand","Uzbekistan","Vietnam","Algeria","Egypt","Ghana","Israel","Kenya","Lebanon","Morocco","South Africa"]

if len(sys.argv) < 2:
  print('Usage: /root/vpn/vpn_enumeration.py nordvpn|expressvpn')
  exit()

whichvpn = sys.argv[1]
if whichvpn != 'nordvpn' and whichvpn != 'expressvpn':
  whichvpn = 'nordvpn'

fileName = ''
binary = ''
sleepTime = 2
errorTimeout = 61
connectionFailureTimeout = 91
stopAfter = int(sys.argv[2]) if len(sys.argv) > 2 else None

if whichvpn == 'nordvpn':
  allRegions = nordvpnCountries
  binary = '/usr/bin/nordvpn'
  fileName = '/root/vpn/data/nord.json'
  sleepTime = 7
elif whichvpn == 'expressvpn':
  allRegions = expressvpnRegions + expressvpnShort
  binary = '/usr/bin/expressvpn'
  fileName = '/root/vpn/data/express.json'

def log(msg):
  print(f'[{whichvpn}] - {time.asctime()} - {msg}')
  with open(f'/root/vpn/data/{whichvpn}.log', 'a') as logfile:
    logfile.write(msg + '\n')

data = dict()

try:
  data = json.load(open(fileName, 'r'))
  log(f'Loaded {len(data)} results from file {fileName}')
except Exception as err:
  print(err)
  log(f'Could not open {fileName}, starting from scratch...')

num = 0
for region in allRegions:
  try:
    connectCommand = f'{binary} connect "{region}"'
    log(connectCommand)

    connectionAttempt = subprocess.check_output(connectCommand, shell=True).decode()
    if 'The VPN connection has failed.' in connectionAttempt:
      log('connection failed, waiting for a bit...')
      time.sleep(connectionFailureTimeout)
      continue

    time.sleep(sleepTime)
    output = None

    for tries in range(5):
      try:
        output = subprocess.check_output(f'curl --silent "https://api.ipapi.is?&key={apiKey}"', shell=True)
        if output:
          break
      except Exception as terr:
        print('timeout error', terr)
        time.sleep(4)

    if output is None:
      log(f'no internet connection on region {region}')
      os.system(f'{binary} disconnect')
      continue
    
    ip = json.loads(output)['ip']
    log(f'Harvested IP {ip} for region {region}')
    if region in data:
      data[region].append([ip, time.time()])
    else:
      data[region] = [[ip, time.time()], ]

    os.system(f'{binary} disconnect')

    with open(fileName, 'w') as fd:
      json.dump(data, fd, indent=2, sort_keys=True)

    time.sleep(1)
    num += 1
    if stopAfter is not None:
      if num >= stopAfter:
        os.system(f'{binary} disconnect')
        break
  except Exception as err:
    log(str(err))
    os.system(f'{binary} disconnect')
    time.sleep(errorTimeout)

Verdict for VPN Exit Node Enumeration

VPN Exit Node Enumeration is a simple and effective detection method and the various evaluation criteria rank as follows:

Category Verdict
Scalability The method has to be repeated for each VPN provider and doesn't scale well. Furthermore, connecting and reconnecting to different regions takes some time and might be rate limited.
Cost Pricing for subscription plans for commercial VPN providers are rather low.
Staleness Data gathered this way is stale very quickly, therefore Exit Node Enumeration is a continuous process.
Accuracy VPN IP addresses gathered that way is very accurate.
Needs Traffic Visibility There is no requirement for traffic visibility. Meaning: This method does not require any real world traffic to work.

The VPN Exit Node Enumeration method is a slow and time consuming way to detect VPN IP addresses. It furthermore doesn't scale well. However, the detection accuracy and the fact that it allows to detect VPN exit nodes without having access to real world traffic make it a very attractive VPN detection method.

Method 2 - Deep Packet Inspection

The idea behind this detection method is that VPN traffic fundamentally looks different in terms of network metadata found in IP or TCP/UPD headers compared to non-VPN traffic.

For example, an article about "Proxy / VPN Detection" from ipleak.com claims that MTU / MSS can reveal the use of certain VPN protocols:

When your device connects to the server directly, the MTU is set to a standard value (for example, 1500 for Ethernet or 1480 for PPTP).
When connecting through a Proxy/VPN via protocols like PPTP, L2TP (± Ipsec), or IPsec IKE, original packets are placed inside other packets (encapsulated), resulting in increased packet size. To prevent excessive packet fragmentation and maintain good data transfer speed, the operating system lowers the MTU setting at the network interface (e. g., to 1400 for IPsec). [1]
The following stack overflow question also suggests that MTU and especially MSS in TCP headers is leaking OpenVPN usage:
OpenVPN is a VPN software suite that is more specifically detected because it works in a different way than other VPN software.
OpenVPN decreases MSS instead of interface MTU. By decreasing MSS like this, it is possible to further detect the encryption being used because a certain combination of protocol, block size (bs), MAC and compression will generate specific MSS sizes. [2]

In order to find a reliable signal using deep packet inspection, one needs to conduct the following experiment:

  1. Generate some reproducible traffic and record it using tcpdump on a server
  2. Recreate the same traffic, but this time the packets are routed over an VPN, and record it using tcpdump on a server
  3. Create a diff of the two tcpdump captures. Remove all the diffs that are due to protocol randomness (SYN or ACK header fields for example). If there is a difference remaining that cannot be explained by normal IP/TCP/UDP protocol behavior, it might indicate a potential signal to discriminate VPN traffic from non VPN traffic.

Deep Packet Inspection might be an interesting method, but there are two obvious downsides:

  • This method only works on live traffic
  • If certain MSS values are indicative of OpenVPN usage, there is nothing preventing VPN services to change those revealing MSS values to something else. In other words: Detecting via MSS/MTU is not reliable!

Verdict for Deep Packet Inspection Method

Category Verdict
Scalability Deep packet inspection can be implemented usually quite efficient. However, only VPNs from live traffic can be detected.
Cost If done with hundreds of millions of connections, cost can become a factor. The true cost is to obtain a sizeable chunk of the Internet's traffic.
Staleness Data becomes stale after the connection ends (immediately).
Accuracy Probably very low, since VPN servers can change and adapt the IP/UDP/TCP header values that are correlated with VNP protocols.
Needs Traffic Visibility Yes

Method 3 - Flow Based Detection using Machine Learning

In recent years, a whole line of research emerged that focused on network flow statistics to detect VPN traffic. The basic idea is to take flow statistics from TCP / UDP streams and train a neural network with flow data from VPN traffic and non-VPN flows so it can distinguish VPN traffic from non-VPN traffic.

A flow is essentially an ordered list of packet meta data that arrives on a network interface such as the one depicted below. The flow example below illustrates the direction from each packet (+ or -), the relative packet arrival time starting with 0.0, the packet size and the TCP flags of the packet.

tcp 110.15.12.112:11414 -> 151.80.174.139:443
  + 0.0 64 SYN 57
  - 0.07 60 SYN ACK 64
  + 19.12 52 ACK 57
  + 19.54 598 ACK PUSH 57
  - 19.56 52 ACK 64
  - 19.83 2852 ACK 64
  - 19.85 1348 ACK PUSH 64
  - 21.14 405 ACK PUSH 64
  + 39.87 52 ACK 57
  + 40.08 52 ACK 57
  + 41.04 145 ACK PUSH 57
  + 41.32 682 ACK PUSH 57
  - 41.47 342 ACK PUSH 64

Recent publications that focus on network flow statistics to detect VPN traffic are:

What is the basic idea behind this line of research?

The first step is to extract statistical features from VPN traffic and non-VPN traffic:

Time-related Features Time related features extracted from network flows in order to create a neural network to distinguish VPN traffic from non-VPN traffic [3]

The next step is to collect a large corpus of those network flow captures from VPN and non-VPN traffic and to train a neural network with labelled data. The resulting trained neural network will be able to detect VPN traffic with a high accuracy and recall.

What is the essence of using time related features from network flows to detect VPN connections?

A VPN server is an artificial hop between client and server. If client C wants to communicate with server S, the communication C <==> S will always be faster than the same communication over an VPN server C <==> VPN <==> S.

The presence of this artificial hop over which network packets are routed can be systematically captured by training a neural network with enough labelled flow data.

Verdict for Flow Based Detection using Machine Learning

Category Verdict
Scalability Only VPNs from live traffic can be detected.
Cost Obtaining correctly labelled data representative of the Internet's traffic is very costly. ML training is fast, running the model on live traffic is costly again.
Staleness Data becomes stale after the connection ends (immediately).
Accuracy According to recent research, accuracy is higher than 95% [1]
Needs Traffic Visibility Yes

Flow Based Detection requires an expensive training phase. Neural network models only reliably work if the training data was collected from a wide diversity of servers and there is enough VPN traffic from different VPN protocols to capture a sizable population of the majority of VPN protocols in use.

Furthermore, after training, this detection method can only detects VPN traffic in live traffic. There might also be false positives.

Method 4 - JavaScript based Detection Methods

There are many interesting things you can do with JavaScript. For example, if the browser timezone is different to the IP timezone of an client, it might be indicative of VPN usage. Another idea would be to scan the internal network with JavaScript for common VPN ports. WebRTC leaks might work for VPN's that are TCP based, but most VPN protocols use UDP, so this method is not applicable in all cases.

A JavaScript library that attempts to locate the user is LocateJS. A VPN connection can be detected if the different methods of interpolating the location lead to different results.

JavaScript based Detection Methods suffer from the same common problem: It only works for live traffic and does not scale well.

Overall Conclusion

Without having access to a sizable chunk of the Internet's traffic, the only VPN detection method that has high accuracy and leads to usable results is the VPN Exit Node Enumeration method. This method come with the advantage that there are no false positives and that VPN exit node enumeration can be automated.