Published:
April 9, 2024
Last Modified:
May 8, 2024
VPN Detection
Exit Node Enumeration
NordVPN
ExpressVPN
Machine Learning
Systematic VPN Detection
This blog post will introduce various methods of systematically detecting VPN IP addresses from providers
such as
NordVPN or ExpressVPN.
You can inspect examples of VPN IP addresses from some of the most widely used VPN
providers by clicking on the button below.
VPN Exit Node Examples
Example Data
At the time of
writing (8th May 2024), the following VPN IP Addresses for NordVPN and
IPVanish were obtained through Exit Node Enumeration:
Too long; Didn't read
VPN detection is the process of finding out whether an arbitrary IP address belongs to a VPN service and
to which VPN provider it belongs (Such as IPVanish or HideMyAss). It is important to understand that the true
IP
address of the VPN user cannot be revealed.
There are many good reasons why websites or apps want to detect users with a VPN
enabled:
- VPN users bypass geo-restricted content by connecting to an VPN in a different country
- VPN users committing cybercrime cannot be easily identified - the VPN is protecting the bad guys as
well!
- A VPN service allows a single entity to obtain access to hundreds of different IP addresses from
different countries and thus
to
bypass rate limits- Therefore, VPN usage is also associated with bot attacks and
automated threats
Introduction
A VPN, short for Virtual Private Network, is a network application that routes the end user's
Internet traffic
over a server owned by the VPN service provider. VPN's for end users are mostly used for the following
reasons:
- Security - Encrypting traffic on the network layer (compared to the application
layer
as it is the
case with
HTTPS / TLS)
- Anonymity - Gaining anonymity when browsing the web (Since the user's traffic is
routed through the VPN server,
the real user's IP address is not exposed to the websites visited.)
- Changing IP Geolocation - Changing the geographic IP location of the user by
connecting
to a VPN server in a different country
Those features that VPN's bring along can be very attractive to its users, since it allows them
to evade geographic
restrictions by websites, IP bans or to remain anonymous when browsing the web.
For example, BBC iPlayer is only intended to be used for
users
in the UK. Therefore, BBC disallows any user that is not located in the UK.
By using a VPN server located in the UK, it is possible to bypass this geo-restriction and use the BBC iPlayer from anywhere in
the world.
Since BBC likely knows
that their geo-restriction is bypassed by some cunning VPN users, they have a justified reason to
disallow
VPN users from accessing their streaming service. This
explains why there is a legitimate interest in knowing whether an Internet user is hiding behind a VPN
server or not.
Another reason why some websites don't allow VPN traffic: Streaming sites such as Netflix have different
pricing schemes in different countries across the globe. Furthermore, the media content on Netflix is
also
tailored to a certain geographic region. Therefore, companies such as Netflix have an incentive to
disallow
users from using a
VPN to bypass their restrictions.
VPN Detection Model
The detection model defines the exact detection goal and the detection environment.
It is important to define the VPN detection model accurately in order to later conclude the
effectiveness
of the proposed
detection methods.
For this blog article, the goal is to detect the largest commercial VPN providers such as the examples
In this research project, it is not a goal to detect any kind of VPN connection. This is also not
useful,
since many VPN setups are not used for the purpose of bypassing some kind of restriction.
For example: If an organization is using an VPN in their company network, it is not
the goal of this blog article to detect VPN traffic originating from such a company network, since such
VPN setups server a
different purpose.
Furthermore, what exactly means detection?
For the sake of this research, detection refers to gaining the knowledge that a certain public IP
address
belonged to a VPN service at a certain point in time. It is important to understand that detected IP
addresses become stale after some time (maybe a week?) and that VPN detection is a process that needs to
be constantly repeated.
Last but not least, detection does not mean that
the true client IP address
can be revealed. This is generally not possible, except for some rare cases where there is a leakage of
the true IP
address via WebRTC (JavaScript) or if users forget to enable the VPN.
VPN Detection Methods
There are many different VPN detection methods. They all have their advantages and disadvantages.
Unfortunately, there is not a perfect VPN detection method. VPN detection is a complicated and error
prone
endeavour and requires significant skill to implement in a scalable fashion.
In order to rate the various VPN detection methods, there is a need to commonly evaluate
each detection method. Hence,
the
following evaluation criteria are suggested:
Proposed Evaluation Criteria for VPN Detection Methods
Scalability
How good does the VPN detection method scale? Is it only useful for detecting a specific VPN provider?
Or
is the detection method provider-agnostic and can be used to detect any kind of VPN service?
Cost
How expensive is this VPN detection method to implement? How much time is required to implement it?
Staleness
How often does the VPN detection method need to be repeated? How long does it take until the data
becomes
stale?
Accuracy
How many false positives are expected for this VPN detection method? The more false positives the
detection
method produces, the worse it is. False positive are extremely bad and need to be
avoided at all cost.
False negatives
don't cause nearly as much problems as false positives, since it is always better to not block a
malicious
user than to block an innocent user.
Requires Traffic Visibility
This criteria specifies whether the detection method is limited to live traffic under
the
detection method's control or if the
detection method can possibly detect traffic that it has no control over.
Simple Example: A VPN detection method that requires JavaScript execution works
only on live traffic.
Method 1 - VPN Exit Node Enumeration
The idea behind VPN Exit Node Enumeration is very simple: A paid
plan from each
VPN provider is purchased. Then, the VPN client application of each provider
is installed, a connection to each region of each provider is made and the VPN exit node IP address is
stored. Most VPN
providers
allow their users to connect to all
their available regions, but some of the providers might throttle the amount of reconnections per time
span.
Put differently, this detection method works by enumerating all available regions, connecting to each
region and storing the VPN IP address by using tools such as ipapi.is to
get the public IP address.
The hardest part of VPN exit node enumeration is to automate the process of rotating the VPN region
automatically. Most commercial providers such as NordVPN or ExpressVPN do not make it easy to rotate the region
automatically. Nevertheless, automation is certainly possibly for every provider, it just takes a bit
longer.
The biggest advantage of this VPN detection method is the certainty that each obtained IP address
actually belongs to the VPN provider. False positives are virtually impossible.
Furthermore, this detection method is the only one that allows to reliably distinguish to which VPN
provider the IP address actually belongs, since it is known at all times which service is used. It is
possible that some VPN providers share exit nodes.
A downside of this detection method is the uncertainty whether the VPN application gives you access to
all
available regions. In order to be sure, one would need to create several different paid subscriptions
from
different identities and regions and
verify
if different subscriptions lead to different VPN regions available.
The following python3
script implements VPN Exit Node Enumeration for
NordVPN and
ExpressVPN. The logic
of the script is very simple. First, a connection to a certain region is made, the IP address is
obtained
and then
a connection to the next region is made. The script can be periodically invoked via a
cronjob
.
import subprocess
import os
import json
import time
import sys
"""
Cronjob for NordVPN:
0 1,5,9,13,17,21 * * * /usr/bin/python3 /root/vpn/vpn_enumeration.py nordvpn
Cronjob for ExpressVPN:
0 0,4,8,12,16,20 * * * /usr/bin/python3 /root/vpn/vpn_enumeration.py expressvpn
"""
apiKey = '' # get the API key from https://ipapi.is/
# nordvpn countries
nordvpnCountries = ["Albania","Germany","Netherlands","Algeria","Ghana","New_Zealand","Andorra","Greece","Nigeria","Argentina","Greenland","North_Macedonia","Armenia","Guam","Norway","Australia","Guatemala","Pakistan","Austria","Honduras","Panama","Azerbaijan","Hong_Kong","Papua_New_Guinea","Bahamas","Hungary","Paraguay","Bangladesh","Iceland","Peru","Belgium","India","Philippines","Belize","Indonesia","Poland","Bermuda","Ireland","Portugal","Bhutan","Isle_Of_Man","Puerto_Rico","Bolivia","Israel","Romania","Bosnia_And_Herzegovina","Italy","Serbia","Brazil","Jamaica","Singapore","Brunei_Darussalam","Japan","Slovakia","Bulgaria","Jersey","Slovenia","Cambodia","Kazakhstan","South_Africa","Canada","Kenya","South_Korea","Cayman_Islands","Lao_People'S_Democratic_Republic", "Spain", "Chile","Latvia","Sri_Lanka","Colombia","Lebanon","Sweden","Costa_Rica","Liechtenstein","Switzerland","Croatia","Lithuania","Taiwan","Cyprus","Luxembourg","Thailand","Czech_Republic","Malaysia","Trinidad_And_Tobago","Denmark","Malta","Turkey","Dominican_Republic","Mexico","Ukraine","Ecuador","Moldova","United_Arab_Emirates","Egypt","Monaco","United_Kingdom","El_Salvador","Mongolia","United_States","Estonia","Montenegro","Uruguay","Finland","Morocco","Uzbekistan","France","Myanmar","Venezuela","Georgia","Nepal","Vietnam"]
# expressvpn regions
expressvpnShort = ['ausy', 'aume', 'auwo', 'aubr', 'aupe', 'auad', 'defr1', 'denu', 'defr3', 'usny', 'uswd', 'usnj1', 'usnj3', 'usat', 'usmi', 'usmi2', 'usda', 'usch', 'uslp', 'usal', 'tr', 'nlam', 'nlro', 'nlth', 'ukdo', 'ukel', 'uklo', 'ukmi', 'ch', 'pl', 'at', 'cz', 'esma', 'frst', 'frma', 'frpa2', 'itco', 'itna', 'hr', 'dk', 'se2', 'se']
expressvpnRegions = ["Argentina","Bahamas","Bermuda","Bolivia","Brazil","Canada","Canada - Montreal","Canada - Toronto","Canada - Toronto - 2","Canada - Vancouver","Cayman Islands","Chile","Colombia","Costa Rica","Cuba","Dominican Republic","Ecuador","Guatemala","Honduras","Jamaica","Mexico","Panama","Peru","Puerto Rico","Trinidad and Tobago","United States","USA - Albuquerque","USA - Atlanta","USA - Chicago","USA - Dallas","USA - Dallas - 2","USA - Denver","USA - Lincoln Park","USA - Los Angeles - 1","USA - Los Angeles - 2","USA - Los Angeles - 3","USA - Los Angeles - 5","USA - Miami","USA - Miami - 2","USA - New Jersey - 1","USA - New Jersey - 2","USA - New Jersey - 3","USA - New York","USA - Phoenix","USA - Salt Lake City","USA - San Francisco","USA - Santa Monica","USA - Seattle","USA - Tampa - 1","USA - Washington DC","Uruguay","Venezuela","Albania","Andorra","Armenia","Austria","Belarus","Belgium","Bosnia & Herzegovina","Bulgaria","Croatia","Cyprus","Czech Republic","Denmark","Estonia","Finland","France","France - Alsace","France - Marseille","France - Paris - 1","France - Paris - 2","France - Strasbourg","Georgia","Germany","Germany - Frankfurt - 1","Germany - Frankfurt - 3","Greece","Hungary","Iceland","Ireland","Isle of Man","Italy","Italy - Milan","Jersey","Latvia","Liechtenstein","Lithuania","Luxembourg","Malta","Moldova","Monaco","Montenegro","Netherlands","Netherlands - Amsterdam","Netherlands - Rotterdam","Netherlands - The Hague","Norway","Poland","Portugal","Romania","Serbia","Slovakia","Slovenia","Spain","Spain - Barcelona","Spain - Barcelona - 2","Spain - Madrid","Sweden","Switzerland","Turkey","Ukraine","United Kingdom","UK - Docklands","UK - East London","UK - London","UK - Wembley","Australia","Australia - Adelaide","Australia - Brisbane","Australia - Melbourne","Australia - Perth","Australia - Sydney","Australia - Sydney - 2","Australia - Woolloomooloo","Bangladesh","Bhutan","Brunei Darussalam","Cambodia","Guam","Hong Kong","India","Indonesia","Japan","Japan - Shibuya","Japan - Tokyo","Japan - Tokyo - 2","Japan - Yokohama","Kazakhstan","Laos","Macau","Malaysia","Mongolia","Myanmar","Nepal","New Zealand","Pakistan","Philippines","Singapore","South Korea","Sri Lanka","Taiwan","Thailand","Uzbekistan","Vietnam","Algeria","Egypt","Ghana","Israel","Kenya","Lebanon","Morocco","South Africa"]
if len(sys.argv) < 2:
print('Usage: /root/vpn/vpn_enumeration.py nordvpn|expressvpn')
exit()
whichvpn = sys.argv[1]
if whichvpn != 'nordvpn' and whichvpn != 'expressvpn':
whichvpn = 'nordvpn'
fileName = ''
binary = ''
sleepTime = 2
errorTimeout = 61
connectionFailureTimeout = 91
stopAfter = int(sys.argv[2]) if len(sys.argv) > 2 else None
if whichvpn == 'nordvpn':
allRegions = nordvpnCountries
binary = '/usr/bin/nordvpn'
fileName = '/root/vpn/data/nord.json'
sleepTime = 7
elif whichvpn == 'expressvpn':
allRegions = expressvpnRegions + expressvpnShort
binary = '/usr/bin/expressvpn'
fileName = '/root/vpn/data/express.json'
def log(msg):
print(f'[{whichvpn}] - {time.asctime()} - {msg}')
with open(f'/root/vpn/data/{whichvpn}.log', 'a') as logfile:
logfile.write(msg + '\n')
data = dict()
try:
data = json.load(open(fileName, 'r'))
log(f'Loaded {len(data)} results from file {fileName}')
except Exception as err:
print(err)
log(f'Could not open {fileName}, starting from scratch...')
num = 0
for region in allRegions:
try:
connectCommand = f'{binary} connect "{region}"'
log(connectCommand)
connectionAttempt = subprocess.check_output(connectCommand, shell=True).decode()
if 'The VPN connection has failed.' in connectionAttempt:
log('connection failed, waiting for a bit...')
time.sleep(connectionFailureTimeout)
continue
time.sleep(sleepTime)
output = None
for tries in range(5):
try:
output = subprocess.check_output(f'curl --silent "https://api.ipapi.is?&key={apiKey}"', shell=True)
if output:
break
except Exception as terr:
print('timeout error', terr)
time.sleep(4)
if output is None:
log(f'no internet connection on region {region}')
os.system(f'{binary} disconnect')
continue
ip = json.loads(output)['ip']
log(f'Harvested IP {ip} for region {region}')
if region in data:
data[region].append([ip, time.time()])
else:
data[region] = [[ip, time.time()], ]
os.system(f'{binary} disconnect')
with open(fileName, 'w') as fd:
json.dump(data, fd, indent=2, sort_keys=True)
time.sleep(1)
num += 1
if stopAfter is not None:
if num >= stopAfter:
os.system(f'{binary} disconnect')
break
except Exception as err:
log(str(err))
os.system(f'{binary} disconnect')
time.sleep(errorTimeout)
Verdict for VPN Exit Node Enumeration
VPN Exit Node Enumeration is a simple and effective detection method and the various
evaluation criteria
rank as follows:
Category |
Verdict |
Scalability |
The method has to be
repeated for
each VPN provider and doesn't scale well. Furthermore, connecting and reconnecting to different
regions
takes some time and might be rate limited. |
Cost |
Pricing for subscription plans for
commercial VPN
providers are rather low. |
Staleness |
Data gathered this way is stale very
quickly, therefore Exit Node Enumeration is a continuous process. |
Accuracy |
VPN IP addresses gathered that way
is very accurate. |
Needs Traffic Visibility |
There is no requirement
for traffic visibility. Meaning: This method does not require any real world traffic to work. |
The VPN Exit Node Enumeration method is a slow and time consuming way to detect VPN IP
addresses. It
furthermore
doesn't scale well. However, the detection accuracy and the fact that it allows to detect VPN exit nodes
without having access to real world traffic make it a
very attractive VPN detection method.
Method 2 - Deep Packet Inspection
The idea behind this detection method is that VPN traffic fundamentally looks different in terms of
network metadata found in IP or TCP/UPD headers compared to non-VPN traffic.
For example, an
article
about "Proxy / VPN Detection" from ipleak.com claims
that MTU / MSS can reveal the use of certain VPN protocols:
When your device connects to the server directly, the MTU is set to a standard value (for example, 1500
for Ethernet or 1480 for PPTP).
When connecting through a Proxy/VPN via protocols like PPTP, L2TP (±
Ipsec), or IPsec IKE, original packets are placed inside other packets (encapsulated), resulting in
increased packet size. To prevent excessive packet fragmentation and maintain good data transfer speed,
the operating system lowers the MTU setting at the network interface (e. g., to 1400 for IPsec). [1]
The following
stack
overflow question also suggests that MTU and especially MSS in TCP headers is leaking
OpenVPN usage:
OpenVPN is a VPN software suite that is more specifically detected because it works in a different way
than other VPN software.
OpenVPN decreases MSS instead of interface MTU. By decreasing MSS like this,
it
is possible to further detect the encryption being used because a certain combination of protocol, block
size (bs), MAC and compression will generate specific MSS sizes.
[2]
In order to find a reliable signal using deep packet inspection, one needs to conduct the following
experiment:
- Generate some reproducible traffic and record it using
tcpdump
on a server
- Recreate the same traffic, but this time the packets are routed over an VPN, and record it using
tcpdump
on a server
- Create a diff of the two
tcpdump
captures. Remove all the diffs that are due to
protocol
randomness (SYN or ACK header fields for example). If there is a difference remaining that
cannot be explained by normal IP/TCP/UDP protocol behavior, it might indicate a potential signal to
discriminate VPN traffic from non VPN traffic.
Deep Packet Inspection might be an interesting method, but there are two obvious
downsides:
- This method only works on live traffic
- If certain MSS values are indicative of OpenVPN usage, there is nothing preventing VPN services to
change those revealing MSS values to something else. In other words: Detecting via MSS/MTU is not
reliable!
Verdict for Deep Packet Inspection Method
Category |
Verdict |
Scalability |
Deep packet inspection can be implemented usually quite efficient. However, only VPNs from live
traffic
can be detected. |
Cost |
If done with hundreds of millions of connections, cost can become a factor. The true cost is to
obtain a sizeable chunk of the Internet's traffic. |
Staleness |
Data becomes stale after the connection ends (immediately). |
Accuracy |
Probably very low, since VPN servers can change and adapt the IP/UDP/TCP header values that are
correlated
with VNP protocols. |
Needs Traffic Visibility |
Yes |
Method 3 - Flow Based Detection using Machine Learning
In recent years, a whole line of research emerged that focused on network flow statistics to detect VPN
traffic. The
basic
idea is to take flow statistics from TCP / UDP streams and train a neural network with flow data from
VPN
traffic and non-VPN
flows so it can distinguish VPN traffic from non-VPN traffic.
A flow is essentially an ordered list of packet meta data that arrives on a network interface such as
the
one depicted below. The flow example below illustrates the direction from each packet (+ or -), the
relative
packet
arrival time
starting with 0.0, the packet size and the TCP flags of the packet.
tcp 110.15.12.112:11414 -> 151.80.174.139:443
+ 0.0 64 SYN 57
- 0.07 60 SYN ACK 64
+ 19.12 52 ACK 57
+ 19.54 598 ACK PUSH 57
- 19.56 52 ACK 64
- 19.83 2852 ACK 64
- 19.85 1348 ACK PUSH 64
- 21.14 405 ACK PUSH 64
+ 39.87 52 ACK 57
+ 40.08 52 ACK 57
+ 41.04 145 ACK PUSH 57
+ 41.32 682 ACK PUSH 57
- 41.47 342 ACK PUSH 64
Recent publications that focus on network flow statistics to detect VPN traffic are:
What is the basic idea behind this line of research?
The first step is to extract statistical features from VPN
traffic and non-VPN traffic:
The next step is to collect a large corpus of those network flow captures from VPN and non-VPN traffic
and
to
train a neural network with labelled data. The resulting trained neural network will be able to detect
VPN
traffic with a
high accuracy and recall.
What is the essence of using time related features from network flows to detect VPN connections?
A VPN server is an artificial hop between client and server. If client C wants to communicate
with server S, the communication C <==> S will always be faster than the same communication over an VPN
server C <==> VPN <==> S.
The presence of this artificial hop over which network packets are routed can be systematically captured
by training a
neural network with enough labelled flow data.
Verdict for Flow Based Detection using Machine Learning
Category |
Verdict |
Scalability |
Only VPNs from live
traffic
can be detected. |
Cost |
Obtaining correctly labelled data representative of the Internet's traffic is very costly. ML
training is fast, running the model on live traffic is costly again. |
Staleness |
Data becomes stale after the connection ends (immediately). |
Accuracy |
According to recent research, accuracy is higher than 95% [1]
|
Needs Traffic Visibility |
Yes |
Flow Based Detection requires an expensive training phase. Neural network models only
reliably work if
the training data was collected from a wide diversity of servers and there is enough VPN traffic from
different VPN protocols to capture a sizable population of the majority of VPN protocols in use.
Furthermore, after training, this detection method can only detects VPN traffic in live traffic. There
might also be false positives.
Method 4 - JavaScript based Detection Methods
There are many interesting things you can do with JavaScript. For example, if the browser timezone is
different to the IP timezone of an client, it might be indicative of VPN usage. Another idea would be to
scan the
internal network with JavaScript for common VPN ports. WebRTC leaks might work for VPN's that are TCP
based, but most VPN protocols use UDP, so this method is not applicable in all cases.
A JavaScript library that attempts to locate the user is LocateJS. A VPN connection can be detected if the
different
methods of interpolating the location lead to different results.
JavaScript based Detection Methods suffer from the same common problem: It only works
for live
traffic and does not scale well.
Overall Conclusion
Without having access to a sizable chunk of the Internet's traffic, the only VPN detection method that
has high accuracy and leads to usable results is the VPN Exit Node Enumeration method.
This
method
come with the advantage that there are
no false positives and that VPN exit node enumeration can be automated.