Detecting VPN Services
Published: July 8, 2024
Last Modified: July 17, 2024
Service Comparison Data Quality ipapi.is ipinfo.io ipdata.co ip-api.com

An honest Comparison of ipapi.is to its Competitors

This blog post systematically compares ipapi.is to its primary competitors, namely :

  • ipinfo.io - Probably the most serious company in the IP API niche
  • ipdata.co - Smaller, but has a strong focus on IT security
  • ip-api.com - Provides less data fields in the API output, but appears to be quite accurate on the first sight

The comparison will be based on the following two main criteria:

  • Data Accuracy - How accurate is the API data?
  • Pricing - How much does the API cost?

The data accuracy is determined by comparing the API data from each API service provider to a manually determined ground truth. The process of finding the ground truth is done via manual inspection and it is possible that there are errors in it. By publishing all the data, readers can make up their own mind whether the approach in this blog article was fair or not. The pricing data is obtained from the API services websites (pricing section).

There are other possible comparison criteria such as API speed or API features, but those are less important. For example, the difference in speed between the services is not a big factor (usually in the range of single digit ms).

API features are also less important, as the main features are covered by all services. The main features of each API service are geolocation, organization data, ASN data, hosting detection, VPN detection, and proxy detection.

Data accuracy is by far the most important comparison criterion, as it determines the reliability of the data provided by the services. Data accuracy is so important because of the devastating consequences that come from false positives.

False positives for example in abuser detection can lead to blocking legitimate users from accessing a service. False negatives in this case are less of a problem (albeit still not nice to have), as they only mean that an abuser is not blocked. The same applies to hosting detection, TOR detection, proxy detection and VPN detection. It's better to have a low false positive rate than a low false negative rate, but of course it is always the best to keep both rates low.

In other words, an API service with perfect data quality has 100% sensitivity and 100% specificity.

Methodology

ipapi.is obviously has an incentive to make its product look as good as possible in this blog article, but it is attempted to be open in the way ipapi.is is compared to the other API services. For that reason, the comparison methodology will be discussed in detail and the obtained intermediate datasets and results will be published, so that the reader will be able to replicate the results.

The comparison will be based on three different IP address data sets:

  • Generic Dataset - 100 randomly selected IP Addresses from Real World Traffic
  • Geolocation Dataset - 31 accurately geolocated IP Addresses, originating from website visitors that allowed access to the JavaScript Geolocation API and from which high accuracy position data could be obtained
  • VPN Dataset - 50 IP addresses belonging to five different commercial VPN providers

Generic Dataset

Generic Dataset

This data set was crafted by collecting 100 pseudo random IPv4 addresses from real world traffic. It is not a good idea to generate IP addresses in a truly random fashion, since the likelihood that uninteresting IP ranges are picked is very high. After all, a substantial part of the IP address space is assigned to homogenous US governmental address space. Furthermore, IP API services are usually queried with IPs from real world traffic, therefore the IP samples should be picked from real world traffic as well.

The data points illustrated in the table below will be considered in the comparison to determine the accuracy. The data point references the fields in the API output format. The same field in the competitor's API response will be used for the comparison. Then API output format is not necessarily the same among the services, so the fields need to be mapped.

Data Point Description Relevance
is_datacenter Whether the IP address belongs to a hosting provider or not. Used to block spam traffic or other malicious traffic.
company.name Whether the services correctly detect the company / organization name of the IP address. As ground truth, a WHOIS lookup for the IP address is used to derive the correct organization that has administrative ownership over it. It is very important to correctly detect the company that owns an IP address / network, since a lot of other data fields depend on the correctness of this information.
company.type The company.type field will also be compared. This field is used to determine whether the IP address belongs to one of the following types of organizations: hosting, education, government, banking, isp, or business The company type is important to correctly classify traffic.
asn.name Determines whether the services correctly detect the ASN organization name to which the IP address belongs. Here again, the ground truth is obtained by making a WHOIS lookup for the IP address. The ASN name is important to be correct for the same reasons as the company.name
asn.type Furthermore, the asn.type field will also be compared. It has the same types as the company.type field: hosting, education, government, banking, isp, or business The ASN type is very important to correctly classify traffic.

Geolocation Dataset

Geolocation Dataset

Since it is impossible to know the geolocation ground truth for random IP addresses, the real geolocation from users visiting the ipapi.is website is collected by using the JavaScript Geolocation API. For obvious privacy reasons, the raw dataset obtained by this collection process cannot be fully published. In this case, transparency of the comparison methodology has to be sacrificed in favour of user privacy.

In order to determine the accuracy among the API services for the geolocation dataset, the following data fields will be used:

Data Point Description Relevance
location.latitude and location.longitude The coordinates of the IP address will be compared to the coordinates of the real user's location. As discussed above, as ground truth data set, real geolocation data points from website visitors are used as ground truth data set. Geolocation accuracy is probably one of the most important data points that an IP API product can have.

VPN Dataset

VPN Dataset

Additionally, 10 different exit node IP addresses from five different commercial VPN services such as NordVPN or ExpressVPN will be collected. This gives an IP address sample of 50 IP addresses that belong to VPN providers.

The following API field is used to determine the accuracy for this data set:

Data Point Description Relevance
is_vpn Whether the IP address belongs to a VPN service or not. The ground truth data set is obtained by finding 50 VPN exit nodes from 5 different commercial VPN providers. Many cyber criminals use VPN IP addresses to hide their malicious activities. Therefore, this field is very important.

Data Sets

The data sets used for the comparison will be published in this section. The IP address samples that were picked for the three datasets are as follows:

Name Description IP Addresses
Generic Dataset 100 randomly selected IP addresses from real traffic. The IPs originate from nginx access log files. Only IPs were taken where the User-Agent appears to be a legit Chrome browser. It is certainly possible that User-Agents are spoofed. But it doesn't hurt to have some malicious IPs in the data set, because it's real traffic as well. genericIPs.txt
VPN Dataset 50 randomly selected IP exit nodes from 5 commercial VPN providers vpn-dataset.json
Geolocation Dataset 31 geolocated IP addresses from ipapi.is website visitors using the JavaScript Geolocation API. The accuracy of the coordinates is reduced in order to protect the user's privacy. geoloc-data-reduced.json

Generic Data Set

The raw data and ground truth data sets are as follows:

Service Description API Lookups
ipapi.is The API results for the generic IP addresses for the service ipapi.is ipapi-is-generic.json
ipinfo.io The API results for the generic IP addresses for the service ipinfo.io ipinfo-io-generic.json
ipdata.co The API results for the generic IP addresses for the service ipdata.co ipdata-co-generic.json
ip-api.com The API results for the generic IP addresses for the service ip-api.com ip-api-com-generic.json
Ground Truth This is the ground truth data set for the generic IP addresses. This data set was created by manually inspecting IP addresses and determining the correct data for each relevant API output field.
The ground truth data set allows multiple different values where there is more than one correct answer. For example, both "WIND TRE S.P.A." and "WIND Telecomunicazioni S.p.A" are correct answers for the company.name field for the IP 151.77.220.15
ground-truth.json

Geolocation Data Set

When querying the geolocated addresses from the API services, the following data sets were obtained:

Service Description API Lookups
ipapi.is The API results querying the geolocated IP addresses on ipapi.is ipapi-is-location.json
ipinfo.io The API results querying the geolocated IP addresses on ipinfo.io ipinfo-io-location.json
ipdata.co The API results querying the geolocated IP addresses on ipdata.co ipdata-co-location.json
ip-api.com The API results querying the geolocated IP addresses on ip-api.com ip-api-com-location.json

VPN Data Set

When querying the VPN addresses from the API services, the following data sets were obtained:

Service Description API Lookups
ipapi.is The API results querying the VPN IP addresses on ipapi.is ipapi-is-vpn.json
ipinfo.io The API results querying the VPN IP addresses on ipinfo.io ipinfo-io-vpn.json
ipdata.co The API results querying the VPN IP addresses on ipdata.co ipdata-co-vpn-manual.json
ip-api.com The API results querying the VPN IP addresses on ip-api.com ip-api-com-vpn.json

Accuracy Results

The accuracy results for the generic IP addresses are presented in this section. The accuracy results are calculated by comparing the API results to the ground truth data set. The accuracy results are presented in the tables below. All of the data was queried on 8th July 2024 and 9th July 2024.

Generic Data Set

Accuracy Rank Service Ratio Accuracy
1st ipinfo.io 676 / 700 96.57%
2nd ipapi.is 671 / 700 95.86%
3rd ip-api.com 259 / 300 86.33%
4th ipdata.co 492 / 700 70.29%

The data accuracy results are quite clear. ipinfo.io is the most accurate service, followed closely by ipapi.is. ip-api.com is the third most accurate service, and ipdata.co is the least accurate service.

VPN Data Set

Accuracy Rank Service Ratio Accuracy
1st ipinfo.io 48 / 50 96.00%
1st ip-api.com 48 / 50 96.00%
3rd ipapi.is 38 / 50 76.00%
4th ipdata.co 34 / 50 68.00%

ipinfo.io is the most accurate service in regards to VPN detection. ip-api.com has the same accuracy, but is not able to tell to which VPN service the IP address belongs, they simply return a boolean. ipapi.is follows as the third most accurate service, and ipdata.co is the least accurate service.

Geolocation Data Set

Accuracy Rank Service Sum Deviation (km) Average Deviation (km)
1st ipinfo.io 4567.66 147.34
2nd ip-api.com 5618.66 181.25
3rd ipdata.co 5620.74 181.31
4th ipapi.is 6971.71 224.89

The data accuracy results are quite clear. ipinfo.io is the most accurate service, achieving the lowest average deviation in kilometers. ip-api.com follows as the second most accurate service, with ipdata.co closely behind. ipapi.is is the least accurate service among those compared.

However, it is important to understand that the average deviation in kilometers is for all API services quite high. This means that IP Geolocation in general is not the most accurate method to determine the real location of an IP address.

Discussion

Generic Dataset

The sample size was N=100, which is quite small. The reason why such a small sample size was chosen: The ground truth needs to be determined manually, which is a very time consuming process.

Because all the raw datasets are published above, you may re-compute the accuracy for yourself. The ground truth dataset for the generic IP addresses, which determines what is the correct answer, is quite lenient. For example, for the IP 141.53.67.241, the ground truth dataset allows many variations for the asn.org, asn.type and company.name:

"141.53.67.241": {
  "is_tor": false,
  "is_vpn": false,
  "is_datacenter": false,
  "asn.org": [
    "Verein zur Foerderung eines Deutschen Forschungsnetzes e.V.",
    "Verein ZUR Foerderung Eines Deutschen Forschungsnetzes E.V"
  ],
  "asn.type": [
    "education",
    "business"
  ],
  "company.name": [
    "Universitaet Greifswald",
    "Ernst-Moritz-Arndt-Universitaet Greifswald"
  ],
  "company.type": "education"
}

VPN Dataset

The sample size was N=50, which is quite small. This needs to be improved in the future. The reason why the sample size was so small: Each VPN IP address was found by using the respective VPN client application and switching the regions. This process takes time.

Regarding the VPN dataset, it must be noted that ipinfo.io also provides the name of the detected VPN provider in their API output. This is considerably more powerful than for example with the boolean output with ip-api.com

Geolocation Dataset

The sample size was N=31, which is very small. The geolocation data was collected by prompting website visitors with the JavaScript Geolocation API. The data needed to be cleaned manually, since it happens that malicious data is submitted by some sensors.

Conclusion

There are many conclusions that can be drawn from this comparison. The most important one is that the data clearly says that ipinfo.io is the best services in terms of data quality, especially regarding VPN detection and geolocation accuracy.

However, on the generic dataset, ipinfo.io is only marginally better in terms of data accuracy than ipapi.is (96.57% vs 95.86% accuracy), but the pricing of ipinfo.io is more than 100 times higher than the pricing of ipapi.is.

If you compare the pricing of the business subscription between ipapi.is and ipinfo.io:

Service Price API Lookups API Lookups per $1
ipapi.is $200.00 / month 2M API requests per day 300,000 lookups
ipinfo.io $416.00 / month 500k requests per month 1,202 lookups

ipapi.is will make improvements regarding the VPN detection quality and the geolocation accuracy. However, it must be noted that IP geolocation accuracy that can be achieved practically is not very high in general, even with the best provider, ipinfo.io.

Outlook

This experiment will be repeated in the near future with a larger sample size. Futhermore, there will be a dedicated dataset that solely focused on hosting detection accuracy. This is important, because hosting detection plays a big role in making IT security based decision such as traffic filtering and bot detection.