An honest Comparison of ipapi.is to its Competitors
This blog post systematically compares ipapi.is to its primary competitors, namely :
- ipinfo.io - Probably the most serious company in the IP API niche
- ipdata.co - Smaller, but has a strong focus on IT security
- ip-api.com - Provides less data fields in the API output, but appears to be quite accurate on the first sight
The comparison will be based on the following two main criteria:
- Data Accuracy - How accurate is the API data?
- Pricing - How much does the API cost?
The data accuracy is determined by comparing the API data from each API service provider to a manually determined ground truth. The process of finding the ground truth is done via manual inspection and it is possible that there are errors in it. By publishing all the data, readers can make up their own mind whether the approach in this blog article was fair or not. The pricing data is obtained from the API services websites (pricing section).
There are other possible comparison criteria such as API speed or API features, but those are less important. For example, the difference in speed between the services is not a big factor (usually in the range of single digit ms).
API features are also less important, as the main features are covered by all services. The main features of each API service are geolocation, organization data, ASN data, hosting detection, VPN detection, and proxy detection.
Data accuracy is by far the most important comparison criterion, as it determines the reliability of the data provided by the services. Data accuracy is so important because of the devastating consequences that come from false positives.
False positives for example in abuser detection can lead to blocking legitimate users from accessing a service. False negatives in this case are less of a problem (albeit still not nice to have), as they only mean that an abuser is not blocked. The same applies to hosting detection, TOR detection, proxy detection and VPN detection. It's better to have a low false positive rate than a low false negative rate, but of course it is always the best to keep both rates low.
In other words, an API service with perfect data quality has 100% sensitivity and 100% specificity.
Methodology
ipapi.is obviously has an incentive to make its product look as good as possible in this blog article, but it is attempted to be open in the way ipapi.is is compared to the other API services. For that reason, the comparison methodology will be discussed in detail and the obtained intermediate datasets and results will be published, so that the reader will be able to replicate the results.
The comparison will be based on three different IP address data sets:
- Generic Dataset - 100 randomly selected IP Addresses from Real World Traffic
- Geolocation Dataset - 31 accurately geolocated IP Addresses, originating from website visitors that allowed access to the JavaScript Geolocation API and from which high accuracy position data could be obtained
- VPN Dataset - 50 IP addresses belonging to five different commercial VPN providers
Generic Dataset
Generic Dataset
This data set was crafted by collecting 100 pseudo random IPv4 addresses from real world traffic. It is not a good idea to generate IP addresses in a truly random fashion, since the likelihood that uninteresting IP ranges are picked is very high. After all, a substantial part of the IP address space is assigned to homogenous US governmental address space. Furthermore, IP API services are usually queried with IPs from real world traffic, therefore the IP samples should be picked from real world traffic as well.
The data points illustrated in the table below will be considered in the comparison to determine the accuracy. The data point references the fields in the API output format. The same field in the competitor's API response will be used for the comparison. Then API output format is not necessarily the same among the services, so the fields need to be mapped.
Data Point | Description | Relevance |
---|---|---|
is_datacenter |
Whether the IP address belongs to a hosting provider or not. | Used to block spam traffic or other malicious traffic. |
company.name |
Whether the services correctly detect the company / organization name of the IP address. As ground truth, a WHOIS lookup for the IP address is used to derive the correct organization that has administrative ownership over it. | It is very important to correctly detect the company that owns an IP address / network, since a lot of other data fields depend on the correctness of this information. |
company.type |
The company.type field
will also be
compared. This field is used to determine whether the
IP address belongs to one of the following types of organizations: hosting ,
education , government , banking , isp , or
business
| The company type is important to correctly classify traffic. |
asn.name |
Determines whether the services correctly detect the ASN organization name to which the IP address belongs. Here again, the ground truth is obtained by making a WHOIS lookup for the IP address. |
The ASN name is important to be correct for the same reasons as the company.name
|
asn.type |
Furthermore, the asn.type field
will
also be compared. It has the same types as the
company.type field: hosting ,
education , government , banking , isp , or
business
|
The ASN type is very important to correctly classify traffic. |
Geolocation Dataset
Geolocation Dataset
Since it is impossible to know the geolocation ground truth for random IP addresses, the real geolocation from users visiting the ipapi.is website is collected by using the JavaScript Geolocation API. For obvious privacy reasons, the raw dataset obtained by this collection process cannot be fully published. In this case, transparency of the comparison methodology has to be sacrificed in favour of user privacy.
In order to determine the accuracy among the API services for the geolocation dataset, the following data fields will be used:
Data Point | Description | Relevance |
---|---|---|
location.latitude and location.longitude |
The coordinates of the IP address will be compared to the coordinates of the real user's location. As discussed above, as ground truth data set, real geolocation data points from website visitors are used as ground truth data set. | Geolocation accuracy is probably one of the most important data points that an IP API product can have. |
VPN Dataset
VPN Dataset
Additionally, 10 different exit node IP addresses from five different commercial VPN services such as NordVPN or ExpressVPN will be collected. This gives an IP address sample of 50 IP addresses that belong to VPN providers.
The following API field is used to determine the accuracy for this data set:
Data Point | Description | Relevance |
---|---|---|
is_vpn |
Whether the IP address belongs to a VPN service or not. The ground truth data set is obtained by finding 50 VPN exit nodes from 5 different commercial VPN providers. | Many cyber criminals use VPN IP addresses to hide their malicious activities. Therefore, this field is very important. |
Data Sets
The data sets used for the comparison will be published in this section. The IP address samples that were picked for the three datasets are as follows:
Name | Description | IP Addresses |
---|---|---|
Generic Dataset | 100 randomly selected IP addresses from real traffic. The IPs originate from nginx access log files. Only IPs were taken where the User-Agent appears to be a legit Chrome browser. It is certainly possible that User-Agents are spoofed. But it doesn't hurt to have some malicious IPs in the data set, because it's real traffic as well. | genericIPs.txt |
VPN Dataset | 50 randomly selected IP exit nodes from 5 commercial VPN providers | vpn-dataset.json |
Geolocation Dataset | 31 geolocated IP addresses from ipapi.is website visitors using the JavaScript Geolocation API. The accuracy of the coordinates is reduced in order to protect the user's privacy. | geoloc-data-reduced.json |
Generic Data Set
The raw data and ground truth data sets are as follows:
Service | Description | API Lookups |
---|---|---|
ipapi.is | The API results for the generic IP addresses for the service ipapi.is | ipapi-is-generic.json |
ipinfo.io | The API results for the generic IP addresses for the service ipinfo.io | ipinfo-io-generic.json |
ipdata.co | The API results for the generic IP addresses for the service ipdata.co | ipdata-co-generic.json |
ip-api.com | The API results for the generic IP addresses for the service ip-api.com | ip-api-com-generic.json |
Ground Truth | This is the ground truth data set for the generic IP addresses. This data set was created by
manually inspecting IP addresses and determining the correct data for each relevant API output
field. The ground truth data set allows multiple different values where there is more than one correct answer. For example, both "WIND TRE S.P.A." and "WIND Telecomunicazioni S.p.A" are correct answers for the company.name field for the IP 151.77.220.15
|
ground-truth.json |
Geolocation Data Set
When querying the geolocated addresses from the API services, the following data sets were obtained:
Service | Description | API Lookups |
---|---|---|
ipapi.is | The API results querying the geolocated IP addresses on ipapi.is | ipapi-is-location.json |
ipinfo.io | The API results querying the geolocated IP addresses on ipinfo.io | ipinfo-io-location.json |
ipdata.co | The API results querying the geolocated IP addresses on ipdata.co | ipdata-co-location.json |
ip-api.com | The API results querying the geolocated IP addresses on ip-api.com | ip-api-com-location.json |
VPN Data Set
When querying the VPN addresses from the API services, the following data sets were obtained:
Service | Description | API Lookups |
---|---|---|
ipapi.is | The API results querying the VPN IP addresses on ipapi.is | ipapi-is-vpn.json |
ipinfo.io | The API results querying the VPN IP addresses on ipinfo.io | ipinfo-io-vpn.json |
ipdata.co | The API results querying the VPN IP addresses on ipdata.co | ipdata-co-vpn-manual.json |
ip-api.com | The API results querying the VPN IP addresses on ip-api.com | ip-api-com-vpn.json |
Accuracy Results
The accuracy results for the generic IP addresses are presented in this section. The accuracy results are calculated by comparing the API results to the ground truth data set. The accuracy results are presented in the tables below. All of the data was queried on 8th July 2024 and 9th July 2024.
Generic Data Set
Accuracy Rank | Service | Ratio | Accuracy |
---|---|---|---|
1st | ipinfo.io | 676 / 700 | 96.57% |
2nd | ipapi.is | 671 / 700 | 95.86% |
3rd | ip-api.com | 259 / 300 | 86.33% |
4th | ipdata.co | 492 / 700 | 70.29% |
The data accuracy results are quite clear. ipinfo.io is the most accurate service, followed closely by ipapi.is. ip-api.com is the third most accurate service, and ipdata.co is the least accurate service.
VPN Data Set
Accuracy Rank | Service | Ratio | Accuracy |
---|---|---|---|
1st | ipinfo.io | 48 / 50 | 96.00% |
1st | ip-api.com | 48 / 50 | 96.00% |
3rd | ipapi.is | 38 / 50 | 76.00% |
4th | ipdata.co | 34 / 50 | 68.00% |
ipinfo.io is the most accurate service in regards to VPN detection. ip-api.com has the same accuracy, but is not able to tell to which VPN service the IP address belongs, they simply return a boolean. ipapi.is follows as the third most accurate service, and ipdata.co is the least accurate service.
Geolocation Data Set
Accuracy Rank | Service | Sum Deviation (km) | Average Deviation (km) |
---|---|---|---|
1st | ipinfo.io | 4567.66 | 147.34 |
2nd | ip-api.com | 5618.66 | 181.25 |
3rd | ipdata.co | 5620.74 | 181.31 |
4th | ipapi.is | 6971.71 | 224.89 |
The data accuracy results are quite clear. ipinfo.io is the most accurate service, achieving the lowest average deviation in kilometers. ip-api.com follows as the second most accurate service, with ipdata.co closely behind. ipapi.is is the least accurate service among those compared.
However, it is important to understand that the average deviation in kilometers is for all API services quite high. This means that IP Geolocation in general is not the most accurate method to determine the real location of an IP address.
Discussion
Generic Dataset
The sample size was N=100, which is quite small. The reason why such a small sample size was chosen: The ground truth needs to be determined manually, which is a very time consuming process.
Because all the raw datasets are published above, you may re-compute the accuracy for yourself. The
ground truth dataset for the generic IP addresses, which determines what is the correct answer, is quite
lenient. For example, for the
IP 141.53.67.241
, the ground truth dataset allows many variations for the
asn.org
, asn.type
and company.name
:
"141.53.67.241": {
"is_tor": false,
"is_vpn": false,
"is_datacenter": false,
"asn.org": [
"Verein zur Foerderung eines Deutschen Forschungsnetzes e.V.",
"Verein ZUR Foerderung Eines Deutschen Forschungsnetzes E.V"
],
"asn.type": [
"education",
"business"
],
"company.name": [
"Universitaet Greifswald",
"Ernst-Moritz-Arndt-Universitaet Greifswald"
],
"company.type": "education"
}
VPN Dataset
The sample size was N=50, which is quite small. This needs to be improved in the future. The reason why the sample size was so small: Each VPN IP address was found by using the respective VPN client application and switching the regions. This process takes time.
Regarding the VPN dataset, it must be noted that ipinfo.io also provides the name of the detected VPN provider in their API output. This is considerably more powerful than for example with the boolean output with ip-api.com
Geolocation Dataset
The sample size was N=31, which is very small. The geolocation data was collected by prompting website visitors with the JavaScript Geolocation API. The data needed to be cleaned manually, since it happens that malicious data is submitted by some sensors.
Conclusion
There are many conclusions that can be drawn from this comparison. The most important one is that the data clearly says that ipinfo.io is the best services in terms of data quality, especially regarding VPN detection and geolocation accuracy.
However, on the generic dataset, ipinfo.io is only marginally better in terms of data accuracy than ipapi.is (96.57% vs 95.86% accuracy), but the pricing of ipinfo.io is more than 100 times higher than the pricing of ipapi.is.
If you compare the pricing of the business subscription between ipapi.is and ipinfo.io:
Service | Price | API Lookups | API Lookups per $1 |
---|---|---|---|
ipapi.is | $200.00 / month | 2M API requests per day | 300,000 lookups |
ipinfo.io | $416.00 / month | 500k requests per month | 1,202 lookups |
ipapi.is will make improvements regarding the VPN detection quality and the geolocation accuracy. However, it must be noted that IP geolocation accuracy that can be achieved practically is not very high in general, even with the best provider, ipinfo.io.
Outlook
This experiment will be repeated in the near future with a larger sample size. Futhermore, there will be a dedicated dataset that solely focused on hosting detection accuracy. This is important, because hosting detection plays a big role in making IT security based decision such as traffic filtering and bot detection.