Published:
July 8, 2024
Last Modified:
July 17, 2024
Service Comparison
Data Quality
ipapi.is
ipinfo.io
ipdata.co
ip-api.com
An honest Comparison of ipapi.is to its Competitors
This blog post systematically compares ipapi.is to its
primary
competitors, namely :
- ipinfo.io - Probably the most serious company in the
IP API niche
- ipdata.co - Smaller, but has a strong focus on IT
security
- ip-api.com - Provides less data fields in the API
output, but appears
to be quite accurate on the first sight
The
comparison will be based on the following two main criteria:
- Data Accuracy - How accurate is the API data?
- Pricing - How much does the API cost?
The data accuracy is determined by comparing the API data from each API service provider to a manually
determined ground
truth. The process of finding the ground truth is done via manual inspection and it is possible that there
are errors in
it. By publishing all the data, readers can make up their own mind whether the approach in this blog
article was fair or not. The pricing data is obtained from the API services websites (pricing section).
There are other possible comparison criteria such as API speed or
API
features, but those
are less
important.
For example, the difference in speed between the
services is not a
big factor (usually in the range of single digit ms).
API features are also less important, as the main
features are covered by all services. The main
features of each API service are geolocation, organization data, ASN data, hosting detection, VPN
detection, and proxy
detection.
Data accuracy is by far the most important comparison criterion, as it determines the reliability of
the
data provided by
the services. Data accuracy is so important because of the devastating consequences that come from false
positives.
False positives for example in abuser detection can
lead to blocking legitimate users from accessing a service. False negatives in this case are less of a
problem (albeit still not nice to have), as they
only mean that an abuser is not blocked. The same applies to
hosting detection, TOR detection, proxy detection and VPN detection. It's better to have a low false
positive rate than a low false negative rate, but of course it is always the best to keep both rates low.
In other words, an API service with perfect data quality has 100% sensitivity and 100% specificity.
Methodology
ipapi.is obviously has an incentive to make its product
look as good as possible in this blog article, but it is attempted to be open in the way ipapi.is is compared to
the other API
services.
For that reason, the comparison methodology will be discussed in detail and the obtained intermediate
datasets and results will be
published, so that the reader will be able to replicate the results.
The comparison will be based on three different IP address data sets:
- Generic Dataset - 100 randomly selected IP Addresses from Real World Traffic
- Geolocation Dataset - 31 accurately geolocated IP Addresses, originating from website visitors that
allowed access to the JavaScript
Geolocation API and from which high accuracy position data could be obtained
- VPN Dataset - 50 IP addresses belonging to five different commercial VPN providers
Generic Dataset
Generic Dataset
This data set was crafted by collecting 100 pseudo random IPv4
addresses from real world traffic.
It is not a good idea to generate IP addresses
in a truly random fashion, since the likelihood that uninteresting IP ranges are picked is very high.
After all, a substantial part of the IP address space is assigned to homogenous US governmental address
space. Furthermore, IP
API services
are usually queried with IPs from real world traffic, therefore the IP samples should be picked from real
world
traffic as well.
The data points illustrated in the table below will be considered in the comparison to determine the
accuracy. The
data point references
the fields in the API output format. The same field in the
competitor's API response will be used for the comparison. Then API output format is not necessarily the
same
among the services, so the fields need to be mapped.
Data Point |
Description |
Relevance |
is_datacenter |
Whether the IP address belongs to a hosting provider or not. |
Used to block spam traffic or other malicious traffic. |
company.name |
Whether the services correctly
detect the company / organization name of the IP address. As ground
truth, a
WHOIS lookup for
the IP address is used to derive the correct organization that has administrative ownership over it.
|
It is very important to correctly detect the company that owns an IP address / network, since a
lot of other data fields depend on the correctness of this information.
|
company.type |
The company.type field
will also be
compared. This field is used to determine whether the
IP address belongs to one of the following types of organizations: hosting ,
education , government , banking , isp , or
business
| The company type is important to correctly classify traffic.
|
asn.name |
Determines whether the services correctly
detect the ASN organization name to which the IP address belongs. Here
again, the ground truth
is
obtained by making a
WHOIS lookup for the IP address. |
The ASN name is important to be correct for the same reasons as the company.name
|
asn.type |
Furthermore, the asn.type field
will
also be compared. It has the same types as the
company.type field: hosting ,
education , government , banking , isp , or
business
|
The ASN type is very important to correctly classify traffic.
|
Geolocation Dataset
Geolocation Dataset
Since it is impossible to
know the geolocation ground truth for random IP addresses, the real geolocation from users visiting
the
ipapi.is
website is collected by using the JavaScript
Geolocation API. For obvious privacy reasons, the raw dataset obtained by this collection
process cannot be fully published.
In this case, transparency of the comparison methodology has to be sacrificed in favour of
user privacy.
In order to determine the accuracy among the API services for the geolocation dataset, the following data
fields will be used:
Data Point |
Description |
Relevance |
location.latitude and location.longitude |
The coordinates of the IP address will be compared to the coordinates of the real user's location.
As discussed above, as ground truth data set, real geolocation data points from website visitors
are used as
ground truth data set. |
Geolocation accuracy is probably one of the most important data points that an IP API product can
have.
|
VPN Dataset
VPN Dataset
Additionally, 10 different
exit node IP addresses from five different commercial VPN services such as NordVPN or ExpressVPN will be
collected. This
gives an IP address sample of 50 IP addresses that belong to VPN providers.
The following API field is used to determine the accuracy for this data set:
Data Point |
Description |
Relevance |
is_vpn |
Whether the IP address belongs to a VPN service or not. The ground truth data set is obtained by
finding 50 VPN exit nodes from 5 different commercial VPN providers. |
Many cyber criminals use VPN IP addresses to hide their malicious activities. Therefore, this
field
is very important. |
Data Sets
The data sets used for the comparison will be published in this section. The IP address samples that were
picked for the three datasets are as
follows:
Name |
Description |
IP Addresses |
Generic Dataset |
100 randomly selected IP addresses from real traffic. The IPs originate from nginx access log
files. Only IPs were taken where the User-Agent appears to be a legit Chrome browser.
It is certainly possible that User-Agents are spoofed. But it doesn't hurt to have some malicious
IPs in the data set, because it's real traffic as well. |
genericIPs.txt
|
VPN Dataset |
50 randomly selected IP exit nodes from 5 commercial VPN providers |
vpn-dataset.json
|
Geolocation Dataset |
31 geolocated IP addresses from ipapi.is website visitors using the JavaScript
Geolocation API. The accuracy of the coordinates is reduced in order to protect the
user's privacy. |
geoloc-data-reduced.json
|
Generic Data Set
The raw data and ground truth data
sets are as follows:
Service |
Description |
API Lookups |
ipapi.is |
The API results for the generic IP addresses for the service ipapi.is |
ipapi-is-generic.json
|
ipinfo.io |
The API results for the generic IP addresses for the service ipinfo.io |
ipinfo-io-generic.json
|
ipdata.co |
The API results for the generic IP addresses for the service ipdata.co |
ipdata-co-generic.json
|
ip-api.com |
The API results for the generic IP addresses for the service ip-api.com |
ip-api-com-generic.json
|
Ground Truth |
This is the ground truth data set for the generic IP addresses. This data set was created by
manually inspecting IP addresses and determining the correct data for each relevant API output
field.
The ground
truth data set allows multiple different values where there is more than one correct answer. For
example, both "WIND TRE S.P.A." and "WIND Telecomunicazioni S.p.A" are correct answers for the
company.name field for the IP 151.77.220.15
|
ground-truth.json
|
Geolocation Data Set
When querying the geolocated addresses from the API services, the following data sets were obtained:
VPN Data Set
When querying the VPN addresses from the API services, the following data sets were obtained:
Accuracy Results
The accuracy results
for the generic IP addresses are presented in this section. The accuracy results are calculated by
comparing the API
results to the ground truth data set. The accuracy results are presented in the tables below. All of the
data was queried on 8th July 2024 and 9th July 2024.
Generic Data Set
Accuracy Rank |
Service |
Ratio |
Accuracy |
1st |
ipinfo.io |
676 / 700 |
96.57% |
2nd |
ipapi.is |
671 / 700 |
95.86% |
3rd |
ip-api.com |
259 / 300 |
86.33% |
4th |
ipdata.co |
492 / 700 |
70.29% |
The data accuracy results are quite clear. ipinfo.io is the most accurate
service, followed closely by ipapi.is. ip-api.com is the third most accurate service, and ipdata.co is the least accurate service.
VPN Data Set
Accuracy Rank |
Service |
Ratio |
Accuracy |
1st |
ipinfo.io |
48 / 50 |
96.00% |
1st |
ip-api.com |
48 / 50 |
96.00% |
3rd |
ipapi.is |
38 / 50 |
76.00% |
4th |
ipdata.co |
34 / 50 |
68.00% |
ipinfo.io is the most accurate
service in regards to VPN detection. ip-api.com has the same accuracy,
but is not able to tell to which VPN service the IP address belongs, they simply return a boolean. ipapi.is follows as the third most accurate service, and ipdata.co is the least accurate service.
Geolocation Data Set
Accuracy Rank |
Service |
Sum Deviation (km) |
Average Deviation (km) |
1st |
ipinfo.io |
4567.66 |
147.34 |
2nd |
ip-api.com |
5618.66 |
181.25 |
3rd |
ipdata.co |
5620.74 |
181.31 |
4th |
ipapi.is |
6971.71 |
224.89 |
The data accuracy results are quite clear. ipinfo.io is the most accurate
service, achieving the lowest average deviation in kilometers. ip-api.com
follows as
the second most accurate service, with ipdata.co closely behind. ipapi.is is the least accurate service among those compared.
However, it is important to understand that the average deviation in kilometers is for all API services
quite high. This means that IP Geolocation in general is not the most accurate method to determine the
real location of an IP address.
Discussion
Generic Dataset
The sample size was N=100, which is quite small. The reason why such a small sample size was chosen: The
ground truth needs to be determined manually, which is a very time consuming process.
Because all the raw datasets are published above, you may re-compute the accuracy for yourself. The
ground truth dataset for the generic IP addresses, which determines what is the correct answer, is quite
lenient. For example, for the
IP 141.53.67.241
, the ground truth dataset allows many variations for the
asn.org
, asn.type
and company.name
:
"141.53.67.241": {
"is_tor": false,
"is_vpn": false,
"is_datacenter": false,
"asn.org": [
"Verein zur Foerderung eines Deutschen Forschungsnetzes e.V.",
"Verein ZUR Foerderung Eines Deutschen Forschungsnetzes E.V"
],
"asn.type": [
"education",
"business"
],
"company.name": [
"Universitaet Greifswald",
"Ernst-Moritz-Arndt-Universitaet Greifswald"
],
"company.type": "education"
}
VPN Dataset
The sample size was N=50, which is quite small. This needs to be improved in the future. The reason why
the sample size was so small: Each VPN IP address was found by using the respective VPN client application
and switching the regions. This process takes time.
Regarding the VPN dataset, it must be noted that ipinfo.io also provides
the
name of the detected VPN provider in their API output. This is considerably more powerful than for example
with the boolean output with ip-api.com
Geolocation Dataset
The sample size was N=31, which is very small. The geolocation data was collected by prompting website
visitors with the
JavaScript Geolocation
API. The data needed to be cleaned manually, since it happens that malicious data is submitted by
some sensors.
Conclusion
There are many conclusions that can be drawn from this comparison. The most important one is that the data
clearly says that ipinfo.io is the best services in terms of data quality,
especially regarding VPN detection and geolocation accuracy.
However, on the generic dataset,
ipinfo.io is only marginally
better in terms of data accuracy than ipapi.is (96.57% vs 95.86% accuracy),
but the pricing of ipinfo.io is more than 100 times
higher than
the
pricing
of ipapi.is.
If you compare the pricing of the business subscription between ipapi.is
and ipinfo.io:
Service |
Price |
API Lookups |
API Lookups per $1 |
ipapi.is |
$200.00 / month |
2M API requests per day |
300,000 lookups |
ipinfo.io |
$416.00 / month |
500k requests per month |
1,202 lookups |
ipapi.is will make improvements regarding the VPN detection quality and the
geolocation accuracy. However, it must be noted that IP geolocation accuracy that can be achieved
practically is not
very high in general, even with the best provider, ipinfo.io.
Outlook
This experiment will be repeated in the near future with a larger sample size. Futhermore, there will be a
dedicated dataset that solely focused on hosting detection accuracy. This is important, because hosting
detection
plays a big role in making IT security based decision such as traffic filtering and bot detection.