Published:
August 31, 2024
Last Modified:
September 1, 2024
Company Data Accuracy
Service Comparison
ipapi.is
ipinfo.io
ipdata.co
IP to Company Accuracy
This blog investigates the accuracy of ipapi.is company /
organization data compared to other popular IP APIs such as ipinfo.io and ipdata.co.
Company data is one of the most important data points in the API, since many other API
fields are derived from it or data pipelines depend on the accuracy of company data. It therefore makes
sense to periodically investigate the accuracy of company data and compare it to other IP APIs.
But why is company data accuracy considered so important?
For example, in order to correctly classify if an
IP address belongs to a hosting provider, the API needs to know the correct and up-to-date company name
and domain. The same applies to VPN and TOR detection. Correct company data is also partially used in IP
Geolocation, since more often than not, company names provide hints about the location of an organization.
An example would be:
Introduction
ipapi.is provides company data in the company
field of its API response. For example, when
looking up the IP address 104.27.153.120, the API
response includes the company
field with the following output:
"company": {
"name": "Cloudflare, Inc.",
"abuser_score": "0.0047 (Low)",
"domain": "cloudflare.com",
"type": "hosting",
"network": "104.16.0.0 - 104.31.255.255",
"whois": "https://api.ipapi.is/?whois=104.16.0.0"
}
As the example above shows, ipapi.is provides the company name, domain, type, network, and whois
information for the IP address. But what does the various data points mean?
-
company.name
- The name of the organization that owns the IP address based on WHOIS data.
-
company.abuser_score
- A score that indicates the risk of the IP address being abusive.
Learn more about the abuser score here.
-
company.domain
- The domain name of the organization that owns the IP address based on
WHOIS data.
-
company.type
- The type of the organization. This can be hosting
,
isp
, education
, government
, banking
or the generic
business
type.
-
company.network
- The network that the IP address belongs to and that was allocated /
assigned to the organization.
-
company.whois
- The raw WHOIS information for the IP address.
When we speak of IP to Company data accuracy, the above data points are considered. Learn more about the
company object in the documentation.
Methodology
To compare the accuracy of the company data provided by ipapi.is, we will use the following method:
-
Select a random sample of IP addresses from real world traffic. It is crucial to use real world traffic
and not random IP addresses, since large chunks of the Internet
are actually unassigned or assigned to placeholder organizations. Selecting at random would ineviatably
lead to uninteresting IP addresses in terms of company data. The sample size will be 300 IP addresses.
-
For each IP address, retrieve the company data via API lookup from ipapi.is, ipinfo.io and ipdata.co.
-
Determine the ground truth company data by manually investigating each of the 300 IP addresses in our
sample. It is perfectly possbile that an IP address can
have several correct variations of company names or types.
-
Compare the company data from each API to the ground truth data and determine the accuracy of each API.
The accuracy of the company data is determined by the following formula:
Accuracy = (True Positives + True Negatives) / (True Positives + True Negatives + False Positives + False Negatives)
All datasets will be published in full, so that the results can be reproduced and verified by anyone.
Obviously, since the ground truth company data is based on manual investigation, the results can only be
as accurate as our manual investigation. Therefore, it is crucial that the ground truth data set is
published.
Only the company.name
and company.type
fields are considered when calculating
the accuracy. The reason for this is that the other fields are either too noisy and because those two
fields are by far the most important ones.
IP Address Sample
The IP addresses sample was created by selecting 300 unique IP addresses from the real world traffic.
The IP addresses are spread over a wide range of countries and ASNs, and they are all unique. The IP
addresses that constitute the sample can be downloaded as a text file here.
API Responses for the IP Address Sample
The API responses for all 300 IP addresses in the sample can be downloaded here. The data was collected on
August 31th, 2024.
Ground Truth Data
The ground truth data set is published as JSON file that can be downloaded here. The ground truth data set contains
the company name and type for each of the 300 IP addresses in the sample that is considered to be correct.
It was created by manually investigating each of the 300 IP addresses in the sample.
The structure of the ground truth data is as follows:
"102.141.172.197": {
"name": [
"level 7 wireless (pty) ltd",
"level-7-internet",
"level 7 internet"
],
"type": [
"isp"
]
}
If the value is an array, it means that the there are multiple correct variations of the company name or
type for an IP address. Meaning for the example above: Both "level 7 wireless (pty) ltd"
and
"level-7-internet"
are correct variations of the company name for the IP address
102.141.172.197
.
Results
The results for the accuracy of the company.name
is as follows:
Accuracy Rank |
Service |
Correct |
Accuracy |
1st |
ipapi.is |
291 / 300 |
97.00% |
2nd |
ipinfo.io |
286 / 300 |
95.33% |
3rd |
ipdata.co |
259 / 300 |
86.33% |
The results for the accuracy of the company.type
field is as follows:
Accuracy Rank |
Service |
Correct |
Accuracy |
1st |
ipapi.is |
295 / 300 |
98.33% |
2nd |
ipinfo.io |
277 / 300 |
92.33% |
3rd |
ipdata.co |
27 / 300 |
9.00% |
The results show that ipapi.is provides the most accurate company data for the IP addresses sample.
Discussion
The sample size with N = 300 is rather small. The reason for the small sample size is that it is very time
consuming to manually investigate each of the IP addresses in the sample to derive the ground truth data.
The sample size could be much larger if the ground truth data could be determined programmatically. One
idea would be to consider the company.name
correct if it appears in normal or normalized form
in the WHOIS record for the IP address. But there are some problems, since it is unclear what forms of
normalization should be allowed and so on.
The results show that competitors to ipapi.is such as ipinfo.io and ipdata.co are less accurate than ipapi.is in regards to the company.name
and company.type
fields.
ipdata.co has extremely poor accuracy for the
company.type
field. Initially, it was assumed that they exclusively provide the value
business
for the company.type
field, but the API data shows that other values
besides business
are also used. For example, ipdata.co provides the value edu
for the IP address 82.13.46.116, which is incorrect in itself.
"82.13.46.116": {
"company": {
"name": "Mansfield",
"domain": "mansfield.edu",
"network": "82.13.46.0/23",
"type": "edu"
}
}
Furthermore, even though ipinfo.io is by far the largest
company with the most employees, they make some very blatant mistakes in regards to company data.
ipinfo.io doesn't provide any company data at all for the
following IP addresses of the sample:
This can be verified by looking at the API responses for the IP addresses. The company
field is missing for
all of the IP addresses above.
There is no good explanation for the lack of company data in the API responses above. Probably ipinfo.io simply has some issues with their WHOIS data
pipeline.
Another example where ipinfo.io doesn't provide correct
company data is the IP address 185.108.7.79.
"185.108.7.79": {
"company": {
"name": "Sormovskaja, 210",
"domain": "yugtelecom.su",
"type": "hosting"
},
}
The company name "Sormovskaja, 210"
is obviously incorrect since it is an postal address
parsed from the descr:
field, and the correct company name should be
"Yug-Telecom Ltd."
. It is unclear why ipinfo.io made this mistake, since the raw WHOIS data for the IP address is very
straight forward and shows the correct company name in the org-name: Yug-Telecom Ltd.
field:
inetnum: 185.108.7.0 - 185.108.7.255
netname: YUG-TELE
descr: Sormovskaja, 210
descr: 350088 Krasnodar Russia
country: RU
org: ORG-YL21-RIPE
admin-c: AC32081-RIPE
tech-c: AC32081-RIPE
status: ASSIGNED PA
mnt-by: UNIVERSITY-MNT
created: 2016-03-11T16:16:24Z
last-modified: 2022-09-15T16:31:42Z
source: RIPE
organisation: ORG-YL21-RIPE
org-name: Yug-Telecom Ltd.
country: RU
org-type: other
address: 350088, Krasnodar, Sormovskaja, 210
abuse-c: AC32081-RIPE
mnt-ref: lidertelecom-mnt
mnt-ref: UNIVERSITY-MNT
mnt-by: lidertelecom-mnt
created: 2015-03-11T07:29:24Z
last-modified: 2022-12-01T17:30:31Z
source: RIPE # Filtered
But to be fair, ipinfo.io is still a very accurate IP
API. All services make mistakes, including ipapi.is.
Conclusion
From all the services tested, ipapi.is is the most accurate
service in regards to company data.
The correctness of company data is likely the most important aspect of any IP API, since the company name
and company type are used to derive many other API fields. Company data sources many data pipelines that
depend on accurate company data. It is not trivial to provide correct company data, since there
is a huge diversity in WHOIS registries and company data needs to be constantly updated and parsed
correctly.