IP to Company Accuracy
Published: August 31, 2024
Last Modified: September 1, 2024
Company Data Accuracy Service Comparison ipapi.is ipinfo.io ipdata.co

IP to Company Accuracy

This blog investigates the accuracy of ipapi.is company / organization data compared to other popular IP APIs such as ipinfo.io and ipdata.co.

Company data is one of the most important data points in the API, since many other API fields are derived from it or data pipelines depend on the accuracy of company data. It therefore makes sense to periodically investigate the accuracy of company data and compare it to other IP APIs.

But why is company data accuracy considered so important?

For example, in order to correctly classify if an IP address belongs to a hosting provider, the API needs to know the correct and up-to-date company name and domain. The same applies to VPN and TOR detection. Correct company data is also partially used in IP Geolocation, since more often than not, company names provide hints about the location of an organization. An example would be:

Introduction

ipapi.is provides company data in the company field of its API response. For example, when looking up the IP address 104.27.153.120, the API response includes the company field with the following output:

"company": {
  "name": "Cloudflare, Inc.",
  "abuser_score": "0.0047 (Low)",
  "domain": "cloudflare.com",
  "type": "hosting",
  "network": "104.16.0.0 - 104.31.255.255",
  "whois": "https://api.ipapi.is/?whois=104.16.0.0"
}

As the example above shows, ipapi.is provides the company name, domain, type, network, and whois information for the IP address. But what does the various data points mean?

  • company.name - The name of the organization that owns the IP address based on WHOIS data.
  • company.abuser_score - A score that indicates the risk of the IP address being abusive. Learn more about the abuser score here.
  • company.domain - The domain name of the organization that owns the IP address based on WHOIS data.
  • company.type - The type of the organization. This can be hosting, isp, education, government, banking or the generic business type.
  • company.network - The network that the IP address belongs to and that was allocated / assigned to the organization.
  • company.whois - The raw WHOIS information for the IP address.

When we speak of IP to Company data accuracy, the above data points are considered. Learn more about the company object in the documentation.

Methodology

To compare the accuracy of the company data provided by ipapi.is, we will use the following method:

  1. Select a random sample of IP addresses from real world traffic. It is crucial to use real world traffic and not random IP addresses, since large chunks of the Internet are actually unassigned or assigned to placeholder organizations. Selecting at random would ineviatably lead to uninteresting IP addresses in terms of company data. The sample size will be 300 IP addresses.
  2. For each IP address, retrieve the company data via API lookup from ipapi.is, ipinfo.io and ipdata.co.
  3. Determine the ground truth company data by manually investigating each of the 300 IP addresses in our sample. It is perfectly possbile that an IP address can have several correct variations of company names or types.
  4. Compare the company data from each API to the ground truth data and determine the accuracy of each API.

The accuracy of the company data is determined by the following formula:

Accuracy = (True Positives + True Negatives) / (True Positives + True Negatives + False Positives + False Negatives)

All datasets will be published in full, so that the results can be reproduced and verified by anyone. Obviously, since the ground truth company data is based on manual investigation, the results can only be as accurate as our manual investigation. Therefore, it is crucial that the ground truth data set is published.

Only the company.name and company.type fields are considered when calculating the accuracy. The reason for this is that the other fields are either too noisy and because those two fields are by far the most important ones.

IP Address Sample

The IP addresses sample was created by selecting 300 unique IP addresses from the real world traffic.

The IP addresses are spread over a wide range of countries and ASNs, and they are all unique. The IP addresses that constitute the sample can be downloaded as a text file here.

API Responses for the IP Address Sample

The API responses for all 300 IP addresses in the sample can be downloaded here. The data was collected on August 31th, 2024.

Ground Truth Data

The ground truth data set is published as JSON file that can be downloaded here. The ground truth data set contains the company name and type for each of the 300 IP addresses in the sample that is considered to be correct. It was created by manually investigating each of the 300 IP addresses in the sample.

The structure of the ground truth data is as follows:

"102.141.172.197": {
  "name": [
    "level 7 wireless (pty) ltd",
    "level-7-internet",
    "level 7 internet"
  ],
  "type": [
    "isp"
  ]
}

If the value is an array, it means that the there are multiple correct variations of the company name or type for an IP address. Meaning for the example above: Both "level 7 wireless (pty) ltd" and "level-7-internet" are correct variations of the company name for the IP address 102.141.172.197.

Results

The results for the accuracy of the company.name is as follows:

Accuracy Rank Service Correct Accuracy
1st ipapi.is 291 / 300 97.00%
2nd ipinfo.io 286 / 300 95.33%
3rd ipdata.co 259 / 300 86.33%

The results for the accuracy of the company.type field is as follows:

Accuracy Rank Service Correct Accuracy
1st ipapi.is 295 / 300 98.33%
2nd ipinfo.io 277 / 300 92.33%
3rd ipdata.co 27 / 300 9.00%

The results show that ipapi.is provides the most accurate company data for the IP addresses sample.

Discussion

The sample size with N = 300 is rather small. The reason for the small sample size is that it is very time consuming to manually investigate each of the IP addresses in the sample to derive the ground truth data.

The sample size could be much larger if the ground truth data could be determined programmatically. One idea would be to consider the company.name correct if it appears in normal or normalized form in the WHOIS record for the IP address. But there are some problems, since it is unclear what forms of normalization should be allowed and so on.

The results show that competitors to ipapi.is such as ipinfo.io and ipdata.co are less accurate than ipapi.is in regards to the company.name and company.type fields.

ipdata.co has extremely poor accuracy for the company.type field. Initially, it was assumed that they exclusively provide the value business for the company.type field, but the API data shows that other values besides business are also used. For example, ipdata.co provides the value edu for the IP address 82.13.46.116, which is incorrect in itself.

"82.13.46.116": {
  "company": {
    "name": "Mansfield",
    "domain": "mansfield.edu",
    "network": "82.13.46.0/23",
    "type": "edu"
  }
}

Furthermore, even though ipinfo.io is by far the largest company with the most employees, they make some very blatant mistakes in regards to company data.

ipinfo.io doesn't provide any company data at all for the following IP addresses of the sample:

This can be verified by looking at the API responses for the IP addresses. The company field is missing for all of the IP addresses above. There is no good explanation for the lack of company data in the API responses above. Probably ipinfo.io simply has some issues with their WHOIS data pipeline.

Another example where ipinfo.io doesn't provide correct company data is the IP address 185.108.7.79.

"185.108.7.79": {
  "company": {
    "name": "Sormovskaja, 210",
    "domain": "yugtelecom.su",
    "type": "hosting"
  },
}

The company name "Sormovskaja, 210" is obviously incorrect since it is an postal address parsed from the descr: field, and the correct company name should be "Yug-Telecom Ltd.". It is unclear why ipinfo.io made this mistake, since the raw WHOIS data for the IP address is very straight forward and shows the correct company name in the org-name: Yug-Telecom Ltd. field:

inetnum:        185.108.7.0 - 185.108.7.255
netname:        YUG-TELE
descr:          Sormovskaja, 210
descr:          350088 Krasnodar Russia
country:        RU
org:            ORG-YL21-RIPE
admin-c:        AC32081-RIPE
tech-c:         AC32081-RIPE
status:         ASSIGNED PA
mnt-by:         UNIVERSITY-MNT
created:        2016-03-11T16:16:24Z
last-modified:  2022-09-15T16:31:42Z
source:         RIPE

organisation:   ORG-YL21-RIPE
org-name:       Yug-Telecom Ltd.
country:        RU
org-type:       other
address:        350088, Krasnodar, Sormovskaja, 210
abuse-c:        AC32081-RIPE
mnt-ref:        lidertelecom-mnt
mnt-ref:        UNIVERSITY-MNT
mnt-by:         lidertelecom-mnt
created:        2015-03-11T07:29:24Z
last-modified:  2022-12-01T17:30:31Z
source:         RIPE # Filtered

But to be fair, ipinfo.io is still a very accurate IP API. All services make mistakes, including ipapi.is.

Conclusion

From all the services tested, ipapi.is is the most accurate service in regards to company data.

The correctness of company data is likely the most important aspect of any IP API, since the company name and company type are used to derive many other API fields. Company data sources many data pipelines that depend on accurate company data. It is not trivial to provide correct company data, since there is a huge diversity in WHOIS registries and company data needs to be constantly updated and parsed correctly.