Published: November 19, 2023
Last Modified: November 25, 2023

Reverse DNS Data Enrichment IP Geolocation from Reverse DNS IP Connection Type from Reverse DNS

Enrich IP Data with Reverse DNS Lookups

ipapi.is aims to constantly improve the quality of the IP address data it provides. Reverse DNS Records for IP addresses contain potentially valuable information for IP Geolocation and the connection type of IP addresses (DSL, xDSL, DHCP, ADSL and so on).

For those reasons, this blog article aims to find out whether it is feasible to make a reverse DNS lookup on all 3.7 Billion non-bogon IPv4 addresses.

Introduction

Reverse DNS, or Reverse Domain Name System, is a process that translates an IP address back into a domain name. It associates an IP address with a domain name, providing a way to look up the domain name associated with a particular IP address.

Common Uses of Reverse DNS:

Network Troubleshooting: Identify and troubleshoot issues on a network by associating domain names with specific IP addresses.
Security: Verify the identity of sending servers, reducing spam and phishing attempts.
Logging and Analytics: Track and analyze network activity by utilizing reverse DNS information.
Access Control: Some systems use reverse DNS as part of access control mechanisms.

This blog article tries to answer the question whether reverse DNS data can be used to enrich the IP API.

For example, the scientific paper "IP Geolocation through Reverse DNS" is an excellent read on how Reverse DNS data can be used to improve IP Geolocation accuracy.

Example Reverse DNS Lookups

When issuing the command dig +short -x 100.34.2.29 in a terminal, the following reverse DNS record is obtained:

pool-100-34-2-29.phlapa.fios.verizon.net.

From the above Reverse DNS output, the subdomain part .phlapa. could indicate that the IP is located in Philadelphia, Pennsylvania, USA. Looking up the IP address 100.34.2.29 with the IP API confirms this assumption: https://api.ipapi.is/?q=100.34.2.29

Another example would be the reverse DNS lookup dig +short -x 63.153.137.40 which results in the output:

63-153-137-40.sxfl.qwest.net.

Again, the subdomain part .sxfl. is an abbreviation of the city Sioux Falls, South Dakota, USA. This is confirmed when looking up the IP 63.153.137.40 with the API: https://api.ipapi.is/?q=63.153.137.40

Reverse DNS Coverage

The above two examples demonstrate that Reverse DNS Lookups can provide additional value for IP Geolocation. The important questions however are:

What percentage of all IPv4 addresses have a corresponding Reverse DNS record?
Out of that number of usable Reverse DNS records, how many DNS records translate into usable geolocation hints?

This question was answered by the excellent research paper "IP Geolocation through Reverse DNS":

As can be seen in the Figure, from 3.7 Billion usable IPv4 addresses, 1.25 Billion IPv4 addresses return a valid reverse DNS hostname. And from those 1.25 Billion reverse DNS hostnames, 4.4% (160 Million) include an exact city match and 7.4% (270 Million) contain an airport code.

Put differently, around 12% of all valid reverse DNS records contain useful geolocation hints.

It can be concluded that reverse DNS hostnames can help to enrich an existing IP Geolocation database, but the coverage is not sufficient to create a complete database by itself.

Deriving the IP Connection Type from Reverse DNS Data

In order to collect sample data, reverse DNS records were obtained from 10,000 random IPv4 addresses. This resulted in the following JSON file: rDNS.json

It can be seen that certain reverse DNS records have the connection type of the IP address in their hostname. Some examples where this is the case:

ADSL - "194.245.147.83": "194-245-147-83.adsl.nrw.net."
DSL - "201.248.36.36": "201-248-36-36.dyn.dsl.cantv.net."
BROADBAND - "95.25.2.183": "95-25-2-183.broadband.corbina.ru."
CATV - "188.156.235.137": "BC9CEB89.catv.pool.telekom.hu."

Therefore, it seems that the usage type (connection type) of IP addresses can sometimes be extracted from corresponding reverse DNS records.

The coverage of usable connection types however is not higher than 10% from all IPv4 addresses. Again, reverse DNS records can help to enrich the connection type of IP address data, but reverse DNS data alone is not sufficient to source such a IP connection type database entirely.

Costs of making a Reverse DNS lookup on the whole IPv4 Address Space

As the graphic above illustrates, there are 3.7 usable Billion IPv4 addresses. From those IP addresses, a reverse DNS lookup needs to be obtained. How long does it take to make a reverse DNS lookup of all 3.7 Billion IPv4 addresses?

Let's assume one server is capable of making 100 concurrent reverse DNS lookups per second. If 20 small VPS servers are rented, this would result in a query rate of 2000 lookups per second. This amounts to 172 Million reverse DNS lookups per day. In order to query the whole 3.7 Billion IPv4 addresses, 20 servers would need to query at full speed for 22 days. This seems to be a realistic time frame. The cost in $ amount would be roughly 20 * 10$ = 200$ to query the whole IPv4 address space.

The whole process needs to be repeated maybe every two months. There are probably some optimizations possible. As the paper "IP Geolocation through Reverse DNS" suggests, only 1.25 Billion IPv4 addresses return a valid reverse DNS hostname. So after the initial lookup, it is relatively safe to query only those 1.25 Billion IP addresses for some time. This only takes 7 days compared to the 22 days for the whole 3.7 Billion IP addresses.

In a next experiment, it should be tested if a cheap VPS is able to query DNS records at a speed of 100 lookups per second. This number seems to be quite large. The following problems can arise:

Will DNS servers block after some time?
What is an acceptable query speed without getting blocked?

Demo Run on a single Server: Reverse DNS for 1 Million IPv4's

To answer the above two questions:

Will DNS servers block? In this demo, 8.8.8.8 was used.
What lookup speed can be achieved with one server?

A simple NodeJS script was programmed that conducted Reverse DNS lookups on 1 Million random, non-bogon IPv4 addresses. The NodeJS script looked as follows:

const fs = require('fs');
const dns = require('dns');
const path = require('path');

dns.setServers([
  '8.8.8.8',
]);

const ipToInt = (ip) => ip.split('.').reduce((acc, val) => (acc << 8) + parseInt(val, 10), 0);

const isBogon = (ip) => {
  const bogonRanges = [
    { start: '0.0.0.0', end: '0.255.255.255' },
    { start: '10.0.0.0', end: '10.255.255.255' },
    { start: '100.64.0.0', end: '100.127.255.255' },
    { start: '127.0.0.0', end: '127.255.255.255' },
    { start: '169.254.0.0', end: '169.254.255.255' },
    { start: '172.16.0.0', end: '172.31.255.255' },
    { start: '192.0.0.0', end: '192.0.0.255' },
    { start: '192.0.2.0', end: '192.0.2.255' },
    { start: '192.88.99.0', end: '192.88.99.255' },
    { start: '192.168.0.0', end: '192.168.255.255' },
    { start: '198.18.0.0', end: '198.19.255.255' },
    { start: '198.51.100.0', end: '198.51.100.255' },
    { start: '203.0.113.0', end: '203.0.113.255' },
    { start: '224.0.0.0', end: '255.255.255.255' }
  ];

  return bogonRanges.some(range => {
    const start = ipToInt(range.start);
    const end = ipToInt(range.end);
    const ipValue = ipToInt(ip);
    return ipValue >= start && ipValue <= end;
  });
};

function getRandomIPv4Array(N) {
  const randomIPv4Array = [];

  while (randomIPv4Array.length < N) {
    let randomIPv4;

    do {
      randomIPv4 = `${Math.floor(Math.random() * 256)}.${Math.floor(Math.random() * 256)}.${Math.floor(Math.random() * 256)}.${Math.floor(Math.random() * 256)}`;
    } while (isBogon(randomIPv4));

    randomIPv4Array.push(randomIPv4);
  }

  return randomIPv4Array;
}

function reverseDnsLookup(ip, verbose = false) {
  return new Promise((resolve, reject) => {
    dns.reverse(ip, (err, hostnames) => {
      let rdns = null;
      if (err) {
        if (verbose) {
          console.error(`Error for ${ip}: ${err.message}`);
        }
        rdns = -1;
        resolve({ ip, rdns });
      } else {
        if (verbose) {
          console.log(`Reverse DNS lookup for ${ip}: ${hostnames.join(', ')}`);
        }
        rdns = hostnames;
        resolve({ ip, rdns });
      }
    });
  });
}

async function main(batchSize = 1000) {
  const started = Date.now();
  let totalRequested = 0;
  let totalResolved = 0;

  for (let i = 0; i < 1000; i++) {
    const ipAddresses = getRandomIPv4Array(batchSize);
    const promises = ipAddresses.map(ip => reverseDnsLookup(ip));
    const results = await Promise.all(promises);
    totalRequested += results.length;
    totalResolved += results.filter((res) => Array.isArray(res.rdns)).length;
    const filePath = path.join(__dirname, `./reverseDnsData/${i}.json`);
    let data = {};
    for (const res of results) {
      if (Array.isArray(res.rdns)) {
        if (res.rdns.length === 1) {
          data[res.ip] = res.rdns[0];
        } else {
          data[res.ip] = res.rdns;
        }
      } else {
        data[res.ip] = res.rdns;
      }
    }
    fs.writeFileSync(filePath, JSON.stringify(data, null, 2));

    const resolvedRate = (totalResolved / totalRequested).toFixed(3);
    const msPerIp = (Date.now() - started) / totalRequested;
    const lookupsPerSecond = Math.floor(1000 / msPerIp);
    console.log(`[${i}] totalRequested=${totalRequested}, lookupsPerSecond=${lookupsPerSecond}, resolvedRate=${resolvedRate}`);
  }
}

main();

After running the above script, the following metrics were obtained:

Total Requested IPv4: 1 Million
Lookups Per Second: 112 rDNS/s
Percentage of valid Reverse DNS records: 34.9%

This means that a lookup rate per server of 110 IPs per second is feasible. Also, the share of IP addresses that have a valid Reverse DNS entry seems to be 35%, which is a bit higher than the paper "IP Geolocation through Reverse DNS (2021)" suggests (33.7%). This was expected, since the adoption of Reverse DNS entries will likely increase slowly over the years.

Furthermore, during the whole lookup process, no blocking or restrictions from Google's DNS server 8.8.8.8 were encountered.

Conclusion

This article has shown that reverse DNS records can be used for two different purposes:

To derive IP Geolocation intelligence and to improve IP geolocation accuracy
To extract the connection type from IP addresses such as DSL, ADSL or CATV

As a demo run with 1 Million IPv4's has shown, the costs of querying the whole IPv4 address space are justified by the benefits for IP address data enrichment.