Published:
November 19, 2023
Last Modified:
November 25, 2023
Reverse DNS
Data Enrichment
IP Geolocation from Reverse DNS
IP Connection Type from Reverse DNS
Enrich IP Data with Reverse DNS Lookups
ipapi.is aims to constantly improve the quality of the IP address data it
provides. Reverse DNS Records for IP addresses contain potentially valuable information for IP Geolocation
and the connection type of IP addresses (DSL, xDSL, DHCP, ADSL and so on).
For those reasons, this blog article aims to find out whether it is feasible to make a reverse DNS lookup
on all 3.7 Billion non-bogon IPv4 addresses.
Introduction
Reverse DNS, or Reverse Domain Name System, is a process that translates an IP address back into a domain
name. It associates an IP address with a domain name, providing a way to look up the domain name
associated with a particular IP address.
Common Uses of Reverse DNS:
- Network Troubleshooting: Identify and troubleshoot issues on a network by associating
domain names with specific IP addresses.
- Security: Verify the identity of sending servers, reducing spam and phishing
attempts.
- Logging and Analytics: Track and analyze network activity by utilizing reverse DNS
information.
- Access Control: Some systems use reverse DNS as part of access control mechanisms.
This blog article tries to answer the question whether reverse DNS data can be used to enrich the IP
API.
For
example, the scientific paper "IP Geolocation
through Reverse DNS" is an excellent read on how Reverse DNS data can be used to improve IP
Geolocation accuracy.
Example Reverse DNS Lookups
When issuing the command dig +short -x 100.34.2.29
in a terminal, the following
reverse DNS record is obtained:
pool-100-34-2-29.phlapa.fios.verizon.net.
From the above Reverse DNS output, the subdomain part
.phlapa.
could indicate that the IP is located in Philadelphia,
Pennsylvania, USA.
Looking up the IP address 100.34.2.29
with the IP API confirms this
assumption: https://api.ipapi.is/?q=100.34.2.29
Another example would be the reverse DNS lookup dig +short -x 63.153.137.40
which results in
the output:
63-153-137-40.sxfl.qwest.net.
Again, the subdomain part .sxfl.
is an abbreviation of the city Sioux Falls, South Dakota,
USA. This is confirmed when looking up the IP 63.153.137.40
with the API: https://api.ipapi.is/?q=63.153.137.40
Reverse DNS Coverage
The above two examples demonstrate that Reverse DNS Lookups can provide additional value for IP
Geolocation.
The
important questions however are:
- What percentage of all IPv4 addresses have a corresponding Reverse DNS record?
- Out of that number of usable Reverse DNS records, how many DNS records translate into
usable
geolocation hints?
This question was answered by the excellent research paper "IP Geolocation
through Reverse DNS":
As can be seen in the Figure, from 3.7 Billion usable IPv4 addresses, 1.25 Billion IPv4 addresses
return a valid reverse DNS
hostname. And from those 1.25 Billion reverse DNS hostnames, 4.4% (160 Million) include an exact city
match and 7.4% (270 Million) contain an airport code.
Put differently, around 12% of all valid reverse DNS records contain useful geolocation hints.
It can be concluded that reverse DNS hostnames can help to enrich an existing IP Geolocation database, but
the coverage is not sufficient to create a complete database by itself.
Deriving the IP Connection Type from Reverse DNS Data
In order to collect sample data, reverse DNS records were obtained from 10,000 random IPv4 addresses.
This
resulted
in the following JSON file: rDNS.json
It can be seen that certain reverse DNS records have the connection type of the IP address in their
hostname. Some examples where this is the case:
- ADSL -
"194.245.147.83": "194-245-147-83.adsl.nrw.net."
- DSL -
"201.248.36.36": "201-248-36-36.dyn.dsl.cantv.net."
- BROADBAND -
"95.25.2.183": "95-25-2-183.broadband.corbina.ru."
- CATV -
"188.156.235.137": "BC9CEB89.catv.pool.telekom.hu."
Therefore, it seems that the usage type (connection type) of IP addresses can sometimes be
extracted from corresponding
reverse DNS
records.
The coverage of usable connection types however is not higher than 10% from all IPv4 addresses. Again,
reverse DNS records can help to enrich the connection type of IP address data,
but reverse DNS data alone is not sufficient to source such a IP connection type database entirely.
Costs of making a Reverse DNS lookup on the whole IPv4 Address Space
As the graphic above illustrates, there are 3.7 usable Billion IPv4 addresses. From those IP addresses, a
reverse DNS lookup needs to be obtained. How long does it take to make a reverse DNS lookup of all 3.7
Billion IPv4 addresses?
Let's assume one server is capable of making 100 concurrent reverse DNS lookups per second. If 20 small
VPS servers are rented, this would result in a query rate of 2000 lookups per second. This amounts to 172
Million reverse DNS lookups per day. In order to query the whole 3.7 Billion IPv4 addresses, 20 servers
would need
to query at full speed for 22 days. This seems to be a realistic time frame. The cost in $ amount would be
roughly 20 * 10$ = 200$ to query the whole IPv4 address space.
The whole process needs to be repeated maybe every two months. There are probably some optimizations
possible. As the paper "IP Geolocation
through Reverse DNS" suggests, only 1.25 Billion IPv4 addresses return a valid reverse DNS hostname.
So after the initial lookup, it is relatively safe to query only those 1.25 Billion IP addresses for some
time. This only takes 7 days compared to the 22 days for the whole 3.7 Billion IP addresses.
In a next experiment, it should be tested if a cheap VPS is able to query DNS records at a speed of 100
lookups per second. This number seems to be quite large. The following problems can arise:
- Will DNS servers block after some time?
- What is an acceptable query speed without getting blocked?
Demo Run on a single Server: Reverse DNS for 1 Million IPv4's
To answer the above two questions:
- Will DNS servers block? In this demo,
8.8.8.8
was used.
- What lookup speed can be achieved with one server?
A simple NodeJS script was programmed that conducted Reverse DNS lookups on 1 Million random, non-bogon
IPv4
addresses. The NodeJS script looked as follows:
const fs = require('fs');
const dns = require('dns');
const path = require('path');
dns.setServers([
'8.8.8.8',
]);
const ipToInt = (ip) => ip.split('.').reduce((acc, val) => (acc << 8) + parseInt(val, 10), 0);
const isBogon = (ip) => {
const bogonRanges = [
{ start: '0.0.0.0', end: '0.255.255.255' },
{ start: '10.0.0.0', end: '10.255.255.255' },
{ start: '100.64.0.0', end: '100.127.255.255' },
{ start: '127.0.0.0', end: '127.255.255.255' },
{ start: '169.254.0.0', end: '169.254.255.255' },
{ start: '172.16.0.0', end: '172.31.255.255' },
{ start: '192.0.0.0', end: '192.0.0.255' },
{ start: '192.0.2.0', end: '192.0.2.255' },
{ start: '192.88.99.0', end: '192.88.99.255' },
{ start: '192.168.0.0', end: '192.168.255.255' },
{ start: '198.18.0.0', end: '198.19.255.255' },
{ start: '198.51.100.0', end: '198.51.100.255' },
{ start: '203.0.113.0', end: '203.0.113.255' },
{ start: '224.0.0.0', end: '255.255.255.255' }
];
return bogonRanges.some(range => {
const start = ipToInt(range.start);
const end = ipToInt(range.end);
const ipValue = ipToInt(ip);
return ipValue >= start && ipValue <= end;
});
};
function getRandomIPv4Array(N) {
const randomIPv4Array = [];
while (randomIPv4Array.length < N) {
let randomIPv4;
do {
randomIPv4 = `${Math.floor(Math.random() * 256)}.${Math.floor(Math.random() * 256)}.${Math.floor(Math.random() * 256)}.${Math.floor(Math.random() * 256)}`;
} while (isBogon(randomIPv4));
randomIPv4Array.push(randomIPv4);
}
return randomIPv4Array;
}
function reverseDnsLookup(ip, verbose = false) {
return new Promise((resolve, reject) => {
dns.reverse(ip, (err, hostnames) => {
let rdns = null;
if (err) {
if (verbose) {
console.error(`Error for ${ip}: ${err.message}`);
}
rdns = -1;
resolve({ ip, rdns });
} else {
if (verbose) {
console.log(`Reverse DNS lookup for ${ip}: ${hostnames.join(', ')}`);
}
rdns = hostnames;
resolve({ ip, rdns });
}
});
});
}
async function main(batchSize = 1000) {
const started = Date.now();
let totalRequested = 0;
let totalResolved = 0;
for (let i = 0; i < 1000; i++) {
const ipAddresses = getRandomIPv4Array(batchSize);
const promises = ipAddresses.map(ip => reverseDnsLookup(ip));
const results = await Promise.all(promises);
totalRequested += results.length;
totalResolved += results.filter((res) => Array.isArray(res.rdns)).length;
const filePath = path.join(__dirname, `./reverseDnsData/${i}.json`);
let data = {};
for (const res of results) {
if (Array.isArray(res.rdns)) {
if (res.rdns.length === 1) {
data[res.ip] = res.rdns[0];
} else {
data[res.ip] = res.rdns;
}
} else {
data[res.ip] = res.rdns;
}
}
fs.writeFileSync(filePath, JSON.stringify(data, null, 2));
const resolvedRate = (totalResolved / totalRequested).toFixed(3);
const msPerIp = (Date.now() - started) / totalRequested;
const lookupsPerSecond = Math.floor(1000 / msPerIp);
console.log(`[${i}] totalRequested=${totalRequested}, lookupsPerSecond=${lookupsPerSecond}, resolvedRate=${resolvedRate}`);
}
}
main();
After running the above script, the following metrics were obtained:
- Total Requested IPv4: 1 Million
- Lookups Per Second: 112 rDNS/s
- Percentage of valid Reverse DNS records: 34.9%
This means that a lookup rate per server of 110 IPs per second is feasible. Also, the share of IP
addresses that have a valid Reverse DNS entry seems to be 35%, which is a bit higher than the paper "IP Geolocation
through Reverse DNS (2021)" suggests (33.7%). This was expected, since the adoption of Reverse DNS
entries
will likely increase slowly over the years.
Furthermore, during the whole lookup process, no blocking or restrictions from Google's DNS server
8.8.8.8
were encountered.
Conclusion
This article has shown that reverse DNS records can be used for two different purposes:
- To derive IP
Geolocation intelligence and to improve IP geolocation accuracy
- To extract the connection type from IP addresses such as DSL, ADSL or CATV
As a demo run with 1 Million IPv4's has shown, the costs of querying
the whole IPv4 address space are justified by the benefits for
IP address data enrichment.