How to build a resilient DNS service
DNS provides an easy, human readable way to map naming for any resources that are connected to internet. You can consider DNS as like a phonebook that stores IP addresses of various domains. In every communication on network, DNS plays a crucial role to lookup the destination IP address and hence, DNS service is one of the most important parts of any network communication.
In a distributed systems world, there are several times when the network issues baffle us with an unprecedented spike in traffic or increased failure rate of any dependent services. DNS service, being a crucial part of any distributed system, should try to build a safety net around itself to avoid large scale failures during lookup. Before understanding how to debug issues with DNS, let’s learn a bit about various tools which are readily available to help with any issues related to DNS resolution.
- tcpdump: It’s a command line packet analyzer which is helpful to capture any DNS traffic from a particular server. This is super helpful in analyzing what kind of traffic hits your servers during a certain time.
- dig: It’s another command line tool to query the DNS and understand how many of the queries made to your DNS server actually passed or failed.
- iptable: This contains the details around the packets which get routed on the network based on a certain rule. You can create custom rules to redirect network packets and if needed, perform analysis on them to understand the traffic to your service better.
Apart from some of the widely popular tools like above, there are a couple of others which can be used to increase chances of identifying issues with your traffic and make your DNS service more reliable. But before getting into those, let’s take a quick look at what are the various layers of host that a DNS service has so that we can use different tools at different hosts.
A general DNS service will consist of three layers of hosts for communication which are a caching layer (used to resolve DNS queries recursively), an edge host layer (runs a DNS authority daemon which is used to respond to cache layer queries and sends them to corresponding zones) and authority host (serves as DNS master and is used for CRUD operations on records).
By moving to modern day infrastructure for every layer, you can increase the chances of making your DNS service more reliable in the face of any unexpected traffic.
- Unbound: Proper logging is very important to debug any issue and Unbound provides an large set of statistics around the DNs traffic. Unbound is a DNS resolver that also validates and caches the responses. If a DNS infrastructure setup uses Unbound, then getting metrics around the traffic can become a lot convenient. Unbound also provides a list of requests that your DNS server is getting and investigate the traffic pattern at any certain time.
- NSD: Name Server Daemon(NSD) is used for edge hosts and is suited for top level domain implementations to serve small to high traffics.
- PowerDNS: PowerDNS provides a versatile nameserver called PowerDNS Authoritative Server (along with PowerDNS Recursor and dnsdist) that can be used for authority host layer.
With the help of modern DNS infrastructure, resiliency and availability of DNS services can be enhanced and plus they provide rich metrics on every layer to help debug failures with DNS service.
In order to handle critical services like DNS, it’s important to make the shift towards new system. Any such shift is painful to achieve but in the long run, in order to successfully serve large traffic loads, solid infrastructure and detailed metrics to understand the health of the service are important. Hence, an effort should me made to move the services to modern day infrastructure.
If you like the post, please share and subscribe to my substack newsletter to stay up to date with tech/product musings.
(The contents of this blog are of my personal opinion and/or self-reading a bunch of articles and in no way influenced by my employer.)