It’s common knowledge all around the web and on LANs everywhere that “if DNS ain’t happy, ain’t nobody happy.” It’s pretty simple, really. Hard-coding apps to use fixed ip.addrs is bad, since those change. So everything from your users’ web browsers to mission-critical line of business applications all must rely on a healthy and performant DNS system. The problem is, many organizations’ DNS infrastructure is anything but healthy, and that leads to poor performance for everything. Let’s review a couple of DNS concepts first.
GeoDNS is a feature in BIND and other DNS services that let’s DNS servers give different answers based on the source ip.addr of the request. It works like this. A zone is set up with multiple entries for resources, like www.example.com. That website’s content is replicated on servers all around the world, and probably leverages global CDNs that also have content replicated on servers all around the world. When a query hits DNS for www.example.com or images.example.com, the DNS server that is authoritative for the example.com zone looks at the source ip.addr of the DNS query, determines where that query came from, and provides a response that includes ip.addrs that are “local” to the query. It’s not a perfect system, and often companies reallocate ip ranges without updating the databases, but overall it works very well. Here’s a practical example.
Say you are in France, and look up www.example.com, which has an endpoint in Austria and another in the U.S. The example.com DNS uses GeoDNS. If your DNS query hits the example.com servers from the NAT ip.addr of your office in France, they will give you the ip.addr for the Austrian server. But if the French office’s DNS servers are configured to forward to the U.S. headquarters, then what the example.com DNS servers see as a query for www comes from the NAT ip.addr of the U.S. datacenter, so they respond with the U.S. server. Instead of a quick 30 millisecond response time to your HTTP requests, you have to cross the Atlantic and contend with cat videos, and probably see more like a 150 millisecond response time. That’s bad.
For your network to properly take advantage of GeoDNS, you must ensure that for external zones, your users are using a DNS server that is close to them, and that DNS server should be able to make its own queries for external zones by going to root, without having to forward across your WAN to some other DNS servers in another part of the world. That not only saves time and WAN bandwidth, it ensures that if there are local resources, you get the ip.addrs that are local to you.
The anchor of DNS is comprised of 13 servers, named A through M.root-servers.net, that host the root zone. These servers are maintained by 12 different organizations and are distributed throughout the world. They don’t resolve every record for every zone in DNS, but they do resolve the authoritative servers for each and every domain on the Internet. When your DNS server can “go to root” then, instead of forwarding to your ISP or to your headquarters, it makes a direct query to one of the root servers to find the authoritative servers for a zone, and then it queries those authoritative servers to resolve the A, CNAME, MX, and other records for that zone. It does take a little longer to resolve a domain for the very first time this way, but your DNS server can cache the identity of the zone’s authoritative servers, and the TTL on their records typically lasts for hours or days. As long as at least the NS records for a domain are already in cache, you should see resolution for any other records complete in well under 50 milliseconds. Records already cached in your DNS server’s memory will definitely resolve in under 25 milliseconds. Since DNS has to have a starting point, DNS servers use a file called the “root hints” file that lists the 13 root servers, and their IPv4 and IPv6 ip.addrs so that the DNS service knows where to begin.
Whether you’re looking at your remote offices, or your web-filtering provider, or even how you resolved this website, review your DNS infrastructure today to ensure you aren’t making one of these seven deadly sins:
1. No local DNS
I see this almost every single week at one customer or another. They have offices all over the world, with users that complain about slow performance, and when we go to troubleshoot the network, we find that there’s no local DNS server at the office. Even if the best you can do is a caching DNS server on the router in the small field office, that’s better than nothing and will help the overall performance for everyone. DNS resolution should not take more than 25 milliseconds; 50 tops. If every time a user must connect to a printer, a domain controller, a file server, or open a web page, they have to wait hundreds of milliseconds for the DNS response to come back from the remote office, then the user is going to notice that things are “slow.” Make sure that you have local DNS servers in any office that is large enough to have a piece of equipment capable of running DNS.
2. Configuring forwarding to the main office
This is a common problem amongst larger companies, especially those with global scope. They configure DNS servers in each of their regions, but then configure those DNS servers to forward to DNS servers in the headquarters or primary datacenter. As more and more SaaS providers and major websites deploy distributed endpoints and use global CDNs to provide better performance for users, the bigger a mistake this is. Bad egress decisions can make this even worse — see below for more on that.
3. Using your ISPs DNS
Okay, so you’re changing the forwarding of your DNS servers in the office not to forward to the headquarters DNS servers on the other side of the ocean. That’s good. So instead, you’re forwarding them to your ISP because that’s better. Only it very often is not. The ip.addrs your ISP gives you may very well NOT be in the same region as you are, or they themselves may be configured to forward to another region. The only time I’d feel comfortable using an ISP’s DNS servers is for home, and I don’t even do that at home! If you are absolutely sure that your ISP’s DNS servers are local to you, and they don’t forward to any remote servers, then this is okay, but monitor your response times, and consider just letting your local DNS servers go to root themselves.
4. Using public DNS
So instead of using your ISP’s DNS, you decide to use one of the public DNS services provided by Google, or OpenDNS, FreeDNS, your antimalware vendor, or one of the Tier 1 ISPs like Level3 or Hurricane Electric. Unfortunately, the same problem can arise here. Often, those publicly listed ip.addrs are not in the same region as you, so you again wind up resolving names to ip.addrs that are not local to you. Again, I’d recommend you let your local DNS servers go to root themselves rather than dealing with forwarding.
5. Not updating root hints
You may have read this up to now, and are feeling pretty good about things because you don’t forward to remote servers, and you do allow your local servers to go to root. That’s great! But when was the last time you updated your root hints file? Anyone? Anyone? Bueller? The named.root file maintained by the Internic was last updated on 2016-10-20. They don’t update it often, but when they do, it’s critical that all DNS admins using root hints update their local root hints file on all their DNS servers with the latest information. If you haven’t done that in the past few months, go to Internic and check out the updated file.
6. Resolution out one path, egress out another
Here’s a huge (dare I say yuuuge) problem I see with customers all the time. They have local Internet egress but they have DNS forwarding set up to remote servers, or they have dedicated paths to SaaS or PaaS resources in one location, but not another. Overriding all the above about local DNS, you need to ensure your DNS resolution goes out the same path as your Internet traffic. If you are splitting that path so that some Internet egress is local (direct or through a proxy) but other Internet egress is out a dedicated circuit to an external provider, you may have to configure conditional forwarding to ensure that DNS resolution and routing run out the same egress for remote resources. Otherwise you might find that you resolve a local endpoint, but route your traffic halfway across the world before it can get out of your network, to have to make the trip back to the local resource, and then back again.
7. Remote proxies
There are several proxy solutions on the market that provide proxies “in the cloud” or in the service provider’s datacenters. If you are using one of these, you must make certain that the service provider is not only offering you proxy nodes local to you, but that they are not making the same DNS mistakes with forwarding as above. I have seen time and again where a customer is using a cloud proxy and trying to access a SaaS service in their region and having really bad performance. When we work with the SaaS provider to diagnose the connections from the proxy, we determine that the proxy itself is connecting to endpoints on the other side of the planet instead of to locally hosted resources, and it comes down to the proxy nodes’ DNS forwarding to upstream servers in another region.
Practically every single connection made from one host to another starts with a DNS query. More and more service providers and CDNs are moving to a GeoDNS approach to help distribute resources globally and provide customers with the best possible response. DNS admins have to do their part to ensure that clients get fast, and appropriate, responses to their queries. Take the time to review your DNS infrastructure to be sure you aren’t committing one of the seven deadly sins.