I ran into a couple of network troubleshooting issues last week and thought I should write them down before the issues get lost in the dustbin of history. Both these issues are related to Internet performance, and ISA firewalls. As you know the ISA firewall is often the “sin eater” for network problems, as inexperienced network and firewall administrator attribute Internet performance problems to the ISA firewall without investigating issues with their own network infrastructure. I learned long ago that that 99.98% of the time its not an ISA firewall issue, its a networking issue that involved some other component of the network infrastructure.
So now to my problem. I found access to Web sites getting slower and slower over time. What I mean by slow is that it took serveral seconds, or even minutes, for a Web site to come up after clicking a link or clicking a button on my Internet Explorer links bar. One big clue that I had that this was a name resolution issue was that the browser would freeze for several seconds or longer while waiting for the site to come up. What I specificially mean by “freeze” is that the links bar button would stick in a “pushed in” configuration. This is a very common problem amoung Web proxy clients configured to use the autoconfiguration script, so it immediately clued me into that this was a DNS problem.
But where was the DNS problem? There are lots of DNS problems! The first thing I did was check serveral nslookups on my client machines and more importantly, on my ISA firewalls. The nslookups work fine, but I had to rerun them serveral times in order to get a name resolution complete, as the first few attempts indicated a time out.
I then did a NetMon trace on both the internal and external interface of the ISA firewall. I found way too many DNS queries being done and long delays in DNS responses. I mean tens of thousands of DNS querys (I traced over the course of about an hour and used a 250MB buffer and capture filter).
OK, so there are too many DNS queries. Now what? Why are there so many DNS queries by hosts on a network that is primarily Firewall and Web proxy clients? As you know, Firewall and Web proxy clients don’t perform name resolution themselves and let the ISA firewall do it for them. So the next step was to identify those hosts who were asking for the DNS name resolution requests.
The NetMon trace and the log files on the ISA firewall only showed the IP address of the DNS server on my network making the DNS queries, which makes sense, since all hosts on my network segments use a specific DNS server to resolve Internet host names. That is to say, the internal hosts on my networks are never configured to use an external DNS server, they always use a dedicated DNS resolver on my internal network.
At this point I had to answer the question “How do I figure out what host(s) are sending the DNS queries to my internal DNS resolver”. That was an easy one to answer: just do a NetMon trace on the DNS resolver. So that’s what I did.
The result was very interesting. It showed that two of my inbound SMTP spam whacking relays (inbound mail goes through four spam whacking/AV relays before hitting my Exchange Server — I don’t like putting Antispam/AV software on my Exchange Server because of the performance hit) were making thousands of requests for MX records. Now I asked myself the question “why would these relays make thousands of MX record requests”?
The answer to that question was becuase the SMTP relays were trying to resolve the MX domain names of spammers, most of which are bogus. Since the Christmas season saw a big spike in spam coming into my network, there was a large spike in the number of MX requests for the NDRs. This lead to thousands of pending DNS requests on my DNS resolver and was bogging the DNS server down, which lead to delays in name resolution for all other outbound requests to the Internet.
I “fixed” (I can’t say I solved the problem) by remove the DNS server addresses from the NICs of the SMTP spam whacking relays. This stopped the DNS query traffic for the MX records, but also prevents the relays from updating their AV databases. The next problem I’ll need to solve is how to more effectively handle these NDRs if at all possible. Otherwise, I’ll have to schedule a netsh scripts to run to enable the DNS settings for a set period of time and then disable them. A definite kludge, but its all I can think of at this time.
MORAL OF THE STORY:
Most admins would blame the ISA firewall for slowing down their Internet connections. They would have been wrong, as is most often the case. Through the thoughtful use of ISA firewall logs and Network Monitor, I was able to determine that this was a DNS issue, where the DNS problem was located, and came up with a “fix” for the performance issue.