As someone who has spent most of his life working in IT, I have definitely seen my share of network problems. Some of these problems have been simple to diagnose and correct, but others have been really tough to figure out. Whenever I encounter a problem like this, I like to begin the network troubleshooting process by doing a few simple things to gather information and narrow down the scope of the problem. This can make it a lot easier to figure out what is going on.
Because the goal is to simplify things to the greatest degree possible, I try to avoid focusing on the network as a whole (if possible) and focus instead on two hosts that are having difficulties communicating with one another. In doing so, I try to use a methodical approach to figuring out what is going on.
Step 1: Check the network configuration
I like to begin the network troubleshooting process by verifying what I think I know about the hosts involved. One way of accomplishing this is to run the IPCONFIG command on both hosts to make sure that they are each using an IP address that falls within the expected range. Although this is a really basic thing to do, running the IPCONFIG command has occasionally revealed the source of my problems. Not all that long ago, for instance, I found that a system was not receiving an IP address because the DHCP scope had been depleted.
Running the IPCONFIG command by itself reveals the IP Address, subnet mask, and default gateway that have been assigned to each network adapter. If these values seem to be correct, then I recommend taking things one step further and running the IPCONFIG /ALL command. This will reveal each network adapter’s DNS server assignment. It is important to verify that the systems are using the expected DNS server.
Step 2: Test name resolution
Once you have verified the IP address configuration for both the source and destination hosts, I recommend verifying that name resolution is working correctly. There are all kinds of different tools available for testing DNS name resolution, but the easiest thing to do is to simply enter the NSLOOKUP command, followed by the other host’s fully qualified domain name.
The thing that I like about the NSLOOKUP command is that it shows you which DNS server is being used, and it tells you whether or not that DNS server is authoritative for the specified host.
Once you receive a result from NSLOOKUP, check to make sure that the result is what you expected. The DNS server’s IP address should match that of the DNS server that the host’s network adapter is configured to use. Similarly, the address that the name resolves to should match the IP address that has been assigned to the remote host (or to a service running on the remote host).
Step 3: Verify the network path
If your checks have thus far been successful, and have yielded the expected results, then the next step in the process is to verify the network path to the remote host. The easiest way to do this is to enter the Tracert command, followed by the remote host’s fully qualified domain name. The Tracert command will show you the route that packets are taking in route to the remote host.
Don’t worry too much if some of the hops are reported as “Request Timed Out”, as this usually doesn’t signal a problem (it just means that a host is configured not to respond to ICMP messages). The important thing is to make sure that Tracert does not tell you that the destination is unreachable (which is sometimes denoted by the !H indicator). A destination host unreachable message indicates that there is either no route to the destination or that an IP address cannot be resolved to an L2 address.
Step 4: Test the remote host’s responsiveness
The next step in the troubleshooting process is to test whether or not you can communicate with the remote host. At one time this simply meant pinging the remote host. Unfortunately, hosts are usually configured not to respond to ping request, so this might not be a viable test.
That being the case, you will need some sort of test to see if you can get the host to respond to you. After all, a response verifies that connectivity exists between the two hosts and that the remote host is still online.
The type of responsiveness test that you can use varies widely depending on the remote host’s configuration. If I can’t use the ping command, I have occasionally verified a remote host’s responsiveness by establishing a remote PowerShell session.
Step 5: Test the remote service
If you have confirmed that the local and remote host are configured properly, and that name resolution and basic connectivity is working properly in both directions, then the problem most likely exists at a higher level of the network stack. If the destination host is a Web server for example, then it is possible that even though basic communications tests have been successful, a system service has stopped, or a permissions problem exists. As such, you will need to test whatever service the remote host is providing.
As you do, one thing to keep in mind is that sometimes a service can be adversely impacted by a lower level dependency. For example, I once experienced some serious communications problems on an Exchange Server. After an exhaustive troubleshooting effort, I eventually traced the problem to the system’s clock, which was set to an incorrect time.
Network troubleshooting: More art than science
Unfortunately, there is no magical solution for network troubleshooting (although there are some really great third-party diagnostic tools available). Any time that I am faced with a network problem, my approach is to initially ignore as much of the network’s complexity as I can and focus on checking the basics. Even if these steps do not reveal the cause of the problem, they can help you to use the process of elimination to narrow down the possible causes.
Featured image: Shutterstock