X

What to do when your IT configuration goes south

If you work in IT long enough, you are going to occasionally run into situations in which something that you are setting up just doesn’t work right. It doesn’t matter how experienced you may be, or how many certifications you’ve got, it’s going to happen. It’s part of working in IT. The real question, of course, is how to get your IT configuration working again when it already seems that the system is configured correctly.

Look for low-level IT configuration problems

Perhaps the biggest lesson I have learned over the years is that if something seems to be configured correctly, and you’ve checked and rechecked the configuration but it still isn’t working, your IT configuration probably is correct. In situations like this, I recommend beginning the troubleshooting process by looking for lower-level problems that could potentially contribute to the issue that you are currently experiencing. Let me give you an example.

Last week, I was having an issue with deploying a management agent to a particular virtual machine. The process worked flawlessly for some other virtual machines on the day before, but for whatever reason I just could not get the agent to deploy to this particular virtual machine. For a while, I was baffled. The problematic virtual machine was configured in exactly the same way as the virtual machines that had worked properly the day before. To add insult to injury, the error message was completely unhelpful. It basically told me that the process had failed, but it didn’t give me any indication as to why. To make a long story short, I eventually determined that the problem could not possibly be related to the agent, the management server, or even the VM configuration, because all of those things were proven to be good. As such, I started looking for lower-level problems, and eventually discovered that my problem was caused by an IP address conflict.

Review log files

Another bit of advice for situations in which your IT configuration just doesn’t work in quite the way that it should is to take some time to review the log files. This troubleshooting step sounds obvious, but I have to confess that I myself am guilty of ignoring the log files until the troubleshooting efforts really get serious. I often find myself doing a fruitless web search on the actual error message, rather than going straight to the logs. The thing is, however, log files very often contain information that is not listed in the error message itself. Sure, logs can sometimes be unhelpful and may do little more than simply parrot the error message that you’ve already seen, but in more cases than not the log files will contain additional information that may help you to resolve the issue at hand.

Look for other sources of information

Something else that I like to do when my IT configuration just isn’t working is to look for other sources of information. Log files are great, but they don’t always give you a complete picture of what’s really going on. Many applications span multiple servers. What I have found is that sometimes the key piece of information that helps you to solve a problem isn’t actually found on the same server where the error is occurring. It could be that some dependency service on another machine is actually to blame for the problem. As such, if you don’t get anywhere with checking the server’s log files, then check related log files that may exist elsewhere.

This brings up another point. As helpful as the Windows Event Viewer may be, it is not the be all and end all when it comes to providing you with diagnostic information. Very often, applications will include supplementary logs in the form of text files. These logs may give you information that does not show up in the Windows Event Viewer.

Validate what you think you know

Another helpful troubleshooting step is to validate everything that you think you know about the system. In the case of a server that refuses to run an application correctly, this might involve things like making sure that domain names can be resolved, or making sure that all of the required system services have started.

In the case of a script, I like to add lines of code in strategic places that will output the contents of the variables that I am using. That way, I can make sure that the variables do indeed contain the anticipated values.

Use a lab environment

If you have tried all of the basic troubleshooting techniques, and are still having difficulties resolving your issue, try setting up a lab environment. Build a lab environment from scratch, and see if the issue occurs in the lab environment. If the issue does occur in the lab environment, then the issue is probably related to your configuration, a missing dependency, or perhaps a security setting. If the issue does not occur within the lab, then there is probably something external to the application that exists in the production environment but not in the lab environment.

Try reducing the complexity

Another thing that you can do to try to resolve the IT configuration problem is to reduce complexity wherever possible. For example, you might temporarily disable security features such as application firewalls. Given the security implications of this sort of trial and error however, it is best to limit security-related troubleshooting efforts to a sandbox environment whenever possible.

Get help from a total novice

OK, this one may be the biggest hail Mary, last-ditch effort of them all. In fact, it sounds completely crazy, but I have seen it work. If things aren’t working out for you, and you have tried everything you can think of, try asking someone who doesn’t know the first thing about the technology. A total novice may occasionally pick up on something really simple that a seasoned pro might have overlooked because they are looking for a more complex problem.

Way back in the early 1990s, I briefly worked as a developer. A close friend was still in college, and was having some trouble with his homework. He called and asked if I could help him to debug some code that he had written, because he was completely stuck and the assignment was due the next day.

The program wasn’t anything overly complex. It only contained about 150 lines of code, and it was written in a programming language that, at the time, I used every day. Three hours into the debugging session however, neither of us had found the problem. I managed to figure out which block of code was causing the issue, but I had yet to figure out why the problem was happening.

About that time, my wife walked into the room. She had never written a line of code in her life. I showed her the four or five lines of code where I believed the problem to be. I don’t know how she did it, but within about 15 seconds she found the problem for which I had been searching for the last three hours. That wasn’t exactly my proudest moment, but it just goes to show that sometimes help can come from an unexpected source.

Get creative

The nature of working in IT demands that IT professionals be good at troubleshooting problems. Sometimes however, the tried-and-true, best practices IT configuration troubleshooting efforts simply are not enough. When this happens, you have to be a little bit more creative in looking for ways to resolve the problem. Remember, if it seems like you’re doing everything right, then you probably are. Most likely, the problem is caused by something simple that you might have missed, or by an external factor.

Photo credit: Pixabay