Troubleshooting is merely the methodical application of common sense and technical knowledge to the inevitable problems that crop up in a fallen world. If common sense can be codified (and perhaps it can with AI) then it starts with answers to simple questions like: Why? How? What? In this article I'll try to distill the issues, tools and procedures of troubleshooting Windows XP/2003 boot problems into a small amount of easily digestible information that you as a system administrator can write on the back of a note card or store in your PDA for easy access when the proverbial poop hits the fan. Let's begin with the Why question.
Why do startup problems happen?
Windows may fail to start for a variety of reasons, and generally speaking in order of decreasing likelihood here they are:
- Hardware failure
- Bad driver
- Corrupt file or volume
- System misconfiguration
- Virus infection
Let me elaborate. A common reason systems fail to start is because some element of the system's hardware has failed. This could range from the simple (someone kicked the power cord out of its socket) to the obvious (smoke emitting from the machine) to the mysterious (something transient that happens only when the moon is full or during sunspot minimum). Next most common is when you update the driver for some piece of hardware (or the BIOS for that matter) and the system won't boot afterwards. After that comes those mysterious messages we'll talk about shortly that usually indicate some key operating system file has somehow become corrupt or gone missing. Misconfiguration is another possible source of boot problems, but this is somewhat rare as in most cases you'll still be able to boot but one or more services may fail to start or your applications may not function as expected. Finally, virus infection can cause a system to fail to boot, but I've listed this in last place because I'm assuming you've got an antivirus solution in place and you're keeping the antivirus signature files updated, right?
Now that we know why Windows may fail to start properly, let's ask the logical next question: How can we know which of these underlying causes is the one that might be preventing Windows from successfully booting?
How to diagnose startup problems
Here is where we need to apply our brains and use a bit of common sense to determine what the cause of startup failure might be. Think of the previous list above as a list of disease-causing viruses, and now you have to play doctor and figure out which virus the patient (your sick computer) actually might have. For if you skip this step and try blasting the patient with every possible remedy in your doctor's bag, two things may happen:
- One of the remedies you try may actually make the patient worse and indeed could prove fatal.
- You'll waste a lot of time and the recovery of your patient will be delayed, and your boss may get upset with you as a result since her business is losing money due to downtime.
So careful diagnosis is a step you should always take time for and never avoid, and just like in the medical profession such diagnosis usually begins with your senses. For example, do you smell something burning? Better unplug your system immediately and wait for things to cool off, then open the case and inspect the damage. Do you hear your CPU fan making a slow grinding sound? Power down your system and replace the fan before your processor burns out and needs replacing. Is your video display flickering? Maybe try reseating the video card after checking if the video cable is seated properly.
OK let's assume its not such a simple and obvious problem. Instead, say you get a black screen with one of the following dreaded messages when you try and boot your system:
- "NTLDR is missing"
- "A disk read error occurred"
- "Invalid partition table"
- "Error loading operating system"
- "Could not read from selected boot disk"
- "Windows could not start because the following file is missing or corrupt"
Or you might get a blue screen (called a STOP screen) with some obscure message on it. Or if you're lucky you might make it all the way through the Windows splash screen to the logon box and then suddenly get a dialog box saying "One or more services failed to start". Or your mouse pointer might freeze and your system hang either before or immediately after logon. How can you match these symptoms to the underlying condition that might be causing them? First let's look at some possible "black screen" messages that can occur after the BIOS POST routine finishes but before the Windows splash screen appears:
Master boot record is corrupt due to hard disk errors or virus infection
Boot sector is corrupt due to hard disk errors or virus infection
Boot.ini file is corrupt, missing, or needs updating.
Boot volume is corrupt or the referenced system file is missing.
In addition to these error messages, a variety of other startup problems can occur including:
- Blue screens. These are typically caused by hardware failure or driver problems but can also be due to virus infection.
- Hung system. These are typically caused by buggy drivers or by registry corruption but can also be due to virus infection.
- Dialog box saying "One or more services failed to start". This is typically caused by misconfiguration or registry corruption but can also be caused by application incompatibility of some form.
So what should you do to resolve such problems?
How to resolve startup problems
Like a doctor's mysterious black medical bag full of medical instruments (at least in old movies on TV) the system administrator also has a set of tools provided by Microsoft for resolving startup problems like the ones in the table above. In a nutshell, here's a quick inventory of the main tools:
- Last known good. Restores the HKLM\System\CurrentControlSet portion of the registry its version during the last successful logon to the system.
- Safe mode. Starts Windows with a minimal set of drivers and creates a record of which drivers load in %windir%\Ntbtlog.txt.
- System Restore. Windows XP only feature to restore system to previously saved configuration.
- Recovery Console. Boots to a command line that allows you to run various commands, see this article by Johannes Helmig for more info.
- Automated System Recovery (ASR). Restores the boot volume from backup, see this article by Johannes Helmig for more info.
- Repair. Run Windows Setup from your product CD and select the option to try and repair your installation.
Which tool should you use to address each of the symptoms we described earlier? Assuming there is no obvious hardware problem (no funny smell) and you've already asked yourself the Golden Question ("What was the last thing I did to this system?") then here's a quick outline that maps the type of knife (may be several in order of severity) to the kind of surgery (underlying problem or visible symptom) you need to perform on your system:
Tool(s) to Use
Corrupt master boot record
Recovery Console (fixmbr)
Corrupt boot sector
Recovery Console (fixboot)
Corrupt or missing boot.ini
Recovery Console (bootcfg /rebuild)
Corrupt system file
Recovery Console (chkdsk)
Recovery Console (chkdsk)
See this resource first
Last known good
"One or more services failed to start"
Don't logon! Reboot and select last known good, log on, undo the last configuration steps you performed.
A good physician understands diseases, can recognize their symptoms, and knows how best to treat them. Fortunately if you toast your system while trying to repair a startup problem, you can always buy another. But by using a combination of common sense and an understanding of the causes of, symptoms of, and tools for treating startup problems, you can usually bring your systems back to perfect health and earn your pay as a system administrator!