Dealing with a Device Driver Disaster
A buggy Network Card
One day last summer I was enjoying a leisurely day on my boat when I received a call from a rather panicked friend. A network card had gone out in his server, so he replaced the card, but the new card didn’t work either. Assuming that he had a bad card, he tried another network card, but it was no use. I told my friend that it sounded like that maybe his server’s system board had a bad PCI slot and that he should try the card in a different slot. Five minutes later, my phone rang again and I knew that I was going to have to go help my friend.
An hour later I arrived at my friend’s office and he showed me the server that was misbehaving. I toyed around with it for a little while and realized that Windows was detecting the card and was loading the driver, but the card just wasn’t working. I decided that perhaps the device driver was buggy and that I should check for a newer version. I asked my friend for the make and model of the network card, went to the manufacturer’s Web site and started downloading the driver. As I waited for my download to complete, the cause of the problem hit me. Windows had misidentified the network card. When the card was installed, Windows correctly determined that it was a network card that was being installed, but Windows had identified the wrong manufacturer (and model) and consequently loaded an incorrect driver. As soon as I replaced the default driver with the one that I had downloaded, the card began to work.
Since that time, I have run into several other situations in which Windows had identified a hardware component incorrectly. Apparently, it isn’t all that uncommon for Windows to make mistakes when setting up new hardware. In the case of my friend’s system, the problem wasn’t exactly harmless (the server was unavailable to the users for a couple of hours), but it wasn’t devastating either. The problem could have been a lot worse.
In my friend’s case, the incorrect driver simply caused that particular device to not work. I have seen situations though in which an invalid or corrupt device driver resulted in a blue screen of death. This is especially common with video drivers.
How do you handle this?
The scary part is that most administrators are under a huge amount of pressure to keep their servers up to date with the latest patches and drivers. With such frequent updating, it only stands to reason that sooner or later the odds of an administrator downloading a buggy, corrupt, or simply incorrect device driver at some point in time are pretty good. That being the case, let’s pretend for a moment that you install an updated device driver onto one of your servers, and the server produces the infamous blue screen of death. What do you do about it?
The appropriate course of action really just depends on what operating system is running on your server. If you are running Windows NT 4.0, then all I can say is “good luck”. The only real tools that Microsoft gives you to deal with a situation like this in Windows NT 4.0 are the Last Known Good Configuration option and the VGA Mode option.
The Last Known Good Configuration option still exists in Windows Server 2003. The idea is that after the operating system loads successfully, Windows takes a snap shot of the system’s configuration (including the device driver information).
The Last Known Good Configuration option was a good idea, but it almost never works. I can only think of one instance in the last ten years in which the Last Known Good Configuration option has worked for me in a real life situation. There are a couple of reasons why this option doesn’t work so well.
For starters, the Last Known Good Configuration option only applies if the update that you performed requires a reboot. In this case, you would reboot the system after the update, the server crashes, so you reboot again and select the Last Known Good Configuration option. Today though, the majority of device driver updates don’t require a reboot, so you never even get a chance to use the Last Known Good Configuration option if the driver update goes belly up.
Another reason why the Last Known Good Configuration option tends to be ineffective is because the option’s job is to help a system to boot. Therefore, if you find yourself in a situation in which a problem does not surface until after you log into Windows, then the Last Known Good Configuration option is useless because Windows assumes that you are running a good configuration. An example of a situation in which a problem doesn’t surface until after login is an updated video driver. Windows uses the default resolution until a user logs on. At that time, Windows uses the video driver to set the video to the specified resolution. This is usually where the problems occur if a bad video driver is being used.
Of course Microsoft has a solution to this little dilemma too. The other recovery option that is available in Windows NT 4.0 and that is still available today is VGA mode. The idea behind VGA mode is that video problems present a bit of a catch-22. The video problems prevent you from being able to log all the way in, but you can’t fix the problem because you can’t log all the way in. VGA mode gets around this problem by forcing the video card to stay in VGA resolution. Since all modern video cards support VGA, Windows is able to load and you are able to correct the buggy driver.
In Windows NT 4.0, I have only been able to get past a driver related blue screen of death using the Last Known Good Configuration or VGA mode about half of the time. The other times, I have had to use a product called ERD Commander from Winternals (http://www.winternals.com/Products/ERDCommander/). ERD Commander can be used for a lot of different tasks, but in a situation like this you can use it to reconfigure or disable a device driver from outside of the operating system. Windows doesn’t even have to be functioning because ERD Commander boots from its own self contained mini operating system. In case you are wondering, this utility is also useful for making repairs to newer operating systems as well.
I could probably write an entire article on Windows NT related driver repair techniques because it was so difficult to get past some driver related problems in Windows NT. I would rather focus on Windows Server 2003 though. I only mentioned these techniques because they can be used on both Windows NT and Windows Server 2003, and because there are still some Windows NT Server deployments out there.
If a Windows Server 2003 machine crashes due to a driver problem, the repair technique is much easier than the techniques that I have already talked about. One of the greatest recovery tools available is Safe Mode. When you boot Windows into Safe Mode, it runs in VGA mode with a minimal set of drivers and services running. You can’t really do a lot with Windows when it’s running in Safe Mode, but that really isn’t the point. The point is that Safe Mode allows you to boot Windows so that you can fix the problem.
OK, so Safe Mode allows you to boot Windows so that you can fix the problem, but how do you actually go about fixing the problem once you log into Windows? Well, you have a couple of options. Both of these options are available through the Device Manager (another nice tool that wasn’t available in Windows NT). You can access the Device Manager by selecting the System option in the Control Panel. When the System Properties sheet appears, go to the Hardware tab and click the Device Manager button. You will now see a list of the various devices in your system. Right click on the device that is malfunctioning and select the Properties command from the resulting shortcut menu. When you do, you will see the device’s properties sheet. Now, select the Driver tab, shown in Figure A.
Figure A: The Driver tab of a device’s properties sheet in the Device Manager allows you to correct device driver problems
As you can see in the figure, you have some options on this screen. If you know that the driver is incorrect or that you have a bad version and you have a replacement available, then you can use the Update Driver button to load the replacement driver, or you can uninstall the existing driver and then load the replacement. If on the other hand the current driver version is causing the problem, but the previous version worked fine, you can click the Roll Back Driver button to revert to the previous version.
In this article, I have talked about a number of different recovery options that can be used to recover Windows from a device driver failure. Most of these techniques involve booting Windows into a special mode (Safe Mode, VGA Mode, Last Known Good Configuration). You can access any of these modes by pressing the F8 key during the earliest phase of the Windows boot process. Doing so will cause Windows to display a boot menu that allows you to select these various options.