Troubleshooting Logon Problems
Logging into a computer is such a routine part of the day that it is easy to not even think about the login process. Even so, things can and occasionally do go wrong when users log into Windows. In this article, I will talk about some of the things that can cause logon failures, and show you how to get around those problems.
Before I Begin
Before I get started, I just want to quickly mention that in order to provide as much useful information as possible, I am going to avoid talking about the most obvious causes of logon failures. This article assumes that before you begin the troubleshooting process, you have checked to make sure that the user is entering the correct password, the user's password has not expired, and that there are no basic communications problems between the workstation and the domain controller.
The System Clock
It may seem odd, but a workstation's clock can actually be the cause of a logon failure. If the clock is more than five minutes different from the time on your domain controllers, then the logon will fail.
In case you are wondering, the reason for this has to do with the Kerberos authentication protocol. At the beginning of the authentication process, the user enters their username and password. The workstation then sends a Kerberos Authentication Server Request to a the Key Distribution Server. This Kerberos Authentication Server Request contains several different pieces of information, including:
- The user’s identification
- The name of the service that the user is requesting (in this case it’s the Ticket Getting Service)
- An authenticator that is encrypted with the user’s master key. The user’s master key is derived by encrypting the user’s password using a one way function.
When the Key Distribution Server receives the request, it looks up the user’s Active Directory account. It then calculates the user’s master key and uses it to decrypt the authenticator (also known as pre authentication data).
When the user’s workstation created the authenticator, it placed a time stamp within the encrypted file. Once the Key Distribution Server decrypts this file, it compares the time stamp to the current time on its own clock. If the time stamp and the current time are within five minutes of each other, then the Kerberos Authentication Server Request is assumed to be valid, and the authentication process continues. If the time stamp and the current time are more than five minutes apart, then Kerberos assumes that the request is a replay of a previously captured packet, and therefore denies the logon request. When this happens, the following message is displayed:
The system cannot log you on due to the following error: There is a time difference between the client and server. Please try again or consult your system administrator.
The solution to the problem is simple; just set the workstation’s clock to match the domain controller’s clock.
Global Catalog Server Failures
Another major cause of logon problems is a global catalog server failure. A global catalog server is a domain controller that has been configured to act as a global catalog server. Global catalog servers contain a searchable representation of every object in every domain of the entire forest.
When the forest is initially created, the first domain controller that you bring online is automatically configured to act as a global catalog server. The problem is that this server can become a single point of failure, because Windows does not automatically designate any other domain controllers to act as global catalog servers. If the global catalog server fails, then only domain administrators will be able to log into the Active Directory.
Given the global catalog server’s importance, you should work to prevent global catalog server failures. Fortunately, you can designate any or all of your domain controllers to act as global catalog servers. Keep in mind though that you should only configure all of your domain controllers to act as global catalog servers if your forest consists of a single domain. Having multiple global catalog servers is a good idea even for forests with multiple domains, but figuring out which domain controllers should act as global catalog servers is something of an art form. You can find Microsoft’s recommendations here.
If your global catalog server has already failed, and nobody can log in, then the best thing that you can do is work to return the global catalog server to a functional state. There is a way of allowing users to log in even though the global catalog server is down, but there are security risks associated with doing so.
If the Active Directory is running in native mode, then the global catalog server is responsible for checking user’s universal group memberships. If you choose to allow users to logon during the failure, then universal group memberships will not be checked. If you have assigned explicit denials to members of certain universal groups, then those denials will not be in effect until the global catalog server is brought back online.
If you decide that you must allow users to log on, then you will have to edit the registry on each of your domain controllers. Keep in mind that editing the registry is dangerous, and that making a mistake can destroy Windows. I therefore recommend making a full system backup before continuing.
With that said, open the Registry Editor and navigate through the registry tree to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa. Now, create a new DWORD value named IgnoreGCFailures, and set the value to 1. You will have to restart the domain controller after making this change.
DNS Server Failure
If you suddenly find that none of your users can log into the network, and your domain controllers and global catalog servers seem to be functional, then a DNS server failure might have occurred. The Active Directory is completely dependent on the DNS services.
The DNS server contains host records for each computer on your network. The computers on your network use these host records to resolve computer names to IP addresses. If a DNS server failure occurs, then host name resolution will also fail, eventually impacting the logon process.
There are two things that you need to know about DNS failures in regard to troubleshooting logon problems. First, the logon failures may not happen immediately. The Windows operating system maintains a DNS cache, which includes the results of previous DNS queries. This cache prevents workstations from flooding DNS servers with name resolution requests for the same objects over and over.
In many cases, workstations will have cached the IP addresses of domain controllers and global catalog servers. Even so, items in the DNS cache do eventually expire and will need to be refreshed. You will most likely start noticing logon problems when cached host records begin to expire.
The other thing that you need to know about DNS server failures is that often times there are plenty of other symptoms besides logon failures. Unless machines on your network are configured to use a secondary DNS server in the event that the primary DNS server fails, the entire Active Directory environment will eventually come to a grinding halt. Although there are exceptions, generally speaking, the absence of a DNS server on an Active Directory network basically amounts to a total communications breakdown.
Although I have discussed some of the major causes of logon failures on Active Directory networks, an important part of the troubleshooting process is to look at how widespread the problem is. For example, if only a single host on a large network is having logon problems, then you can probably rule out DNS or global catalog failures. If a DNS or a global catalog failure were to blame, then the problem would most likely be much more wide spread. If the problem is isolated to a single machine, then the problem is most likely related to the machine’s configuration, connectivity, or to the user’s account.