In the always-on technology world we live in, time is everything. You cannot go back and change time, but time skew does play a big role, and if not kept in check, can cause you a big headache. In your Exchange Server environment, you might have a domain controller setup that points to an external source to get its time, and then all the other machines on the network point to that domain controller for their time. This is not something new — time plays a very important part in your domain. If you have a completely virtualized environment, another key area to ensure that time is set correctly is on the hypervisors themselves. You may be thinking, “I let it pull from the BIOS, so what’s the problem?” The problem is that if the BIOS battery starts degrading or goes faulty, the system time goes out on that hypervisor, and what this does is cause havoc on virtual machines.
It’s about time
Yes, it does cause havoc if your Exchange Server’s time is out of whack. Even though your Exchange Servers pull time from the domain controller, I have seen instances where time is ahead by five minutes or more or behind by five minutes or more, or the host itself defaults to UTC. If your time zone is not UTC, you are going to be off by a lot. So how does this affect your users and your environment?
In an Exchange 2013, Exchange 2016, or Exchange 2019 environment, if the time is out, clients may experience the following issues:
- Seeing certificate popups even though you have a valid certificate — this happens when time is ahead.
- Unable to log in to Outlook. You can enter your password as many times as you like and it just does not log you in. If you run a Fiddler trace on this, you will see an HTTP 302 response and then the OAuth error.
- Event logs on the domain controller are out with time. You may have event logs for 9 a.m. and then jump to 12 p.m. and come back to 10 a.m. if the issue is resolved.
These are some of the symptoms noticed when time is ahead or behind. This has a strange ripple effect. If your hypervisor solution fails-over machines to balance them like VMware, you will have users affected, but then suddenly, the problem disappears. So, it makes troubleshooting harder if you do not know what you are looking for. After a few hours, when other machines get moved around, the symptoms start appearing again, and you end up in a vicious circle.
Troubleshooting Exchange Server time issues
As I mentioned Fiddler earlier, if you not sure where to start, start with the user’s machine that is giving issues. You can download Fiddler, and once you start it, you need to enable it to inspect the SSL certificate so you can see what is happening with the traffic. When you launch Outlook with Fiddler running, you will see that you receive an OAuth error and the HTTP 302 error.
Once you start seeing those, you now need to head over to your servers. Start with your servers on your load balancing device or the front-line servers you access from outside. Once you have logged in, open up the event viewer and look at the Application Log. You can filter by Critical, Error, and Warning, and you can go through the errors. If you do not see anything, head over to the next server.
Once you have found a server that is affected, you should see Event ID 1035 for Time Skew. Here is what is in Event ID 1035:
- Unexpected Auth Blob Check For Clock Skew
Now that you have found your first error, the other servers on the affected host will most likely also have it. Your next step is to go through each of the hosts on your hypervisor platform and set an NTP server and a domain controller for time and, hopefully, that should resolve all the issues mentioned above. You may need to reboot your servers as time does cause issues, and the easiest way to fix that is with a reboot.
Another example of a time error is that if you are running coexistence with Exchange 2010, you will find the Exchange 2010 hubs report they cannot authenticate to the Exchange 2016 servers because of the time skew problem. Just something else to look out for.
The next thing you can do is look at putting in a PowerShell script that will check for time and report on any server where the time is either ahead or behind. The quicker you can catch it, the less impact it will have on end-users. Perhaps look at your monitoring tools to look for that specific event and to log tickets or create a scheduled task that runs a PowerShell script and emails based on Event ID 1035. There are a few Event ID 1035 alerts, and you will get ones for failed logins and then the time skew one.
With email having a high SLA today, you need to ensure that you can keep to your agreement. That is always easier said than done, especially when you are hunting down the issue in your environment — and it takes a couple of hours or more out of your day.
Featured image: Shutterstock