Troubleshooting Kerberos in a SharePoint environment (Part 1)

If you like to read the other parts in this article series please go to:

Introduction

If you missed my article entitled; Kerberos in a SharePoint environment, which explains the Kerberos configuration and log on process, please read that for a better understanding of what is going on when accessing the website and base configuration.

After writing the previous article, some people asked me how to troubleshoot different error-messages they were getting. It can be difficult to pin-point exactly what the error means and going through the whole configuration again will not always reveal the problem. You may end up spending a lot of time searching for help on the internet, even though you will usually find the correct answer to your problem.

This is not a guide to all Kerberos-related errors, but I will set up a test environment and create different problems to show which error-messages come from the configuration problems I create. The error messages in server event logs will seem obvious sometimes, other times a larger investigation is needed on several server event logs and even network packet sniffers.

The setup

The Demo-lab has the following computers:

DC1     Domain Controller (KDC)
SQL1   SQL Server 2008
WSS1  Windows Sharepoint Services 3.0 SP1 (+infrastructure update)
PC1     Windows Vista


Figure 1

Service Principal Names (SPNs) and delegation is configured as the following table.


Figure 2

Where is the toolbox?

When we troubleshoot errors we must have a set of tools. In this article series we will use only some of these, but, for your convenience here are my suggestions for the troubleshooting tools:

  • Windows event logs on clients and servers
  • IIS log files on web frontends, SQL servers and Domain Controllers
  • SharePoint log files
  • Command line tools
    setspn (from the Windows Server resource kit, Windows Server 2008 have this by default)
    ldifde
    KList (from the Windows Server resource kit, Windows Server 2008 have this by default)
  • GUI tools
    KerbTray (from the Windows 200 Server resource kit, works on all Windows)
    ADSIEdit
    Network Monitor
    WireShark network packet analyzer

Some useful commands to use while testing are the clearing commands:

  • DNS cache:  Ipconfig /flushdns
  • NetBIOS cache you type in:  Nbtstat -R
  • Kerberos tickets: Klist purge
    (We can also see the logged on interactive user Kerberos tickets with KerbTray)

When analyzing the log on procedure in Kerberos it is very handy to have the following table of the actions.


Figure 3

The problems to investigate

There are some problems I see more often than others on customer servers and so, here is a list of which problems this article will cover:

  • Date and time
  • Application pool accounts
  • SPN configuration

In this part of the article series we will look into what can be seen in the Windows event log files and a network protocol analyzer for every problem we create.

Date and time

The date and time is an essential part of the Kerberos authentication because the tickets issued by the Key Distribution Center (KDC) are only valid for a limited period of time. If the clients and servers are not in sync, validation of the tickets will fail as this is a part of the security structure. Therefore, it is very important to check that all clients and servers have the correct time zone and settings. In this exercise we will take a look at the date and time problems.

Time difference on the SharePoint server

I configure the SharePoint server WSS1 to have a 24 hour time difference, and the errors occur in the Windows System event log.

Warning, W32Time, Event ID: 52, Category: None
The time service has set the time with offset -86391 seconds

Usually the servers synchronize the time automatically and the errors go away. That was the case in my test here – so no administrator interaction was necessary.

Sometimes the domain controllers can have time synchronization problems. I test this by changing the time on the domain controller and Kerberos notifies me with the LSASRV event id 40960 in the system event log.

Warning, LSASRV, Event ID: 40960, Category: SPNEGO (Negotiator)
The security System detected an authentication error for the server MSSQLSvc/sql1.domain.local:1433. The failure code from authentication protocol Kerberos was “The time at the Primary Domain Controller is different than the time at the Backup Domain Controller or member server by too large an amount. (0xc0000133)”.

The date and time errors are easy to detect and correct – just adjust the time or open the required ports in the firewalls if the time sync packets are dropped. In virtual environments, time synchronization problems can cause larger problems though, as the virtual hardware clock of the virtual machines can differ with other virtual server hosts.

Application pool accounts

The IIS websites for the web applications are automatically configured by SharePoint and when creating these you choose/add Application Pools. The web application runs in this pool and with its configured identity (user).

Changing the application pool account manually

The websites runs in IIS application pools and these are not meant to be configured manually. If an administrator changes the identity of the application pool to a wrong account, this can of course cause the website to become unavailable. It can be necessary to adjust this if the user changes.

I will try changing the application pool account to domain\spwrongacct for our website http://intranet.domain.local.


Figure 4

This will cause these errors in the Windows System event log on the SharePoint server:

Warning, W3SVC, Event ID: 1012, Category: None
The identity of application pool, ‘SharePoint – intranet.domain.local – 80’ is invalid.  If it remains invalid when the first request for the application pool is processed, the application pool will be disabled.  The data field contains the error number.

Warning, W3SVC, Event ID: 1057, Category: None
The identity of application pool ‘SharePoint – intranet.domain.local – 80’ is invalid, so the World Wide Web Publishing Service cannot create a worker process to serve the application pool.  Therefore, the application pool has been disabled.

Error, W3SVC, Event ID: 1059
A failure was encountered while launching the process serving application pool ‘SharePoint – intranet.hendriksen.dk80’. The application pool has been disabled.

-and the error on the client computer accessing the website would be: Service Unavailable

To correct the error, change the account to the one configured in the SharePoint configuration and start the application pool again from the IIS management console. If you need to change the user/password in the SharePoint configuration please follow the steps in the following Microsoft Knowledgebase article.

Service Principle Name (SPN) configuration

The configurations of SPNs are also very important for Kerberos authentication to work. First I will summarize how these are used between the client and the server.

  1. The user types in a URL in the Internet Explorer (e. g. http://intranet.domain.local)
  2. The client browser constructs the SPN, which contains a name of the host and the service type
    (SPN: http/intranet.domain.local – Service type: HTTP Name: intranet.domain.local)
  3. The client sends a request to the KDC to get a ticket for this SPN
  4. The KDC server encrypts the ticket using the registered accounts public key (domain\spcontentpoolacct) and sends the ticket to the client
  5. The client authenticates with the SharePoint web frontend by sending the ticket
  6. The SharePoint server decrypts the ticket with the application pool account (its identity) and checks the content
  7. The user is authenticated or an error message are sent to the client browser/event log
  8. If the user fails the Kerberos authentication, NTLM authentication is attempted

Missing SPN for the web application

I will try to see what happens if the client cannot get a ticket from the KDC by removing the SPN mapping to the account.

Delete the wrong account: SETSPN -D HTTP/intranet.domain.local domain\spwrongpoolacct

Then we access the website from PC1, http://intranet.domain.local, and we get the website default page. -but how did we authenticate?

If we check the Client Windows event log we do not see any entries. In the Windows Security event log on the SharePoint server though we see the following:

Audit Success, Event ID: 4624, Category: Logon

Logon process: NtLmSsp
Authentication Package: NTLM

So, the logon failed Kerberos and went on authenticating with NTLM because it negotiates in that order. We need to investigate why this happened and we can add more Kerberos logging to our client and server, or use a packet sniffer. Most of the time I use a sniffer called Wireshark and I start out by installing and running this on the client. I get this output from when I capture the process above:


Figure 5

As the SPN missing the Active Directory will send a KDC_ERR_S_PRINCIPAL_UNKNOWN. This is the message saying that the Active Directory cannot find a matching SPN for this website.

Configuring the wrong account in Active Directory for the SPN

If the decryption key does not match step 6, this means that the encryption key comes from another account and the configuration has an error somewhere. Let us configure the SPN to use a wrong account and see the result

Delete the correct account: SETSPN -D HTTP/intranet.domain.local domain\spcontentpoolacct
Add the wrong account: SETSPN -A HTTP/intranet.domain.local domain\spwrongpoolacct

If we analyze the packets from the SharePoint server, we see this communication when we do iisreset /noforce and access the web application.


Figure 6

The SharePoint server gets the Kerberos information from the KDC and uses it to decrypt the ticket. If it does not match, it generates an error that is sent to the client.

In the Windows System event log of the client we see this error:

Error, Event ID: 4, Category: None
The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server wss1$. The target name used was HTTP/intranet.domain.local. This indicates that the target server failed to decrypt the ticket provided by the client. This can occur when the target server principal name (SPN) is registered on an account other than the account the target service is using. Please ensure that the target SPN is registered on, and only registered on, the account used by the server. This error can also happen when the target service is using a different password for the target service account than what the Kerberos Key Distribution Center (KDC) has for the target service account. Please ensure that the service on the server and the KDC are both updated to use the current password. If the server name is not fully qualified, and the target domain (DOMAIN.LOCAL) is different from the client domain (DOMAIN.LOCAL), check if there are identically named server accounts in these two domains, or use the fully-qualified name to identify the server.

When the web front-end tries to decrypt the service ticket, the key is incorrect because this was encrypted using the SPN accounts key (domain\spcontentpoolacct) and decrypted with application pool accounts private key (domain\spwrongacct). The error KRB_AP_ERR_MODIFIED will be sent to the client and appear in the Windows System event log.

The environment is correctly reconfigured to the domain\spcontentpoolacct account again:

Delete the wrong account: SETSPN -D HTTP/intranet.domain.local domain\spwrongpoolacct
Add the correct account: SETSPN -A HTTP/intranet.domain.local domain\spcontentpoolacct

Note: The KRB_AP_ERR_MODIFIED error can also be caused by other misconfigurations.

Conclusion

We have now set up a test environment, found some tools to use and generated error-messages to help us find some answers for date/time, application pool accounts and SPN configuration, if found in a production environment.

In the following article parts I will cover typical problems such as

  • Duplicate Service Principal Names
  • DNS Configuration mismatch
  • Delegation, when is it used and how to check it
  • Shared Service Provider (SSP), is it Kerborized?
  • More investigation with the network packet analyzer

Links

 

If you like to read the other parts in this article series please go to:

 

About The Author

Leave a Comment

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Scroll to Top