The Mystery of the failing POP3 Access with ISA 2000
By Stefaan Pouseele
Last Update: 16/05/2005
Recently I investigated what seems to be at first sight a simple POP3 access problem. Users were complaining that their access to an external POP3 server was frequently failing. After gathering some more information it becomes clear I stumbled on a very weird access problem that needs some thorough analyses. Moreover, it turns out that this access problem was not specific POP3 related but was in fact a potential problem for any TCP based access handled by the Firewall client. If you want to know why this can happen and how to solve that problem, read on.
The network setup is a common back-to-back firewall configuration with a Checkpoint as outer and an ISA 2000 as inner firewall. The segment between both firewalls is used as DMZ where a POP3 server is located. This POP3 server is accessed by internal and external clients. The internal clients are all configured as Web Proxy and Firewall clients because all outbound access must be authenticated. The ISA server is fully patched and runs ISA 2000 SP2 on Windows 2000 SP4 with all latest critical updates. The POP3 server runs CommuniGate Pro on Windows 2003. The following figure summarize the above scenario:
About a 200 internal mail clients access every 5 minutes the POP3 server in the DMZ. They use a mix of different versions of Outlook, Outlook Express and Eudora. Therefore we could exclude very rapidly any mail client problems. The reported Windows Sockets Error Code was Connection timed out (WSAETIMEDOUT 10060). Also, the DMZ contains a number of other servers, mostly Web servers. No problems were reported in accessing those servers.
To facilitate the investigation of the problem we made sure we have a Windows XP SP2 host at our disposal with local administrative rights. We closed all running programs and ran the following simple command file:
--- Begin ---
netsh diag connect iphost FQDN 110
--- End ---
This simple command file just establish a TCP connection to the POP3 server on port 110. When the connection succeeds, the connection is closed normally. The success or failure of the connection is reported in the command window. Very rapidly we saw a number of connection failures and this proves that the mail clients are indeed not the culprits.
Next, we started a Network Monitor trace on the POP3 server itself with the parameters Buffer Size = 8 MByte, Frame Size = 128 Bytes and Capture Filter = from ISA External interface to POP3 server. We started again our simple command file and let the Network Monitor trace run for about a 15 minutes. Again we saw a number of connection failures. When analyzing the Network Monitor trace we found indeed a number of connection requests from the ISA External interface with no responses at all from the POP3 server. Further analyzing reveals that the source port numbers used by the ISA External interface don't follow the normal incremental pattern used by a normal Windows host. Moreover, we saw that ISA 2000 very rapidly reuse previous used source ports. So, we suspected that the failing POP3 connections are those for which the socket endpoint was still in the TIME_WAIT state on the POP3 server.
Note: for more information about the TCP/IP protocol and the meaning of the TIME_WAIT state, check out the following resources:
- TCP/IP Fundamentals for Microsoft Windows
- Microsoft Windows Server 2000 TCP/IP Implementation Details
- Microsoft Windows Server 2003 TCP/IP Implementation Details
- Microsoft Windows Server 2003 TCP/IP Protocols and Services Technical Reference Book
To be really sure we were on the right track, we installed Ethereal and Jim's excellent WinsockTool on the client. We then started Ethereal to analyze the Firewall Client Control Channel and used the WinsockTool to manually make connections to the POP3 server. When the connection request timed out, we executed on the POP3 server the command netstat -an | find "IP-Address:" to get a listing of all the connection entries from the ISA External interface. We stopped then the Ethereal capture. By analyzing the Firewall Control Channel we could easely determine which source port the ISA server negotiated on his external interface for that connection. Then we looked up that source port number in the result of the netstat command on the POP3 server. Every time we got the connection failure, the socket endpoint on the POP3 server was still in the TIME_WAIT state and that means it could not yet be reused at that time. This proves positively that the fast reuse of source port numbers by the ISA External interface was indeed the cause of the POP3 connection failures.
To further narrow down the problem, we disabled the Firewall client, make sure the client was also configured as a SecureNAT client and adapted the Firewall Policy accordingly. We then started a Network Monitor trace on the POP3 server itself and rerun our simple command file on the client. This time no single failure was reported. Moreover, in the Network Monitor trace we saw that the source port numbers used by the ISA External interface do follow the normal incremental pattern used by a normal Windows host and that no fast reuse of the source port numbers by the ISA External interface was happening.
As a consequence, we can say that the above problem is not bound to the POP3 access only. In fact it is a potential problem for any TCP based access handled by the Firewall client. It just crop up the most with the POP3 access due to the nature of the polling mechanism implemented in the mail clients and the fact that they all connect to the same destination host.
Now that we knew the exact cause of the problem we made a report for Microsoft PSS and started looking for a workaround. We have investigated three possible workarounds:
We changed on the POP3 server the registry key TcpTimedWaitDelay (HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters) to the minimum value of 30 seconds. Although this seems to mitigate the problem, it didn't solve the problem completely. Moreover we don't consider this a good solution because you don't have in all scenario's administrative access to and control over the POP3 server.
The second workaround was to distribute in some way a custom client LAT file named Locallat.txt and place it into the Microsoft Firewall Client folder on the client computer. The custom client LAT file should contain the IP address of the POP3 server so that any TCP request to that destination would not be handled by the Firewall client. Of course, all internal hosts should then be configured as SecureNAT client too and in the Firewall policy anonymous access should be allowed to the POP3 server. All internal clients were already configured as a SecureNAT client because the internal network was in fact a routed internal network. However, we didn't implement this workaround because distributing a custom client LAT file was too cumbersome as a temporary solution.
The last workaround was to change the Firewall client configuration on the ISA server itself. Because all clients were already SecureNAT clients and that the overwhelming majority of the mail clients were Outlook, we could easely disable Outlook in the Firewall client configuration on the ISA server itself. To do that, go to the Firewall Client Properties in the ISA MMC and select the tab Application Settings. There you select the Application Outlook and change the key Disable to the value 1. After a manual refresh of the Firewall client or after maximum 6 hours, this is the default refresh period for the Firewall client, all clients will have the updated configuration. Of course, the Firewall policy should be changed to allow anonymous access to the POP3 server. This was the workaround we implemented.
After a couple of days we got the following answer from Microsoft PSS:
The reason the firewall does not respond is because the external POP3 server does not respond to the Firewall's sync packet. If a trace is taken leading up to this problem, you will find that the external port pair had been used only 2 minutes previously. So the port pair is sitting in a timed wait state on the POP3 server. There are registry settings on the ISA server to slow the port reuse down.
The following registry keys control the port pool:
- MinTickBeforePortReuse DWORD, default 1200, value is the number of millisecond before reuse. As you can see this is much less than usual TIME_WAIT.
- MaxSocketsInAllPools DWORD, default 6000, total number of sockets that may be kept in the pool.
- MaxTimeInFreePool DWORD, default 60000, value is the number of milliseconds that a socket can remain in the pool and not be reused. Sockets that remain in the pool longer are released.
More information on the registry key:
- MaxTimeInFreePool should be greater than MinTickBeforePortReuse. But if setting MaxSocketsInAllPools works for you, don't try to modify the other values.
- The optimization is required only in cases where there is a high rate of connections per second, and low volume of traffic. It was added for a specific benchmark scenario that tests the number of connections per second.
- For the optimization to work, the following must hold:
Let R be the number of connections per second
MinTickBeforePortReuse / 1000 * R <= MaxSocketsInAllPools
If the number of sockets is smaller - there is possibility that none of the sockets in the pool has aged enough and new socket will be required.
With high values of R, the number of sockets will be too high and will consume all the non-paged pool.
The suggested resolution was:
- Fix #1: if MaxTimeInFreePool is set to 0, then MinTickBeforePortReuse is disabled.
- Fix #2: another way is to set MinTickBeforePortReuse to 4 minutes, which is 240000.
To implement the suggested resolution, we needed to first undo the implemented workaround. We then tried Fix #1 by adding the specific regkey MaxTimeInFreePool = 0 and rebooted the server. After running our simple command file on the client and a Network Monitor trace on the POP3 server, we still saw connection timeouts. Moreover, in the Network Monitor trace we still observed the unusal source port patterns. Therefore this resolution didn't solve the problem at all. So, we moved to Fix #2 by deleting the regkey MaxTimeInFreePool, adding MinTickBeforePortReuse = 240000 and rebooted the server. After running again our simple command file on the client and a Network Monitor trace on the POP3 server, no single failure was reported. Moreover, in the Network Monitor trace we saw that the source port numbers used by the ISA External interface do follow the normal incremental pattern used by a normal Windows host and that no fast reuse of the source port numbers by the ISA External interface was happening.
To summarize the resolution that worked for us:
- create or change MinTickBeforePortReuse DWORD and give it a value of 240000.
- delete MaxSocketsInAllPools if it exists.
- delete MaxTimeInFreePool if it exists.
As a final note, according to the info we got, no other registry keys should be added unless you want to do benchmarking.
In this article we investigated what seems to be at first sight a simple POP3 access problem. However, it turns out that this access problem was not specific POP3 related but was in fact a potential problem for any TCP based access handled by the Firewall client. It just crop up the most with the POP3 access due to the nature of the polling mechanism implemented in the mail clients and the fact that they all connect to the same destination host. In consultation with Tom Shinder, I think we might have solved the age-old POP3 issue which plagued ISA 2000 since it was in beta.
I hope you enjoyed this article and found something in it that you can apply to your own network. If you have any questions on anything I discussed in this article, head on over to http://forums.isaserver.org/ultimatebb.cgi?ubb=get_topic;f=7;t=002017 and post a message. I’ll be informed of your post and will answer your questions ASAP. Thanks! – Stefaan.