Enterprise Web Proxy Performance Optimization Tips for Forefront Threat Management Gateway (TMG) 2010
The Forefront Threat Management Gateway (TMG) 2010 firewall is an excellent security solution for pretty much all verticals, and for organizations large and small. A single standalone TMG firewall can capably serve a small office in a unified role, serving as the edge firewall, forward proxy, reverse proxy, and VPN server. For mid-sized businesses, a standalone array of at least two nodes can provide these same services, with some basic redundancy built in to ensure availability. In these deployment scenarios, Forefront TMG 2010 can be implemented using its defaults, without much modification, and can easily handle these workloads as long as basic capacity planning is exercised.
For larger organizations, typically these roles are performed by separate devices, with firewall and VPN services often handled by dedicated, non-Windows security appliances. In these scenarios, Forefront TMG 2010 is commonly deployed as a forward proxy server, handling Internet requests on behalf of many thousands, even tens or hundreds of thousands of users. Forefront TMG 2010 Enterprise Edition supports the creation of arrays for redundancy and scalability, and additional nodes can be added to meet the capacity demands for even the largest enterprise deployments. However, in my experience, the default configuration does not lend itself well to providing optimum service for very large forward proxy server deployments, even with large multi-node arrays. Scaling out with additional hardware or virtual machines may reduce utilization constraints on a per-node basis, but performance can still suffer due to slow response times.
It goes without saying that the TMG firewall should have ample resources in the way of computing resources, specifically CPU, memory, and networking. You’d be surprised at how often this simple task is overlooked though! When preparing the servers, it is strongly recommended that you deploy Forefront TMG 2010 on dedicated physical servers or hardware appliances. Although virtual deployments are supported, for large-scale deployments they don’t offer the best performance. When supporting a large number of users, the overhead of the hypervisor, along with the fact that all network communication relies on the virtual host’s CPU, can introduce serious bottlenecks that can severely impede performance. You can use the Forefront TMG 2010 Capacity Planning Tool to estimate how much you will need in terms of hardware resources, but my advice to you is to be generous, especially with CPU and RAM. In addition, when planning the size of your array, always use the N+1 model. For example, if your capacity planning efforts determine that you need 4 nodes running 16 cores and 32 GB RAM each, plan to deploy 5 of those nodes. The idea here is simple; if a node fails, you still want the remaining nodes to have more than enough capacity to handle production traffic. In the event of a node failure, if you don’t use the N+1 model, there won’t be enough capacity on the remaining nodes to handle the load, which results in the remaining nodes running out of resources. This is a downward spiral that will result in a complete loss of Internet connectivity, as the domino effect of node failures will cascade until all nodes are completely overwhelmed. Trust me…over plan for capacity and plan for node outages. You’ll be glad you did.
Operating System Configuration
Service hardening and attack surface reduction should always be performed before putting any Forefront TMG 2010 firewall into production service. Not only does this improve the security posture for the solution, it has the added benefit of reducing resource utilization. After installing and configuring Forefront TMG, it is a good idea to review the list of running services and stop and disable any and all services that aren’t required. The best tool for this task is the Security Configuration Wizard (SCW), which is included with Windows Server 2008 R2. You can find guidance on using the SCW here.
If you’re authenticating outbound web proxy requests (a good idea, by the way!), be advised that using the default Integrated authentication option for large-scale deployments places a heavy burden on the underlying authentication infrastructure. Ensure that domain controllers are well-connected and have plenty of capacity to handle requests from the TMG firewall. Also, by default, Integrated authentication uses NTLM, which has some serious shortcomings. Specifically, using NTLM integration will result in the TMG firewall establishing a secure channel to a single domain controller in your environment. To eliminate this potential bottleneck, ensure that your clients are configured as Web Proxy clients and configure TMG to support Kerberos authentication. Details for configuring Forefront TMG 2010 to support Kerberos authentication in load balanced scenarios can be found here. An added benefit to configuring web proxy clients is TCP connection reuse. This reduces the number of TCP connections established by clients and allows the TMG firewall to handle many more concurrent connections per node. I demonstrated TCP connection reuse here.
The choice of logging methods can have a big impact on the throughput and performance on the TMG firewall. By default, Forefront TMG uses native SQL logging. All requests are logged to a local instance of SQL Express running on the TMG firewall. Using SQL Express (as opposed to MSDE used by previous versions of ISA Server) and some additional enhancements to the underlying logging infrastructure have made local SQL logging much better. However, for large-scale enterprise deployments, it is not the best choice. SQL consumes a significant amount of memory, which reduces the overall scalability of the solution. Also, logging to SQL consumes much more CPU than other options, which again is resources that could be better used to service web proxy requests. For TMG web proxy deployments that support many users, it is recommended to use text file logging. This will consume the least amount of system resources compare to other options.
Advanced Web Protection
Forefront TMG includes advanced web protection services such as integrated URL filtering, virus and malicious software scanning, HTTPS inspection, and behavioral and signature-based network Intrusion Detection and Prevention Systems (IDS/IPS). While these features can be leveraged to provide enhanced protection for web proxy clients, they are expensive in terms of resource consumption on the firewall. They can also negatively impact both latency and throughput, which can lead to a poor user experience if not properly planned. My advice here is to carefully consider your requirements to determine if all of these services are indeed necessary for your implementation. If they are not, consider disabling them to improve performance and reduce resource utilization.
Firewall Policy Configuration
An often overlooked area that can have a big impact on the performance of the TMG firewall is firewall policy configuration. If firewall policy is misconfigured, the efficiency of the firewall can be hampered dramatically. The most common example of this is configuring a deny all rule that applies to a Domain Name Set for all protocols. When this happens, the TMG firewall service is forced to perform a reverse name resolution look up for every single packet handled by the firewall. As you can imagine, performance goes downhill quickly in this scenario. There are additional configuration issues that can have an adverse effect on performance as well. For detail information regarding Forefront TMG 2010 firewall policy best practices, click here.
For large-scale enterprise deployments, load balancing is essential. While the integrated Network Load Balancing (NLB) feature of TMG Enterprise edition is sufficient for many small and mid-sized organizations, it often provides a less than ideal experience in very large organizations supporting many thousands, and especially tens or hundreds of thousands of users. NLB has a hard-coded throughput limit of 500Mbps. NLB also uses layer two broadcasts for heartbeat traffic, which in very busy environments can impede throughput and performance. Also, this problem becomes worse as you add nodes to the array. It is for this reason that NLB arrays are limited in size to eight nodes. For large-scale enterprise deployments, the use of a dedicated, external hardware load balancer is recommend. Using an external load balancer allows for more intelligent and granular load balancing as opposed to NLB, and doesn’t suffer from the throughput limitations or negative scalability that NLB imposes. In addition, hardware load balancers often include additional network optimizations that can be leveraged to further improve performance and reduce latency.
The Forefront TMG 2010 firewall serves as an excellent edge firewall, forward and reverse proxy, and VPN server for organizations of all size. At the high end, for large-scale enterprise deployments, TMG is often deployed as a dedicated forward web proxy. In this scenario, the TMG firewall can serve capably, but scalability, throughput, and performance can all be improved by following these simple guidelines and best practices.