Measuring System Performance on a Forefront Threat Management Gateway (TMG) 2010 Firewall (Part 1)

If you would like to read the next part in this article series please go to Measuring System Performance on a Forefront Threat Management Gateway (TMG) 2010 Firewall (Part 2).

Introduction

Troubleshooting performance issues with the Forefront TMG firewall can be challenging for new and veteran administrators alike. Further complicating matters is the fact that TMG relies heavily on supporting infrastructure services, like Active Directory and DNS, and will often appear to be the culprit when in reality it is one of its supporting services. When faced with a TMG firewall that is performing poorly, it can be challenging to know where to begin your troubleshooting efforts, and which tools are best suited for the job. In part one of this two-part series on measuring performance on the TMG firewall I will outline a methodology for systematically collecting data from and evaluating the performance of the four main computing subsystems – CPU, memory, network, and disk. In part two of this series I’ll follow up with prescriptive guidance on gathering and assessing specific TMG-related performance data.

Before You Begin

In my experience, when faced with TMG firewall performance issues many administrators assume the worst and begin arbitrarily collecting data from everywhere, often using tools that provide extremely detailed information. At some point these tools and information might be required, but to begin our troubleshooting efforts we need to take a step back and look at the basics. As I covered in a previous article about optimizing performance on the TMG firewall, it is best to perform a general assessment of the overall configuration of the server and of TMG. Often this sanity check will reveal issues that contribute to the poor performance of your TMG firewall.

Methodology

When faced with degraded performance, it is vital that we take a logical and methodical approach to collecting and evaluating data. The first tool I recommend using is the venerable Windows Performance Monitor (perfmon). It is easy to be overwhelmed by this tool, but with a little guidance and direction I will demonstrate that by looking at just a few objects and counters we can gain valuable information about the health of our TMG firewall. We’ll use this information to determine if any underlying subsystems are resource constrained and if they are, those will need to be addressed before proceeding with additional troubleshooting. Once we’ve determined that there are no resource constraints we can begin digging deeper to gather more detailed and specific information from TMG itself.

CPU

We’ll begin our troubleshooting by taking a look at the current CPU utilization of our TMG firewall. To begin, launch the Windows Performance monitor by clicking on the Start button, then clicking Run. Enter perfmon.exe and click Ok. After the Performance Monitor appears, click the Performance Monitor node in the navigation tree on the left.


Figure 1

By default, the %Processor Time counter for the Processor object is displayed. This is an excellent starting point as this is the first resource we want to measure. Here we want to see CPU utilization under 90% (80% on multi-processor systems). Obviously if we have TMG firewalls running continuously at this high level of processor utilization this would be cause for concern and should be investigated further.

In addition to processor utilization, an excellent indicator of a potential CPU bottleneck is the Processor Queue Length counter of the System object. You can add this counter by right-clicking anywhere in the Performance Monitor window and selecting the option to Add Counters from the context menu, or by clicking the green plus sign at the top of the window. Scroll down and expand the System object, then select the Processor Queue Length counter and click Add.


Figure 2

Sustained values greater than two per CPU are indications of a potential bottleneck. As processor utilization approaches 100% there will be an exponential increase in the length of the processor queue. Continuously high values for this counter should be addressed by adding capacity or reducing demand.

Memory

Having enough free system RAM is vital to the performance of the TMG firewall. When looking at memory statistics, the first measurement to take is the available memory on the system. Observe the AvailableMBytes counter of the Memory object. Obviously the more memory we have available the better, but anything less than 5% of physical memory (RAM) is cause for concern. Excessive memory utilization can be caused by myriad factors, so further investigation may be required. Note: There are counters for AvailableKBytes and AvailableBytes, but if you have to get this granular in order to determine how much available memory you have, you probably need more memory!

Another important memory utilization indicator is the %Committed Bytes In Use counter of the Memory object. Again, lower is always better, but values in excess of 80% indicate a possible shortage of memory.

Excessive paging is another strong indicator of high memory pressure. Any time the system runs low on available memory, the paging file stored on a disk drive will be used. Accessing memory stored in the paging file on disk is exponentially slower than retrieving it directly from RAM, so the performance penalty will be steep and noticeable. To monitor paging activity I suggest observing the Pages Input/sec counter of the Memory object. The threshold for this counter will vary with the performance of the disk subsystem, but a good rule of thumb is that sustained values greater than 20 are indicative of excessive paging.

In addition, by observing the Committed Bytes counter of the Memory object and dividing that by the amount of installed RAM on the system, we calculate a memory contention index that can be useful in predicting the likelihood that excessive paging will occur. For example, on a system with 4GB RAM and committed bytes at 2GB, there is more than enough physical memory to hold all of the committed bytes in RAM and so the demands on the page file should be relatively low. However, if committed bytes is equal to or exceeds the amount of RAM, excessive paging is likely to occur. If the ratio of committed bytes to RAM exceeds 1.5:1, this is a good indication that there is a shortage of physical memory (RAM) that will have to be addressed.

Network

The Forefront TMG firewall is fundamentally a network security device, and as such, the performance of the firewall can be severely impacted if network throughput is impeded. Network utilization can be calculated by measuring the Bytes Total/Sec counter of the Network object and dividing it by the Current Bandwidth counter. Results that exceed 75% indicate excessive utilization and should be addressed by increasing bandwidth or distributing the network load by adding additional network interfaces or perhaps by adding additional TMG firewalls.

As mentioned earlier, excessive queuing is a strong indicator of a bottleneck for any subsystem. Monitor the Output Queue Length counter of the Network Interface object and watch for sustained values greater than two. Continuously high values observed here are strong indications of a network bottleneck.

Disk

Forefront TMG firewalls are typically either CPU or memory bound. However, there are times when disk performance can cause serious problems. Poor disk performance can affect logging and reporting and impact the user experience by slowing virus and malicious software scanning and cached content retrieved from disk. To assess the health of the disk subsystem, start by observing the %Idle Time counter of the Physical Disk object. Values below 20% indicate that the disk subsystem is saturated.

Poor disk performance can also be identified by looking at the Avg. Disk sec/Read and Avg. Disk sec/Write counters of the Physical Disk object. Values for either of these counters that exceed 25ms indicate slow disk performance. In addition, if the Avg. Disk Queue Length counter of the Physical Disk object reveals sustained values greater than two per spindle, this is another good indication of a disk bottleneck.

Resolving Performance Issues

So what do we do if we’ve identified a subsystem that shows signs of excessive utilization or insufficient capacity? We have two choices: increase capacity or reduce demand. In which specific subsystem the capacity constraint lies will dictate how we need to address it. It may require adding capacity by upgrading existing hardware (scale up) or adding additional TMG firewalls (scale out) to meet demand. We can reduce demand by in a number of ways; perhaps by configuring the TMG firewall to be more efficient by optimizing firewall policy or configuring clients as web proxy clients as opposed to SecureNAT.

Summary

As you can see, by taking a methodical approach to gathering information about the performance of our TMG firewall we can quickly and easily identify any resource constraints in any of the four basic computing subsystems. By observing CPU utilization, memory consumption, network utilization, and disk performance, and by carefully monitoring work queues where available, we gain a basic understanding of how our TMG firewall is operating. If any one of these systems is oversubscribed and showing signs of high utilization, we can address those concerns by adding capacity or reducing demand. As I demonstrated here, using the Windows Performance Monitor and collecting basic information from just 12 counters from four objects, we can determine how best to proceed with our troubleshooting efforts. After we’ve completed our assessment of the CPU, memory, network, and disk utilization and resolved any outstanding issues, if performance issues persist we can proceed with more in-depth troubleshooting of TMG. Be sure to read my next article where I’ll demonstrate how and what to look for when gathering detailed performance data from TMG.

If you would like to read the next part in this article series please go to Measuring System Performance on a Forefront Threat Management Gateway (TMG) 2010 Firewall (Part 2).

Leave a Comment

Your email address will not be published.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Scroll to Top