Planning for High Availability and Scalability in your TMG Deployment


Once upon a time, businesses could tolerate a degree of downtime in regard to access to the Internet. In fact, it was expected. However, organizations have come to depend more and more on continuous, uninterrupted connectivity to resources that reside on computers outside of the local network. This will only increase in the years to come as many companies combine cloud computing with their own internal networks.

Along with availability concerns, organizations are struggling with issues related to scalability. In today’s economic climate, mergers and acquisitions are commonplace. Growth is good, but it presents challenges in many areas, including IT. Securing and managing access to the Internet for an expanding user base requires solutions that can easily adapt to accommodate the extra load.

This means you need to plan for high availability and scalability from the ground up when you design a TMG deployment plan. In this article, we will discuss guidelines for the points you should consider in developing a highly available, scalable TMG deployment.

Designing for high availability

What do we mean by “high availability?” Availability refers to the “uptime” of a particular service or resource and is generally expressed as a percentage of time that is it available during a given timeframe. For example, 90 percent availability means the resource is available for use 90 percent of the time during the month, year or other measurement period.

Within the industry, availability is often referred to in terms of “nines.” 90 percent availability is called “one nine,” 99 percent is “two nines,” and so forth. Six nines (99.9999 percent uptime) is something of a holy grail in networking; this magic number represents a configuration with only about 31 seconds of downtime in an entire year. For a discussion of the feasibility of chasing the dream of five or more nines, see this article.

Here is a chart showing the maximum downtime per year for each of the “nines.”

“Nines” designation

Uptime percentage

Max. downtime per year

One nine


36 days and 12 hours

Two nines


87 hours and 36 minutes

Three nines


8 hours and 46 minutes

Four nines


52 minutes and 33 seconds

Five nines


5 minutes and 35 seconds

Six nines


31.5 seconds

Seven nines


3.15 seconds

Table 1

High availability is critical to many business operations today, because of the global nature of their operations that requires access 24 hours per day to avoid loss of business, productivity and money. Designing for high availability includes building in redundancy and automation, so that in case of failure, restoration of service will occur automatically without human action required. This requires detection of the failure and automatic reconfiguration of the system to restore the service.

Designing for scalability

Scalability is important in any situation where there is a chance of an increased workload, which can come about as a result of growth in number of users (either internally in the form of additional employees utilizing the systems or externally in the form of customers or other outsiders accessing the systems) or increased productivity from the same number of users. Scalability can also refer to the ability to handle new and additional functions as needed.

Scalable systems are able to handle the increased workload or new functions with minimal disruption to the existing users. Scalable systems can also scale down when necessary, as well as up. There are two basic ways to scale:

  • Vertical scaling: adding more resources to a system (for example, more memory or additional or faster processors).
  • Horizontal scaling: adding more systems in a distributed computing model such as an array or cluster.

Vertical scalability is generally more limited but can be more cost effective. Virtualization complicates the definitions because by adding more system resources to a single system (vertical scaling), you gain the ability to create additional virtual systems that can work together in a distributed fashion (horizontal scaling).

Effective designing for scalability requires accurate assessment of potential future growth and/or decline.

High availability and scalability considerations for TMG

TMG provides you with the ability to easily implement horizontal scaling by deploying one or more arrays of TMG servers. An array can consist of a large number of TMG servers that are all managed through one centralized management interface and work together. The traffic load can be spread across all of the TMG servers in the array, either using Microsoft’s Network Load Balancing (built into Windows Server and integrated into TMG) or using the third party load balancing solution of your choice.

Array types

The number of TMG servers that you can have in an array depends on whether you have deployed TMG (Enterprise edition) in a standalone configuration or an Enterprise Management Server (EMS)-managed array. With a standalone deployment, you can have up to fifty TMG servers in the array. One of the members of the array will function as the array manager. This works when you have the members of your TMG array deployed in the same location and works best with a medium sized amount of traffic.

There is a better option if your TMG servers are deployed in multiple locations, or in a single location where the traffic load is heavy. In those cases, the EMS-managed array is the best choice. This type of array can consist of as many as 200 arrays that hold up to 50 TMG servers each, for a total of 10,000 TMG servers.

In a TMG array, the server configuration settings for each of the array members are stored, along with the array configuration settings, in one centralized location (the array manager in a standalone deployment or the Enterprise Management Server in an EMS-managed array). Richard Hicks wrote a great article about standalone arrays, how they differ from EMS-managed arrays, and what you can do in the case of the array manager’s unavailability. Check it out here.

TMG arrays enable high availability through redundancy within an array as well as scalability through the ability to add additional TMG servers to an array or additional arrays to an EMS-managed array deployment.

Load balancing in an array

When you use Microsoft’s Network Load Balancing (NLB) in Windows Server 2008, you can balance the traffic load across up to eight members of an array, with no need to purchase additional hardware. You are able to use the TMG management console to manage and monitor NLB across the array, because it is integrated into TMG. An extra added bonus is that the firewall rules and settings will be configured automatically.

NLB integration has been available since ISA Server 2006 and it is a tremendous convenience for TMG admins because instead of having to configure the NLB settings on every node, you’re able to do it via the Enterprise Management Server or on the array manager. Then the EMS or array manager passes the settings on to each of the members of the array.

If you want to disable NLB integration on a TMG array, you can use the “NLB Clear” feature (this was not included in ISA Server but was introduced in the first version of TMG. However, you can download it from the Microsoft Download Center to remove NLB settings from an ISA Server 2006 EE array member). This feature allows you to select whether you want to do away with the NLB configuration settings and reset them. If you choose to do this, the following actions will be performed:

  • The virtual IPs will be removed from the NICs
  • The NICs will be unbound from the NLB protocol
  • All settings pertaining to NLB will be removed from the TMG server’s registry

The “NLB Clear” function does this for all of the members of the array. You can also remove the settings for each array member individually. You do this via a troubleshooting task or using the NLBClear.exe utility. You have to run the task on the local machine; you won’t be able to run it from a remote management console. NLBClear.exe is located in the in the installation directory. You will need to stop the Firewall service in order to run it. That means you first need to run net stop fwsrv.


High availability and scalability are two important concepts that need to go into your design decisions when you are planning a deployment of TMG, and both can be accomplished by selecting the appropriate type of TMG array. In this article, we’ve briefly discussed how to design for high availability and for scalability, and the best deployment choices for different environments and traffic loads.

Leave a Comment

Your email address will not be published.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Scroll to Top