Using Windows Performance Monitor to Baseline a Terminal Server (Part 2)

If you missed the first article in this series please read Using Windows Performance Monitor to Baseline a Terminal Server (Part 1).

Windows Performance Monitor (PerfMon) is an excellent tool for diagnosing performance issues on Windows servers.  In part one of this article, we reviewed how to use PerfMon to create a baseline and gather data. In part two, we will review the data gathered and look at how to interpret the counters.

Viewing Performance Monitor Results

Now that the information has been collected, use the System Monitor in Windows Performance Monitor to view it. The default data source in System Monitor is to view live performance data. To view the saved information, you need to select an alternate data source and then select the saved log.

  1. Open PerfMon and select the System Monitor screen from the left pane, and then click on the View Log Data link in the toolbar (the database symbol icon highlighted in figure 1). You can also simply press CTRL+L.


Figure 1

  1. On the Source tab, add the log file(s) that contain the performance data from your job. Once you select the file(s), clicking on Time Range (figure 2) will show you the window of time during which data was collected.


Figure 2

Clicking and dragging either end of the slide bar on the Time Range will either shorten or widen the time frame in which you will view the data. Once the data source has been selected, click OK.

  1. To view the data, simply add counters to the System Monitor screen. You will notice that when you go into the Add Counters screen (figure 3), only the counters that were selected during the setup of the job are available for viewing. You can add whichever counters and instances you wish to view.


Figure 3

  1. Once you have selected the counters to view, you will see them in the System Monitor screen. You can alter the color of each counter on the Data tab of System Monitor properties to make viewing easier, and you can also click on the light bulb in the toolbar to highlight a particular counter in the graph.
    When you view the data, take note of some of the numbers at the bottom of the graph, such as Last, Average, Duration, etc. Those numbers correspond to the following information:

Last:

The last measured metric. This number is mostly irrelevant when viewing historical data as it will always display the last metric captured. When viewing live performance data, it constantly updates with the value of the last sampling interval.

Average:

This is the average value of the counter measured over the sampling period.

Minimum:

This is the minimum value of the counter measured over the sampling period.

Maximum:

This is the maximum value of the counter measured over the sampling period.

Duration:

This is the length of the sampling interval, or how long information was captured.

Interpreting the Results

It is a good idea to review the baseline data after it is taken; however don’t expect to really make any immediate use of this information. Take a look to get a feel for the metrics that were recorded, especially if you are unfamiliar with what values to expect for certain counters. Unless any performance issues were noticed, this information should be a good snapshot of what a properly running server should look like.

The information captured really comes in handy when, down the road, a performance problem is observed. It may not always be apparent where a performance problem lies by just looking at the current performance counters. For instance, the counter “Processes” under the object “System” may have an average value of “233”. To determine if this number could be part of the problem, compare the current metrics to the baseline data to see what this value was when performance was perceived as “normal”.

Important:
Make sure that you are viewing relevant baseline data when comparing it to the real-time counters. Remember, the Source tab in the System Monitor Properties window will indicate when the capture was taken. Be sure you are getting statistics from the same time window. See step 2 above for more information.

To make comparing data easier, multiple Performance Monitor windows can be open at one time, each viewing different information. To easily compare the baseline data to real-time performance metrics, open two PerfMon windows, and set one to view current activity and the other to view the logged data. Then from the Window menu, select Tile Horizontally to view the two windows side-by-side.

Below is a detailed list of the recommended baseline counters and what each one measures, along with some recommendations on interpreting the results.

Memory

Counter

What It Means

How to Interpret Results

Pages/Sec

Identifies how often the page file is being accessed. 

Generally speaking, this should remain at or close to zero, but Windows almost always pages some information out to disk through normal operations. However, any consistently high readings on this counter mean the system simply does not have enough RAM.

Available Mbytes

The total physical RAM available to the system for processes. 

Low numbers indicate RAM is getting tight.

Committed Bytes

The amount of physical RAM that has space reserved in the page file on disk. 

When a process is launched, the kernel automatically reserves a certain amount of space in the page file. The more RAM that is consumed on the system, the more page file space is reserved. This is simply reserved space and does not necessarily mean that processes are being swapped out.

Page Faults/Sec

How often the system needs to retrieve data from outside of the process’ working set. 

This number includes both soft and hard page faults. Soft page faults are when the information was found elsewhere in physical RAM. Hard page faults are when the information had to be retrieved from the page file on disk.

 

Network Interface

Counter

What It Means

How to Interpret Results

Bytes Total/Sec

Total number bytes being transferred in or out of the NIC, per second.

The values here typically shouldn’t exceed 60% – 70% of the total bandwidth of the interface.

Packets/Sec

Total number of packets being transferred in or out of the NIC, per second.

This number is typically useful to view trending data from one baseline to the next.

 

Paging File

Counter

What It Means

How to Interpret Results

% Usage

How much of the disk-based page file is being used.

High numbers are a sure sign of too little RAM in the system, but keep in mind that Windows almost always swaps some information out to the page file.

 

Physical Disk

Counter

What It Means

How to Interpret Results

% Disk Time

The total amount of time the disk is either reading or writing data.

High numbers could indicate too much paging or a disk subsystem bottleneck.

Avg Disk Bytes/ Transfer

Average bytes transferred to or from the disk.

The actual data transfer to and from the physical platters on most hard drives is limited to 80MB – 95MB per second (depending on manufacturer and model), regardless of the interface. The thresholds here depend on the controller attached to the drives and the array configuration. A typical RAID-5 array of 4 disks has a maximum throughput of 320MB – 380MB per second, so in this case the transfer speed is limited to the interface speed, such as U320 SCSI. However, a pair of mirrored disks only has a maximum throughput of 80MB – 95MB.

Avg Disk Queue Length

Average number of queued read/write operations waiting for disk access and is a direct indication of disk congestion.

The average here should be less than the total number of spindles in the array. Anything more means the OS is waiting for the disk subsystem. If viewing data from a SAN, ignore this counter and concentrate on the latency counters below (Avg Disk Sec / Read and Avg Disk Sec / Write)

Avg Disk Sec/ Transfer

Average duration in seconds for data transfers to or from the disk.

Averages should be around 20ms, with spikes no higher than 50ms.

Avg Disk Sec/ Read

Average time in seconds to read data from the disk.

Averages should be around 20ms, with spikes no higher than 50ms.

Avg Disk Sec/ Write

Average time in seconds to write data to the disk.

Averages should be around 20ms, with spikes no higher than 50ms.

Disk Transfers/Sec

The number of read/write operations on the disk.

This number is typically useful to view trending data from one baseline to the next.

 

Processor

Counter

What It Means

How to Interpret Results

% Processor Time

Total processor utilization on the system, across all processors, both logical and physical.

This number should not exceed 90%.  Windows calculates this number by taking the total utilization on the box and dividing it by the total number of logical CPUs detected in the system (including hyper-threaded processors).

% Privilege Time

Total time spent processing kernel mode processes (OS-related processes), per processor.

If processor utilization is high and % privilege time is also high, it may be an indication of a Windows service having issues.

% User Time

Total time spend processing user mode processes (application-related processes), per processor.

On Terminal Servers, it is typical to see a greater percentage of time on user-mode processes than kernel-mode processes.

Interrupts/Sec

The number of times per second the processor receives hardware interrupts for service requests from peripherals (NIC, disk, mouse, etc.). Interrupts cause the processor to temporarily suspend thread processing to service the request.

This number will be high in environments with high disk utilization or networking demands. However, values significantly over 1000 should be investigated and could indicate a hardware problem. Compare this value to the System Calls/Sec counter in the System object. If Interrupts/Sec is higher, it usually indicates a hardware problem.

 

System

Counter

What It Means

How to Interpret Results

Context Switches/Sec

Context switches occur when a running thread relinquishes control of the processor, is preempted by a higher priority thread, or changes from kernel mode to user mode and vice versa.

This number varies based on the number and speed of processors in the server. It is not unusual to see average numbers between 3000 and 5000 context switches/sec, per CPU on a Terminal Server. However, high numbers here will surely kill perceived performance.

Processes

The total number of processes running on the system.

This number is typically useful to view trending data from one baseline to the next.

Processor Queue Length

The number of threads in the processor queue waiting for processor time.

Numbers consistently over 3 or 4 indicate not enough processors or simply too high of a load.

 

Terminal Services

Counter

What It Means

How to Interpret Results

Active Sessions

The total number of sessions that are active, excluding disconnected sessions.

This number is useful to correspond to other metric recorded in the baseline, such as the number of Total Processes, % Processor Time, etc.

Total Sessions

The total number of sessions that are active, including disconnected sessions.

This number is useful to correspond to other metric recorded in the baseline, such as the number of Total Processes, % Processor Time, etc.

 

Terminal Services Session

Counter

What It Means

How to Interpret Results

% Processor Time

The amount of processor time allocated to all processes running in a particular session.

This number is useful to correspond to other metric recorded in the baseline.

Page Faults/Sec

The total number of times the system cannot find needed information in RAM (includes both hard and soft page faults).

This number is useful to correspond to other metric recorded in the baseline.

Some Final Notes

Windows Performance Monitor is a great diagnostic tool; however like every tool, it is only as good as the knowledge of the person using it. Take the time to develop baselines of your various Terminal Servers deployed in your environment. Be sure to perform separate baselines of the various hardware platforms used and Terminal Server “application silos” so all of your varying server profiles are covered.

It is also a good idea to perform repeat baseline analyses every 3-6 months to build a trend. Even if user load never increases, outside influences on the system such as installed service packs and patches can cause changes in performance. With a baseline taken every 3-6 months, you can identify performance trends and possibly head off performance problems before they occur. When you can anticipate performance issues and correct them before your users feel the impact, then that’s when you truly achieve Terminal Server nirvana.

If you missed the first article in this series please read Using Windows Performance Monitor to Baseline a Terminal Server (Part 1).

Leave a Comment

Your email address will not be published.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Scroll to Top