Monitoring Exchange 2010 with OpsMgr 2007 R2 (Part 1)

If you would like to read the other parts in this article series please go to:

Monitoring Exchange 2010 with OpsMgr 2007 R2 (Part 2)

Monitoring Exchange 2010 with OpsMgr 2007 R2 (Part 3)

Monitoring Exchange 2010 with OpsMgr 2007 R2 (Part 4)

Monitoring Exchange 2010 with OpsMgr 2007 R2 (Part 5)

Introduction

The story with the Management Pack (MP) for the Exchange Server 2007 was not a very good one, since the first version of the MP was imported from the one designed for Microsoft Operations Manager (MOM) 2005, and the native MP was released very late.

For Exchange 2010 Microsoft decided to build a completely new MP from the ground up, it has nothing to do with the Exchange 2007 Management Pack, therefore, if you are acquainted with the previous version, there is a new experience in the way you deploy and configure the latest one

The Microsoft Exchange Server 2010 Management Pack includes a complete health model, extensive protocol synthetic transaction coverage, and a full complement of diagnostics-based alerts and service-oriented reporting.

One of the most exciting features in this new MP is that alerts are now processed by a new component called the Correlation Engine that is supposed to suppress duplicate alerting whenever possible.

Another change you’ll notice is that very little tuning is required, no matter the size of the Exchange Organization, since most of the diagnostic information used in the Exchange 2010 Management Pack is specifically engineered for monitoring and it will scale with your environment.

Here are some of the key enhancements of this MP over its predecessors:

Correlation Engine: The goal of the Correlation Engine is to significantly reduce the number of alerts that may not require an action by the administrator that is monitoring the Exchange environment by watching the System Center Operations Manager Console.
New set of reports specific to Exchange 2010: mail flow statistics reporting and service-oriented reporting.
Exchange aware availability modeling.
Full protocol synthetic transaction coverage.
Improved design that simplifies deployment and configuration.
Improved topology discovery: unlike its predecessor, the discovery of Exchange 2010 server roles is enabled by default.

The Correlation Engine

One of the new features of the Exchange 2010 MP, probably the most important one, is the Correlation Engine. The goal of the Correlation Engine is to provide better quality and efficient alerting to the Exchange Administrator that is monitoring the System Center Operations Manager Console. It does this by maintaining the health model in memory, and processing state change events, determining when to raise an alert. The Correlation Engine has the required intelligence to fire up only the alerts that require an action, thus significantly reduces the number of the alerts displayed on the Operations Console.

The Correlation Engine is a stand-alone Windows service, installed with the Exchange 2010 MP, which runs on the Root Management Server (RMS).

Figure 1: Architecture of the Correlation Engine

Figure 1 shows the Correlation Engine at the RMS processing several monitors that changed to a critical condition, due to a problem. The Correlation Engine analyzes the impact of these monitors on the overall health and then decides whether to raise or not to raise an alert then visible on the Operations Manager Console.

The Exchange Server 2010 Management Pack also introduces alert classification. Alerts are classified into one of three categories:

Key Health Indicator (KHI): KHIs are issues that affect the health of the service. Most alerts fall into this category (for example, “A mailbox database is dismounted.”)
Non-Service Impacting (NSI): NSI monitors detect problems that may affect some users, but not every user of the system. A good example of an NSI situation is two users with the same proxy address – mail to this address will be returned as non-deliverable, but the overall transport system is not otherwise impaired.
Forensic: Forensic monitors are used to record information that may be relevant while troubleshooting an issue, but isn’t necessarily indicative of an eminent or existing system failure. “CPU activity >90% for 5 minutes” is an example of a forensic issue – there may be a process inappropriately consuming CPU cycles, or the server may have been rebooted and is catching up on normal system activity. These monitors are visible in the Alert Context field of the alert properties and in Health Explorer. Alerts are not raised for Forensic monitors. State is not updated when a single forensic monitor alert is raised. However, state may be updated based on the aggregation of current forensic monitor alerts for each component.

The actions taken by the Correlation Engine is determined based on the several factors.

Monitor state change events: Monitors, which watch for the specific diagnostics from Exchange such as event log messages, performance counter thresholds, and PowerShell task output events, register state change events. Monitors are configured not to alert automatically on state change events. This allows the Correlation Engine to determine the best alert to rise.
Health Model: The class hierarchy includes class relationships that define component dependencies throughout the system. By defining these component dependencies in the object representation of the product, the Exchange Server 2010 Management Pack is able to better understand the health of the Exchange organization.
Timing: The Correlation Engine works in 90-second intervals. When state change events for multiple monitors come in at the same time, it waits to see whether anything else potentially related to the failure is detected so that it can make the most effective determination of the root cause.

The Correlation Engine process uses the following algorithm:

It connects to the Operations Manager SDK service to download the Health Model hierarchy and instance state (on service startup or as needed, if errors require it).
Next, it queries Operations Manager for the latest state change events related to entities in the Exchange Management Pack.
If new NSI state changes are detected, then it raises alerts for them.
KHI monitors are then evaluated, and “chains” of red KHI monitors are isolated. These “chains” indicate issues where a dependency has failed and is impacting dependent processes. Recognizing these relationships is the key step (the Health Model contains all these relationships). Monitors in maintenance mode are simply skipped when evaluating the health model.
Alerts are raised for the root cause monitor in the KHI chain. If the “chain” of KHIs includes both error and warning monitors, then the alert is raised as an error, regardless of the class of the root cause monitor. The KHI chain, including any forensic monitors, is included in the Alert Context field available in the properties of the final alert.
It then waits 90 seconds, and then starts over at step 2.

If the Correlation Engine service is stopped, the Exchange Server 2010 Management Pack will not raise Exchange alerts that correspond to the health of the environment. A general alert is raised to notify that the Correlation Engine is not running.

Although the Correlation Engine affects the way monitor state changes raise alerts, the following items are not different due to its presence:

Overrides still work as expected; you can change certain values or disable monitors just as you do today. Don’t use overrides to disable rules, because that will affect the Correlation Engine. Just disable the monitors associated with the service monitoring you want to override.
Monitors/objects in maintenance mode are skipped by the Correlation Engine.
Per-monitor alert rules were added to the Exchange Server 2010 Management Pack. Per-monitor alert rules allow monitoring personnel to enter company-specific notes for a given alert into the Company Knowledge field, even when the alert rules aren’t used to raise alerts for their corresponding monitors.
Other management packs are not affected by the presence of the Correlation Engine.

Summary

This concludes part 1 of this 6-part article. In this part we covered the new features of the Exchange Server 2010 MP, with special focus on the new Correlation Engine that can significantly reduce the number of alerts that don’t require administrative actions. In the next part we’ll start detailing the installation procedures of the Exchange Server 2010 Management Pack.

Related Links: