X

Monitoring Exchange 2013 with SCOM 2012 (Part 1)

If you would like to read the other parts in this article series please go to:

Introduction

When I’m discussing what the mission critical systems are of any given organization, email often comes up as one of the most crucial and important service to maintain. It’s not a surprise since, although it’s not categorized as line-of-business or core application, people rely on email at some level to perform their regular job tasks. And with this in mind, proactively maintaining a healthy messaging infrastructure should be on the top of the list for every Systems Administrators throughout the world (and I’m sure it is, as I remember a survey from a couple of years ago where for over a third of CIOs and IT managers, a week without e-mail is more traumatic than events such as a minor car accident, moving to a new home, or getting married or divorced).

In case you’re running Microsoft Exchange Server, System Center 2012 Operations Manager (SCOM 2012) can provide the necessary monitoring and alerting to help maintain those messaging servers to ensure they run smoothly without any problems. The latest Microsoft Exchange Server 2013 Management Pack (MP) includes a complete health model and a full complement of diagnostics-based alerts. This MP is also much simpler than its predecessors and more user focused, with a simplified dashboard that makes it easier to quickly determine what users are experiencing.

Like the previous versions of the Exchange MP, Microsoft invested all its internal experience running Exchange servers into the development of this software piece. And this time, the Exchange team is also sharing all the knowledge acquired through the management of Office 365 and Exchange Online environment, with its extreme demands on availability and performance.

Exchange Server 2013 introduced a new feature called Managed Availability, which comes with monitoring built in. This latest management pack leverages the capability of Exchange to detect and automatically recover from performance and availability issues, thus reducing alert noise and administrative overhead.

In terms of interoperability, this Management Pack does not upgrade the Exchange 2010 Management Pack, this is a completely new MP. It is possible to run these Management Packs side-by-side as you upgrade your Exchange environment from 2010 to 2013.

What’s New

One thing you’ll notice with the Microsoft Exchange 2013 Management Pack is its simplicity (some might say too simple). It contains 3 views, less than 20 classes and about 75 monitors that cover Exchange component health (such as Hub Transport health), customer touch point health (such as “is OWA working”), clustered scenarios, as well as dependencies monitoring (“is Active Directory healthy”). Monitoring covers primarily availability and performance scenarios.

This simplicity in the management pack also means a lower impact to the Operations Manager environment and a better scalability.

The following are some of the new features in the Exchange 2013 Management Pack:

  • Simplified dashboard: the dashboard of the Exchange 2013 Management Pack has been simplified and refined into the following three categories
    • Active Alerts - Provides a list of all outstanding alerts in your organization.
    • Organization Health - Provides an overview of the overall service health in your organization.
    • Server Health - Provides an overview of the health of individual servers in your organization.
  • User focused monitoring: Exchange 2013 introduces a monitoring and recovery infrastructure called Managed Availability that focuses on the user experience. All Exchange 2013 components have built-in monitors that detect problems and attempt to recover the service availability. Any issues that can't be recovered automatically are escalated to the Exchange 2013 Management Pack as an alert.
  • There are no performance counters collected by this Management Pack: the monitoring does cover any performance issue that might arise, leveraging Exchange Managed Availability.
  • There are no reports in this Management Pack: you can still use some of the built-in Operations Manager reports (such as the Health and Availability reports) to track organization availability, or define SLAs against the Organization.
  • The Correlation Engine, introduced with the Exchange 2010 MP, is no longer used: Although it seemed a good idea with the Exchange 2010 MP, with the introduction of the Exchange 2013 Managed Availability, it no longer makes sense. The correlation logic, as well as self-healing capabilities moved to the Exchange servers. Each monitored Exchange server is responsible for monitoring its own health, and simply reports this via the Operations Manager agent. There is a little bit of roll-up going on, from Exchange server to Organization health

Using Health Explorer (Figure 1) you can dive into the monitoring capabilities of the Exchange Management Pack, exposing all the available monitors. As depicted in Figure 2 and Figure 3, each monitor has a link to online knowledge.


Figure 1: Exchange 2013 Health Explorer


Figure 2: Heath Explorer for Exchange 2013


Figure 3: Online knowledge article

If instead of using Health Explorer you open the management pack using MP Viewer (Figure 4), you’ll have a hard time trying to find the 75 monitors included. That’s because most of them are just not there, the MP leverages the information provided by Exchange 2013 Managed Availability.

It might be fun to also open the Exchange 2010 MP (Figure 5) and compare both of them. This will give you a clear image of all the simplicity within the latest Exchange management pack.


Figure 4: MP Viewer with Exchange 2013 MP


Figure 5: MP Viewer with Exchange 2010 MP

Exchange Server 2013 Managed Availability

Managed availability is defined as a set of internal processes made up of probes, monitors, and responders that incorporate monitoring across all server roles and all protocols. With managed availability, internal monitoring and recovery-oriented features are tightly integrated to help prevent failures, proactively restore services, and initiate server failovers automatically or alert administrators to take action. Managed availability moves away from monitoring individual separate slices of the system to monitoring the end-to-end user experience, and protecting the end user’s experience through recovery-oriented computing

Managed availability includes three main asynchronous components that are constantly doing work: probes, monitors and responders.

  • Probes: These are sets of data collectors that measure various components. There are three distinct types of probes:
    • Synthetic transactions that measure synthetic end-to-end user operations and checks that measure actual traffic.
    • Checks that measure actual customer traffic.
    • Notifications that allow Exchange to take immediate action. A good example of this is the notification that is triggered when a certificate expires.
  • Monitors: The data collected by probes are passed on to monitors that analyze the data for specific conditions and depending on those conditions determine if the particular component is healthy or unhealthy.
    The monitoring is done at different layers to deal with dependencies. Because there is no correlation engine in Exchange 2013, the dependencies are differentiated with unique error codes that correspond to different probes and with probes that don’t include touching dependencies.
  • Responders: If a monitor determines that a component is unhealthy, it will trigger a responder. If the problem is recoverable, the responder attempts to recover the component using the built-in logic. There are several responders available for each component, but the one responder that’s relevant for the Exchange 2013 Management Pack is the Escalate Responder. When the Escalate Responder is triggered, it generates an event that the Exchange 2013 Management Pack recognizes and feeds the appropriate information into that alert that provides administrators with the information necessary to address the problem. These are the types of responders available:
    • Restart Responder: Terminates and restarts service
    • Reset AppPool Responder: Cycles IIS application pool
    • Failover Responder: Takes an Exchange 2013 Mailbox server out of service
    • Bugcheck Responder: Initiates a bugcheck of the server
    • Offline Responder: Takes a protocol on a machine out of service
    • Escalate Responder: Escalates an issue

The specific set of probes, monitors and responders within Exchange 2013 Managed Availability are referred to as health sets. Health sets are further grouped into functional units called Health Groups. There are four Health Groups and they are used for reporting within the SCOM Management Portal (Figure 1):

  • Customer Touch Points – components with direct real-time, customer interactions (e.g., OWA).
  • Service Components – components without direct, real-time, customer interaction (e.g., OAB generation).
  • Server Components – physical resources of a server (e.g., disk, memory).
  • Key Dependencies – server’s ability to call out to dependencies (e.g., Active Directory).


Figure 6: Exchange 2013 Managed Availability

Managed availability runs on every Exchange 2013 server and it is implemented in the form of two processes:

  • Exchange Health Manager Service (MSExchangeHMHost.exe) - A controller process that is used to manage worker processes. It is used to build, execute, and start and stop the worker process, as needed. It is also used to recover the worker process in case that process crashes, to prevent the worker process from being a single point of failure.
  • Exchange Health Manager Worker process (MSExchangeHMWorker.exe) - A worker process that is responsible for performing the runtime tasks.

Systems Center Operations Manager is used as a portal to see health information related to the Exchange environment. The alerts within the System Center Operations Manager (SCOM) portal indicate unhealthy states as reported by the Managed Availability components in Exchange 2013.

Each server executes its own probes, monitors itself, and takes action to self-recover, and escalates to SCOM via the Operations Manager agent when needed. This is a different approach from previous versions of Exchange and the corresponding management pack, where all events were escalated to a central SCOM server that would have to decide, based on the correlation engine, to raise an event or not. Pushing everything to one central place didn’t work, that’s why now each individual Exchange server acts as an island.

Notification and alerting to Operations Manager is handled via events, so the Management Pack has a set of simple event monitors that trigger based on these events. Events are logged to the Microsoft > Exchange > ManagedAvailability > Monitoring event log via the Escalate Responder.


Figure 7: Exchange 2013 Managed Availability Event Log

To view health, you use the Get-ServerHealth and Get-HealthReport cmdlets. Get-ServerHealth is used to retrieve the raw health data, while Get-HealthReport operates on the raw health data and provides a current snapshot of the health.

Get-HealthReport -Identity E2K13-MBX1


Figure 8: Get-HealthReport

Get-ServerHealth -Identity E2K13-MBX1


Figure 9: Get-ServerHealth

Summary

And so we conclude part 1 of this 5-part article about configuring the Exchange 2013 Management Pack for System Center 2012 Operations Manager. We covered the new features of the MP, and also the native monitoring functionality of Exchange Server – Managed Availability – which decentralizes monitoring and healing actions.

In the next part we’ll cover the installation of the Exchange Server 2013 Management Pack.

If you would like to read the other parts in this article series please go to: