We in the Exchange product group get this question from time to time. The first thing we ask in response is always, “What was the customer impact?” In some cases, there is customer impact; these may indicate bugs that we are motivated to fix. However, in most cases there was no customer impact: a service restarted, but no one noticed. We have learned while operating the world’s largest Exchange deployment that it is fantastic when something is fixed before customers even notice. This is so desirable that we are willing to have a few extra service restarts as long as no customers are impacted.
You can see this same philosophy at work in our approach to database failovers since Exchange 2007. The mantra we have come to repeat is, “Stuff breaks, but the user experience doesn’t!” User experience is our number one priority at all times. Individual service uptime on a server is a less important goal, as long as the user experience remains satisfactory.
However, there are cases where Managed Availability cannot fix the problem. In cases like these, Exchange provides a huge amount of information about what the problem might be. Hundreds of things are checked and tested every minute. Usually, Get-HealthReport and Get-ServerHealth will be sufficient to find the problem, but this blog post will walk you through getting the full details from an automatic recovery action to the results of all the probes by:
- Finding the Managed Availability Recovery Actions that have been executed for a given service.
- Determining the Monitor that triggered the Responder.
- Retrieving the Probes that the Monitor uses.
- Viewing any error messages from the Probes.
Read more at source: http://blogs.technet.com/b/exchange/archive/2013/06/13/what-did-managed-availability-just-do-to-this-service.aspx