Hosted Exchange: What you need to know about ongoing support & operation
In the past few weeks, I have been writing based on the topic "key questions to Exchange Hosting Providers", as it helps customers to make a decision before choosing the right providers for their Exchange Hosting services. This post is a continuation on my effort on the same topic. If you haven't read my previous posts on this topic, here you go:
- Top questions to ask before choosing your Microsoft Exchange Hosting Provider
- Why an Infrastructure Check is Important Before Choosing a Hosting Provider
After carefully going through the features of Hosted Exchange and benefits of your preferred Hoster, the next stage could be to scan through the SLA (Service Level Agreement) which can eventually guarantee the reliable operation mode & support structure.
In the information Technology Service business, the two most critical components of SLA are uptime & high availability. Although most of the service providers offer uptime from 99% onwards, you may find huge differences in terms of hours/minute. For example, the following comparison values give an uptime metric with the use of high available systems/infrastructure:
- 99.9% ≡ Less than 43.8 minutes per month downtime
- 99.99% ≡ Less than 4.38 minutes per month downtime
- 99.999% ≡ Less than 0.44 minutes per month downtime
In the above example, you can see that availability is measured in terms of percentile and this value is basically the uptime of a given year. In other words, the MS Exchange servers in the hosting environment offer a genuine mechanism of continuous availability, which comes with a higher price tag as they guarantee the "state of the art" infrastructure deployment so that it can eliminate single point of failures. This may eventually complement the operation mode & support structure by having the luxury of application upgrades, patches, hardware refresh & other maintenance without shutting down the services & critical systems.
Whatsoever, the SLA becomes useless & hopeless when hosters cannot include penalties in such an incident whereby they do not meet the guaranteed service level. Apart from that, if you deal with Private label Exchange Hosting resellers, it would be even more important to further scrutinize the SLA.
What action should you take now?
Get a copy of the SLA from your selected hosters and last but not least, define compliance level & the penalty process. The following are some key areas for customers to make deal with Hosters in the SLA:
- Demand a monthly report on various compliance levels which you agreed upon with your designated hoster.
- Ensure that you review the uptime/downtime track record every month with help of reports
- Review the performance threshold report you set in the agreement (e.g. slow connections, network latency etc.)
- Policies & procedures are in practice (e.g., management of incident, change & problems and route cause analysis in the event of major breakdown)
In short, this should be a comprehensive real time status report available online. Then establish a mechanism where credit accounts are active in the event service providers go out of compliance according to terms with SLA.
Is that enough? I guess, no. How can you get going your SLA in a healthy state? What would be the underlying key success factors to achieve a successful SLA? Let's look into couple of areas:
Operation: What can the better strategies be?
While there are many strategies, I believe having a strong & reliable disaster recovery plan is second to none. So decide on a better backup plan! I suggest going for two types of backup: Full Backup & Differential. Limit your full back up to 1-2 times per week as it takes a longer time. Differential starts with a full backup, extends to each day. Differential backups the data since the last full backup.
Now with Exchange 2007 Hosting, you can take advantage of double backup protection using Cluster Continuous Replication (CCR). CCR is a high availability feature introduced in Exchange 2007 server, which has inbuilt capability of asynchronous log shipping & replay technology. The beauty of CCR is, it neither requires special hardware nor shared storage. As a win-win, it eliminates single point of failure and helps to minimize the frequency of full backup with reduced volume size for the backup requirement. Isn't that great? Yes, because it also shortens the SLA for recovery time from first failure. If you are interested in knowing more, read Henrik's tutorials on CCR here. Below is CCR diagram (source: Microsoft)
Implement & be consistent on "Best Practice" restorations
This takes you to "extra mile effort" because being able to hardly restore for the first time during the emergency situation is not only risky but also not considered as best practice restoration method. So how can we revise this?
- Have an isolated setup where the service provider should be able to restore the data on a weekly basis.
- Utilize the passive node in Cluster Continuous Replication where it can replay the logs in real time. This ensures the Exchange database consistency.
How efficient are the service providers in deploying software upgrades & security updates?
I believe they must have the following:
- Service providers should have a Centralized System Tool (e.g. System Center Configuration Manager) infrastructure in order to execute up-to-date security patches & software upgrades efficiently.
- Hosters should build a lab setup where they can test the stability & reliability of the package they intend push into the production environment.
- To ensure consistent environment of software, hoster must track & document the packages they pushed down to the production setup.
Knowing more about the Support Model is key before finalizing on the Hoster & SLA.
Hosters can expect many questions from customers. From my experience, this is the hard part, but you have to face it:
- Good hosters provide expanded hours of support with the consideration of customers based in various time zones/geography.
- Hosters...Be ready to have on call 24x7
- Get ready with a web based support (e.g. Live chat, remote help desk agent, etc.)
- Answering Machine & Pager bleep will be a disappointment to the customers. So avoid them as much as possible.
- If Hosters sub contract the support, make sure they are well managed with SLA (between hosters & sub contracting organizations)
- Location of your support team is key, especially when hosters need to provide support across the globe. For example, language, culture, time-zone etc.
Hosters..Do you have Exchange Subject Matter Experts (SMEs) in your support team?
For small scale Exchange hosters, this would be hard to answer. For large scale Exchange hosting providers, this is possible. How can one assure the credibility of Exchange specialists in its organization? In my opinion, there are many factors complementing each other. Subject Matter Experts in MS Exchange are a different type, they are closely attached to the service providers Integration Engineering & Solution Architect team. If customers escalate issues with top level severity, they should be able to resolve it quickly. These SME's credibility goes high if they are certified on Messaging and other industry/community recognitions such as Exchange Ranger, MVPs, Microsoft Certified Solution Architect in messaging etc.
To summarize, build a better SLA with all the key terms mentioned such as Support structure & operation mode in particular.