Monitoring the QoS when moving Apps to the Cloud
The GSX Blog
It's no secret that many businesses are already moving their productivity applications to the cloud, or they are considering it at least. In Microsoft's latest earning release, for instance, the company reported a whopping 114% rise in business cloud revenues in the second quarter of its fiscal year alone. That growth is being driven by Office 365, Azure and Dynamic CRM Online adoption, and represents an annualized revenue run rate of $5.5bn.
The benefits of moving to the cloud are pretty clear - lower infrastructure costs, ease of management, predictable subscription-based pricing, and so on. However, businesses need to understand their SLAs in order to make sure that critical applications workloads are highly available in order to make sure that they are getting what they are paying for. Application downtime can be extremely costly, and eventually detrimental to businesses, and cloud services do go down - it's just a fact of life. One of the main bottlenecks is the lack of visibility and especially when outages occur.
Most cloud service providers offer SLAs that offer at least 99.9% availability, which equals about nine hours of application downtime per year. However, that figure can be misleading, and businesses need to understand what can cause downtime from multiple, interdependent points of failure in the cloud service delivery chain - and how what is really covered by a three-nine SLA. In reality, that 99.9% guarantee is only related to the availability of the cloud service provider's infrastructure, but there are many other reasons that can cause critical business applications to become unavailable to end users.
When choosing an SLA that's right for your business, IT administrators need to consider the various points of failure in the service delivery chain that can cause application downtime. These include: the internet backbone; ISP outages; the internet connection itself; Active Directory Federated Services (ADFS) and single sign-on configurations; firewalls, DNS and proxy servers; and, on-premises LAN configurations. When you add up the risks from all of these points in the cloud connection, application downtime can be much greater than the 99.9% the cloud service provider is (on paper) actually on the hook for as far as breaching the SLA.
To mitigate the risks of failure at any of these points in the cloud service delivery chain, businesses and cloud service providers need in-depth and proactive monitoring tools in place to not only gain visibility into the system, but to be able to react to outages before there is any real impact on the end user experience, as well as to ensure that SLAs are, in fact, being met. In addition, these tools need to work across hybrid on-premises and cloud infrastructure.
Third-party monitoring and performance analysis tools like GSX Monitor & Analyzer give businesses and cloud service providers alike the ability avoid service disruptions before they happen. Take the significant outage Microsoft Office 365 customers experienced last summer. In two days, Microsoft Lync went down, and then Microsoft Exchange Online crashed for almost nine business hours. If GSX Monitor had been in place, IT administrators would have been alerted of performance issues with Office 365 before any significant service disruptions occurred, and with GSX Analyzer, they could have diagnosed the root cause of the performance disruption in order to proactively remediate the issues before the overall business was impacted.
Processing end user scenarios regularly is a way to test the availability of Exchange Online from an end-user perspective and prevent downtime of the messaging application.
Overall, it is clearly becoming more cost effective and efficient for businesses to move their applications to the cloud. However, SLAs are much more complicated than they might seem at first glance, and no matter whether applications are running in the cloud, on-premises or in a hybrid environment, it is critical for businesses and cloud service providers to define a set of ground rules that need to be met to guarantee and improve application performance and availability - and most importantly, from the end user experience perspective.