Let’s face it. Microsoft has a kickass global WAN. Think about the incredible infrastructure they’re supporting, from services that include:
- Bing, which is used by 1 out of 5 people
- Azure, with an estimated 1 billion identities by this point
- Dynamics 365, which likely serves close to a half a million users
- Office 365, in use by 85 million active users
- OneDrive, which likely boasts close to 1 billion users (inclusive of free and paid)
- Xbox, which serves more than 50 million active users
To do all of this seamlessly without making you miss a beat (I mean, after all, you’d hate to lose your place when playing Mass Effect–oh wait, we’re talking about work here?), Microsoft needs to be constantly running, constantly churning, delivering trillions of requests on a regular basis. Serving the enterprise lives at its core, but the gamers among us (not me, I promise) are also pretty heavily reliant on the infrastructure that powers our daily dependencies.
That’s why Microsoft’s global wide-area network plays such an important part in the sustainability of its services and systems. If those numbers above seem daunting (and many are simply estimates based on 2015 figures), what Microsoft does admit is that its infrastructure is powered in 38 regions around the world in hundreds of data centers to ensure near-perfect availability, high capacity, and flexibility to respond to traffic spikes.
Microsoft’s three guiding principles
How does Microsoft do what it does so well? It guides itself on these principles:
- Proximity: Forget latency, optimal latency is key – as long as its infrastructure is close enough, there is low/no latency
- Failure recovery: It’s a busy job to be a Microsoft sysadmin, as they have to stay in control of the capacity and resiliency to ensure the system doesn’t fail when there’s a failure. Redundancy is key.
- Manage traffic: Using software-defined neworking (SDN), Microsoft is proactively managing network traffic at scale.
Proximity
Have a look at the Microsoft network map above. Not only are the datacenters close to the users, Microsoft is also using innovative software to optimize network routing to build and deploy network paths that allow data to travel nearly at the speed of light. The result: reduced latency and a user experience that is fast and what a customer would come to expect.
Traffic enters the network through nodes that Microsoft has strategically placed. There are more than 2,500 unique Internet partners who are part of this node network that span 130 locations.
Azure traffic stays within the Microsoft network. Azure ExpressRoute is used by many customers to create private network connections across a variety of Microsoft services. These ExpressRoute connections bypass the Internet for more reliability, faster speeds, and less latency. There are 37 ExpressRoute sites in near every Azure region.
Resiliency
It’s important for Microsoft to handle failures in a graceful manner–so that the end users don’t even notice they’ve happened. Doing so requires software and hardware. Microsoft uses private fiber (dark fiber) for metro, terrestrial, and submarine paths (even that alone sounds awesome). In three years, Microsoft has grown its long-haul WAN capacity by 700% and can support 1.6 Pbps of inter-datacenter bandwidth.
Microsoft’s wiring under the sea is focused on the Pacific and Atlantic oceans. It has most recently, in partnership with Facebook, invested in MAREA cable, a 6,600 km submarine cable between Virginia Beach, Virginia, USA, and Bilbao, Spain. MAREA is the highest-capacity subsea cable to cross the Atlantic ocean and has eight fiber pairs, supporting 160 Tbps.
But even with this awesome innovation, Microsoft has to be mindful of the risks of cables ripping under the sea. Just a few months ago, a ship shut down the Internet because its anchor ripped the darn wire apart. This makes it imperative to leverage more than one working path under the sea so that there is redundancy no matter what.
Automated software operations
We’re pretty smart as a human species, but we can’t all do it alone. This is why Microsoft Research has worked on a wide range of SDN technologies to manage routing and centralize control. Standard switches and routers are used but Microsoft uses proprietary software to handle the enormous traffic on the network.
SWAN, Microsoft’s software driven WAN, enables centralized management and control of the network infrastructure to improve reliability and efficiency. SWAN controls the network and automatically reconfigures the data plane to match the traffic demand.
A solid commitment
Microsoft’s got a whole road ahead of it as more and more software is released and adopted by more and more users and enterprises alike. They’re doing a darn good job, but they’re also committed to building the fastest and most reliable global network in the public cloud, and for that, we owe them a pat on the back.
Photo credit: Microsoft, Shutterstock