Implementing high availability for Microsoft SQL Server running on Linux

It was a watershed moment for me several years ago when I heard that Microsoft SQL Server would soon be able to run on Linux. By that point, we already had PowerShell for Linux and bash for Linux, but being able to deploy SQL Server on Linux was something else entirely. Was Microsoft losing interest in its Windows Server platform? Or were they launching a new initiative in the war to conquer the open source world through a strategy of “embrace and extend”? Today the dust has settled somewhat and SQL Server on Linux is poised to finally come of age. To help us grasp the significance of what’s happening I asked an expert to share his thoughts on the subject of implementing high availability for Microsoft SQL Server running on Linux. Edwin Sarmiento is the managing director of 15C, a consulting and training company that specializes in designing, implementing, and supporting SQL Server infrastructures. He is a 12-year former Microsoft data platform MVP and Microsoft Certified Master specializing in high availability (HA), disaster recovery (DR), and system infrastructures running on the Microsoft server technology stack. What follows is a technical Q&A session that I recently had with Edwin.

microsoft SQL Server

MITCH: Edwin, can you start for us by briefly summarizing the high availability features and capabilities of Microsoft SQL Server when it’s running on the Windows Server platform?

EDWIN: I’d like to start by emphasizing that the features and capabilities don’t matter if you don’t know what your specific goals are. It’s just like saying, “I’d like to get the 2019 BMW X5 because it has self-parking capabilities.” The real questions go beyond features and capabilities: “How often do you drive to and from work?” “Do you need a powerful vehicle to pull a trailer?” “What’s your budget?” And oftentimes, the most obvious questions are missed: “Do you even have a driver’s license?”

In my experience working with consulting clients all around the world, the majority of them seem to be focusing on features and capabilities without asking the real questions. It is important to start the conversation with recovery objectives (RPO/RTO) and service level agreements (SLAs) before even starting to look at the technologies that can help them achieve those goals.

Now that I’ve gotten that out of the way, the most popular high availability features and capabilities of Microsoft SQL Server on the Windows Server platform fall under the Always On category: SQL Server failover clustered instances (FCI) and availability groups (AG). Both solutions implement the concept of a hot standby where a redundant system is available to immediately assume the responsibilities of the primary system in the event of a catastrophic failure. While the mechanisms used to achieve high availability between the two are different, they share a common platform that makes this possible: the Windows Server Failover Cluster (WSFC). It is the job of the WSFC to act as a platform that provides high availability, does health detection, and performs automatic failover should there be issues that can lead to possible outages.

In my experience working with consulting clients all around the world, the majority of them seem to be focusing on features and capabilities without asking the real questions. It is important to start the conversation with recovery objectives (RPO/RTO) and service level agreements (SLAs) before even starting to look at the technologies that can help them achieve those goals.

Before introducing AGs in SQL Server 2012, database mirroring was another feature designed to achieve high availability in SQL Server. It is still available in the product but Microsoft has marked it for deprecation starting with SQL Server 2012.

MITCH: Microsoft ported its SQL Server product to the Linux platform back in 2017 but it didn’t include built-in support for clustered hosts. Why is that?

high availability
Shutterstock

EDWIN: If you think about it, this practice of not including built-in support for clustered hosts wasn’t just introduced in SQL Server 2017. In fact, Microsoft has been doing it ever since SQL Server 6.5. Instead of building their own clustering technology, the SQL Server team relied on a tried-and-tested platform to provide high availability capabilities. And that’s brilliant from a business standpoint. This has saved them a lot of engineering resources and money because they didn’t have to reinvent the wheel. They just used something that works. In fact, this was how they evolved database mirroring into AG — SQL Server takes care of the data replication process while WSFC takes care of high availability. We just don’t see it this way because both the WSFC and SQL Server come from the same company: Microsoft. But the reality is that the Windows Clustering team and the SQL Server team are two separate teams that are in two separate buildings at the Microsoft campus in Redmond. When you open a case with Microsoft involving SQL Server Always On, the SQL Server support engineers will assess the issue and resolve it if it happens to be a problem with SQL Server. But if it is a problem with the WSFC, they escalate it and let the Windows Clustering support team handle it.

With SQL Server on Linux, Microsoft doesn’t own the operating system. It doesn’t make business sense to make their own operating system with built-in clustering technology. It would take years to do that and would affect their time-to-market. Even SQL Server on Linux wasn’t built like that. Microsoft didn’t rewrite the entire SQL Server codebase just to make it work on Linux. They took what already works from Project Drawbridge incubated inside Microsoft Research, modified it, and made it commercially available. In the fast-changing world of technology, what makes sense is to not reinvent the wheel but to find something that works and build on top of it. Pacemaker for Linux was already available as an open-source cluster resource manager. All that Microsoft had to do was to write resource agents for use with SQL Server Always On so that it can work with Pacemaker.

MITCH: OK so when Microsoft released SQL Server for Linux, one initially had to use an open-source high availability resource manager like Pacemaker if you wanted to implement HA for SQL Server on Linux. What limitations did this impose compared to running SQL Server on a Windows Server failover cluster?

EDWIN: The biggest blocker was adoption. Almost all SQL Server database administrators (DBA) are Windows folks because SQL Server only ran on Windows Servers before SQL Server 2017. Porting SQL Server to Linux meant convincing SQL Server DBAs to learn a new operating system that they didn’t have to in the past. And that’s just the operating system. Imagine learning an operating system and a cluster resource manager like Pacemaker just so you can deploy and manage SQL Server. Two versions later, the majority of the SQL Server DBAs are still hesitant to deploy SQL Server on Linux.

From a technical standpoint, SQL Server on WSFC is more tightly coupled compared to SQL Server on Pacemaker. And that’s just because of the long history of SQL Server running on Windows Server. For example, you can initiate a manual failover of an AG from within SQL Server running on a WSFC. On Linux, this has to be done via Pacemaker. What about Active Directory integration? While you can join a Linux server to an Active Directory domain, you cannot leverage Active Directory domain accounts as service accounts for endpoint authentication – you need to use certificates. This becomes a management nightmare because you have to implement some form of certificate lifecycle management, something that SQL Server DBAs need not worry about when using Active Directory accounts for endpoint authentication. These are just some of the limitations imposed by running SQL Server on Pacemaker. And they have a huge impact.

MITCH: I understand that there are now third-party solutions available that offer better HA capabilities for SQL Server running on Linux. Can you elaborate on this a bit?

EDWIN: I mentioned that the biggest blocker is adoption. And if adoption is slow, growth is going to be slow. We need to have opportunities that eliminate these blockers to allow for adoption. This is where DH2i comes in. DH2i provides an alternative to delivering SQL Server high availability on Linux without the complexity of Pacemaker. It even allows you to combine different operating systems – whether Windows or Linux — something that you cannot do on either WSFC or Pacemaker. And because they have a management tool that runs on Windows, SQL Server DBAs don’t have to worry about learning how to manage Linux just so they can deploy and manage SQL Server databases. This can help grow adoption of SQL Server on Linux while also providing high availability. From a business standpoint, faster adoption means faster deployments. And faster deployments mean faster time-to-market.

MITCH: Any other thoughts you’d like to share with our readers regarding how to implement HA for SQL Server on Linux?

EDWIN: First, embrace change. It’s unimaginable that Microsoft would even consider doing anything on Linux in the Steve Ballmer era. But it’s now a reality. SQL Server on Linux is just the beginning because this opened opportunities for running SQL Server on Docker containers and Kubernetes. How you embrace the changes in the technology landscape will determine whether you’ll be successful in your career within the next 10 years. And whether you embrace change or not, it is still going to happen.

Next, as a SQL Server DBA or somebody who manages SQL Server, you have to learn to ask yourself “What if?”

  • What if my company decides to deploy SQL Server high availability on Linux tomorrow, how confident am I to get the job done?
  • What if I was the lead engineer for deploying new projects on SQL Server on containers instead of just responding to tickets every day?
  • What if my company started outsourcing new projects on these new platforms because nobody on our team is capable enough to work on them, how sure am I that I’ll still have a job tomorrow?
  • What if I’m missing great opportunities just because my SQL Server skills are no longer relevant?

Here’s the truth. Making sure that your SQL Server skills are relevant and up-to-date is the secret to succeeding in this industry. It’s what guarantees you the confidence to get the job done. And should you decide to apply for a new role or move on to a much better opportunity, you know you have what it takes. It’s basically the answer to the question “Do you even have a driver’s license?” I’ve been helping SQL Server professionals for almost a decade now on how to be agile and technically adaptable so their skills stay relevant. Because you want to be able to take advantage of opportunities for career growth.

It’s unimaginable that Microsoft would even consider doing anything on Linux in the Steve Ballmer era. But it’s now a reality. SQL Server on Linux is just the beginning because this opened opportunities for running SQL Server on Docker containers and Kubernetes.

Finally, take action. Take charge of constantly improving yourself. Because the only way to showcase your value is when you have the confidence and the technical expertise to either implement Pacemaker on Linux or DH2i’s DxEnterprise to provide high availability to SQL Server. Nobody else can do that for you.

MITCH: Thank you, Edwin, for taking the time to share your expertise in this area with us.

EDWIN: You’re very welcome!

About The Author

Leave a Comment

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Scroll to Top