How to ruin your High Availability solution by using Virtualization
It happens very often these days: people contact me because they want some help from me in designing their High Availability solution for SQL Server. There are already a lot of people who are interested in deploying AlwaysOn Availability Groups. And this demand will grow further in the future, because Availability Groups will be part of the Standard Edition of SQL Server 2016.
In the first step of every possible customer engagement, I always ask if they want to deploy their High Availability solution on bare metal or in a virtualized environment. And of course a lot of people tell me that everything is virtualized, because it helps them to save money and share hardware resources across multiple virtual machines.
So far, so good. Nothing is wrong with that approach. I really like it. But after this clarification I get really nasty, and I ask my one-million-dollar question:
“On how many physical ESX hosts you are planning to run your High Availability solution?”
The answer to this question is very often quite simple:
Good-Bye High Availability
Now let’s do a brief recap.
“You want me to make your SQL Server highly available by deploying AlwaysOn Availability Groups, and then you run all the various replicas on ONE physical host? So your solution would look like the following picture.”
“Are you still sure that we are talking about High Availability here? What happens if your physical ESX host is down? In that case your WHOLE “highly available” SQL Server is also down!”
“Hmm, yes. We can follow you – somehow. Understood. But our management wants to have a High Availability solution in place for our SQL Server installation. Therefore we want to deploy Availability Groups. And our physical ESX host is always up and running. Don’t worry about that. We need you to help us with our *SQL Server infrastructure*. We are not talking about the virtualization layer!”
Ok, at that point I’m out of this game, because the whole proposed solution doesn’t make any sense. If you deploy a highly available SQL Server solution, you ALSO have to make sure that the underlying layers are also highly available – including your virtualization layer! If you are using virtualization you need AT LEAST 2 physical hosts, otherwise we are not talking about high availability. Let’s have a look at the following picture.
Now every replica runs on a separate ESX host. When one ESX host is down, it doesn’t matter, because you still have the other host, which runs the other replica of your Availability Group.
But even with 2 physical ESX hosts, your ESX admin can live-migrate all of your replicas onto the same node – in a way that you don’t notice it.
So you also have to educate your ESX admins in a way that this specific scenario doesn’t occur. Oh, and trust me: I have already seen this many times when I have performed a SQL Server Health Check.
If you are thinking or implementing a High Availability solution for your SQL Server installation, please do me a favour and also think about all the other layers. Especially your virtualization layer. Running all your replicas on the same physical ESX host is not really a High Availability option. It looks nice, but it doesn’t give you real high availability. As soon as your ESX host is down, your Availability Group is down. Please bear that in mind next time when you design your High Availability solution.
Thanks for your time,