I’m working on a blog post that tracks advances in high availability from the 1950s until today. It will be way too long for most to read, but I’ll eventually finish writing it since it amuses me. In the mean time I came across references that triggered a desire to write about a more narrow topic. And as a reminder, this is my personal blog and these are my views, not necessarily those of my current or previous employers.
If you have some time take a look at John Devito’s tutorial on creating a Windows Cluster. John doesn’t even talk about obtaining and maintaining the requisite hardware, but it still takes to Part 4 to get Microsoft SQL Server failover clustering working. Or take Brent Ozar’s article on setting up SQL Server 2016 AlwaysOn Basic Availability Groups. Brent also recommends you download his checklist for setting up the SQL Server you will be using as your secondary, take some precautions so it will be compatible with the primary, and apply necessary Windows patches. John and Brent make this easier by taking what seems like an infinite set of choices and turning them into a recipe. But it’s still not a recipe you can simply whip up for dinner. These are but two of many write-ups you can find that demonstrate the difficulty in creating a high-availability solution around the tools provided for Microsoft SQL Server.
Putting a high-availability solution in place for any database engine is difficult and complex. Oracle is in a class of its own, on both capabilities and the complexity of implementation. For open source databases there are many options and they all come with differing levels of trade-offs and complexity of implementation depending on the characteristics you are looking for. Want to implement a highly-available PostgreSQL database? Here’s a cookbook for you. Or maybe a packaged consulting offering from EnterpriseDB would help you breakthrough the complexity. Their are a myriad of solutions for MySQL (a 2010 book listed 50 recipes). There is a more recent book by members of Oracle’s MySQL team covering some of them. MariaDB and Percona would both love to help you with consulting to set up your high-availability MySQL solution.
With all this complexity you can imagine the pleasant surprise when a couple of years ago I discovered the Amazon RDS Multi-AZ capability. To setup this high-availability solution takes a single step at either database instance creation time or later via modifying the instance, select Yes (in this case for Amazon RDS for SQL Server) from a drop-down:
Of course the implementation of Multi-AZ may be complex, but all of that is hidden from the DBA and other IT staff. The real work is done by the infrastructure and software that Amazon has created.
When I think back over the years through all the application databases that should have been highly available, but weren’t because of the complexity and cost involved, is when I get the most excited about RDS Multi-AZ. When I tried to reserve a tee-time, and the system was down. Or place an order on a small specialty store website and see the telltale error message indicating the website can’t talk to the database. Or be rushing to change my company benefit elections before open enrollment ends, and realize the database is down and no one is around to do anything about it until Monday. Or look up a book on my Library’s on-line card catalog and realize I was going to have to manually search the stacks instead.
Sure RDS Multi-AZ dramatically brings down the cost and complexity of keeping obviously mission critical databases running. But what excites me even more is that it enables all databases to easily be made highly available.
Stay tuned if you want to know why the transistor was the biggest improvement ever in computer system availability, how a number of attempts to improve availability turned out to be so complex they actually reduced availability, the big breakthrough of checkpoint restart, and how ACID saved the world. It will take me a while to wrap that up, but hopefully it will be worth the wait.