16TB Cloud Databases (Part 1)

I could claim the purpose of this blog post is to talk about Amazon RDS increasing the storage per database to 16TB, and to some extent it is.  It’s also an opportunity to talk about the challenges of a hyperscale environment.  Not just the world of AWS, but for Microsoft Azure, Google Cloud, and others as well.  I’ll start with the news, and since there is so much ground to cover I’ll break this into multiple parts.

As part of the (pre-) AWS re:Invent 2017 announcements Amazon RDS launched support that increased maximum database instance size from 6TB to 16TB for PostgreSQL, MySQL, MariaDB, and Oracle.   RDS for Microsoft SQL Server had launched 16TB database instances back in August, but with the usual RDS SQL Server restriction of them not being scalable.  That is, you had to pick 16TB at instance create time.  You couldn’t take a 4TB database instance and scale its storage up to 16TB.  Instead you would need to dump and load, or use the Native Backup/Restore functionality, to move databases to the new instance.  If the overall storage increase for RDS was lost in the noise of all the re:Invent announcements, the fact that you could now scale RDS SQL Server database instance storage was truly buried.  The increase to 16TB databases benefits a small number of databases for a small number (relatively speaking) of customers, the scalability of SQL Server database instance storage benefits nearly all current and future RDS SQL Server customers.

While RDS instances have been storage limited, Amazon Aurora MySQL has offered 64TB for years (and Aurora PostgreSQL was also launched with 64TB support).  That is because Aurora was all about re-inventing database storage for the cloud. so it addressed the problems I’m going to talk about in its base architecture.  In the case of non-Aurora RDS databases, and Google’s Cloud SQL, Azure Database for MySQL (or PostgreSQL), and even Azure SQL Database (which despite multiple name changes over the years, traces its lineage to the CloudDB effort that originated over a decade ago in the SQL Server group) have lived with the decades old file and volume-oriented storage architectures of on-premises databases.

Ignoring Aurora, cloud relational database storage sizes have always been significantly limited compared to their on-premises instantiation.  I’ll dive into more detail on that in part 2, but let’s come up to speed on some history first.

Both Amazon RDS and Microsoft’s Azure SQL Database (then called SQL Azure) publicly debuted in 2009, but had considerably different origins.  Amazon RDS started life as a project by Amazon’s internal DBA/DBE community to capture their learnings and create an internal service that made it easy for Amazon teams to standup and run highly available databases.  The effort was moved to the fledgling AWS organization, and re-targeted to helping external customers benefit from Amazon’s learnings on running large highly-available databases.  Since MySQL had become the most popular database engine (by unit volume), it was chosen to be the first engine supported by the new Amazon Relational Database Service.  RDS initially had a database instance storage size limit of 1TB.  Now I’m not particularly familiar with MySQL usage in 2009, but based on MySQL’s history and capabilities in version 5.1 (the first supported by RDS), I imagine that 1TB covered 99.99% of MySQL usage.  RDS didn’t try to change the application model, indeed the idea was that the application had no idea it was running against a managed database instance in the cloud.  It targeted lowering costs while increasing the robustness (reliability of backups, reliability of patching, democratization of high availability, etc.) of databases by automating what AWS likes to call the “undifferentiated heavy lifting” aspects of the DBA’s job.

As I mentioned, Azure SQL started life as a project called CloudDB (or Cloud DB).  The SQL Server team, or more precisely remnants of the previous WinFS team, wanted to understand how to operate a database in the cloud.  Keep in mind that Microsoft, outside of MSN, had almost no experience in operations.  They brought to the table the learnings and innovations from SQL Server and WinFS, and decided to take a forward-looking approach.  Dave Campbell and I had spent a lot of effort since the late 90s talking to customers about their web-scale application architectures, and were strong believers that application systems were being partitioned into separate services/microservices with separate databases.   And then those databases were being sharded for additional scalability.   So while in DW/Analytics multi-TB (or in the Big Data era, PB) databases would be common, most OLTP databases would be measured in GB.  Dave took that belief into the CloudDB effort.  On the technology front, WinFS had shown it was much easier to build on top of SQL Server than to make deep internal changes.  Object relational mapping (ORM) layers were becoming popular at the time, and Microsoft had done the Entity Framework as a ORM for SQL Server.  Another “research” project in the SQL Server team had been exploring how to charge by units of work rather than traditional licensing.  Putting this altogether, the CloudDB effort didn’t go down the path of creating an environment for running existing SQL Server databases in the cloud.  It went down the path of creating a cloud-native database offering for a new generation of database applications.  Unfortunately customers weren’t ready for that, and proved resistant to some of the design decisions (e.g., Entity Framework was the only API offered initially) that Microsoft made.

That background is a little off topic, but hopefully useful.  The piece that is right on topic is Azure SQL storage.  With a philosophy apps would use lots of modest sized databases or shards and understand sharding, charging by the unit of work which enabled multitenant as a way to reduce costs, the routing being built above a largely unchanged SQL Server engine, and not supporting generic SQL (and its potential for cross-shard requests), Azure SQL launched with a maximum database size of 50GB.  This limit would prove a substantial pain point for customers, and a few years later was increased to 150GB.  When I asked friends why the limit was still only 150GB they responded with “Backup.  It’s a backup problem.”  And therein lies the topic that will drive the discussion in Part 2.

I’ll close out by saying that relatively low cloud storage sizes is not unique to 2009, or to Amazon RDS and Azure SQL.  Google Cloud SQL Generation 1 (aka, their original MySQL offering) was limited to 500GB databases.  The second generation, released this year for MySQL and in preview for PostgreSQL, allows 10TB (depending on machine type).  Azure SQL Database has struggled to increase storage size, but now maxes out at 4TB (depending on tier).  Microsoft Azure Database for MySQL and PostgreSQL is limited to 1TB in preview, though they mention it will support more at GA.  RDS has increased its storage size in increments.  In 2013 it was increased to 3TB, and in 2015 to 6TB.  It is now 16TB or 64TB depending on engine. Why? Part 2 is going to be fun.







This entry was posted in Amazon, AWS, Cloud, Computer and Internet, Database, Microsoft, SQL Server. Bookmark the permalink.