Endings and Beginnings (Part 1 – AWS)

Last week’s announcement of Amazon Aurora Multi-Master being generally available marked a kind of ending for me. It also served as a reminder that I haven’t written anything about my new venture, Gaia Platform. So nearly two years after I tried (and once again failed) at retirement, let me wrap up my Amazon Web Services (AWS) adventure and tell you about my new one.

The lure of working on databases for a new computing era, The Cloud, is what drew me out of semi-retirement and to AWS. I was running Amazon Relational Database Service (RDS), parts of Amazon Aurora (it’s complicated in that I had the control plane and product management reporting to me, and then the Aurora PostgreSQL project fully reported to me, but my peer Anurag Gupta owned Aurora MySQL and the Aurora storage system and is the father of Aurora; I get embarrassed when assigned credit that rightfully belongs to Anurag), the Database Migration Service (DMS), Performance Insights, and a few things that aren’t externally visible (e.g., the DBAs for AWS’ control plane databases, an operations team for a bunch of services in AWS under the CIA’s Commercial Cloud Services (C2S) program).

There were a lot of challenges in this new role for me and I relished them, even when I struggled. For example, I’d always forced my hand into the business side of the products I worked on but never had actual responsibility for the business. At AWS I owned the relational database business. While confidentiality considerations keep me from talking actual sizes, it was one of the largest AWS businesses and the fastest growing of those larger businesses.  In the weeks before I stepped down we passed one of the (even) more household name services in revenue. I have no real idea on the current business size, but doing some very conservative projections it must be an unbelievably big business today. What amazes me when I look back on the experience is not that they trusted me with engineering and operations for RDS, I had the track record to suggest I could succeed at that, but that they trusted me with such major business responsibility. That turned out to be an incredible career highlight for me, and I thank Andy Jassy, Charlie Bell, and Raju Gulabani for giving me that opportunity. Particularly Raju, because I know the only way it happened is because he committed to have my back.

After three years I announced I was going back into retirement. For those who don’t know, I live in Colorado and commuted every week (early Monday morning out, late Friday return) to Seattle. That I sustained it for three years is only a bit of a mystery to me, that my wife survived three years of it is amazing! But neither of us could sustain it longer, and didn’t want to move to Seattle. Plus we had some family things to take care of. There is more to this story, and we almost found a way where I would keep working for Amazon part-time.  But I realized I’d never contribute to Amazon in a way I found satisfying as a part-timer. So after a few months I pulled the plug on a staged retirement and did a cold-turkey retirement. Or so I thought. Once again a little credit here, I couldn’t have worked from Colorado without Raju having my back (i.e., in 2014/2015 Amazon literally did not allow people to do work when in Colorado, so Raju had to cover for me if there was an operational issue that needed VP involvement over a weekend), and he was the one who proposed a staged retirement.

So why was last week’s launch of Aurora Multi-Master a good end point of the AWS story for me? My one major regret from my days at Microsoft had been that we never shipped a “single system image”, multi-master, SQL Server clustering solution. When we did the original planning for building our own database business (out of the ashes of Sybase SQL Server) we’d put clustering in our 3 version plan.  Yukon (SQL Server 2005) was supposed to include a single system image clustering solution. By single system image I mean that an application can talk to and update a database on any node in the cluster  completely transparent to the fact that the database is distributed over multiple nodes. In other words, it looks just like you are talking to a single system. That’s what we’d done at DEC with Rdb (conceptually copied by Oracle to become RAC). Others had done variants as well, but after a burst of energy in this space in the 80s to mid 90s, vendors (except for Oracle) lost interest. The SQL Server team made a number of stabs at it, but they always faltered in the wake of either higher priority work or technical challenges. Doing single-system image is hard. So sharding, or dropping some of the transparency (Spanner is an example), or going to NoSQL models that had far fewer transparency demands, became alternate answers. I’ve been away from Microsoft for almost 9 years, and the SQL Server team for over 15, and SQL Server (or Azure SQL) still doesn’t have a single-system image clustering solution. But with AWS, that was once again the vision for Aurora.

I didn’t get to be the one that built Aurora Multi-Master, and I’m fine with that. When driving back from an Andy Jassy OP1 offsite in the summer of 2015 Anurag and I talked about single-system image clustering and how desperately we both wanted to see it done. No matter how we rejiggered the organization structure over time, we would make this happen. Anurag got to drive it, although he too left AWS before Aurora Multi-Master GA, it’s done now. Oh they have plenty more to do to complete the vision (e.g., multi-region multi-master), but the solution is out now. Take your credit card and go give it a try.  From my standpoint there is always a ton more to do  in meeting customer database needs in the cloud. But in terms of a feeling of completeness and ability to move on, having Amazon Aurora Multi-Master available lets me focus on what other interesting problems there are out there. I’ll talk about that in part 2.

This entry was posted in Amazon, Aurora, AWS, Cloud, Computer and Internet, Database, Microsoft, RDS, SQL Server. Bookmark the permalink.

4 Responses to Endings and Beginnings (Part 1 – AWS)

  1. Joseph Williams says:

    Good part I story, Hal. Looking forward to part II.

  2. joe yong says:

    Nice! Glad to know you had a hand in its inception Hal, even if it was the idea and goal but not actually building it. I used to wonder if we didn’t deliver in SQL Server because enough people didn’t want to do it so feet were dragged and it was killed at the first opportunity. Yes, there are hard, hard technical problems but they are solvable and we had people with the right expertise and experience. Having worked on an OPS project (!@#*$&%!@&!) in a previous life I always believed we could do the world a favor and implement it right. Then RAC (10g onwards, 9i was still a bit of a PITA) came along and what a world of difference. Especially when you implement TAF for SELECT failover.

    Now I’m curious what fun head scratchers you’ve gotten into now; hurry up with Part 2. 🙂

    • halberenson says:

      It was a combination of things, from the hardware vendors coming through with higher end systems taking the pressure off on scalability to WinFS becoming a priority to the initial cloud focus being very low-end multi-tenant focused to disagreements on what approach to take to all the issues with running well in the chaotic hardware environments that characterized the PC-based server world to a failed big project to….

Comments are closed.