Microsoft Trustworthy Computing (TwC)

Today marks the 10th Anniversary of Bill Gates’ Trustworthy Computing email to all of Microsoft. Most consider this a transitional event for Microsoft, in particular being the point at which Security assumed its proper position as the most important of the many “ities” that products must address. Prior to this memo, or more specifically a decision by the Windows team to halt development of a new version while it took about a year to secure Windows XP (with Service Pack 2), Security had just been another one of the “ities” that products addressed. Yes development teams took it seriously, but not seriously enough. When I joined Microsoft in 1994 I found its attention to Security lacking compared to what we’d had at DEC (for quite a number of years). After the start of the Trustworthy Computing (TwC) initiative no sizable commercial product organization on the planet took it more seriously than Microsoft.

TwC has led to a number of great changes in how Microsoft builds products and responds to threats, but in many ways I think the biggest change was in making it OK to break backward compatibility to address serious security issues. This was the biggest struggle the Windows team faced in creating XP SP2. It was also the biggest challenge we faced in SQL Server. I know I’ve written about the blank SA password problem before, but this is an appropriate time to re-tell the story.

Sybase had originally shipped SQL Server with the SA, or System Administrator, login password defaulting to a blank (meaning no password at all). Microsoft, in its porting role, had retained that default. Over time many organizations had failed to create passwords for the SA account, and many millions of lines of scripts and application code depended on the SA account not having a password. I really wanted to change that situation, but as we worked on SQL Server 7.0 there were many bigger fish to fry and we didn’t need to introduce yet another compatibility problem. For SQL Server 2000 I owned the central Program Management team and thus the Security PM reported to me, as well as the Setup team, so we took a serious look at the problem. What we decided to do was to retain the ability to have a blank SA password, but change that from being the default to one that the person installing SQL Server (be it a new installation or an upgrade) would jump through hoops to select. Then in a later version we’d look at eliminating the SA login entirely. And so SQL Server 2000’s Setup implemented this change. Meanwhile we also wanted the management tools to issue a warning every time someone connected to a server using SA/nopassword, but they were far behind schedule and didn’t implement this functionality. In the pre-TwC world this was an acceptable tradeoff, whereas in the post-TwC world it would not have been.

In truth we could have been even more draconian in our attempts to eliminate SA/nopassword. For example, we could have forced you to actually run a separate setup. Or disabled features when you didn’t have a password. Or created a mapping mechanism thatactually meant “grab the password from this other location” so that code would not need to be changed yet a password would indeed exist. But at the time addressing this problem, particularly as it was bad practice by users rather than an actual bug in the system, wasn’t paramount.

SQL Server also had two different authentication modes, Windows Authentication and Mixed-Mode (ie, Standard plus Windows) Authentication. The entire SA/nopasswordscheme was part of the legacy Standard authentication. Ever since Microsoft first ported SQL Server to Windows NT the philosophy had been to not enhance Standard Authentication as a way to encourage movement to the more secure Windows Authentication. And so SQL Server 2000 didn’t have any of the modern protections such as requiring complex passwords for Standard Authentication. Over time we realized that Standard Authentication wouldn’t die because of scenarios where Windows Authentication was not practical (e.g., access from a Unix system), and so for Yukon (SQL Server 2005) a proposal was made to enhance Standard Authentication. My guess is that this feature would have been cut had it not been for the TwC initiative. Once Microsoft’s overall priorities shifted towards favoring security work the bad situation with Standard Authentication could not be tolerated and thus SQL Server 2005 brought it into the modern era. Note that there were third-party tools that added good password management to SQL Server, and we did investigate licensing and shipping such a tool with SQL Server 2000. However the philosophy around wanting to deprecate Standard Authentication kept this effort from bearing fruit. Instead I co-opted an MCS consultant to write and make available a series of stored procedures that analyzed SQL Servers for bad practices, like SA/nopassword. These were the precursor to the Best Practices Analyzer (BPA) that was released a few years later.

SQL Slammer was the SQL Server teams personalized wakeup call on the importance of the new security efforts. For SQL Server 2000 we’d introduced a new networking model to eliminate the complexities of configuring SQL Server’s network libraries. This was part of a prime directive that had guided us for both SQL Server 7.0 and SQL Server 2000 of focusing on ease of use. Unfortunately a small coding error in the new networking support was exploited to create SQL Slammer and effectively take down the Internet. The bug itself is something that hopefully would not have occurred in the post-TwC memo environment due to the tools and practices that were introduced with the Security Development Lifecycle (SDL). But the more profound effect was for Microsoft to backtrack on ease of use and disable network access to SQL Server by default. This is most noticeable in the Desktop (now Express) Edition, which for a number of reasons (most notably that an Enterprise could have thousands of copies running without a clue that they were there) was the vector that really let SQL Slammer get out of control. I’d been the “godfather” of the new networking approach (and of MSDE 2000) so SQL Slammer weighed very heavily on me. In one of the more poignant events in my career the developer who’d written the buggy code contacted me and apologized for it.

The truth was that the bug exploited by SQL Slammer had in fact been patched a few months earlier. But this was prior to Microsoft Update and patch application was a manual process. Few customers had installed the patch, and even those who had missed the “embedded” MSDE installations which it turned out were near unpatchable. For the SQL Server team finding a way to automate patch distribution and installation became a priority, and with TwC pushing all Microsoft products to address this problem it led to the creation of Microsoft Update as an extension to the existing Windows Update.

The MSDE patchability problem was itself a side-affect of our being forced to prematurely adopt Windows Installer. We were under tremendous pressure to move to Windows Installer for SQL Server 2000, but realized we couldn’t make the move in time. However, MSDE needed a new setup anyway and so we placated the powers that be by using Windows Installer to create it. Unfortunately its style of allowing one to embed an installation within another became the patching problem for SQL Slammer (as you really needed the top-level Setup, usually a third-party application, to distribute the patch). And it actually made the SA/nopasswordsituation worse. Early versions of Windows Installer had no way to keep parameters from ending up in plain text in a log file, and so MSDE’s setup couldn’t allow a password for the SA account to be specified when it was invoked. It would perform a silent install without a password, and then it was up to the top level setup (or a user) to change the password to something sensible. When the SQL Server team brought this issue up with the Windows Installer team resolution was pushed off to a future release. This would never happen in the TwC era, but then I doubt a team operating under the SDL would ever have allowed their design to have this problem in the first place.

The security problems that started surfacing in the late 90s had other early impacts on SQL Server. For example, changes in Outlook to keep malware from spreading via email broke our SQLMail feature. While that lead Kurt Delbene (then Outlook GM) and I to have a couple of heated arguments (that did result in a workaround), it was a good early example of starting to favor security over backward compatibility.

Over the last 10-12 years security has gone from being just one of many important characteristics one must address, to a characteristic that has a fairly high minimum bar that must be met if you expect customers to use your product. As I’ve written about several times things are far from perfect, and today’s minimum bar has become unacceptably too low. But you have to crawl before you walk, and walk before you run. We were crawling before TwC, now the industry is walking. Hopefully soon we’ll graduate to running.

2 Responses to Microsoft Trustworthy Computing (TwC)

cpsltwr says:

January 15, 2012 at 12:26 pm

Don’t you think this (absolutely necessary) effort, as much as antitrust or other circumstances, was a major factor in the “lost five years” you mention in the other post? Not only did it slow down development and require developers to spend more time fixing existing problems instead of adding new features, it seems to have basically killed off the OLE/DCOM/ActiveX architecture that was a big part of their technical strategy at the time.

It’s sort of like if, at a time when you seem to be moving up in the world, the creditors of a huge debt you owe finally track you down and you have to spend five “lost” years paying it off – the real source of the problem was not in those five years, but at the time you incurred the debt in the first place.
- halberenson says:
  
  January 15, 2012 at 2:53 pm
  
  It was indeed a big factor in the lost five years, in multiple ways. Obviously it directly caused a ~12 month delay while the SDL was put in place and SP2 was worked on. It also added to the chaotic development process for Longhorn, which teams had already started working on and then had to shelve for a while. And when development of Longhorn resumed it was somewhat haphazard as different teams resumed work somewhat independent of one another (depending on how much security work they had to do). Finally it was responsible for the overzealous approach to UAC in Vista, causing it to be one of the major knocks against that release. The decision to focus on .NET, which addressed the problems of the native mode ActiveX, was partially a reaction to the security problems of native code. And the focus on .NET was a major cause of the failure of Longhorn. It wasn’t ready to be at the center of an OS, and by the time Microsoft gave in to that fact they had to throw away the Longhorn work and start over with native code. This destroyed many major initiatives, from WinFS (which couldn’t be reimplemented in native code in time for Vista) that was eventually cancelled to WPF which was (originally supposed to be THE graphics system for Windows but was) instead released as the graphics system just for .NET.

Comments are closed.