OLE DB and SQL Server: History, End-Game, and some Microsoft “dirt”

Last month the Microsoft SQL Server team effectively sounded the death knell for Microsoft’s OLE DB. I say “effectively” because while SQL Server isn’t the only implementor of OLE DB, it is (or rather was) Microsoft’s flagship for this data access technology. Since I was both a godfather of the OLE DB strategy and responsible for the SQL Server implementations that have now been deprecated I thought now would be a good time to reveal the overall strategy and why it never succeeded as envisioned.

Before we time travel into OLE DB’s origins let’s survey the current state of data access in SQL Server. The first person who contacted me after Microsoft announced SQL Server’s deprecation of OLE DB basically said “there goes Microsoft, changing data access strategies again”. Well, Microsoft does indeed have a history of replacing its data access APIs all too frequently. But the truth is that OLE DB has been with us for 15 years and, although deprecated, will be supported for another 7. 22 years is a heck of a long lifespan for a technology, particularly one that was only partially successful. And the truth is that OLE DB is well past its prime, with many other data access technologies (both older and newer) in far greater use.

The reason Microsoft has seemingly changed horses so often on the data access front is because of the rapid evolution that the market has demanded. Initially SQL Server used the DB-Library API that Sybase had invented (and then deprecated around the time Microsoft and Sybase ended their relationship). Microsoft had come up with ODBC as a way for Excel to import data from various data sources, but the primary client database actually in use at the time was the JET database inside Microsoft Access and Visual Basic. A programming model called DAO was provided to access JET. DAO could access SQL Server and databases supporting ODBC but only through JET’s distributed query processor (RJET/QJET) thus making that access slow. For SQL Server 6.0 Microsoft created a native ODBC driver for SQL Server to replace DB-Library and for Visual Basic 4 the Developer Division introduced RDO as an object model that lived directly on top of ODBC and thus didn’t have to go through RJET/QJET. RDO/ODBC quickly became the native and preferred data access story for applications written with Microsoft technology. When OLE DB came along we introduced ADO as the object model directly on top of OLE DB. With the introduction of .NET we needed an object model that both was optimized for the .NET world (which could have been just a minor evolution of ADO) but more importantly one that was specifically tuned for the Internet. This latter requirement was one of the factors that lead the ADO.NET team to create their own data provider model, one of which could be connector to OLE DB data sources. But for optimal performance they chose to implement a data provider that natively spoke to SQL Server’s TDS network protocol. Later programming advances, such as LINQ and the Entity Framework, also use the ADO.NET native SQL Server Data Provider. During the development of SQL Server 2000 it became apparent that SQL Server was at a huge disadvantage when our customers chose to build applications using Java with either IBM’s Webspere or BEA’s Weblogic because we didn’t have a JDBC driver. I initiated an effort to add a Microsoft-supported JDBC driver to SQL Server’s bag of tricks.. More recently a PHP driver, that actually layers on top of ODBC, was added to SQL Server’s supported data access methods. So for nearly a decade now the primary ways to write applications that access SQL Server have NOT involved OLE DB! No wonder the SQL Server team feels comfortable deprecating it.

With that background out-of-the-way let’s time travel back and look at the real OLE DB strategy and why it never achieved its goals. When I joined Microsoft in April of 1994 there was already a OLE DB effort underway. In fact the very first meeting I remember attending was an evening meeting of a design team working on the OLE DB spec. It was an evening meeting because all the participants also had “day jobs” that their management pressured them to work on. The key driver of OLE DB at the time was the Cairo Object File System (OFS). Soon thereafter we’d press the reset button and assign a pair of newly hired Partner Architects to the OLE DB effort as their day jobs. OFS, though still a factor for a while, soon departed the scene. With OLE DB we were trying to accomplish two things. One was a vision of Universal Data Access that went beyond relational databases, the other was the idea of Componentized DBMS. OLE DB was to partially succeed at the first, but fail horribly at the second.

It is hard to remember that back in the early 90s when things like OFS and OLE DB were conceived that relational databases were still in their youth and not very widely accepted. Most corporate data was still stuck in hierarchical (IMS) and network (Codasyl) databases or flat files (VSAM, RMS). Vast amounts of data was stored on the desktop, usually in tools like Microsoft Excel. The most popular data store for applications on the desktop was Btrieve, an ISAM-type offering. Microsoft also realized that email, then still in its infancy, would turn out to be the largest information store of all. Microsoft Exchange was envisioned as a mail system on top of OFS, but ultimately implemented its own (again distinctly non-relational) store. And many people thought that Object Databases were the wave of the future, though ultimately they never achieved much success outside the CAD/CAM/CAE world. So it seemed clear that Microsoft needed a data access strategy that would work across the relational, object, and legacy data worlds.

One proposal was to extend ODBC to handle the new requirements however this approach was ultimately rejected. Since this was before my time I don’t understand exactly what happened, but what I recall being told was that they tested the extensions out with other companies involved with ODBC and found significant resistance to them. Deciding that if the industry wasn’t going to accept the idea of extending ODBC they might as well go for a more optimal solution, Microsoft went down the path that lead to OLE DB.

Beyond better legacy data access, the fact that Microsoft was working on non-relational stores makes it kind of obvious why we thought we needed OLE DB. But we can take it to another level, we thought that even in the relational world we would evolve to add more object and navigation capabilities. And we would implement this by creating a componentized DBMS that let an application use various capabilities depending on its needs. There would be a Storage Engine, Query Processor, and one or more Navigation Engines. In the most primitive form the Navigation Engine would implement SQL’s Cursors, but in an extended form it would be an in-memory database that projected data as objects that you could access via pointer chasing (ala Object Databases). An application could go against data in the database directly with ISAM-style access against the Storage Engine, or it could use the Query Processor to access data in the Storage Engine or other stores, or it could use a Navigation Engine (or other unconcieved of components) for additional capabilities. It was this strategy that really drove Microsoft down the OLE DB path, and this strategy that never came to fruition.

By 1994 Microsoft had realized it wanted to be a serious contender in the database arena but had not finalized its strategy for doing so. In preparation for this the company had negotiated a split with Sybase giving us rights to an older version of their source code, joint ownership of things like the TDS protocol, and freedom (for both parties) to pursue our respective strategies. While the SQL Server team had launched an effort to build the first independently developed version (SQL95 aka SQL Server 6.0) of the product, there was tremendous debate going on around how to proceed in the long term. The organization behind the JET database engine in Access (also known as JET-RED) had embarked on an effort to create a new JET-compatible Server database known as JET-BLUE (fyi, it is JET-BLUE that is used in Microsoft Exchange and not JET-RED; most people just say JET and don’t realize they are different). However there was no query processor being built to work with JET-BLUE and no customer for it other than Microsoft Exchange. The Exchange team, faced with delays in OFS, had opted to build their own interim store using JET-BLUE for the low-level database capabilities. This “interim” store is still in use today. The discussion throughout 1994 was do we take JET-BLUE and build a full new RDBMS around it or do we start with SQL Server and basically gut it and replace its insides while maintaining a highly compatible exterior. There was a lot of back and forth but ultimately we decided that if we were going to succeed in the Enterprise that evolving the SQL Server product was the more likely route to success (because we had a large customer base and it was popular with Enterprise ISVs). This didn’t sit well with the SQL Server team, because they realized we were forcing them down a risky re-write path while they preferred a more straightforward evolution of their code base. And it really didn’t sit well with the JET-BLUE team, whose leader (one of Microsoft’s earliest employees) made one last appeal to Bill Gates before the strategy was finalized. As everyone now realizes the strategy to go with SQL Server was both chosen, and did succeed. But it ultimately doomed the vision of a componentized DBMS.

Work started in 1994 on design for a Query Processor (QP) for the new componentized DBMS, and after we made the decision that SQL Server would be our future product focus we moved the QP team to the SQL Server organization. But it wasn’t until we shipped SQL Server 6.5, a minor update to SQL95, in the spring of 1996 that work on re-architecting and re-writing SQL Server got fully underway. The internal architecture was to follow the componentized DBMS idea and use OLE DB to connect its components. While it largely did this, the choice to gut and build on an existing product introduced some realities that hadn’t really been anticipated in the original plan.

There were two factors that the componentized DBMS idea hadn’t fully taken into account. The first was rapid innovation in Query Processor technology that made the then state-of-the-industry split of responsibilities between the Storage Engine and Query Processor obsolete. The second was that it didn’t account for all the aspects of a traditional Relational Engine that didn’t fall into the Query Processor or Navigation Engine categories. For example, OLE DB said nothing about management interfaces across the components. Two other factors would also come into play. The first was that we couldn’t rewrite all of SQL Server in a single release and so we’d have to maintain some legacy code that violated the new architecture. The second was that to maintain compatibility with older versions of SQL Server and to exceed its performance we’d have to violate the architecture. Although we thought both of these two later factors would be temporary, they ultimately contributed greatly to abandonment of the componentized DBMS idea.

As we re-built SQL Server (Sphinx, later known as SQL Server 7.0) we did use OLE DB internally. The Query Processor and Storage Engine do talk to one another using it. In one regard this architecture proved out nicely in that we were able to add Heterogeneous Query directly into the product. There is a lot of unique connection and metadata management work, but once you use OLE DB to materialize a Rowset to an external data source the Query Processor can just party on the data without regard to if it came from the Storage Engine or an external source. But that is about the only part of using OLE DB internally that I can point to as a success. For all the things that OLE DB doesn’t cover we had to use a lot of private interfaces to talk between the Storage and Relational engines. And then there is that rapidly evolving Query Processor things. It turned out we could never allow access directly to the Storage Engine (SE) because the QP ended up (for performance reasons) taking over responsibilities that had previously been in the SE. For example, the maintenance of Referential Integrity. Or for a later example, materialization of the Inserted and Deleted tables used by Triggers. We’d debated doing this in SQL Server 7.0, but for expediency went with the traditional solution. In the latter the way these virtual tables are created is for the Storage Engine to scan backwards through the Log file. However, as you get to higher performance systems the Log becomes your key bottleneck and so removing these backward scans is an important performance boost. So now the Query Processor could look to see if an update would cause a Trigger to fire and build in to the Query Plan the creation of the inserted and deleted tables along the way. But this also meant that you could never allow an application to directly update a table through the Storage Engine if a Trigger existed. As we put all our energy into building SQL Server 7 and then SQL Server 2000 these various realities pushed the componentized DBMS idea further and further away from ever becoming a reality.

It wasn’t just SQL Server’s difficulty with implementation of the concept that caused the death of the componentized DBMS plan, it was its success in the market and the acceptance of relational databases throughout the industry. Basically it became more important to have a great RDBMS, and extend it as needed, then to componentize. Without spending any more time on this point, let’s just leave it with the idea that this key force behind OLE DB was effectively dead.

So what of the Unified Data Access part of the OLE DB strategy? Well that was more successful but also had three flaws, plus had run its course. It ran its course because the success of relational databases meant that the need to access other database sources dimished fairly rapidly. One flaw is related to the componentized database flaw, which is that the non-relational database solutions that OLE DB envisioned never really took off. Another is that it was too related to the COM world and thus anyone who had to target platforms other than Windows couldn’t fully embrace it. And the final one is that interop vendors basically followed the same strategy with OLE DB that they followed with ODBC. They put a OLE DB interface on a lightweight Query Processor and then used a proprietary interface between their Query Processor and drivers for different data sources. It was their QP and Drivers that turned every data source into a relational data source, thus eliminating any differentiation between OLE DB and ODBC for accessing that data. OLE DB’s unique advantages in this space were thus never fully exploited.

OLE DB has had other issues during its life of course. It is a rather complicated architecture, partially because it was intended to do so much and partially because it was designed by a committee and then had to be rapidly reworked. The architect who did a lot of that rework admitted to me years later how that was the one piece of work in his career that he was embarrassed about. Also a lot of OLE DB components we shipped as part of a package called MDAC, were caught up in multiple controversies such as a lockdown of who could update things shipped in Windows. We wasted a lot of time and effort trying to figure out how and when updates to OLE DB could ship, how to maintain compatibility between versions, etc. But I think these tactical issues account for far less of OLE DB’s limited success than the failure of our original strategic imperatives to take hold. Without those OLE DB became a solution looking for a problem.

4 Responses to OLE DB and SQL Server: History, End-Game, and some Microsoft “dirt”

Jeroen Frijters says:

November 13, 2011 at 3:05 am

Wow. Thanks for posting this. I wrote an OLE DB SQL Server client (in C++) for our Java product back in the day and it was pretty painful to try to understand how to use OLE DB correctly.

It’s always great to read the behind the scenes stuff that went on.
Dave Campbell says:

January 29, 2012 at 9:54 pm

Hal, I’m enjoying reminiscing while reading through your blog. One of the first coding tasks I took on when I moved over to SQL Server was to write an OLE-DB interface to the Sybase storage engine. Jeff Coffler and I added a hack to wire up the new QP to this thing with a “set qjet on” statement. As part of this I wrote a little interpreter for OLE-DB row accessors which would evalulate nullability and and column offsets statically and would optimize and block transfer contiguous fixed columns for example. It was blazingly fast. Quite a bit faster than the one Gordon Smith wrote for the production implementation as I recall. Lots of fun…
Raghu Ram says:

July 4, 2012 at 10:16 am

Great posting – gives a good historical perspective to a decision that was made and rightly so…
Ph X says:

November 14, 2012 at 8:50 pm

Thank you for the article! You mentioned that OLEDB was not used much for object DBs, but how about modern NoSql? Could they benefit from OLEDB if the driver was available?

Comments are closed.