Don’t they claim Linux is secure?

I’ve spent so many years hearing Linux fans claim it is totally secure that I just had to post this one. Duqu, the most sophisticated and mysterious Trojan since Stuxnet (and perhaps related to it) compromised Linux servers to create its Command and Control infrastructure. “Many of the servers that had been hacked to become part of Duqu’s infrastructure were running Linux, namely CentOS 5.2, 5.4 or 5.5, a community version very similar to Red Hat Enterprise Linux.”  Now obviously Windows was compromised by Duqu as well, so I’m not trying to claim Windows is more secure than Linux.  I’m just reiterating a message that ALL operating systems are vulnerable and to claim otherwise is irresponsible (and one of the all time great security myths).  Other recent examples include the targeting of MAC OS by fake Anti-Malware attacks, the massive growth in malware targeting Android,  and even a researcher demonstrating that you can download malware into printers!  The difference is that after years of attacks everyone in the Windows ecosystem recognizes the threat and most are actively working to confront it, while the Linux, Apple, Android, etc. ecosystems still largely have their heads buried in the sand.

Posted in Computer and Internet, Linux and Android, Security, Windows | Tagged , , , | 4 Comments

Stop SOPA and PIPA

One of the biggest threats to the Internet is censorship legislation currently in the U.S. Congress known as SOPA in the House and PIPA (“Protect IP Act”) in the Senate. SOPA was dealt a huge blow a couple of weeks ago as protests against it mounted. However, now the action has moved to the U.S. Senate and it is time to let Senators know that PIPA is unacceptable as well. You can find out more about this at the Electronic Frontier Foundation or Fight for the Future

 

Posted in Computer and Internet | Tagged , , , | 2 Comments

Can Windows 8/Windows Phone 8 ship in June 2012

While most observers expect to see Windows 8 and Windows Phone 8 no earlier than next fall, there have been various indicators that we could see one or both much earlier. Indeed, there is increasing evidence that we might see them in early summer, specifically in June. The most credible indicators come from Nokia executives who seem to have developed what police call “diarrhea of the mouth”, aka inability to keep their mouths shut. Most recently Nokia execs have claimed both Windows Phone 8 and Windows 8 are coming in this earlier timeframe. Can this indeed be the case?

It’s possible.

The way most observers come up with their fall dates is by looking backwards at the Windows 7 and Windows Phone 7/7.5 development cycles. Those both suggest fall deliveries, but don’t take into account potential changes in how Microsoft and OEMs execute the “end game”. Historically for both Windows and Windows Phone there is a multi-month testing period between the final software build (known as RTM for historical reasons) and General Availability. Historically this allowed for the manufacturing and distribution of floppies and later CDs and DVDs. It also allowed an OEM to add their own final drivers, do final testing, switch over their manufacturing processes, and push systems through their distribution network to retailers. Well, what if Microsoft and it’s OEMs could do more of this process in parallel? You could drop a few months of time between Microsoft’s RTM to system availability down to days or weeks!

The primary reason OEM’s take so long after RTM to bring products to retail is that historically Microsoft has made too many last minute changes for the OEMs to complete their own work in parallel. But Microsoft already demonstrated with Windows 7 and Windows Phone 7/7.5 that it has the discipline to avoid this. With the right set of commitments, and OEM trust that Microsoft will live up to them, an OEM could finalize drivers, testing, etc. of pre-RTM builds and be ready to go when Microsoft delivers RTM bits. I believe Microsoft and OEMs, particularly those with the closest relationships such as Nokia and Dell, are going down this path.

Another factor is specifically around tablets. Since tablets have a more limited set of potential (at least initial) configurations to worry about, it is likely that Microsoft would be able to make a stronger set of assurances to OEMs around them. In other words, they could give OEMs something like the Windows Phone chassis definition and focus testing and stability on that. This would allow the time from RTM to GA for tablets to be shorter than it is for desktops/laptops. Historically Microsoft would have held launch until all form factors were ready. But I suspect that they are so desperate to launch their full-court press on the iPad that they would now support GA of tablets a couple of months earlier than on other configurations.

Last, Microsoft has been working hard on its update processes for multiple releases now. It is entirely conceivable that they would allow OEMs to ship a pre-RTM version of Windows 8 or Windows Phone 8 on their devices but require a mandatory update by the consumer prior to the device becoming fully functional. This would let OEMs fill the channel with devices prior to RTM and then release the devices for sale the moment Microsoft has the RTM and updates ready to go.

The bottom line here is that historically for Microsoft to have products available for the holiday shopping season the would have hit RTM in June for Octoberish General Availability. Now it may be possible for an early June RTM to result in late June General Availability! One practical advantage of this would be to hit the back to school shopping season. And having college students start to show up at school in the fall with Windows Tablets rather than iPads would surely be a big win.

Posted in Computer and Internet, Microsoft, Mobile, Windows, Windows Phone | Tagged , , , | 10 Comments

Windows Phone 8 and Windows 8: Cousins or Siblings

For quite some time now there have been rumours about Windows Phone 8 (WP8) being based on the Windows NT kernel (WinNTk).  More recently a blogger called MS Nerd made the case that this is not true.  Well, I’m going to add fuel to the fire and make the case that Windows Phone 8 can and should be based on the Windows NT kernel.

The first thing that we need to get out-of-the-way is the difference between saying WP8 will be based on Windows 8 (Win8) and saying it will be based on WinNTk.  Saying WP8 is based on Win8 implies that it would bring along the Win8 user experience and legacy support.  Microsoft has indicated that it wants the phone experience to stay optimized for phones, so the Win8 user experience is not going to make its way there anytime soon.  But this has nothing to do with whether or not Windows Phone continues to use the Windows CE kernel or makes the switch to WinNTk.  The kernel is focused on things like process structure, memory management, scheduling, device drivers, etc.  It has little to do with the user experience.  So when the rumors are circulating about WP8, Win8, and WinNTk it is important to keep in mind that the likely scenario is WP8 on WinNTk.

So why should Microsoft switch kernels?  There are both technical reasons and practical reasons for the switch, and in the long-term doing so has one awesome benefit that I’ll save for the end.   One important point is that when you do the same things twice it is almost impossible to have them both be completed at approximately the same time, be truly compatible with one another, and be of equivalent quality.  Second is that the cost of staffing to repeatedly do the same thing twice is outrageous, even for a company the size of Microsoft.  And third, the closer you get to wanting two things to be almost the same the less the justification for having two different things!

The history of Windows CE (and Windows Mobile) is that teams around Microsoft build things for the Windows NT-based operating systems and then either they or the Windows CE team itself takes that code and ports it (or hacks it into a subset) for the CE environment.  This is done with small numbers of people, and the results are often left to languish a number of versions behind the mainstream product.  Take the Common Language Runtime (CLR) as an example.  The .NET Compact Framework (CF) was spun off from the original CLR a long time ago.  It eventually was left in maintenance mode with just a small team in Microsoft’s India Development Center (IDC) handling bug fixing and minor tweaks.  Issues, such as a garbage collection algorithm that was tuned for very small memory footprints and caused applications with large memory footprints to stall, were never intended to be addressed.  However, once Windows Phone 7 came along Microsoft found itself having to try to recreate many of the optimizations it already had done for the full CLR in the Compact Framework.  It had to add resources, and more senior resources at that, to the team.  And that wasn’t to get new functionality, it was just to make CF perform closer to what the full CLR already was capable of.  More on this story in a moment.

The Windows CE kernel itself has also been lacking many modern OS features such as SMP.  While the latest version of Windows CE added SMP support, those who have been involved in operating system development for quite some time know that it takes multiple versions to really get this tuned up and running well.  I’ve seen this on operating systems from TOPS-10 to VMS to Unix to Windows NT to Linux and a few others.  Also no doubt the Windows CE SMP support was optimized for dual-core situations, but we know that multi-core (already the norm on x86) is coming to the ARM world too.  WinNTk already has the most well tuned SMP support of any operating system kernel, and the resources and time required to bring Windows CE up to snuff (and keep it there) is high.

The same is true here in many other areas.  Think about all the security work that has gone into WinNTk.  Windows CE is designed for embedded systems that allow for a tightly controlled environment, and although they have added security features in recent releases a lot of work will be required to continue to track evolving requirements for general purpose devices such as Phones.  Sure it can be done, but at what cost in resources to Microsoft?

You see Windows Phone without full drive encryption, a problem for enterprises, even though Windows has Bitlocker.  Is the full-drive encryption you want on your Phone sufficiently different from the full-drive encryption you want on your Tablet to justify two different implementations?  I don’t think so.  You see Windows Phone without support for document encryption/decryption, even though Windows has RMS.  You might want the management benefits of being able to join a domain.  Sure these are possible to add to a Windows CE-based Windows Phone, but that takes resources and time.  If Windows Phone were based on WinNTk this would take a lot fewer resources and a lot less time.

I bring up these resource issues both because its been obvious over the years that resources allocated to CE have been far lower than those allocated to Windows, but also because Microsoft has made dramatic personnel cuts in traditional businesses since 2008 and continues to look for ways to cut costs.  Funding Windows CE development to support specific low-memory embedded scenarios makes sense.  Increasing the funding for it to chase Windows in the general purpose OS market does not.

Now let’s go back to the .NET Compact Framework.  When the Windows Phone 7 effort began we realized that CF was going to require a lot of work to make it meet their requirements.  Garbage Collection was going to need significant rework, and performance work was needed across the board.  At the same time Microsoft Research had a prototype of the full CLR ported to ARM and we were urged to use that for Windows Phone 7 rather than increase the investment in CF.  Our analysis (and that of the CLR’s architect) was that there wasn’t enough time to meet Windows Phone 7’s aggressive schedule with a full CLR port (i.e., productization of MSR’s work plus a port to Windows CE) and so we added resources to the CF team and went to work on Windows Phone 7’s requirements.   Windows Phone 7.5 continued with improvements in CF garbage collection, improvements that would not have been necessary with the full CLR since it has had them for years.  At the same time Microsoft’s layoffs were taking hold, and Developer Division was hit hard.  So now you have a much smaller organization devoting a larger percentage of its resources to doing little but duplicating work that it had done years earlier.  It makes no sense.  Porting the full CLR to Windows CE wouldn’t maximize the resource savings because you’d still need a porting team.  However, if you had a Windows Phone 8 based on WinNTk you get full CLR for free.  The .NET CF investment can shrink to cover maintenance.  Lets take the new WInRT introduced in Windows 8.  Does Microsoft really want to invest in porting the WinRT to Windows CE, and maintaining that port, when switching to WinNTk would allow WP8 to have WinRT at little or no cost?

Now let me go into two things missing from Windows Phone that will hopefully be addressed in WP8.  The first is dual-core support.  Yes Windows CE added dual-core (SMP) support, but apparently not before Windows Phone 7 took a snapshot.  So why didn’t Microsoft just upgrade to the newer CE kernel for its Mango release?  I can think of three reasons.  One is that the dual-core support in CE just wasn’t up to snuff and so the Windows Phone team decided they weren’t going to introduce something that was non-competitive with IOS or Android.  Another is that the kernel upgrade was too disruptive (which is why major kernel changes usually cause things at Microsoft to be labeled “.0” releases) relative to the benefits that would have been accrued.  But I believe the most likely reason was that what was the point of putting a lot of effort into supporting a newer Windows CE kernel when you were already putting that effort into a WinNTk port?

Perhaps the biggest overall complaint about Windows Phone, starting with Windows Phone 7 and continuing through NoDo and Mango, is the lack of 3rd-party native mode application support.  Originally some of this was schedule (for the first Windows Phone 7 release) and some of it was philosophical (the phone could be made more reliable by just having managed applications).  But some of it, and I think the continuing reason, is that no one wants to embed Windows CE’s idiosyncracies in new apps.  Again some of that is reliability (e.g., scenarios where a native mode application’s shared memory is not freed when the process is killed) and some is a desire to make apps portable across the three-screens (PC, Phone, TV) and a cloud world.  Moving to WinNTk eliminates the CE idiosyncrasy problem and makes native mode applications (and particularly WinRT-based ones) more of a no-brainer.

I think that lays out most of the case of why Windows Phone should move away from Windows CE to the Windows NT kernel.  So let me address a key objection to the move, “size”.  Windows CE has a reputation for being small and modular, Windows has a reputation for being big and bloated.  But Windows itself is about 6 years into an effort to completely restructure and clean up the code base, its resource requirements have actually shrunk over the last two releases (Win7 and Win8), and Windows 8 is very much focused on the low power/constrained resources environment that characterize both tablet and phone devices.  Indeed I am running the Windows 8 developer preview on a tablet that is less powerful and has no more memory than the current generation of smartphones and even at this early stage it is as responsive as those devices.  In some regards, such as boot time, it is actually faster!  So could a WinNTk-based Windows Phone 8 run well on a 1Ghz CPU with 512MB of RAM?  Of course it could.  What about 800Mhz?  Almost certainly.  256MB?  Well, now I start to wonder.  It is pretty clear that Microsoft is focusing on enabling low-cost smartphones from Nokia and others as a key part of its strategy.  The question is, what does a low-cost smartphone’s specs look like in the fall of 2012?  And how big a target is RAM for cost reduction?  And how much savings can you really get using Windows CE over WinNTk.

The two biggest cost areas on a smartphone bill-of-materials is are the logic complex (processor, GPU, radio) and the screen.  Memory is the third largest, but is composed of both RAM and FLASH memory so those represent two different discussions from a kernel decision standpoint.  Mechanical aspects (packaging, buttons) are fourth, and cameras are fifth.   Moore’s Law allows for two things to occur over time, either increased performance at constant cost or constant performance at lower cost.  High-end smartphones absorb the electronics following the increased performance at constant cost track.  Low-end smartphones can either try to go for reduced specs, purely follow the constant performance/lower cost track, or a mixture of both.  The most likely path is to follow the constant performance/lower cost track on pure electronics and go for reduced specs in areas that don’t fully benefit from Moore’s Law.  So, for example, stick to the 1Ghz  processor/512MB RAM that is characteristic of first generation Windows Phone 7 devices because this will be dirt cheap in late 2012 but use cheaper materials in packaging and really focus on reducing display costs.  The latter might occur from lower-quality displays, different resolutions, or both.  A different resolution is the change that would require the most work in Windows Phone and is completely orthogonal to the Windows CE vs WinNTk decision.  I would also expect cameras to either be eliminated or very low-cost/low-resolution.  And some sensors (e.g., gyroscope) avoided.  But the main point here is that spec changes that would make adoption of WinNTk undesirable are very unlikely.

Ok, so I think that is about it.  Oh, there is one more thing!  You might know about the Motorola Atrix.  It’s an Android Smartphone that you can plug into a PC-like notebook body and it becomes an Android PC.  It isn’t very interesting because of course no one really wants an Android PC (e.g., no apps for it) and so had little success.  Someone else made an Android smartphone that plugs into a tablet body and turns your phone into a tablet.  But without real support from Google for this it seems to have gone nowhere.  Well, at least as early as 2006 Microsoft envisioned that you would carry around a phone that morphed into a full PC when attached to PC peripherals (large screen, keyboard, mouse).  It just didn’t have the ducks lined up to pull it off (unless you wanted a dock that converted your Windows Mobile phone to a Windows CE PC, which a third-party did indeed create, and have about as much luck with as the Atrix).  The move to WinNTk is a prerequisite for ever bringing this vision to reality.  If you have the common kernel then you could have high-end smartphones that shipped with both the Windows 8+ user experience and Windows Phone 8+ user experiences and swap between them based on what peripherals were available.  Maybe this is a “Windows 9” thing, or maybe it is a “Windows 10” thing, but the point is that going to a common kernel clears the path for a lot of innovative possibilities.  Sticking with two different kernels will always make these scenarios difficult or impractical.

So I think I’ve laid out the case for Windows Phone 8 to be based on the Windows NT kernel.  There is lots of upside and very little downside for Microsoft.  Will it happen?  I really don’t know.  There are definitely a lot of Microsoft people, both current and former, who want it to happen.  I could point out that a chunk of the WinNTk team moved over to the Windows Phone organization a couple of years ago, and while that proves nothing it certainly shows that the expertise to pull off the kernel switch is in place.  If this is going to happen we shouldn’t have long to wait to find out.  For Windows Phone 8 to release in 2012 Microsoft would need to hold a developers conference for it (e.g., Mix12) this coming spring.  So we are likely within 4 or 5 months of knowing the answer.

Posted in Microsoft, Mobile, Windows, Windows Phone | Tagged , | 49 Comments

More proof that Toyota needs Microsoft-style Program Managers

Every few years I rent a Toyota Prius when I’m on one of my trips, and the most recent trip was one of them.  The Prius is a marvel of engineering, showing what is really possible when you choose a Hybrid Power-plant as the overall design center.  Sure there are other compromises, like at best modest performance, but the Prius is comfortable, quiet, and roomy for a car that gets 45+ MPG.  However, this Prius demonstrates an annoying failure on Toyota’s part.

Apparently in Japan men must not open the passenger door for their aging mother (or their wife or girlfriend).  Because if they did then Toyota’s keyless entry system would include sensors in the passenger-side door handle as well as in the driver-side.  So if you walk up the driver’s side door and grab the handle it unlocks, and if you touch the right spot on the handle it locks.  But if you walk up to the passenger door and grab the handle nothing happens, and you have to fumble around for the key fob.  Ditto to lock.  This is extremely frustrating since you either get used to keyless entry or you get used to pulling out the key fob  and pressing buttons.  To go back and forth between the modes is a pain.

There are other problems with the design too.  If you touch the lock button on the driver door handle before the passenger closes the door then the car just howls at you until the door is finally closed.  This drives the (slow-moving) passenger crazy because it is as if the car is trying to rush them.  Is that any way to treat your elders?  Other keyless entry systems I’ve experienced handle this the other way, simply holding off on the usual locked beep until all doors have been closed.

What a Program Manager would do is write out all the scenarios and make sure that they’d solved the problem for the key ones.  So either Toyota doesn’t feel that the courtesy of opening the door for a loved one is a key scenario, or they didn’t have a Program Manager think through the scenarios.  I bet someone in marketing said “we need a keyless entry system” and the “lock engineer” just threw something together.  At least that’s how their implementation comes across.

 

 

Posted in Uncategorized | Tagged , , | 2 Comments

“For all of us who have lives, there’s Windows” – David Gewirtz

I couldn’t resist pointing out this article about dropping Linux in favor of Windows.  In the mid-2000s I made a concerted effort to learn and use Linux.  I even went as far as running it on my laptop, and taking that laptop to some Microsoft executive presentations (done with Open Office I might add) to make a point.  The laptop was one that the OEM (Dell) had refused to update to support Windows XP, but Linux ran just fine on it.  However, my experience was much as David Gewirtz reports is still true today for Linux servers.  You get a Linux distribution, you craft it together from the pieces-parts to get the system you want, and then you either never change a thing or you put in an enormous effort to make any change (even a security update) and still keep it running.  My conclusion on the client side was that Linux was not something I’d ever give my mother, cousins, etc. to run.  And while as a “hacker” I loved the idea I could customize the server to my heart’s content, it always seemed to me that the total cost of ownership was much higher than Windows.  After all, it was many years ago where Hardware and Software became cheap (dirt cheap actually) and labor costs skyrocketed.  So a solution that lowers software costs further, at greatly increased labor costs, doesn’t make economic sense.

The one place where Linux makes enormous sense to me is in embedded systems.  That’s because you can apply some expertise to customization to get your product running exactly as you desire, and then pump out copies in volume.  In other words, its leveraged just like any high-volume software business.

So here we are 6-7 years later and I would have expected Linux to have matured into something much less labor intensive for an IT shop to use on servers.  Based on Gewirtz’s experience, if anything it seems like its gotten worse.

Posted in Computer and Internet, Linux and Android, Windows | Tagged , , | 5 Comments

What the heck is “Program Management”?

When I first joined Microsoft back in 1994 I, like many other outsiders, was shocked by its split of Software Engineering into three (rather than two) disciplines.  Splitting Development from Test was pretty common in the industry.  But subdividing Development into Program Management (PM or PgM) and Software Design Engineers (SDE)  was not.  Sure other companies had Program Managers who handled scheduling and coordination kinds of things, but this isn’t what Microsoft had done.  At Microsoft there are process-oriented Program Managers and technical Program Managers (and quite often people who do both).  The process-oriented Program Managers are like the classic ones found in other companies.  But the technical Program Managers are the ones who drive what features will be in a product, what they will look like, and write the functional specs for the features.  Then a Software Design Engineer does internal designs and writes the code.  Initially I hated this split because I was brought up in a culture where an engineer was held responsible for the entire job of understanding the customer requirement and taking it all the way through to working code.  If you read the book Showstopper!, about the creation of Windows NT, you’ll see that engineers who came to Microsoft from DEC generally didn’t believe in the PgM/SDE split.

I held both PgM and SDE (as well as Architect and General Management) roles at Microsoft and over time I came to appreciate the split.  For one thing it lets people pay a lot more attention to the customer experience than is possible when the same person worries about the external presentation and behavior as well as the internal functioning.  For another, for the majority of people it lets them focus on areas of strength rather than having to be a jack of all trades.  Now for the 10-20% of engineers who are equally good at PgM and SDE responsibilities it can be a little frustrating, but otherwise it seems like a good idea.

The truth is that I really appreciate the Microsoft-style of Program Management when I see good or bad design choices in non-software products.  Take a couple of bad examples from BMW.  Someone at BMW decided to put the interior door lock/unlock button in the center of the console instead of on the front door armrests.  Not only that, instead of it being a rocker switch that you push one way to open and the other to close, it is a single push button that decides what to do based on context.  Think of what this means (besides the fact that no one other than a BWM owner can find the switch).  First of all it means that (assuming you don’t have the remote on you) to open one door and unlock the rest you basically have to sit down in the car!  Second, it often means that you get into the car and press the button to unlock the other doors and instead it locks them!  This complete lack of consideration about how a door lock/unlock switch is used suggests that BMW let an engineer who thought the context sensistive door switch was a cool technical trick make the decisions.  They should have a program manager who understands how drivers and their passengers use the switch making those decisions.  And putting it in the center of the console?  I can’t really figure that one out at all.  Cost savings?  Security feature (i.e., someone can’t reach in the window to unlock the doors, they have to get half their body in the car)?  I don’t know.  But I can tell you it is darned inconvenient.

Of course another example is where BMW got it right and Toyota got it wrong.  Both our X5 and Rav4 offer a switch to lock the window controls.  In the case of the X5 it locks the passenger and rear window control but allows the driver to open or close all windows in the car.  In the Rav4 it locks the passenger and rear window controls, and prevents the driver from opening or closing any window except his own.  Now Toyota’s approach makes no sense at all.  You lock the window controls so that children or pets can’t operate them.  But you still want the driver to operate them.  The driver can, of course, unlock the controls to operate them.  But this provides a window of abuse.  Perfect and real life example, I unlock the Rav4 to open the passenger window a bit and the dog steps on a rear seat window control opening it all the way (to where he could jump out).  He could just as easily have closed it on his neck.  I don’t know if this was a Program Manager making a bad choice, or the lack of Program Management leaving the decision in the hands of the Engineer (who is busy worrying about things like can the switch withstand 10000 pushes, can he reduce the cost 1/2 penny, etc.).

Of course the best automotive advertisement for having a separate (and good) program management department might be the BMW iDrive.  It was a great idea that had a horrible design.  Over the years BMW has made several revisions to improve it, but it still is a largely despised aspect of BMWs.   For those unfamiliar with this adomination, the fundamental design flaw was in not adopting a clear and common navigation technique.  Sometimes, but infrequently, you can go back.  Most quite often you have to jump to the beginning and navigate your way back through a menu structure.  Contrast this with Audi’s equivalent that adopted the web’s back convention allowing you to always go back when you’ve navigated to the wrong screen.  Audi got it right, BMW got it wrong.  If you have good Program Management you get Audi’s implementation.  If not, you get BMW’s.

Lest anyone think I’m picking on BMW too much it should be known that I’m a huge BMW fan.  When it comes to driving I think they “get it” better than any other car manufacturer (some speciality situations aside).  When it comes to non-driving related design decisions they are hit or miss.

When you look at things like the Microsoft Office “Ribbon”, thank or curse Program Management.  When you look at the new Start Page in Windows 8, thank or curse Program Management.  When you like or curse some new T-SQL language syntax in Microsoft SQL Server….    You get the picture.

This is why, at most Microsoft events, you find the Program Managers are the ones doing the presentations.  Or perhaps why even though Terry Myerson runs Windows Phone Engineering it is his Director of Program Management, Joe Belfiore, who has become its public face.  The SDEs make the magic happen, but the Program Managers first write a spec for the trick.

 

 

Posted in Computer and Internet | Tagged , , , , | 3 Comments

OLE DB and SQL Server: History, End-Game, and some Microsoft “dirt”

Last month the Microsoft SQL Server team effectively sounded the death knell for Microsoft’s OLE DB.  I say “effectively” because while SQL Server isn’t the only implementor of OLE DB, it is (or rather was) Microsoft’s flagship for this data access technology.  Since I was both a godfather of the OLE DB strategy and responsible for the SQL Server implementations that have now been deprecated I thought now would be a good time to reveal the overall strategy and why it never succeeded as envisioned.

Before we time travel into OLE DB’s origins let’s survey the current state of data access in SQL Server.  The first person who contacted me after Microsoft announced SQL Server’s deprecation of OLE DB basically said “there goes Microsoft, changing data access strategies again”.   Well, Microsoft does indeed have a history of replacing its data access APIs all too frequently.  But the truth is that OLE DB has been with us for 15 years and, although deprecated, will be supported for another 7.  22 years is a heck of a long lifespan for a technology, particularly one that was only partially successful.  And the truth is that OLE DB is well past its prime, with many other data access technologies (both older and newer) in far greater use.

The reason Microsoft has seemingly changed horses so often on the data access front is because of the rapid evolution that the market has demanded.  Initially SQL Server used the DB-Library API that Sybase had invented (and then deprecated around the time Microsoft and Sybase ended their relationship).  Microsoft had come up with ODBC as a way for Excel to import data from various data sources, but the primary client database actually in use at the time was the JET database inside Microsoft Access and Visual Basic.  A programming model called DAO was provided to access JET.  DAO could access SQL Server and databases supporting ODBC but only through JET’s distributed query processor (RJET/QJET) thus making that access slow.  For SQL Server 6.0 Microsoft created a native ODBC driver for SQL Server to replace DB-Library and for Visual Basic 4 the Developer Division introduced RDO as an object model that lived directly on top of ODBC and thus didn’t have to go through RJET/QJET.  RDO/ODBC quickly became the native and preferred data access story for applications written with Microsoft technology.  When OLE DB came along we introduced ADO as the object model directly on top of OLE DB.  With the introduction of .NET we needed an object model that both was optimized for the .NET world (which could have been just a minor evolution of ADO) but more importantly one that was specifically tuned for the Internet.  This latter requirement was one of the factors that lead the ADO.NET team to create their own data provider model, one of which could be connector to OLE DB data sources.  But for optimal performance they chose to implement a data provider that natively spoke to SQL Server’s TDS network protocol.  Later programming advances, such as LINQ and the Entity Framework, also use the ADO.NET native SQL Server Data Provider.  During the development of SQL Server 2000 it became apparent that SQL Server was at a huge disadvantage when our customers chose to build applications using Java with either IBM’s Webspere or BEA’s Weblogic because we didn’t have a JDBC driver.  I initiated an effort to add a Microsoft-supported JDBC driver to SQL Server’s bag of tricks..  More recently a PHP driver, that actually layers on top of ODBC, was added to SQL Server’s supported data access methods.  So for nearly a decade now the primary ways to write applications that access SQL Server have NOT involved OLE DB!  No wonder the SQL Server team feels comfortable deprecating it.

With that background out-of-the-way let’s time travel back and look at the real OLE DB strategy and why it never achieved its goals.  When I joined Microsoft in April of 1994 there was already a OLE DB effort underway.  In fact the very first meeting I remember attending was an evening meeting of a design team working on the OLE DB spec.  It was an evening meeting because all the participants also had “day jobs” that their management pressured them to work on.  The key driver of OLE DB at the time was the Cairo Object File System (OFS).  Soon thereafter we’d press the reset button and assign a pair of newly hired Partner Architects to the OLE DB effort as their day jobs.  OFS, though still a factor for a while, soon departed the scene.  With OLE DB we were trying to accomplish two things.  One was a vision of Universal Data Access that went beyond relational databases, the other was the idea of Componentized DBMS.    OLE DB was to partially succeed at the first, but fail horribly at the second.

It is hard to remember that back in the early 90s when things like OFS and OLE DB were conceived that relational databases were still in their youth and not very widely accepted.  Most corporate data was still stuck in hierarchical (IMS) and network (Codasyl) databases or flat files (VSAM, RMS).  Vast amounts of data was stored on the desktop, usually in tools like Microsoft Excel.  The most popular data store for applications on the desktop was Btrieve, an ISAM-type offering.  Microsoft also realized that email, then still in its infancy, would turn out to be the largest information store of all.  Microsoft Exchange was envisioned as a mail system on top of OFS, but ultimately implemented its own (again distinctly non-relational) store.  And many people thought that Object Databases were the wave of the future, though ultimately they never achieved much success outside the CAD/CAM/CAE world.  So it seemed clear that Microsoft needed a data access strategy that would work across the relational, object, and legacy data worlds.

One  proposal was to extend ODBC to handle the new requirements however this approach was ultimately rejected.  Since this was before my time I don’t understand exactly what happened, but what I recall being told was that they tested the extensions out with other companies involved with ODBC and found significant resistance to them.  Deciding that if the industry wasn’t going to accept the idea of extending ODBC they might as well go for a more optimal solution, Microsoft went down the path that lead to OLE DB.

Beyond better legacy data access, the fact that Microsoft was working on non-relational stores makes it kind of obvious why we thought we needed OLE DB.  But we can take it to another level, we thought that even in the relational world we would evolve to add more object and navigation capabilities.  And we would implement this by creating a componentized DBMS that let an application use various capabilities depending on its needs.  There would be a Storage Engine, Query Processor, and one or more Navigation Engines.  In the most primitive form the Navigation Engine would implement SQL’s Cursors, but in an extended form it would be an in-memory database that projected data as objects that you could access via pointer chasing (ala Object Databases).  An application could go against data in the database directly with ISAM-style access against the Storage Engine, or it could use the Query Processor to access data in the Storage Engine or other stores, or it could use a Navigation Engine (or other unconcieved of components) for additional capabilities.  It was this strategy that really drove Microsoft down the OLE DB path, and this strategy that never came to fruition.

By 1994 Microsoft had realized it wanted to be a serious contender in the database arena but had not finalized its strategy for doing so.  In preparation for this the company had negotiated a split with Sybase giving us rights to an older version of their source code, joint ownership of things like the TDS protocol, and freedom (for both parties) to pursue our respective strategies.  While the SQL Server team had launched an effort to build the first independently developed version (SQL95 aka SQL Server 6.0) of the product, there was tremendous debate going on around how to proceed in the long term.  The organization behind the JET database engine in Access (also known as JET-RED) had embarked on an effort to create a new JET-compatible Server database known as JET-BLUE (fyi, it is JET-BLUE that is used in Microsoft Exchange and not JET-RED; most people just say JET and don’t realize they are different).  However there was no query processor being built to work with JET-BLUE and no customer for it other than Microsoft Exchange.  The Exchange team, faced with delays in OFS, had opted to build their own interim store using JET-BLUE for the low-level database capabilities.  This “interim” store is still in use today.  The discussion throughout 1994 was do we take JET-BLUE and build a full new RDBMS around it or do we start with SQL Server and basically gut it and replace its insides while maintaining a highly compatible exterior.  There was a lot of back and forth but ultimately we decided that if we were going to succeed in the Enterprise that evolving the SQL Server product was the more likely route to success (because we had a large customer base and it was popular with Enterprise ISVs).  This didn’t sit well with the SQL Server team, because they realized we were forcing them down a risky re-write path while they preferred a more straightforward evolution of their code base.  And it really didn’t sit well with the JET-BLUE team, whose leader (one of Microsoft’s earliest employees) made one last appeal to Bill Gates before the strategy was finalized.  As everyone now realizes the strategy to go with SQL Server was both chosen, and did succeed.  But it ultimately doomed the vision of a componentized DBMS.

Work started in 1994 on design for a Query Processor (QP) for the new componentized DBMS, and after we made the decision that SQL Server would be our future product focus we moved the QP team to the SQL Server organization.  But it wasn’t until we shipped SQL Server 6.5, a minor update to SQL95,  in the spring of 1996 that work on re-architecting and re-writing SQL Server got fully underway.  The internal architecture was to follow the componentized DBMS idea and use OLE DB to connect its components.  While it largely did this, the choice to gut and build on an existing product introduced some realities that hadn’t really been anticipated in the original plan.

There were two factors that the componentized DBMS idea hadn’t fully taken into account.  The first was rapid innovation in Query Processor technology that made the then state-of-the-industry split of responsibilities between the Storage Engine and Query Processor obsolete.  The second was that it didn’t account for all the aspects of a traditional Relational Engine that didn’t fall into the Query Processor or Navigation Engine categories.  For example, OLE DB said nothing about management interfaces across the components.  Two other factors would also come into play.  The first was that we couldn’t rewrite all of SQL Server in a single release and so we’d have to maintain some legacy code that violated the new architecture.  The second was that to maintain compatibility with older versions of SQL Server and to exceed its performance we’d have to violate the architecture.  Although we thought both of these two later factors would be temporary, they ultimately contributed greatly to abandonment of the componentized DBMS idea.

As we re-built SQL Server (Sphinx, later known as SQL Server 7.0) we did use OLE DB internally.  The Query Processor and Storage Engine do talk to one another using it.  In one regard this architecture proved out nicely in that we were able to add Heterogeneous Query directly into the product.  There is a lot of unique connection and metadata management work, but once you use OLE DB to materialize a Rowset to an external data source the Query Processor can just party on the data without regard to if it came from the Storage Engine or an external source.  But that is about the only part of using OLE DB internally that I can point to as a success.  For all the things that OLE DB doesn’t cover we had to use a lot of private interfaces to talk between the Storage and Relational engines.  And then there is that rapidly evolving Query Processor things.  It turned out we could never allow access directly to the Storage Engine (SE) because the QP ended up (for performance reasons) taking over responsibilities that had previously been in the SE.  For example, the maintenance of Referential Integrity.  Or for a later example, materialization of the Inserted and Deleted tables used by Triggers.  We’d debated doing this in SQL Server 7.0, but for expediency went with the traditional solution.  In the latter the way these virtual tables are created is for the Storage Engine to scan backwards through the Log file.  However, as you get to higher performance systems the Log becomes your key bottleneck and so removing these backward scans is an important performance boost.  So now the Query Processor could look to see if an update would cause a Trigger to fire and build in to the Query Plan the creation of the inserted and deleted tables along the way.  But this also meant that you could never allow an application to directly update a table through the Storage Engine if a Trigger existed.  As we put all our energy into building SQL Server 7 and then SQL Server 2000 these various realities pushed the componentized DBMS idea further and further away from ever becoming a reality.

It wasn’t just SQL Server’s difficulty with implementation of the concept that caused the death of the componentized DBMS plan, it was its success in the market and the acceptance of relational databases throughout the industry.  Basically it became more important to have a great RDBMS, and extend it as needed, then to componentize.  Without spending any more time on this point, let’s just leave it with the idea that this key force behind OLE DB was effectively dead.

So what of the Unified Data Access part of the OLE DB strategy?  Well that was more successful but also had three flaws, plus had run its course.  It ran its course because the success of relational databases meant that the need to access other database sources dimished fairly rapidly.  One flaw is related to the componentized database flaw, which is that the non-relational database solutions that OLE DB envisioned never really took off.  Another is that it was too related to the COM world and thus anyone who had to target platforms other than Windows couldn’t fully embrace it.  And the final one is that interop vendors basically followed the same strategy with OLE DB that they followed with ODBC.  They put a OLE DB interface on a lightweight Query Processor and then used a proprietary interface between their Query Processor and drivers for different data sources.  It was their QP and Drivers that turned every data source into a relational data source, thus eliminating any differentiation between OLE DB and ODBC for accessing that data.  OLE DB’s unique advantages in this space were thus never fully exploited.

OLE DB has had other issues during its life of course.  It is a rather complicated architecture, partially because it was intended to do so much and partially because it was designed by a committee and then had to be rapidly reworked.  The architect who did a lot of that rework admitted to me years later how that was the one piece of work in his career that he was embarrassed about.  Also a lot of OLE DB components we shipped as part of a package called MDAC, were caught up in multiple controversies such as a lockdown of who could update things shipped in Windows.  We wasted a lot of time and effort trying to figure out how and when updates to OLE DB could ship, how to maintain compatibility between versions, etc.  But I think these tactical issues account for far less of OLE DB’s limited success than the failure of our original strategic imperatives to take hold.  Without those OLE DB became a solution looking for a problem.

 

Posted in Computer and Internet, Database, Microsoft, SQL Server | Tagged , , , | 4 Comments

Good summary of Windows 8 security

Jason Garms, the Group Program Manager at Microsoft responsible for Windows 8’s security features, has written an overview of Windows 8’s added malware protection.  If you are on the techie-side then it’s a great read, but otherwise your eyes will probably glaze over.  So I’ll do a little bit of a summary for those who are curious, but if this is a topic of deep interest then I highly recommend reading Jason’s blog entry.

First let’s get the part that might make your eyes glaze over out-of-the-way.  Malware-authors often are trying to exploit a vulnerability (i.e., flaw) to install their malware on your system.  There are things (known as mitigations) you can do in software that make it very difficult to exploit any vulnerabilities they may find.  Microsoft started introducing these techniques in Windows XP SP2 and has been expanding them in each release since.  This is a key reason why, for example, Windows 7 is so much less subject to Malware than Windows XP.  And Windows 8 contains yet another set of major mitigation improvements.

The second big change is the expansion of the built-in Windows Defender into a more complete anti-Malware solution.  Jason revealed that when Windows 7 shipped the telemetry Microsoft was seeing indicated that close to 100% of systems had up-to-date anti-malware, but that a year later at least 27% did not.  This is likely because many people do not pay for subscriptions to the anti-malware software pre-installed by computer manufacturers once the trial subscription runs out.  Windows Defender addresses this problem.

A really exciting development is the inclusion of Application Reputation into Windows 8 itself.  This feature first appeared in Internet Explorer 9’s SmartScreen and has now been extended to any file that is downloaded from the Internet (via other browsers, for example) and then run.  If the file is has a known good reputation then Windows lets it run.  If it does not have an established reputation then Windows warns you that it is risky to run.  You will now see fewer warnings than in the past (Microsoft estimates that typical users will see only 2 warnings a year), and should take those warnings very seriously.

The last set of changes Jason talks about are changes to how Windows boots that protect against newer types of malware called Bootkits and Rootkits.  One of the areas that malware authors have begun targeting is to install their malware so that it runs before any anti-malware software is started.  So somewhere between when you press power-on and you logon to Windows.  If malware can take control during this period then it can hide from or disable anti-malware software.  Microsoft has secured this path, particularly when you are using a new PC that includes the latest firmware implementing “Secure Boot”.   I can’t tell you how many conversations I’ve been in with security experts where the summary has been “we can’t really tell if a computer is healthy because the boot path is vulnerable”.  With Windows 8 (and modern computers) that will no longer be true.

There’s the summary of Windows 8’s malware-protection improvements.  For more details please see Jason’s blog posting.

Posted in Computer and Internet, Security, Windows | Tagged , , , , | Comments Off on Good summary of Windows 8 security

Thinking about User Interface paradigms and an eventual “Humanoid UI”

(Let me apologize before you start for the length of this blog entry.  If it were a magazine article I’d spend hours more trying to edit it to perhaps half its current length.  But this is a blog, and the thing about blogs is that they are usually stream of conscience rather than highly thought through and edited.  And when I stream my thoughts, well…)

One of my recent postings brought up a reply that essentially says “touch and gestures is old thinking, I want a speech-based user interface”.  Ah, wouldn’t we all?  Generalized speech recognition is one of the “Holy Grail”‘s of Computer Science.  I can still remember one of my friend’s returning from Carnegie Mellon University for a summer break in the mid-1970s and going on about how generalized speech recognition (he’d been working on Hearsay-II) was right around the corner.  35ish years later and we are still not quite there. I still pick on him about it.   A couple of years ago I teased Microsoft’s Chief Research Officer (and former CMU professor), Rick Rashid, about this.  Rick correctly pointed out that we have come a long way and that speech recognition is now entering widespread, if more targeted, use.  So I’m going to talk about the evolution of computer User Interface, where we seem to be with speech, and why speech may never become the primary mode of computer interaction.

When it comes to direct human interaction with computers the way it all started was by taking existing tools and figuring out how to wire them up to the computer.  We had typewriters, so by hooking a typewriter to the computer you could input commands and data to it and the computer could print its output.  We had oscilloscopes so by hooking one up to the computer we could output more graphical information.  We had to create a language you talked to the computer in and those command line (aka command shell)  languages became the primary means of interacting with computers in the 1960s, 70s, and 80s.  Even today Linux, Windows, MAC OS, etc. all have command line languages and they are often used to perform more esoteric operations on the systems.   The nice thing about command line languages is that they are dense and precise.  The bad thing is that they are unnatural (requiring wizard-level experts who have trained on and utilized them for years).

These three attributes, density (how much information can be conveyed in a small space), precision (how unambiguous is the information conveyed), and how natural (to the way humans think and work) can be used to evaluate any style of computer interaction.   The ideal would be for interactions to be very dense, very precise, and very natural.  The reality is that these three attributes work against one another and so all interaction styles are a compromise.

As far back as the 1960s researchers were looking for a more natural style of computer interaction than command lines.  And obviously Science Fiction writers were there too.  For example in the original Star Trek we see interactive graphic displays, tablet style computers, sensor-based computers (e.g., Tricorder)  and computers with full speech recognition.  Who can forget Teri Garr’s amazement at seeing a speech controlled typewriter in 1968’s “Assignment: Earth” episode?  Yet these were all truly science fiction at the time.  Interestingly Star Trek never showed use of a computer mouse, and in the Star Trek Movie “The Voyage Home” when Scotty sees one he has no idea what it is.  I find that interesting because the computer mouse was invented in 1963, although most people would never see one until the 1990s.

The command line world wasn’t static and continued to evolve.  As video terminals began to replace typewriter-style terminals (or “teletypes”)  they evolved from being little more than glass teletypes to being capable of displaying forms for data input and displaying crude graphics for output.  Some more human-oriented command languages, such as the Digital Command Language (DCL) appeared.  Some command line processors (most notably that of DEC’s TOPS-20) added auto-completion and in-line help, making command lines much easier to use by non-experts.  Of all these only Forms altered the basic Density, Precision, Naturalness equation by allowing Task Workers (e.g., order entry clerks) to make use of computers.  After all, filling out forms is something that humans have been doing for at least a couple of centuries.

In the 1960s and 1970s Stanford Research Institute’s ARC and Xerox’s PARC continued to work on better ways to interact with computers and produced what we now know as the Graphical User Interface (GUI),  based on Windows, Icons, Menus, and Pointers (WIMP).  While WIMP is far less dense than command line based systems, it maintains their precision.  Density is still important however, which is why keyboard shortcuts were added to Microsoft Windows.  But most importantly, WIMP is far more natural to use than command lines due to the desktop paradigm and visual clues it provides.  It was GUI/WIMP that allowed computers to fully transition from the realm of computer specialists to “a computer on every desk and in every home”.

Work continued on how to make computers even more natural to use.  One of the first big attempts was Pen Computing and Handwriting Recognition, which had its roots in the 1940s (or as far back as 1888 if you want to stretch things).  There was a big push to bring this style to the mainstream in the late 1980s and early 1990s, but it failed.  High costs, poor handwriting recognition, and other factors  kept pen computing from catching on.  It wasn’t dense nor precise enough.  This style enjoyed a bit of a renaissance in the late 1990s with the introduction of the Palm Pilot which eschewed general handwriting recognition in favor of a stylized pen input technique known as Graffitti.  The Palm Pilot was also a limited function device, which allowed it to be well tuned for Pen use.  This lead to further use of a Pen (aka Stylus) in many PDAs and Smartphones.  However, the more general purpose the platform (e.g.,  Smartphones or a PC) the more tedious (lack of density) Pen use became.  In other words, the use of a Pen as just another pointer in a WIMP system was just not very interesting.

This finally brings us to the user interface paradigm that will dominate this decade, Touch and Gestures (Touch).  Touchscreens have been around for many years, at least back to the 70s.  But they generally had limited applicability (e.g., the check-in kiosk at the airport).  When Apple introduced the iPhone, dropping WIMP and bypassing Pen Computing in favor of a Touch-based UI, it really did change the world.  To be fair Microsoft introduced these at the same time, but in a very limited production product known as Surface.  So Apple gets the real credit.  Touch trades away density and precision to achieve a massive leap in how natural it is for a human to interact with the computer.  The tradeoff works really well for content consumption, but is not good for content creation.  So WIMP, which is a great content creation paradigm, is likely to live on despite the rise of Touch.  The place most users probably notice Touch’s precision problems are when there are a series of links on a web page that are stacked on top of one another.  Your finger can’t quite touch the right one (there it is, lack of precision).  If you are lucky you can use a gesture to expand and then position the page so you can touch the right link (requiring more operations, which is less dense than WIMP would allow), but sometimes even this doesn’t work.  Now expand this to something like trying to draw a schematic, or a blue print, and you can see the problems with Touch and why WIMP will continue to survive.  For another example consider how much easier it is to book a complex travel itinerary (tons of navigation and data input) on your PC versus doing the same on your iPad.  It is one of the few activities where I feel compelled to put down my iPad and move to my PC.  Writing this blog is another.  Touch is great for quick high-level navigation to content you want to view.  It is painful for performing precise and/or detailed input.

Speech-based user interface research dates back to the 1950s, but took off in the 1970s.  You can really split this into Speech output and Speech recognition.  As I pointed out earlier, the big joke here is that generalized speech recognition is always right around the corner.  And has been for almost 40 years.  But speech synthesis output has been commercially successful since 1984’s introduced of DECtalk.  DECtalk was a huge hit and 27 years later you can still hear “Perfect Paul” (or “Carlos” as he was known to WBCN listeners, which included so many DECies that most of us forgot the official name), DECtalk’s default voice, from time to time.  But what about Speech recognition?

If you own a Windows XP, Windows Vista, or Windows 7 PC then you have built-in speech recognition.  Ditto for the last few versions of Office.  How many of you know that?  How many of you have tried it?  How many use it on a regular basis?  I’d love if Microsoft would publish the usage statistics, but I already know they would indicate insignificant usage.  My father used to call me up and say “hey, I saw a demo of this thing called Dragon that would let me write letters by just talking into the computer”.  He did this more than once, and each time I told him he had that capability in Microsoft Word, but to my knowledge he never actually tried it.  I did meet a lawyer who threw away her tape recorder and began using Dragon Naturally Speaking for dictation, but I think she was a special case.  Frankly, in all the years I’ve heard about speech recognition she is the only layperson (or non-physically challenged person) I’ve met who uses it on such a general and regular basis.  More on her situation later.  Meanwhile my own attempts to use this feature demonstrated its weakness.  It works great until you have to correct something, then its use becomes extremely tedious (lack of precision and density), and complex changes require the use of a pointing device (or better put, you go back to WIMP).

It’s not just that you can do dictation in Microsoft Word or other applications, you can control your Microsoft Windows machine with it.  However, I can’t see many people doing this for two reasons.  One is speech’s lack of both density and precision.  The other is that layering speech on top of a WIMP system makes everything about speech’s lack of density and precision worse.  File->Save As->… is just too tedious a command structure to navigate with speech.  But the most important indictment of speech as the primary form of computer interaction is that it is far less natural than people assume.

Think about how annoying it is for someone to take a cell phone call in a restaurant.  Or why do you suppose that most U.S. Airlines have decided not to install microcells on their planes so you can use your cell phone in flight (and even those with in-flight WiFi are blocking Skype and VOIP services)?  And how proper is it for you to whip out your cell phone and take a call in the middle of a meeting?  Or think about how hard it is to understand someone in a crowded bar, at a rock concert, in an amusement park, or on a manufacturing floor.  Now imagine talking to your computer in those same circumstances.  Your co-workers, fellow diners, or seatmates will want to clobber you if you sit around talking to your computer.  And you will want to slit your own throat after a few experiences trying to get your computer to understand you in a noisy environment.  Speech is a highly flawed communications medium that is made acceptable, in human to human interaction, by a set of compensating mechanisms that don’t exist in a human to computer interaction.

I recently read about a study that showed that in a human to human conversation comprehension rises dramatically when you can see the face of the person you are talking to.  Our brains use lip-reading as a way to autocorrect what we are hearing.  Now maybe our computers will eventually do that using their cameras, but today they are missing this critical clue.  In a human to human interaction body language is also being used as a concurrent secondary communication channel along with speech.  Computers don’t currently see this body language, nor could they merge it with the audio stream if they did.  In human to human communications the lack of visual cues is what makes an audio conference so much less effective than a video conference, and a video conference so much less effective than an immersive experience like Cisco’s Telepresence system, and Telepresence somewhat less effective than in-person meetings.  And when you are sitting in a meeting and need to say something to another participant you don’t speak to them, you slip them a note (or email, instant message, or txt them even though they are sitting next to you).

I use speech recognition on a regular basis in a few limited examples.  One of the ones I marvel at is United Airlines’ voice response system (VRP).  It is almost flawless.  In this regard it proves something we’ve long known.  You can do generalized speech recognition (that is, where the system hasn’t been trained to recognize an individual’s voice) on a restricted vocabulary or you can do individualized recognition on a broader vocabulary.  For example, getting dictation to work requires that you spend 15 or  more minutes training the software to recognize your voice.  I imagine that specialized dictation (ala medical or legal) takes longer.  United has a limited vocabulary and so it works rather well.  My other current usage is Windows Phone 7’s Bing search.  I try to use speech recognition with it all the time, and it works maybe 70% of the time.  There are two problems.  The first is that if there is too much noise (e.g., other conversation) around me then it can’t pick up what I’m saying.  The bigger one is that if I say a proper noun it will often not come close to the word I’m trying to search on.  Imagine all the weird autocorrect behaviors you’ve seen on steriods.  Autocorrect is a great way to think about speech recognition, because after software converts raw sound into words that sounds similar it uses dictionary lookups and grammatical analysis to guess at what the right words are.  I suggest a visit to http://damnyouautocorrect.com/  for a humorous (and, warning, sometimes offensive) look at just how off course these techniques can take you.

Let’s get to the bottom line.  Speech has horrible precision, poor density, and there are social factors that make it natural in only certain situations.

So what is the future of speech?  Well first of all I think the point uses of it will continue to grow dramatically.  Things like United Airlines’ VRP.  Or the lawyer I mentioned.  She used to dictate into a tape recorder then pay a transcription service to  transcribe the tape.  She would then go back over the transcript and make corrections.  The reason that a switch to Dragon Naturally Speaking worked for her is that the correction process took her no more time then did fixing the errors the transcription service introduced.  And it was a lot cheaper to have Dragon do the initial transcription than to pay a service.  So certainly there are niches where speech recognition will continue to make inroads.

The bigger future for speech is not as a standalone user interface technology but rather part of a full human to humanoid-style of interaction.  I can say play or “touch” play to play a video.  I can merge sensory inputs, just as humans do, to figure out what is really being communicated.  I can use a keyboard and/or pointer when greater precision is required, just as humans grab white boards and other tools when they can’t communicate with words and gestures alone.  And I can project output on any display (the same one you use as your TV, your phone, a dedicated monitor, the display panel on your oven, the speakers on your TV or audio components, etc.  This is the totality of a Natural User Interface (NUI).   Speech doesn’t become truly successful as a user interface paradigm of its own.  It shines as part of the NUI that will dominate the next decade.

I really think it will take another 8-10 years for a complete multi-sensor NUI (nee Humanoid UI) to become standard fare, but Microsoft has certainly kicked off the move with the introduction of Kinect.  It’s primitive, but its the best prototype of the future of computing that most of us can get our hands on.  Soon we’ll be seeing it on PCs, Tablets, and Phones.  And a decade from now we’ll all be wondering how we ever lived without it.

Posted in Computer and Internet | Tagged , , , , , , | 2 Comments