Ad-blocker Wars

About a year ago I wrote my Adblockers are the new AntiVirus piece. In the intervening period the war between ad blocking and web sites that depend on advertising has gone exponential. Many sites put up a warning asking you to unblock ads on their site, others block access entirely. And now Google, the tech company almost entirely based on serving ads, is using their control of the dominant web browser, Chrome, to limit ad blockers. Make no mistake, I am OK with the concept of advertising on the web. It is a great way to democratize access to content, whereas pay walls (however appropriate in many situations) limit information flow. But as I wrote in the earlier piece, as long as advertising remains a huge channel for distributing malicious content I will be blocking it. Because I refuse to white list them there are several web sites that I can no longer access, but it is a small price to pay for better security and privacy. On the positive side for some, there is one site I found valuable enough to pay for access rather than allow ads. But just one so far, and it was a very small charge.

While I use all three major browsers to some extent, Firefox remains my primary browser. That’s partially because it offers the most options for incorporating ad-blocking and other filtering options. It even has a built-in content blocker, though you must know to configure it to use for general browsing. One of my favorite Firefox features is that it allows you to specify a DNS server to use independent of what your system is set to use. So my family notebook computers have Firefox set to use Quad9‘s malware-filtering DNS no matter what network they attach to, without having to manually change network settings each time (when on our home network our router is set to use Quad9). I could use the same mechanism to point to an ad-blocking DNS.

Of course ad-blocking extensions for browsers are insufficient, and with Google limiting their capabilities on Chrome, are becoming the wrong point in the technology stack to block ads. There is also the problem of non-browser applications that bypass the extensions, as I talked about in last year’s entry. Fortunately there are other options. Ad-blocking DNS may be the easy and free alternative, with AdGuard DNS currently the leading option.  Some routers also offer built-in ad-blockers, though they may be part of a paid service. For example, the eero Plus service for eero routers supports ad blocking. That feature has been available for years, but is still labeled as being in beta, so caveat emptor.  For those who like to really hack, you can download new firmware such as Tomato or DD-WRT to your router, or build your own Pi-hole. I keep getting tempted to add a Pi-hole to my network, but it is down a long list of things I may never get time to do. More consumer-friendly hardware solutions such as the little known eBlocker are available. I suspect as this category grows the mainstream vendors will increasingly include ad blocking options on new routers, which is great because my experience with whole-home devices that sit beside the router is decidedly poor.

There are also paid system-wide solutions. I’ve mentioned AdGuard for Windows before, but still haven’t given it a serious try. There is also a version for the Mac. I did pay for AdGuard for iOS Pro, which can perform adblocking across an iOS device rather than just in Safari. Don’t confuse this with the free AdGuard for iOS, which is a Safari extension. Not that it too isn’t a good adblocker.

And then there is Microsoft (and Apple, but I don’t follow MacOS developments). It is unclear how Microsoft’s adoption of Chromium as the basis for Edge will be impacted by Google’s latest change to Chrome. Will Microsoft follow Google’s lead,or continue to support a fully featured interface for ad blocking extensions? Microsoft abdicated its leadership role in this space when they failed to move the Tracking Protection List feature forward from Internet Explorer into Edge. They could either return to leadership by adding new features to the Chromium-based Edge, emulating Firefox, add new features to Windows that work across all browsers and applications, continue to leave this to others, or adopt Google’s privacy and security unfriendly behavior. While disappointing, I suspect they will take the middle road and leave this to others.

What you should notice is the one option that would save ad-supported websites, a move by the advertising industry to truly protect security and privacy, is absent. Maybe there is some work going on there, but so far it hasn’t made it to the mainstream. As I said a year ago, they are running out of time to save themselves. The escalating ad blocking war tells us that it is just about too late.

 

Posted in Computer and Internet, Microsoft, Privacy, Security, Windows | Tagged , | 1 Comment

Prime 1-Day Delivery Really is Different

At last week’s earnings call Amazon announced it was moving Amazon Prime from its historical 2-day shipping to 1-day shipping. Inevitably there were articles saying how Walmart or Target or whoever already had this. Or even better than Amazon, had same day delivery for some common products. All because they delivered from their large network of stores. I’m going to call BS on that, because “delivering from their stores” turns out to be more a symptom of a problem then a means of solving it.

Go back to Amazon’s origins as an on-line bookseller and Jeff Bezos’ recognizing that he could offer access to a vastly larger number of books (basically all those in print) than you would ever find in your local bookstore. Far more even than in the giant bookstores being built by chains like Barnes & Noble and Borders. This observation holds true more than ever in today’s retail environment. A retail outlet, even one as large as a Walmart Supercenter, only stocks a tiny fraction of the products, brands, styles, colors, sizes, etc. that are available. And in one of the most frustrating parts of the shopping experience, they frequently don’t have what you are looking for when you go into the store.

It is a very rare event when I go out shopping in local retailers that I come home with every item I was looking for. Even going to a store I know carries an item I want is often an unsatisfying experience. “Sorry you just drove 30 minutes and dealt with parking issues, crowds, etc. we are out of stock on that.” ^&%$(. “Oh, you like those shoes? Sorry, we don’t carry that size in store but you can order it on our website.” “We only carry the 2′ version of that cable in the store, if you want the 4′ you’ll have to order it on our website.” Brand of a particular nutritional supplement? Lets roll the dice and see if this store carries it and as it in stock at this very moment. My preferred brand/scent of antiperspirant? The Safeway stores in Denver seem to stock it, but not the ones around Seattle. And so on. As a result, I don’t bother going to stores. When I need something I just order it. Most of the time from Amazon.

While being able to deliver in one day, or same day, from a local retail outlet can be a very useful part of a fulfillment system, any attempt to make it the center of the experience replicates its bad characteristics to the online world. I don’t really care if Walmart or Target can deliver to my house in 20 minutes if neither carries the antiperspirant I want. Or if they are out of stock on the style, color, and size jeans I am looking for.

I’ve been living in an area where Amazon already offers free 1-day Prime delivery on many items for orders over $35. On Tuesday I realized I’d lost my Apple Pencil I had a new one on Wednesday, despite my SUV being in the shop. Amazon also offers various same-day delivery programs in my area, though I haven’t made use of those services. The news in Amazon announcing that they were moving Prime to 1-day delivery as the default is that they are building out their logistics system to support doing so for a very large portion of the items available on Amazon.com. And that is a whole different beast, both in complexity and in customer offering, than adding a delivery service from your local poorly stocked store. It’s the very same advantage that Jeff Bezos’ had over bricks and mortar bookstores on Day One. And not a surprise for a company where “it is always Day One”.

 

 

 

Posted in Amazon, Computer and Internet, Retail | 3 Comments

DMARC or Die

Let me ask a simple question, when are we going to get serious about dealing with unauthenticated email and its associated Phishing and Malware risks? If you think the industry is already taking this seriously, and that it is simply a hard problem, you are (IMHO) just wrong. Take this little snippet from the Microsoft Office 365 documentation on their handling of inbound mail that fails a Domain-based Message Authentication, Reporting, and Conformance (DMARC) check:

If the DMARC policy of the sending server is p=reject, EOP marks the message as spam instead of rejecting it. In other words, for inbound email, Office 365 treats p=reject and p=quarantine the same way.

In other words, in Microsoft’s infinite wisdom they ignore instructions from the domain owner to shred, incinerate, and bury deep in the earth mail that fails the checks they established to prove it comes from them, and instead put that mail in the Junk folder where 100s of millions of naive users will find it and believe it might be legitimate. This may have been a wise step back when DMARC was fresh and new in 2012, today it is simply irresponsible of Microsoft to favor legacy behaviors over a domain owner’s explicit instructions.

I don’t really want to pick on Microsoft, other than as a representative of the industry overall. We have the tools (SPF/DKIM/DMARC) to dramatically impact the SPAM problem but aren’t driving adoption, and proper usage, at a rate commensurate with the danger that unauthenticated email represents. SPF and DKIM have been with us for about 15 years. After 15 years we should no longer accept excuses such as SPF breaking legacy (pre-)Internet systems like listservers, there has been plenty of time for alternate compliant systems to be deployed. Unfortuntately nearly every SPF record seems to end with a soft-fail indicator, meaning “I don’t know who might legitimately send email on my behalf so don’t actually reject anything”. DMARC, which really brings SPF and DKIM into a useful framework, has only been adopted by 50% of F500 companies. And nearly all of them have DMARC policies of NONE, meaning just go ahead and deliver mail that fails authentication to the user’s inbox. WTF? And if they do take DMARC seriously only to have Microsoft ignore instructions to REJECT mail that fails authentication, it’s enough to make a CISO drink.

Is it going to take legislation to make the industry get serious? Maybe if Microsoft were subject to a lawsuit with treble damages because they delivered a malicious email to people’s junk folder rather than honor the DMARC REJECT policy we’d see some action. Not just by Microsoft, but by every organization fearful that new legislation had made it clear that failure to adopt well established anti-SPAM techniques subjected them to unlimited financial exposure.

We need a hard timetable for DMARC adoption, and if industry doesn’t do it then perhaps it will take a legislative push. In either case, we need a date by which all domains either establish a DMARC policy or have their mail rejected by recipient servers. We need a date by which a DMARC policy must be either REJECT or QUARANTINE. We need a date by which servers must enforce the DMARC policy rather than just check it. The later is actually the first thing to be tackled. If someone has taken the trouble to establish a policy, a server should enforce it! Hear that Microsoft? And we need a date by which REJECT is the only acceptable policy. Want to install some other milestones, fine. But let’s stop with the excuses. It really doesn’t matter if this is a problem of the perfect being the enemy of the good, or of competing interests, or just inertia. Throw out the excuses and DMARC or Die.

Posted in Computer and Internet, Microsoft, Phishing, Privacy, Security | Tagged , , , | 5 Comments

The Travel Buddy PC

There have been a flurry of articles and blog posts lately on the topic of can the iPad Pro replace a PC (Windows or MacOS), and I thought it was time to wade in to the muck. If you’ve read my blog these last (almost) 9 years then you’ll guess some background on where I land on this topic. But before we get there, let’s talk a little about observations.

The other day one of our friends came to visit for a few days. While she has been an iPad (both the 9.7″ and iPad Mini) user ever since Apple released them, in the past whenever she needed to do work she pulled out a MacBook. Not on this trip however, instead she sat on our couch with a 12.9iPad Pro. When I queried her about it she said that when she traveled she preferred to take the iPad Pro with her rather than the MacBook. It hadn’t replaced the MacBook, but it addressed an overlapping requirement.

Another friend, a former Microsoft C-level executive, uses the iPad Pro in much the same way. He has a few Windows notebooks as well as a Mac, yet the device he always has with him is an iPad Pro. I have seen him use it to present to another former Microsoft C-level executive on a large monitor, and present to senior (including C-level) executives at both a large bank and a medium-large technology company. He can also frequently be seen using the Apple Pencil and iPad Pro to take notes in OneNote.

Our friends’ use cases caused me to reflect further on my wife and my own iPad Pro usage. While we both have notebooks, and a family desktop, for serious productivity work, our go to devices for portable personal computing are our iPad Pros. Indeed, this blog is being written on my new iPad Pro 11″. It is sitting on the ottoman, its base and keyboard stable (actually more stable than my top heave Surface Book 2 would be) while I sit on the edge of the couch. The viewing angle is almost right, I do wish I could adjust it a little more, but certainly not a problem. While the Surface Pro is more adjustable, most traditional notebooks really don’t offer a better angle for how I’m sitting, their screens don’t go back far enough. I haven’t tried the 11″ on an airplane yet, but my 9.7″ worked great on an airline tray and it worked adequately on my lap. The 11″ keyboard/case design is more stable than that used on earlier iPad Pros and passes the lapability test far better than many of the 2-in-1 Windows PCs I’ve tried.

I regularly use an iPad Pro to do management of my AWS account resources, research problem solutions, and of course write emails. I also use it for spreadsheets and preparing presentations. Some of those (famous) narratives that are the bread and butter behind all decisions at Amazon were partially written on my iPad Pro 9.7″. I’ve even done some limited software development on it, by using it to connect to both a remote Windows desktop and an Ubuntu Linux development machine. While I wouldn’t recommend the iPad Pro be a primary computer for any of these things, it does most of them adequately enough to let you leave your Windows or MacOS notebook behind much of the time.

My iPad Pro is almost always with me. I slide it under the seat of the car when I am out and about, take it into restaurants when I dine alone, take it to the doctor’s office or car dealer etc. when I know I’m going to be waiting around. Take it to business meetings so I can take notes or do research. And usually it is the only computer I take with me when doing non-business travel.

I only take the Surface Book 2 when I am in full work mode. Then it travels between my home, a client, whatever I am currently using as an office, etc. I take it if I’m in the middle of building something serious, where the advantages of having a full WIMP user interface at my disposal makes me more productive. But that’s my 10% use case. Most of the time my SB2 sits docked to a large monitor in my home office.

When people ask the question “Can an iPad Pro replace my notebook” the answer is a clear “much of the time”. For me the iPad Pro is ideal as what I call the Travel Buddy computer (or even Travel Buddy PC). It retains the application library and content consumption strengths of the original iPad, while getting to the 80-90% mark on content creation compared to similar Windows tablets/2-in-1s. Recent Windows systems like Microsoft’s Surface Go also fall into the Travel Buddy category, but are too weak in tablet usability and limited application library to address most user’s non-work desires of a Travel Buddy PC.

So what are the biggest limitations of the iPad Pro as a notebook replacement? As others have noted the lack of a mouse or equivalent pointing device makes some work painful. In particular, cut/paste. The iPad Pro has an advantage over a PC in terms of broad adoption within applications of sharing entire objects, and sometimes that makes them feel superior. But if you need to take a precise region of data, like part of a list within a document, and copy it to another document, then the PC wins. PCs are also much better at multiple windows than the iPad Pro, although this is somewhat a matter of taste. As I’ve written in the past, I mostly run in full-screen mode no matter how big the screen I’m using. Sometimes I use two windows so I can look up data at the same time I’m filling in a form or writing a document. Well, the iPad Pro can do that. But if your work style is to keep 3, 4, 5+ windows open on the screen at the same time then…what on earth are you doing buying an 11″ or 12.9″ display device of any variety?

So could you replace your notebook with an iPad Pro? For many scenarios absolutely. For all scenarios no way. And in particular, could an iPad Pro become your only (non-phone) computing device? I think for a surprising number of the 1.5B PC users out there it could, but that is because many of them don’t really rely on the PC’s strengths. And for all of us, the computer you have with you always beats the computer you left at home. Which makes the iPad Pro a good alternative to a Windows notebook or MacBook as a Travel Buddy.

Posted in Computer and Internet, Mobile | Tagged , | 4 Comments

Snatching defeat from the jaws of victory

Once again Microsoft appears to have snatched defeat from the jaws of victory, this time repeating a key mistake from the Windows 8 era.  Microsoft was on the path to a coupe, launching the seemingly excellent Surface Go well ahead of Apple’s launch of the next generation of iPad Pros.  It also launched the Surface Pro 6 ahead of Apple’s launch, though with a much smaller lead.  So where did Microsoft go wrong?  NO LTE.  Oh they promise LTE in the future, but futures don’t cut it in this case.  This is exactly where Microsoft (and its ecosystem) screwed up back in 2013, and has continued to screw up in successive launch cycles.

Back in 2013 the excellent Dell Venue 8 Pro, and other Windows tablets, launched with a promise of LTE, and then it never appeared.  Within the Surface line Microsoft has always either ignored LTE, delayed it for well beyond initial launch, and if it did arrive they made it hard to buy (i.e., targeted the business sales channel) rather than featuring it.  Now we have Microsoft singing the praises of “Always-Connected PCs”, but they don’t walk the talk.  For Microsoft, being “always connected” only applies to low-end ARM-based Windows 10 systems.  And they so far haven’t even offered one of those themselves.

With Apple you just select WiFi-Only or WiFi+LTE as part of its normal sales processes, both online and in-store.  And they launch (and generally ship) the LTE models concurrently with the WiFi-Only models.

I was completely ready to spring for a Surface Go the moment I could get one with LTE, and then yesterday Apple launched the new generation of iPad Pros.  There are a few things that the iPad Pro is not good at, like software development, but for my daily on-the-go needs it is near perfect.  And most importantly, I will have one in my hands, WITH LTE, in a couple of weeks.  So the moment has passed Microsoft, and while you keep talking about being always connected Apple is doing a much better job of walking the talk.  The Surface Go likely isn’t going anywhere, and I’m not particularly hopeful about the “Always-Connected PC” initiative either.

Posted in Computer and Internet, Microsoft, Windows | Tagged , , | 6 Comments

Google goes to the dark side on JEDI

Every time I read an article on the U.S. Department of Defense large Cloud project known as JEDI I find myself suppressing an urge to comment.  Google dropping out of the bidding finally made that urge difficult to suppress.

It is almost certainly true that only Amazon (AWS) and Microsoft have the current breadth of offering to meet much of the JEDI requirements.  It is equally true that neither of them have all the pieces needed for this contract, they are going to have to build new capabilities as well.  Most articles I’ve read focus on certifications as a differentiator, and while those may represent a minimum bar for selling into this market and a demonstration of a Cloud’s maturity, they seem neither a significant differentiator nor a significant hinderance to a vendor’s ability to compete for the RFP.  Put another way, if the rest of the RFP response showed overwhelming leadership then a roadmap for achieving the needed certifications would be sufficient to overcome the AWS and Microsoft existing certification leads.

The problem for every potential bidder is that they either need to partner to meet the full RFP requirements and/or commit to significant developments that could negatively impact (e.g., in opportunity cost) their commercial offering roadmaps.  The whining about JEDI being a single source contract says more to me about  tech industry disdain for partnering amongst major players than it does about the nature of the contract.  DOD is used to many, if not most, major contracts involving partnerships amongst the top suppliers (aka, competitors).  Boeing/Lockheed, Lockheed/Boeing, Lockheed/Northrop Grumman, Boeing/Saab, etc.  The right bid from a lead/prime with a lot of DOD experience would have a strong chance to challenge AWS and Microsoft.  For example, IBM has lots of the pieces for a bid and decades of experience being a Prime contractor for DOD.  It is the latter, not their fragmented commercial cloud offerings, that make them a serious contender to win JEDI.

The real question about JEDI, and likely the real meaning behind Google’s using lack of certifications as an excuse to drop out, is how much a vendor is willing to let the JEDI requirements impact their commercial roadmap.  AWS’ Andy Jassy likes to say that there is no compression algorithm for experience.  While that sometimes sounds like a marketing sound-bite, there is a lot of truth to it.  When the cloud was new, and enterprise adoption was near non-existent, AWS aggressively went after a number of deals for the experience they would provide.  Those deals were key to getting AWS to its current leadership position, because they prepared an organization with only eCommerce DNA to address industries it otherwise couldn’t understand or relate to. One of those was the U.S. Intelligence Communities’ Commercial Cloud Services (C2S) contract, which many point to as one of AWS’ key strengths in the JEDI bid.  Certainly AWS wouldn’t be in a good position to win the JEDI deal without C2S, because it would face the “no compression algorithm for experience” dilemma.  And while others may not have the direct classified cloud experience C2S gave AWS, Microsoft, IBM, and Oracle have decades of experience working with DOD and meeting their most demanding IT needs.

C2S is most important in the context of how much a young and small AWS was willing to impact its commercial roadmap to gain experience at working in the toughest public sector environments.  Both the learnings, and yes the optics, of being able to support the most demanding security environment have had a huge impact on AWS’ ability to attract large enterprises to its cloud.  This is where Amazon’s focus on the long-term comes into play.  C2S was a drop in the bucket on public sector IT spending.  JEDI is still just a toe in the water.  AWS will value JEDI not only for the business it brings, but for the things it forces them to do to meet DOD’s requirements.  Many of which it will bring back into its commercial offerings. Oracle will value it for giving their cloud a legitimacy they have yet to achieve. It could actually save their IaaS/PaaS offerings from oblivion.  IBM seem more likely to value the revenue than other benefits. Microsoft likely sees it as validation that the direction(s) they’ve taken with Azure (including Azure Stack) has them equal to or ahead of AWS (without having to fall back on winning because the customer is an Amazon-retail competitor, or buying the business with a “strategic investment”).  Sorry, I couldn’t resist taking a little dig at my Microsoft friends.

And Google?  Google’s primary marketing thrust  is you should use Google Cloud because everyone wants to do things just like Google does.  But if Google doesn’t want government to use AI like they do, and may in the future not want government to use some of their other technologies, and doesn’t want to disrupt their commercial roadmap to meet DOD requirements, then Google can’t bid on the deal.  The same applications that Google doesn’t want its AI technology to be used in could make use of technologies like BigQuery and Spanner, so how can Google offer those as part of JEDI?  And how much does Google want to focus its infrastructure work on being able to quickly standup a new region at a newly established military base vs continued development of its commercial regions?  How hungry are they for this business?  Apparently not very as they’ve decide to go dark on the bidding.

The company that wins this business is going to be a company that is hungry for it, and not just for the revenue it brings.  That is always important of course, and being able to make a profit at it is just as important.  But in the end the winner is going to be, or at least should be, someone with a passion for the DOD customer base and for applying the learnings from JEDI to moving the Cloud up another notch in addressing broader customer needs.  I obviously see that from AWS and Microsoft, and Google already made it clear that isn’t the case for them.

Posted in AWS, Azure, Cloud, Computer and Internet | Tagged , | Comments Off on Google goes to the dark side on JEDI

The Big Non-Hack?

This week Bloomberg Businessweek (BBW) published “The Big Hack: How China Used a Tiny Chip to Infiltrate U.S. Companies” which claimed that 30 companies, most notably Apple and Amazon Web Services, had servers using hacked Chinese-made motherboards from U.S. manufacturer SuperMicro.  Apple, Amazon, SuperMicro, and even the Chinese government issued strong denials.  Additional denials are coming in as well, and right now BBW seems pretty far out on a limb with the story.  True or not, the article publicized real concerns about the security of the technology supply chain.  Concerns we are not taking seriously enough.

One bit of clarification (which is important, particularly if you don’t read the article carefully) is that the Amazon-related comment is about a company it acquired, Elemental Technologies.  Allegedly the hardware hack in Elemental server products was discovered as part of Amazon’s pre-acquisition due diligence and nearly scuttled the deal.  If there is any truth to the story, and Amazon gave quite a detailed response saying there isn’t, it should give some measure of assurance to AWS customers that AWS’ security processes caught this before the Elemental acquisition.  One weird part of the story vis a vi AWS is that some of the hacked motherboards showed up in the AWS Beijing region.  While I won’t say exactly why, that part of the story set off my BS detector. Otherwise, the AWS servers that run customer virtual machines (AMIs) and service control planes were not implicated in the story.

For all three major cloud providers I expect security practices that would either prevent or quickly uncover a hack such as the one discussed in the story.  I have no personal knowledge of Google, but both Amazon and Microsoft are extremely thorough, sophisticated, and usually quite aggressive on the security front.  Particularly when it comes to their own infrastructure.  At AWS security is considered the #1 priority, and failure is treated as the ultimate risk for destroying customer trust.  If the story about Elemental is even remotely true, the result of having discovered an actual hardware hack would have led AWS to implement numerous additional checks in its hardware acquisition and acceptance processes.

But to the meat of the issue, China is increasingly seen as a bad actor. When you combine repeated concerns about back doors in Chinese-made technology products with ongoing Intellectual Property theft concerns, rising wage costs, rising shipping costs, rapidly growing national security concerns, and the nascent trade war, I have to wonder how long until western companies just start removing China from the supply chain.  That doesn’t necessarily mean moving manufacturing “back” to the U.S. (or western Europe), it may mean moving to other low-cost countries.  Countries where, presumably, there is better protection of Intellectual Property and privacy.  And far less national security risk as well.  Basically, how long before western companies say the risks of having China in your supply chain far exceed the rewards?  For those wanting to sell to the U.S. Government, and likely many allies, the day of reckoning is already here. That noose will just keep getting tightened.

When will we see an accelerated move away from including China in the supply chain of technology products?  If the BBW story turns out to be true, that will certainly accelerate things somewhat.  If the trade war lasts for more than a few months, that will have a major impact.  Few, if any, companies are going to try to figure out how to remove China from the supply chain of existing or well along in development products.  But probably every (non-startup) western company is looking at products just entering the development cycle and trying to figure out if there is a sensible way to not make that product in China or with Chinese-sourced components. Most will likely conclude there isn’t currently a sensible alternative, or decide to take the risk the trade war will be resolved before they go into production.  Many will at least take some initial steps to reduce their China supply chain exposure, such as seeking second sources outside China for key components.  The longer the trade war goes on the more they will conclude tariffs are a long-term part of the cost equation and shift away from China.  And if another, confirmed, story of Chinese hardware hacking comes out during these deliberations?  There will be a mad rush for the exit.

As for BBW, I’m concerned that the story doesn’t seem to have legs.  And if the story is false, or at least got a lot of the facts wrong, then it gives a serious black eye to reporting on the technology business.

 

 

 

Posted in Amazon, AWS, Cloud, Computer and Internet, Security | Tagged | Comments Off on The Big Non-Hack?

The Product Shipping Tax

There is an observation I had back in the 1980s that both holds in today’s Cloud world and remains one of the toughest messages to communicate to senior leadership.  When you ship a new product or service, major release, or even a major feature (in the Cloud world), your people resources for new feature development is permanently cut in half.  The short-term message is no more palatable, but perhaps easier to communicate.  For the first 6-12 months after a major release nearly 100% of your people will be unavailable (or their efforts severely degraded) for new feature development.  So each budget cycle product teams end up asking for more staffing, even as it seems we are delivering less in the way of features.  It isn’t that senior leaders don’t get that there is a tax on supporting existing products, from bug fixing to operations, but they do have trouble with the magnitude of it.  For example, non-engineers (or those who haven’t done engineering recently) struggle with how costly yet necessary it is to pay down technical debt.

Two things happened to me in the 80s that lead to my 50%/100% rule of thumb.  The first was my experience as a project leader of multiple releases.  Each time we did a release I would end up finding the number of person-months I had to schedule to investigate customer issues, fix bugs, perform cleanups of code that had become unmaintainable, deal with dependencies (e.g., a new OS version breaking an existing product), revamp build systems, respond to corporate initiatives (e.g., you must switch to this new setup/installation system), etc. would go up.  And over time I realized it would stabilize at about half the team.

The other thing that happened in the 80s is I went back and looked at multiple releases, including those I hadn’t been involved in, and plotted the incoming Software Performance Report (SPR) rate by month against a number of other metrics.  SPRs were a means for DEC customers to report bugs, request features, and otherwise communicate with the engineering team about issues.  There was no filter on these, even customers without support contracts could submit SPRs, so a complex feature might generate a lot of SPRs even though those resulted in a low unique bug rate. There were two interesting data points here.  The first was that incoming SPR rate started to rise dramatically about 60 days after release, the peak occurring around the 6 month mark.  While the incoming rate dropped off, it plateaued at a higher level after each release.  There were two causes for that, one being just having more features that needed support.  The other was that, thankfully, there was a rapidly growing customer base.  So even if you drove SPRs per Customer (one of my favorite overall product quality metrics) down, the growth in customers meant more SPRs.

The second data point was that there was a clear correlation between the number of check-ins for the release and the incoming SPR rate, so major releases not surprisingly resulted in more SPRs than minor releases.  I was actually able to predict the SPR rate for a new major release would be terrifyingly high based on this metric, a prediction that sadly was accurate.  At peak nearly the entire development team was required to respond to SPRs, and for about 90 days before and after there was a high interrupt load on most developers as SPRs hit for their area rendering them unproductive at working on new features.

The Cloud changes none of this, and perhaps makes it even worse.  Before you enter a beta or preview period you have no operational burden, minimal deployment burden, only modest urgency on fixing most bugs, etc.  The preview is as much about making sure you can operate at hyperscale as it is about traditional beta things like verifying that customers can use the service as intended.  Then the day you declare General Availability (GA) you have a 24×7 operational burden.  Production-impacting bugs become urgent.  It’s the day you start learning where you missed on preparing for hyperscale (see https://hal2020.com/2018/01/20/challenges-of-hyperscale-computing-part-2/ and https://hal2020.com/2018/08/25/challenges-of-hyperscale-computing-part-3/).  It’s the day customers start trying to do things you never intended, or perhaps never expected.  It’s the day that you start having to plan on paying down technical debt built up during development.  It’s the day you have to start dealing with disruptions like the Meltdown and Spectre security issues with an urgency that distracts from feature work. Etc.  So just like with a 1980s packaged product, for the first 6-12 months nearly the entire team will be unavailable for feature work and on an ongoing basis only half the team you had at launch will be available for feature work.

I tried for years to find ways to avoid the 50%/100% tax, but never succeeded.  So each budget cycle I’d look at all we wanted to do, all that our customers wanted us to do, and go and ask for a significant headcount increase.  Each year I would face the pain of telling senior leadership how little feature work we could do without that increase.  Each year they would challenge me, and I didn’t blame them.  I never found a way to communicate the magnitude of the situation in the context of the budgeting exercise.  In retrospect I realize was I should have done at Amazon is written a narrative, outside the “OP1” process, that made all this clear.  I could have looked at data for numerous projects that would have (likely) supported my career-long observation.  But that would have been too late to help with the decades at DEC and Microsoft where I failed to fully explain the need for the additional people.  To be clear, I just about always got the people I needed.  It was just more painful than it should have been.

So what prompted me to write this now?  I’m watching as the first signs appear that Aurora PostgreSQL is getting past its “V1.0” 100% stage.  For example, although Aurora PostgreSQL has not yet announced PostgreSQL 10 support in some regions you can actually find it (10.4 specifically) in the version selector for creating Aurora PostgreSQL instances.  Launch must be fairly imminent, with hopefully many more features coming in the next few months.  Overall though, it reminded me my 50%/100% rule still applies.

 

 

 

 

Posted in Computer and Internet | 2 Comments

Challenges of Hyperscale Computing (Part 3)

Back in Part 2 I discussed the relationship between failures and the people resources needed to address them, and demonstrated why at hyperscale you can’t use people to handle failures.  In this part I’ll discuss how that impacts a managed service.  If you’ve wondered why it takes time, sometimes a seemingly unreasonable amount of time, for a new version to be supported, why certain permissions are withheld, why features may be disabled, etc. then you are in the right place.

tl;dr At hyperscale you need extreme automation.  That takes more time and effort than those who haven’t done it can imagine.  And you have to make sure the user can’t break your automation.

We probably all have used automation (e.g., scripts) at some point in our careers to accomplish repetitive operations.  In simple cases we do little or no error handling and just “deal with it” when the script fails.  For more complex scripts, perhaps triggered automatically on events or a schedule, we put in some simple error handling.  That might just focus on resolving the most common error conditions, and raising the proper notifications for uncommon or otherwise unhandled errors.  Moreover, the scripts are often written to manage resources that we (or a small cadre of our co-workers) own.  So a DBA might create a backup script that is used to do backups of all the databases owned by their team.  If the script fails then they, or another member of their team, are responsible for resolving the situation.  If the team makes a change to a database such that the scripts fail, the responsibility for resolving the issue remains with them.  This can be as human intensive or as automated as your environment supports, because it all rests with the same team.

In the case of a managed service the operational administration (“undifferentiated heavy lifting” such as backups, patching, failover configuration and operation, etc.) of the database instance is separated from the application-oriented administration (application security, schema design, stored procedure authoring, etc.).  The managed service provider creates automation around the operational administration, automation that must work against a vast number (i.e., “millions” was where we ended up in Part 2) of databases owned by a similarly large number of different organizations.

In Part 2 I demonstrated that the Escaped Failure Rate (EFR), that is the number of failures that required human intervention, had to be 1 in 100 Billion or better in order to avoid the need for a large human shield (and the resulting costs) to address those failures.  Achieving 1 in 100 Billion requires an extreme level of automation.  For example, there are failure conditions which occur so infrequently that a DBA or System Engineer might not see them in their entire career.  At hyperscale, that error condition might present itself several times per day and many times on a particularly bad day.  As an analogy, you are unlikely to be hit by lightning  in your lifetime.  But it does happen on a regular basis, and sometimes a single strike can result in multiple casualties (77 in one example).  At hyperscale on any given day there will be a “lightning strike”, and occasionally there will be one resulting in mass “casualties”.  So you need to automate responses for conditions that are exceedingly rare as well as those that are common.

As the level of automation increases you have to pay attention to overall system complexity.  For example, if you are a programmer then you know that handling concurrency dramatically increases application complexity.  And DBAs know that a whole bunch of the complex work in database systems (e.g., the I in ACID) is focused on supporting concurrent transactions.  When thinking about automation, you make it dramatically more complex by allowing concurrent automation processes.  In other words, if you allow concurrent automation processes against the same object (e.g., a database instance) then you have to program them to handle any cases where they might interfere with one another.  For any two pre-defined processes, assuming they have no more than modest complexity, that might be doable.  But as soon as you allow a more general case the ability to ensure the concurrent processes can successfully complete, and complete without human intervention, becomes impractical.  So when dealing with any one thing, for example a single database instance, you serialize the automation.

I kicked this series off discussing database size limits.  The general answer for why size limits exist is the interaction between the time it takes to perform a scale storage operation and how long you are willing to defer execution of other tasks.  Over time it became possible to perform scale storage on larger volumes within an acceptable time window, so maximum size was increased.  With the advent of EBS Elastic Volumes the RDS automation for scale storage can (in most cases) complete very quickly.  As a result they don’t block other automation tasks, enabling 16TB data volumes for RDS instances.

The broader implications of the requirements for extreme automation are:

  • If you can’t automate it, you can’t ship it
  • If a user can interfere with your automation, then you can’t deliver on your service’s promises, and/or you can’t achieve the desired Escaped Failure Rate, and/or they will cause your automation to actually break their application
  • A developer is able to build a feature in a couple of days that might take weeks or months of effort to sufficiently automate before being exposed in a hyperscale environment

One of the key differences that customers notice about managed database services is that the privileges you have on the database instance are restricted.  Instead of providing the administrative user with the full privileges of the super user role (sysadmin, sysdba, etc.) of the database engine, Amazon RDS provides a Master user with a subset of the privileges those roles usually confer.  Privileges that would allow the DBA to take actions that break RDS’ automation are generally excluded. Likewise, customers are prohibited from SSHing into the RDS database instance because that would allow the customer to take actions that break RDS’ automation.  Other vendors’ managed database services have identical (or near identical) restrictions.

Let’s take a deeper look at the implication of restricted privileges and lack of SSH and how that interacts with our efforts to limit EFR.  When a new version of software is released it always comes with incompatibilities with earlier versions (and bugs of its own of course).  A classic example is where a new version fixes a bug with an older version.  Say a newer version of database engine X either fixes a bug where X-1 was ignoring a structural database corruption, or introduces a bug where X can’t handle some condition that was perfectly valid in X-1.  In either case, the upgrade in place process for taking a database from X-1 to X fails when the condition exists, leaving the database inaccessible until the condition is fixed.  To fix this you have to SSH into the instance and/or access resources that are not accessible to you. Now, let’s say this happens in 1 out of 1000 databases.  If the service provider doesn’t automate the handling of this condition then, since the customer can’t resolve it themselves, the service provider will need to step in 1000 times for the 1 million instance example.  Did you read Part 2?  That’s not a reasonable answer in a hyperscale environment.  So the managed service can’t offer version upgrade in place until they’ve both uncovered these issues, and created automation for handling them.

Similar issues impact the availability of new versions of database software (even without upgrade in place).  Changes (features or otherwise) that impact automation, be that creation of new automation or changes to existing automation, have to be analyzed and work completed to handle those changes.  Compatibility problems that will break currently supported configurations have to be dealt with.  Performance tuning of configurations has to be re-examined.  Dependencies have to be re-examined.  Etc.  And while some of this can be done prior to a database engine’s General Availability, often changes occur late in the engine’s release cycle.  A recent post in the Amazon RDS Forum was complaining about RDS’ lack of support for MySQL 8.0, which went GA last April.  So I checked both Google Cloud SQL and Microsoft Azure Database for MySQL and neither of them supported MySQL 8.0 yet either.  To be supportable at hyperscale, new releases require a lot of work.

Let me digress here a moment.  The runtime  vs. management dichotomy goes back decades.  With traditional packaged software the management tools are usually way behind in supporting new runtime features.  With Microsoft SQL Server, for example, we would constantly struggle with questions like “We don’t have time to create DDL for doing this, so should we just expose it via DBCC or an Extended Stored Procedure?” or “This change is coming in too late in the cycle for SSMS support, is it ok to ship without tool support?” or “We don’t have time to make it easy for the DBA, so should we just write a whitepaper on how to roll your own?”  The SQL Server team implemented engineering process changes to improve the situation, basically slowing feature momentum to ensure adequate tools support was in place.  But I still see cases where that doesn’t happen.  With open source software (including database engines), the tooling often comes from parties other than the engine developers (or core community)so the dichotomy remains.

It’s not just that management support can’t fully be done until after the feature is working in the database engine (or runtime or OS or…), it is that for many features the effort to provide proper management exceeds the cost of developing the feature in the first place.  On DEC (nee Oracle) Rdb I was personally involved in cases where I implemented A runtime feature in a couple of hours that turned into many person days of work in tools.   Before I joined AWS I noticed that RDS for SQL Server didn’t support a feature that I would expect to be trivial to support.  After I joined I pressed for its implementation, and while not a huge effort it was still an order of magnitude greater than I would have believed before actually understanding the hyperscale automation requirements. So while I’m writing this blog in the context of things running at hyperscale, all that has really changed in decades is that at hyperscale you can’t let the management aspects of software slide. 

There is a lot more I could talk about in this area, but I’m going to stop now since I think I made the point.  At hyperscale you need ridiculously low Escaped Failure Rates.  You get those via extensive automation.  To keep your automation operating properly you have to lock down the environment so that a user can’t interfere with the automation.  That locked down environment forces you to handle even more situations via additional automation.

When all this works as intended you get benefits like I described years ago in a blog I wrote about Amazon RDS Multi-AZ .  You also get to have that managed high availability configuration for as little as $134 a year, which is less than the cost of an hour of DBA time.  And the cloud providers do this for millions of instances, which is just mind-boggling.  Particularly if you recall IBM Founder Thomas Watson Sr’s most famous quote, “I think there is a world market for maybe five computers.”

 

 

Posted in Amazon, Azure, Cloud, Computer and Internet, Database | Tagged | 2 Comments

Keezel – Another Internet Security Device

I’m always on the search for new security tools, and this time my hunt took me to Keezel.  For full disclosure, I liked the concept so much I made a token investment in Keezel via crowdfunding site StartEngine.  Keezel is a device a little larger than a computer mouse that creates a secure WiFi hotspot you use between your devices and another WiFi (or wired Ethernet) network.  It uses a VPN to communicate over the public network, so your traffic can’t be compromised.  You connect it to a hotel, coffee shop, or other location that has a public/semi-public network you can’t fully trust, then you connect all your devices to the Keezel’s WiFi.   So VPN in a box, or puck if you prefer.

Keezel has a few features beyond giving you a VPN.  It can block access to known Phishing sites, and also provides an ad-blocker.  Both features are off by default but are easy to toggle on.  While you may already have software that provides these features, it no doubt has gaps.  For example, iOS only supports ad-blocking in Safari itself. And I’ve previously discussed how non-browser apps displaying web pages showed ads that attempted to download malware to a Windows PC.  Multiple layers of checks for phishing websites is also valuable given that one source of dangerous URL information may block a site before others.

Keezel has a built-in 8000mAh battery so you can use it without plugging in for a day.  You can also use the battery to charge your phone etc.  The latter feature is more important than it sounds, because the battery makes the Keezel heavy.  When I travel with the Keezel I can leave one of my Mogix portable chargers behind making it roughly weight neutral from a backpack perspective.  It’s perfect for the all too frequent cases where the only seats available in an airport lounge or coffee shop are the ones without nearby outlets.

There is one big question-mark on a Keezel, why use one vs. VPN software on the device?  There are a number of reasons.  The first is that you may have devices that can’t install VPN software.  The Keezel lets you take your Fire TV stick, Echo, and other “IoT” devices on the road while keeping them off unsafe networks.  The second is Keezel’s anti-phishing and ad-block technology.  The third is that VPN services often have a limit on the number of devices they support per subscription.  For example, ExpressVPN limits you to 3 simultaneous connections.  While that is fine most of the time, occasionally you may want to exceed that number.  Fourth, while you may be perfect in turning on your VPN whenever you connect to a public network most people aren’t.  For example, what about your spouse or kids?  With their devices already set to automatically connect to Keezel, all you need do is connect it to the public WiFi and all devices being used by your party automatically are connected to a secure network.

The major downside I’ve found to Keezel is performance, as it peaks out at about 10Mbps for me.  Keezel says the range is 4-20Mbps.  I can do much better than that with ExpressVPN.  For example, on a 1Gbps FIOS connection I saw 400+ Mbps from an iPhone 7s Plus with no VPN, ~60 Mbps with ExpressVPN, and the aforementioned 10 Mbps from Keezel.  Of course public hotspots don’t usually offer high raw speeds, so the Keezel limits may actually be unnoticeable.  I haven’t tested it enough to be sure.

Pricing is also a factor to be considered.  ExpressVPN costs me $99/year.  A Keezel starts at $179 with the lifetime Basic service.  Basic has a speed limit of 500Kbps, so is mostly for email and light browsing.  A device with a year of Premium service, which brings the “HD Streaming Speed” goes for $229.  Premium service can be extended (or added to the Basic device) for $60/year.  So while Keezel is initially a little expensive, over multiple years (or many devices) it can work out to be quite cost-effective.

There are some things I’d like to see from Keezel that would make it a better security device.  Blocking malware serving sites, not just phishing sites, is a clear one.  Reports are another feature I’d like to see, since I like to spot check my networks for potential bad actors.   Additional URL filtering capability (e.g., “family safety” as a filtering category) is also desirable.  Overall, I’d like Keezel to provide security features more comparable to the EERO Plus service for EERO devices.  And, of course, I would like to see much higher performance than they currently provide.

What is my personal bottom line on Keezel?  For day-to-day use, where I walk into a Starbucks and need to kill an hour between meetings, I will stick with ExpressVPN to protect any device that needs WiFi.  When I’m staying at a hotel, I’ll use the Keezel to create my own secure WiFi network.  For scenarios in-between?  I’m undecided.

Posted in Computer and Internet, Mobile, Phishing, Privacy, Security | Tagged , | 2 Comments