Amazon, Netflix, Standard Cloud APIs and the Inevitable Lock-in

A few weeks ago Adrian Cockcroft (Cloud Architect @ Netflix) wrote another very interesting post on his blog. Adrian warms up the discussion sharing his experience about the reasons for which you may want to use public cloud services. While there are a lot of people (including myself) sometimes advocating about these concepts, there isn’t anything like hearing this first hand from the people that are actually running a business out of this model. I like to hear/read Adrian for this reason. It’s no secret that Netflix uses Amazon AWS to run their business and this is the second part of Adrian’s post. Admittedly the part that intrigued me the most.

The remaining part of his post is basically a public ask (or hope) to see AWS API compatible clouds (or clones),  possibly built around the OpenStack stack (no pun intended). He doesn’t seem to be shy about sharing his pessimism about OpenStack success (correct me if I am wrong Adrian) but this isn’t going to be the core of the post I am writing . Only time will tell who will be successful in doing what.

Going back to Adrian’s “ask” I believe there are a number of reasons why he would like to see an AWS clone. Again Adrian is welcome to set the record straight if I got the wrong understanding.

One of the reasons is somewhat logical and it boils down to: risk mitigation, additional resiliency and problem avoidance. I came to learn from another very interesting piece by Adrian that Netflix has a number of policies for backup and data retention. This includes backing up data on S3, copying them in different AWS availability zones, and eventually replicating them in different AWS regions. It only makes perfect sense for Netflix to go a step further duplicating these data at different service providers for an additional level of risk mitigation. This is after all what this slide was trying to convey in his interesting pitch (highly recommended if you haven’t watched it yet):

I’d speculate that another good reason for which Adrian would like to see alternative public clouds based on clones of the AWS APIs is this: Netflix would like to have choices. Simple. What’s wrong with that? I wouldn’t expect anything less if I was them. Someone would try to argue that Netflix doesn’t want to be locked-in into Amazon. I think the matter is a lot more complex and, in fact, I am not sure I agree (entirely) with that. I don’t even know if avoiding a certain level of lock-in is even possible at all anyway (more on this later).

Warning: I am not trying to sell vCloud to Adrian Cockcroft or anyone else. By the way I believe Adrian knows more about vCloud than I do. .

Having this said this is a hot topic. Adrian’s blog post (along with all comments on the thread) reminded me of a couple of old blog posts I wrote last year. They are “Open standards, open source, OpenStack and the TCPIP of Cloud APIs” and “vSphere, vCloud and the Meaning of Being Open” where I was trying to describe VMware’s strategy in terms of API standardization and choice of service providers. This is an oversimplified picture, from one of those blog posts, that focuses on the point I am trying to make: a common API that works across different service providers.

This picture primarily shows access to different service providers using the same interface but the story doesn’t stop here. Since vCloud Director is a product you can buy, you can even build your own private cloud if you want to. I regularly use, as a consumer of cloud services, a couple of internal labs (that mimic private clouds) as well as the public Stratogen cloud and another public cloud I am piloting with another big telco in Europe. I do have my choices.

Here I am not specifically talking about the effort of making the vCloud APIs an industry standard. Lately, I came to the (personal) conclusion that a standard API is a function of its adoption and not a function of a theoretical agreement. I am instead talking about the choice of service providers the vCloud stack would be able to guarantee to consumers. After all, it’s one stack instantiated many times by different organizations (either private or public). I am not sure if it’s a standard (yet), certainly it is very consistent. And this is where I can hear you claiming. “it’s a lock-in“. And this is where I would argue: “is a certain minimum level of lock-in avoidable anyway?”

Let’s try to get into a bit more details and explore the options this industry (more particularly consumers and providers of cloud services) have.

API lock-in

First of all, what on earth is a lock-in. How do you define it?  A lock-in, to me at least, is a function of the time it takes to move to an alternative solution. In the context we are discussing here a lock-in is a function of how much time and effort it would take to rewrite your software (for example the Netflix software) to talk to a different cloud interface. Adrian at some point says it wouldn’t be (too) difficult for Netflix to do that but the mere reasons for which he is looking for an AWS clone is telling me he doesn’t want to get to that point (my speculation).

At  this point, does it make any difference if the APIs you are writing your solution against are the vCloud APIs, the AWS APIs or the future OpenStack native APIs (these are APIs that exposes the OpenStack personality, not the AWS clone interface). I don’t think so. Lock-in isn’t so much what you are writing against (be it the vCloud APIs, the OpenStack APIs, or the Amazon AWS APIs), it is rather how difficult it is to move away from it.

At the end of the day, as a consumer, you don’t have control on any of those anyway. So it doesn’t make any difference at all.

If you are a service provider you are pretty much in the same situation if you intend to use vCloud Director or OpenStack. Unless you decide to take OpenStack, fork it and do with it whatever you want. In that case it’s a different kind of lock-in, and not necessarily a better one. Good luck with that.

Sure if you are big enough you may be able to contribute to the main OpenStack project and see what you need / want implemented sooner rather than later but, frankly, if you are an organization of such a size, chances are that you have a word on the roadmap of a proprietary product too. I have seen that first hand.

All in all using available third party software products (be them vCloud Director or OpenStack) to build clouds has the advantage of allowing consumers to connect to different service providers. Having this said, if users decide to consume services from these service providers, they are essentially locking themselves into that specific interface/API. Whatever that interface is.

I am not getting into the federation and hybrid cloud discussion here because it would only be useful to discuss why choosing one interface over the other could be better. Not the point of this post anyway.

Service Provider lock-in

The other option to see more openness (or the perception thereof) would be to keep Amazon AWS as your “gold standard” and pray for other service providers to implement a clone of their APIs (using OpenStack or any other tool). This is, to me, the worst of both worlds since both consumers and providers have certainly no control whatsoever on the AWS APIs (similarly to how you’d have no control over the vCloud APIs or the potential OpenStack native APIs). In addition to that you’d have to deal with the complexity of creating and consuming APIs whose clone is fundamentally a reverse engineering hack which will suffer the generic problems of copying someone else’s interfaces.

This is especially true when these interfaces are changing at the speed of light (given the pace Amazon is innovating introducing new cloud services) and also given the fact that the AWS interfaces appear to be pretty complex to track.

In reality, Adrian was asking for cloning only a subset of the features provided by AWS but, based on my past experience working for a company that was trying to be the overlay interface to everything, typically the only thing that works (somewhat) well across different virtualized platforms and interfaces is turn on and off virtual machines. I bet Netflix needs something more compelling than that to consider another service provider that claims to be compatible with the Amazon APIs. OK I am exaggerating but you see (hopefully) my point. If Amazon was to facilitate this cloning process or better yet if Amazon was to provide (read: sell) to service providers its own technology enablement stack the story would be very different but I don’t think any service provider will be successful in implementing an AWS clone if Amazon doesn’t want that to happen.

If I was evaluating this option, as a consumer, I would just give up with the idea of consuming a clone of Amazon…and I would just consume native Amazon AWS resources. Sure you are limiting yourself to a single service provider (AWS) but I think it is better to be locked-in into Amazon than having choices… that don’t work very well. Because, at the end, we all need to be pragmatic don’t we?

Conclusions

In conclusion I just want to reiterate that it’s just a bet you are making and you can’t really avoid a certain level of lock-in. It’s just a fact of (IT) life. In the last 15 years I came across a lot of vendors that were selling openness and freedom of choice. At the end of the day they were just trying to sell another control point. They don’t call it a lock-in as it makes the whole sales process a bit harder but it is what it is.

This post is not meant to bash Amazon or OpenStack. As a matter of fact I am bashing at least as much vCloud. It’s just a reality check of what’s going on and how I see these things progressing going forward for both consumers and providers of (IaaS) cloud services.

My message? Make your bet and keep your fingers crossed.

Perhaps I will be proven wrong. Oh well, it’s just my usual (less than) 2 cents

Massimo.

5 comments to Amazon, Netflix, Standard Cloud APIs and the Inevitable Lock-in

  • Hi Massimo,

    this is a reasonable summary of what I was talking about.

    I’m more concerned about having similar concepts to AWS than the fine details of the API. For example if your cloud doesn’t use security groups but uses a completely different mechanism, it is a conceptual mismatch that is much harder to deal with than having different API calls for the same concept.

    In reality, almost all of the Netflix platform is based on Java APIs. If you implement the bits of the AWS API that the various Java SDK’s actually use and don’t change the Java interfaces, then I have fairly transparent portability. That doesn’t help people who build their apps in Ruby or PHP etc, but it would address a fairly large chunk of the market.

    • Massimo

      Hi Adrian. Thanks for reading the post and comment. I am glad I captured somewhat correctly the content of your blog and presentation.

      You raise an excellent point in the first paragraph. I have been in meetings with large telcos that were trying to create an overlay interface that would then connect to various backend interfaces and it was immediately clear that the challenge was not so much around the semantics of the calls but the constructs that those interfaces were describing. The vCloud vDC (virtual data center) is an example of a construct available in an interface but that doesn’t exist in other similar products. You seem to be describing the same issues that telco was facing. Which makes sense.

      Interesting point you make in the second paragraph. I guess we will just need to see how things works out in the end. I wish I had a crystal ball to see the into the future.

      Thanks again once again, great stuff you are doing at Netflix. You are truly leading a change in this industry.

      Massimo.

  • Hi Massimo
    this is a great post that releals some of the issues with Amazon and the problematic implementation of Netflix over AWS.
    You are right, with cloud vendors the Lock-In is inevitable. This is why you won’t see companies like Google, Facebook, Yahoo, ebay, etc… going in that direction.

    businesses that think and aware of scale cannot afford vendor lock-in. in this case DIY is the method to avoid vendor lock-in as much as you can but this is something only some companies are aware of these days and build the knowledge to do it in-house.

    we can also talk about the economics of running on clouds but this is another subject.
    more on this … here:
    http://techblog.outbrain.com/2011/04/lego-bricks-our-data-center-architechture/

    • Massimo

      Ori, I disagree that the way out is the DYI. I (as a VMware vCloud Architect working with big Service Providers) hear over and over and over again that they are not in the business of creating that layer themselves. They want to get “something” that out-of-the-box they can use on top of which they want to build their own “SP cloud personality”.

      As I briefly touched on this post (when I mentioned forking OpenStack to get full control of that layer) I consider the DYI another form of lock-in. Per my definition of lock-in which is, again, the effort it takes to move away from a choice you made in the past. If you invested 2 years and a group of 30 people with a vertical knowledge in the tool you built to bring a cloud service to the market and you decide to de-commit from it (for whatever reason).. aren’t you locked-in a way?

      There are certain organizations (corner cases) where the DYI is the right choice (Google, Facebook, the other usual suspects and, at some extent, Amazon). For the vast majority of the other players out there (other service providers and more so customers) I don’t think the DYI makes a lot of sense (IMO of course).

      Thanks. Massimo.

  • Adrian Cockcroft

    Netflix started into AWS with the goal that we could port our platform to another vendor in a few months if we had to. We are very good at making big complex changes extremely quickly, so normal organizations would take far longer…

    Here’s the real benefit of being on AWS, it’s a huge ecosystem, not a single vendor. I see hundreds of resumes go by, almost all have AWS experience. I don’t remember seeing other cloud platforms on resumes apart from a few people who worked at Microsoft with Azure experience. There are also a huge number of products that interface to AWS, optimization, management, custom AMIs etc.

    By the logic in comments above, everyone should be running Linux laptops, not MacOS or Windows. A few people do, but they don’t get the benefits of the ecosystems that Apple and Microsoft have built.

    Until someone else builds an ecosystem to compare with AWS, or a clone that can leverage the existing AWS ecosystem, it remains the best option for getting products built faster. My bet is that it’s easier to clone and leverage AWS, but it’s at least plausible that someone like VMware or Microsoft could build an alternative cloud developer ecosystem.

Leave a Reply