The Cloud Spectrum and the Amazon Dilemma

Last week at VMworld 2013 in San Francisco, among various other sessions, I presented “a Parallel between vCloud Hybrid Service and Amazon Web Services” (session #PHC5123). It went overall fairly well with lots of positive feedbacks. The one that stood out for me was a tweet from Jack Clark.

I enjoyed reading that feedback because that meant I successfully managed to “..NOT doing this session… the Microsoft way!”. Also, since I always try to be a trusted advisor, I appreciated that my name was associated to the word “honesty”.

Funny enough, It was only later that I realized who Jack is and why he was at my session. It’s always a pleasure and an honor being quoted on The Register. Particularly in the same paragraph with Bezos and in the same article with Mathew Lodge and Raghu Raghuram.

However, I am very disappointed that Jack didn’t pick up the strong statements and commitments I made about the Outback [steakhouse] and their blooming onion (I am still unsure how I ended up talking about those in a vCHS/AWS session, but anyway…).

Instead he decided to quote me on the serious stuff. Which is kind of boring.

Kidding aside (for a moment), I tried to spend 80% of that session discussing the technical parallels between the two services: “this is how it works in AWS, this is how the same thing works in vCHS”. I tried to cover all three major areas such as compute, storage and network.

The remaining 20% of the presentation was used to provide an industry positioning of the different services. Admittedly a highly debatable topic and, arguably, an academic discussion I should have entertained on my private blog rather than in a VMworld breakout session. But it is what it is, customers (and Jack) seem to have enjoyed it so…

Now that we are here (on my private blog) I’d like to try to clarify my (personal) thoughts and expose them to my readers that were not in the session.

The first concept I described is what I call the Cloud Spectrum. This is a natural follow on to the concept I introduced in the Cloud Magic Rectangle.

Make also sure that, as a background, you read the TCP-clouds, UDP-clouds, “design for fail and AWS blog post as well as the vCloud, OpenStack, Pets and Cattle post.

In my VMworld session, I tried to reduce / collapse the three columns I had in the Cloud Magic Rectangle into two major deployment models:

“Enterprise” (for lack of a better name)

  • Traditional Linux / Windows Applications

  • Hybrid

  • Resilient (HA, DR)

  • Built-in Enterprise Backup / Restore of VMs & Files (Pets)

  • Typically consumed with a GUI

  • Compute Instances (e.g. VM) and Storage (e.g. VMDK) usually managed as “one entity”

  • Limited number of VMs, fairly stable in number

  • More geared towards a traditional SQL model (always consistent)

  • Fixed Cost – Capacity Planning

“Design for Fail” (for lack of a better name)

  • Cloud Applications

  • Standalone

  • Resiliency built into the Application (cloud infrastructure not resilient)

  • No heavy need to backup instances (Cattle)

  • Typically consumed via an API

  • Compute Instances (e.g. EC2) and Storage (e.g. EBS, S3) usually managed separately

  • Huge amount of VMs, quickly varying in number (“we can auto scale 50.000 VMs in 5 minutes”)

  • More geared towards a NoSQL model (eventually consistent)

  • PAYG – No need to assess capacity needs

The characteristics of these models are described in more details in the posts I linked above.

At that point, I thought there was a need to visualize how different public cloud services map on a graph that represents the progression (from left to right) of an IT continuum moving from the first deployment model (Enterprise) towards the second deployment model (Design for Fail). That’s in fact how I see IT evolving (over time).

Enter the Cloud Spectrum:

Solid colors represent where the services are currently delivering in the context of the spectrum. Dotted rectangles represent where the services are aspirationally moving to.

Note I don’t seat in the board of directors in any of these companies so the ambitions I am calling out are speculative and based on common industry knowledge.

For example, I don’t see GCE (Google Compute Engine) being engineered natively for scenarios involving scale up single image existing applications where the underlying platform can guarantee high availability and DR independently of the application itself (see again the the TCP-clouds, UDP-clouds blog post). You can argue that grouping together Openstack and GCE isn’t right and that Openstack may be, aspirationally, trying to cover some (maybe not all) of the traditional enterprise workloads. Fair enough. Let’s not start debating the details of the size and shape of those rectangles.

Apparently the IT world is moving from left to right. We all agree on that.

What we instead typically end up discussing is how fast the world is moving. My stance is that, on average, the speed is glacial. But that’s me.

Pro-tip: the Netflix attitude is the exception, not the norm. I am also wondering whether this (move to the far right) will happen for everyone. One should start wondering when Google (and I mean, Google!) starts claiming that this No-SQL thing is just too hard. Quoting from the article:

“The reason Google built F1 was partly because of the frustration its engineers had found when dealing with non-relational systems with poor consistency problems”.

and again:

“In all such [eventually consistent] systems, we find developers spend a significant fraction of their time building extremely complex and error-prone mechanisms to cope with eventual consistency and handle data that may be out of date.”

Wow! Now, if engineers at Google can’t cope with the challenges of eventually consistent scenarios, imagine the average developer Joe.

Sadly, I think Oracle will continue to suck your money for the foreseeable future. Sorry about that.

Funny enough, I just came across this interesting article from Ben Kepes. Not only pontificating on designing for fail is easier than actually designing for fail… but apparently not even AWS was able to properly design for fail their own services. That doesn’t change anything about the awesomeness of the AWS services. It rather only speaks to how difficult it is to walk the talk (a problem every vendor has, VMware included).

As you can see from the slide (and as you can read from the press), vCHS has been introduced, initially, with a particular value proposition in mind. Having that said, there is no doubt there is a desire to cover many other use cases going forward, including the design for fail one and, more generally, all the space of the “new applications”. The announcement that VMware will be offering CloudFoundry on top of vCHS is a step towards that direction.

In this Cloud Spectrum I am speculating that all other vendors are looking at this space but very few of them have a desire to support existing Enterprise applications that have not been designed with a “true cloud” in mind. Google is a good example of this and Microsoft not guaranteeing any SLA for single VMs is another sign of “yes I want to get Enterprise workloads but I am not doing a lot to make myself appealing for those”. Your mileage may vary, consider me biased.

AWS is the 800 pounds gorilla in this discussion. Notice, in the Cloud Spectrum, that little note that says “how far?”.

Enter the Amazon Dilemma:

This should be self explaining and you don’t need an MBA from Stanford to get it.

Consider that AWS was speculated to make roughly 2B$ in 2012. To put things in perspective, the total IT spending in 2012 was in the ballpark of 3.6T$ (trillion dollars). In other words AWS represents roughly 0.06% of the total IT spending.

Now, it is obvious that the AWS TAM isn’t the entire IT spectrum (they don’t have [yet] printing as a service, luckily for HP) but it would be fair IMO to say that for every dollar spent on AWS, customers spend several hundred dollars for buying comparable hardware, software and services for traditional Enterprise deployments (typically on-prem, some from traditional outsourcing).

So here I am postulating the Amazon Dilemma:

If you are here [in the design for fail space], you think the world is going there [towards the extreme of design for fail] and you know that the bulk of money to be made for the foreseeable future are there [on the far left side of the spectrum]…. what do you do?  

I’d pay a ton to be in those rooms and listen to the arguments business people are making to purist cloud engineers: “We need to grab those money!”. “No way! That is not cloud!”

I have speculated two years ago that AWS may be looking at introducing more Enterprise characteristics into their cloud offering. Quoting myself:

“Amazon is full of smart people and I think they are looking into this as we speak. While we are suggesting (to an elite of programmers) to design for fail, they are thinking how to auto-recovery their infrastructure from a failure (for the masses). I bet we will see more failure recovery across AZs and Regions type of services in one form or another from AWS. I believe they want to implement a TCP-cloud in the long run since the UDP-cloud is not going to serve the majority of the users out there”.

This is an interesting dilemma AWS is facing. But more interesting will be to look at the faces of the clouderati if / when this happens.

Massimo.

11 comments to The Cloud Spectrum and the Amazon Dilemma

  • PJ

    Another excellente piece, Max. So true and so real the fact that the enterprise/enterpricey space still has its feet firmly cemented to legacy patterns and architectures (and infrastructures, but at least that has been/is slightly changing). Tons of reasons for that, but anyway I really like the spectrum analogies and the weighted dynamics that inherently grow from this. Excellent job as always.

    • Massimo

      Thanks PJ. Appreciated your comment.. particularly because I think you are a great forward looking thinker…. (but apparently still with your feet on the ground).

  • This is a good article highlighting the differences in IaaS implementations.

    VMWare and Amazon AWS are coming from different places. VMWare from the expensive enterprise datacentre world
    and Amazon from the cheap web commodity infrastructure world.
    Amazon are really adding high availability, recovery etc in their Platform as a service offerings – RDS, Caching, Autoscaling. Typically where there is data to protect from failure.

    I don’t expect amazon to offer the “expensive” enterprise datacentre High availability. I think they will just keep
    adding features to RDS etc so that Enterprises can migrate their applications easily to these environments to get High
    avaliability and recovery etc. as an example of this see: http://aws.typepad.com/aws/2013/09/migrate-mysql-data-to-amazon-rds-and-back.html

    So here is the real difference. Amazon will provide HA, recovery etc in their platform as a service offerings rather than down in the IaaS layer as vmware does. But VMware are also big on PaaS via Cloudfoundry.

    VMware and Amazon are heading to the same place – Platform as a Service and this is when the real competition will start!!!

    • Massimo

      Those are on the line of the conclusions at the end of my Magic Rectangle blog post.

      I am not 100% sure that delivering a highly reliable RDS service will do. Often “Pets” aren’t only databases but a broader number of workloads.

      Time will tell.

  • […] What applies to Azure vs vCHS applies to Amazon EC2 vs vCHS as well. Amazon however is more mature and feature rich than Windows Azure. Massimo Re Ferre’ , architect of VMware, wrote an interesting blog about this subject here. […]

  • […] vs “cloud enabled workloads” see Marcel van den Berg’s blog) [edit] I missed this great article from Massimo Re Ferre on “cloud spectrum” trends covering these workloads – a […]

  • Massimo, sorry late on this discussion just discovered your excellent blog via @AndiMann. Some good points you make on the state of enterprise development and the shift to the right. Since I work in that enterprise space and am responsible for moving the architecture I can verify the state as you see it. I would say that we definitely see the value of the design for fail shift in application development. But much like when SOA was introduced as a concept and later APIs etc which we ultimately adopted slowly, the average enterprise and developer is far behind the available capability/concepts that vendors/providers and analyst can provide. I think we are just starting to see some of the value in the design for fail concept but shifting an enterprise is not a trivial undertaking. I’m glad you used Netflix as an example of the exception rather than the rule. While I do follow Netflix fairly closely because I see some value in the way they do things I’m also a realist understanding that trying to compare or emulate Netflix in the average enterprise is simply not practical or doable.

    • Massimo

      Thanks Mark. It’s always good to get confirmations from the real world. Twitter is becoming more like Secondlife (a fake of the reality). Thanks!

  • Cloud Insider

    Good analysis in first sections. Your cloud spectrum chart however does not represent the future – it is more a snapshot of the past or present. Many large enterprises are moving to the right too. For example, we are working with multiple Fortune 100 companies that are now doing next gen apps. So if you look at the future, imo the large enterprises are also moving to the right. And yes I am one of the people in the room you referenced :-) .

    • Massimo

      The cloud spectrum chart doesn’t really talk about how / where workloads get deployed. That chart talk about the technical capabilities of public cloud platforms relative to the characteristics of workloads.

      What you are suggesting is that the green circle on the right hand side in the next slide exist. Yep it does. Surely there are a lot of Enterprises that have “next gen apps” project. But, as usual, it’s all relative. If they have a 1M$ next gen app project, what’s their total IT budget? That’s the point of that slide.

      There is also another interesting aspect to this which is: that slide assumes that Enterprise apps = past = infra resiliency required by infra whereas next gen apps = future = designed to sustain infra failure.

      What if (some) next gen distributed apps will still require infrastructure resiliency for best operations? I’ll throw it like that…. I don’t know the answer.

Leave a Reply