The Cost of Building Clouds

Last week I posted an article on the VMware vCloud corporate blog (re-posted here). That article talks about the extensibility of the core vCloud platform to use features that are not natively exposed. While the use case is centered around vShield App, the extensibility framework really provides infinite possibilities.

I am very excited about this because it really demonstrates how the core can be extended. While VMware customers and partners cannot modify the core itself, they can indeed extend it. At what cost though? This is what I'd like to touch on below.


Before even thinking about building a cloud, you need to answer a very simple question: "how much do I want to pay for it?"

This, usually, has a couple of dimensions:

  • How much does the software (Cloud Management Platform) cost?
  • How much does the labor cost? And is this cost one-shot or recurring?

Let's make a step back (another one). There are really three viable philosophies when you want to build a cloud (public, private, whatever):

The red part is labor cost.

The blue part is software cost (assuming there is a cost)

Let's be crisp: the first model (build-your-own) is for Amazon, Google, Microsoft, Rackspace. Anyone else?

Oh yes, perhaps this model is for (a few) other SPs or Enterprise customers that are trying to re-invent the wheel. They will, inevitably, undertake a gigantic and expensive migration project to move to the second model when they realize the mistake they made.

The second model (core / extended) is for big SPs and Enterprise customers that want to start with a solid existing software foundation on top of which building their own customized solution.

The third model is for all the other SPs and Enterprise customers that prefer to have an out of the box solution without any sort of customization and extension.

What do you mean by out-of-the-box and what do you mean by customized/extended?*

In the context of this blog post, when I say out-of-the-box I mean the experience you'll get by taking a piece of (CMP) software and setting it up with a set of Next-Next-Next-Done wizards.

Done. Nothing more, nothing less. While there is obviously a labor cost associated to do this, for the sake of this discussion we will round it to 0 and we will assume it's just all software cost (assuming there is a cost associated to the CMP software).

When I say customized/extended I typically mean any of the following (for example):

  • I want to (or must) develop a web UI
  • I want to (or must) change a default web UI shipping with the core
  • I want to (or must) develop new APIs extending the core behavior
  • I want to (or must) change core APIs behavior
  • I want to (or must) change the core of the product
  • I want to (or must) develop workflows running on top of an orchestrator
  • I want to (or must) develop brand new scripts
  • I want to (or must) edit scripts shipping with the core software

This list should ideally include anything you can think of that sits between the out-of-the-box setup (see above) and "your" target solution.

Depending on what you want to (or must) do to implement "your" solution, the ratio between the red and blue part may change vastly (e.g. 80-20 or 20-80).

Where does VMware vCloud Director fits into all this?

VMware vCloud Director can be used to implement both the second and the third models I described above. Three years ago vCD was more of a black-box that wasn't very easy to extend, customize or integrate. These days, with the introduction of new features such as notifications, blocking tasks, API extensions, metadata tagging and, in general, with a heavy use of orchestration technologies, you can really customize and extend vCloud Director beyond the default out-of-the-box behavior. The exciting extensions we discussed in the blog post I linked at the beginning is a good example of this.

Sure enough there are a number of things you cannot do because you can't modify the core (closed source). Some open source CMPs will even allow you to modify the core (for good or bad).

So what's the problem?

The (potential) problem here is the maintainability of the solution overall. When you deploy a software in an out-of-the-box model, the vendor is essentially responsible for working out all of the hurdles associated to moving from one version of the stack to the next version of the stack. To the point where, ideally, a vendor should be able to provide an upgrade button that allows the Enterprise customer or the SP to upgrade the stack transparently (again, without the red part mentioned above).

Let's go back to the very exciting use case I have mentioned at the beginning of this blog post. If you read that post you've noticed that the fundamental components of the architecture are vCenter Orchestrator and vShield Manager. Essentially a set of workflows hosted in vCO that call the vShield Manager APIs (when appropriately triggered by vCD blocking-tasks).

Warning: this is what could happen to your workflows moving from one version of vCO to the next version of vCO:

A couple of (potential) problems:

  • Your workflows may (potentially) break moving from one version of vCO to the next one
  • Elevating all modules comprising the stack to the next version may be subject to a lot of dependencies

The reference to the vCD 5.1 plugin requiring vCO 5.1 (vCO 4.x is not supported) reminded me of a slide I built some 10 years ago whose title was "HW/SW stack version dependencies (i.e. Nightmare)":

While this discussion has nothing to do with hardware, imagine the dependencies nightmare you need to deal with in a stack comprised by so many moving parts: "you have to upgrade product A but product B only works with the old version of product C which however requires to be upgraded to be able to talk to the new version of product A". Well, if you have been in IT for more than 2 weeks you know what I am talking about.

Even without customizing / extending (by developing workflows) there is enough complexity here to keep you busy for months when you need to upgrade your stack.

But we are digressing. Back to the vCD / vShield App integration we were discussing at the beginning, this is what the vShield 5.1 API Programming Guide says about vShield API compatibility:

This is similar to the warning above for the compatibility of vCO workflows.

In essence what's happening here is that, as the core moves to the next release, the labor part will have to be adjusted to cope with the new core:

And this means a lot more work. In particular:

  • existing scripts and workflows will need to be adapted to the new APIs and objects (assuming they have changed)
  • features implemented in the extensions need to be transitioned and delivered through the core (assuming the core has implemented the feature)

As you can see, this is not just about the cost of developing and maintaining the customization/extension, but it's also a rather challenging operational nightmare. I am not talking about a PoC. I am talking about a production environment at scale.

Could this be any worse than this?

It sounds hard given what we saw above. However, yes it could be worse than that. From at least a couple of angles.

The more sophisticated "your" solution is, the more dependencies you create, the more expensive it becomes to maintain those customizations and extensions. Last year I talked about the Frankencloud and the ABC of lock-in. If it costs 2 years and 2M$ to create a Frankencloud, it will cost you another 4M$ over 3 years to maintain it (the red part of the puzzle).

Even worse than that, you may want to (or must) customize the core of a CMP software. I have always wondered what it takes to upgrade to a new release of an open source software when you took the previous release and heavily customized it. Oh well.

In general, while you may be getting the impression that I am picturing the vCloud platform as a mess to deal with, it is fair to say that the vCloud platform is still a couple of orders of magnitude easier to deal with compared to ANY other CMP software out there as of January 2013.

I am confused. What's the message here Massimo?

This post is not meant to scare you. I am not advising against customizing or extending things (either outside of the core or inside of the core). This post is more to create awareness that doing so doesn't come free of charge.

And, more importantly, this post is to remind that customizations and extensions do not only have a one time development effort (and cost). Rather, they have a recurring customization tax you need to take into account when you lay out your strategy to build a cloud. Regardless of the CMP you are using.

Everyone loves the idea of extending and customizing stuff. No one really talk about the cost associated to actually doing that (at scale, in production, not in a PoC).

Again, this isn't to stop you from doing so. However I hope it helps to create the best balance between the red and the blue parts. I'd like to avoid you finding this out by surprise 2 years (and 2M$) later.

For the Google and Amazon of the world this is a no brainer, to the point that they obviously built everything from scratch. How about you? How about the remaining 99.99% of the world population? What should your red Vs. blue balance look like?

Adapting your needs to an existing shipping software Vs. adapting an existing shipping software to your needs. That is the problem.

I don't have an answer for that, sorry, but hopefully the discussion above may help you take a more educated decision.