<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>IT 2.0</title>
	<atom:link href="http://it20.info/feed/" rel="self" type="application/rss+xml" />
	<link>http://it20.info</link>
	<description>Next Generation IT Infrastructures</description>
	<lastBuildDate>Thu, 09 Feb 2012 14:30:46 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Will we need a C for Nicira? God forbid!</title>
		<link>http://it20.info/2012/02/will-we-need-a-c-for-nicira-god-forbid/</link>
		<comments>http://it20.info/2012/02/will-we-need-a-c-for-nicira-god-forbid/#comments</comments>
		<pubDate>Thu, 09 Feb 2012 14:30:46 +0000</pubDate>
		<dc:creator>Massimo</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://it20.info/?p=481</guid>
		<description><![CDATA[<p align="justify">This morning I was on the phone with Ivan Pepelnjak (@ioshints) to decipher some of the paragraphs in one of his latest posts on Nicira Open vSwitch inside vSphere. He always has to bear with my stupid questions so I can see him (virtually), from time to time, facepalming some of my questions. Long <span style="color:#777"> . . . &#8594; Read More: <a href="http://it20.info/2012/02/will-we-need-a-c-for-nicira-god-forbid/">Will we need a C for Nicira? God forbid!</a></span>]]></description>
			<content:encoded><![CDATA[<p align="justify">This morning I was on the phone with Ivan Pepelnjak (@ioshints) to decipher some of the paragraphs in one of his latest posts on <a href="http://blog.ioshints.info/2012/02/nicira-open-vswitch-inside-vsphereesx.html"> Nicira Open vSwitch inside vSphere</a>. He always has to bear with my stupid questions so I can see him (virtually), from time to time, facepalming some of my questions. Long story short we cleared a few doubts I had on his write up and I decided to ask him yet another border line question. The question sounded like this:</p>
<p align="justify">&#8220;Ivan, I see Nicira (and many others) are using extensively the word <em>open</em>. I also see a lot of excitement from people that point to Nicira as a cross-hypervisor vendor thus giving this idea of openness and good feeling of not being locked-in. However I believe this problem is multidimensional: if people consider vSphere a lock-in for the traditional virtualization space, why aren&#8217;t&#8217; people considering Nicira proprietary for what they call the network virtualization? In the final analysis, why would one want to have 3 vendors to virtualize servers and 1 vendor to virtualize the network? What&#8217;s your thought?&#8221;.</p>
<p align="justify">At that point Ivan laughed out loudly and I was sure my question was another facepalm. Oh well. But before we get there, let me show you a picture of what I had in my head while asking that question and that demonstrates why I thought that vSphere isn&#8217;t that different from Nicira NVP (from a lock-in vs openness perspective):</p>
<p><img style="border: 0pt none;" src="http://www.it20.info/misc/pictures/WillweneedaCforNicira-Godforbid.jpg" alt="" width="722" height="372" border="0" /></p>
<p align="justify">If you segment vSphere as a mere compute virtualization layer (we need to talk about this by the way, maybe in another post) and Nicira as a network virtualization layer they both are pretty much &#8220;open&#8221; in terms of objects they support. They just happen to be different objects because we segmented them into different categories. This doesn&#8217;t mean that one compute virtualization product is a &#8220;lock-in&#8221; whereas one network virtualization product isn&#8217;t a &#8220;lock-in&#8221;. In other words, if customers are strategically looking at different hypervisors (are they?) for not being locked-in&#8230; why shouldn&#8217;t they look at different network virtualization products for not being locked-in?</p>
<p align="justify">I am looking forward to the day when a <a href="../2012/02/the-abc-of-lock-in/">C comes in</a> and say <em> &#8220;oh wait, now you have Nicira and Pokera (a name I&#8217;ve just made up, don&#8217;t bother googling it)&#8230;. let me manage them both for you in a single pain of glass&#8221;</em>. God forbid! My suggestion? Run! Run! Run!</p>
<p align="justify">And this is where the next massive mess in the compute era is going to begin, all over again, forgetting about the most important cloud principle above all: economy of scale through simplification. Amazon docet.</p>
<p align="justify">I don&#8217;t envy you Mr. customer: you have the choice of either being <a href="../2011/09/amazon-netflix-standard-cloud-apis-and-the-inevitable-lock-in/">&#8220;inevitably locked-in&#8221;</a> or die under a ton of scripts (or under a ton of expensive consultants writing them for you for that matter). I don&#8217;t honestly see a third way.</p>
<p align="justify">But wait a moment, we left Ivan laughing and forgot about him! Perhaps he thinks that this is all wrong. Perhaps OpenFlow is so <em>open</em> that you can interchange vendors at will and avoid that lock-in everybody is concerned about. Well it turned out, much to my surprise, that Ivan was laughing because he linked my very own article <a href="../2012/02/the-abc-of-lock-in/">&#8220;The ABC of Lock-in&#8221;</a> in a comment of a blog post published on <a href="http://packetpushers.net/"> PacketPushers</a> that was talking about this very same problem. Read it yourself <a href="http://packetpushers.net/is-openflow-losing-its-openness/"> here</a>. While there is admittedly some level of (theoretical?) interoperability between some of the components in an OpenFlow deployment, network professionals don&#8217;t seem to be so positive and I&#8217;d be interested myself to see a real life homogenous production network built with multi-vendor technologies.  Mine isn&#8217;t an academic question: we know everything is possible in a demo or better in a power point deck. Mine is more of a practical question for real customers running real businesses. After all having an A and a B interoperate with each other wouldn&#8217;t be easier, in my opinion, than having <a href="../2012/02/the-abc-of-lock-in/">a C homogenizing an A and a B</a>.</p>
<p align="justify">Bear with me please. I may not understand a lot about networking (admittedly) but I have been around enough to see &#8220;the big picture&#8221; (hopefully).</p>
<p align="justify">I have just came to the conclusion that, perhaps, <em>open</em> is an abused word. What do you think?</p>
<p align="justify">Massimo.</p>
]]></content:encoded>
			<wfw:commentRss>http://it20.info/2012/02/will-we-need-a-c-for-nicira-god-forbid/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>The ABC of Lock-In</title>
		<link>http://it20.info/2012/02/the-abc-of-lock-in/</link>
		<comments>http://it20.info/2012/02/the-abc-of-lock-in/#comments</comments>
		<pubDate>Thu, 02 Feb 2012 17:03:18 +0000</pubDate>
		<dc:creator>Massimo</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://it20.info/?p=471</guid>
		<description><![CDATA[<p style="text-align: justify;" align="justify">There have been a lot of discussions lately about a topic I find extremely interesting: vendor lock-in.</p> <p style="text-align: justify;" align="justify">Multi-hypervisor is a discipline where you can apply the high level ranting below but you can really apply it to pretty much everything in IT.</p> <p style="text-align: justify;" align="justify">I started this blog <span style="color:#777"> . . . &#8594; Read More: <a href="http://it20.info/2012/02/the-abc-of-lock-in/">The ABC of Lock-In</a></span>]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;" align="justify">There have been a lot of discussions lately about a topic I find extremely interesting: vendor lock-in.</p>
<p style="text-align: justify;" align="justify">Multi-hypervisor is a discipline where you can apply the high level ranting below but you can really apply it to pretty much everything in IT.</p>
<p style="text-align: justify;" align="justify">I started this blog post writing a couple of pages (as usual) and then I thought no one would care to read it (how can I blame you?). So I summarized it in a few pictures. A picture is worth a thousands words. Always.</p>
<p style="text-align: justify;" align="justify">So the story goes like&#8230; you (the customer) start with A and you build or buy an ecosystem of people, tools, knowledge, programs, scripts (yeah A has APIs) and a lot of other things you need to do to fully exploit the value of A.</p>
<p align="justify"><img src="http://www.it20.info/misc/pictures/TheABCofLockIn1.jpg" alt="" width="336" height="408" border="0" /></p>
<p align="justify">You (the customer) are happy but then comes vendor C to your door and tells you that you are locked in into A. &#8220;It isn&#8217;t so easy to move away from it given all the investments you have done&#8221; he says. &#8220;Imagine if A was to apply a vTax at some point: God forbid!&#8221; C goes on. C tells you there is B now which is good and cheap and you can adopt both A and B so you are not locked in into either. &#8220;Let C manage them for you transparently&#8221; he says. And this is what happens (in theory):</p>
<p align="justify"><img src="http://www.it20.info/misc/pictures/TheABCofLockIn2.jpg" alt="" width="431" height="415" border="0" /></p>
<p align="justify">Yeah, all of a sudden you (the customer) find out that (2 years and 2M$ of professional services later) you are&#8230; locked in into C. Imagine now if C was to apply a cTax&#8230;. God forbid! You would need to move to D which is cheaper and the story goes on and on. What&#8217;s your business? Bank transactions? Shoemaker? Doh I thought you wanted the infrastructure to disappear not become your core attention.</p>
<p align="justify">If you thought that this was the end of a sad story there is more. Actually it gets a lot worse than this. It turns out that (2 years and 2M$ of professional services later) you can actually only send &#8220;heterogenous&#8221; alerts (such as &lt;the disk is full&gt;) to operators in the middle of the night and perhaps present a web interface to a user to power on and off a VM on both platform A and B. Oh and did I mention that when A and B delivers a new version of their platforms you need to give C another good 2 years and 2M$ to &#8220;adapt it&#8221;? Ok now I told you.</p>
<p align="justify">You thought this was the end didn&#8217;t you? Well not quite, there is even more:</p>
<p align="justify"><img src="http://www.it20.info/misc/pictures/TheABCofLockIn3.jpg" alt="" width="684" height="517" border="0" /></p>
<p align="justify">Since you can only send &#8220;the disk is full&#8221; type of alerts and provision a VM from a portal (which is neither multi-hypervisor <span style="text-decoration: underline;">management</span> nor <span style="text-decoration: underline;">IaaS cloud</span> by the way) you have to build another ecosystem for B similar to what you built for A, essentially doubling your past efforts (which is the reasons for which many people argue that a multi-hypervisor strategy is inefficient).</p>
<p align="justify">Can it get any worse than this? I can&#8217;t think how.. however if it can, it will. Be sure.</p>
<p align="justify">Tip1: I have seen these things. First hand. You have full rights to not trust me and think I am biased now though. That&#8217;s ok.</p>
<p align="justify">Tip2: In the interest of time (I&#8217;ve got work to do too) I exaggerated to make a point. Apply your common sense. Look at the forest and not at the tree in this post. I was also having some fun with some of you. You know who you are.</p>
<p align="justify">Discuss below if you want. I am running out of time.</p>
<p align="justify">Massimo.</p>
<p align="justify"><span style="color: #ff0000;"><span style="text-decoration: underline;">Update</span>: reading the comments below I am starting to realize there is a chance this post gets misread and misunderstood. I genuinelly believe there is a difference between &#8220;being able to use both A + B as loosely coupled platforms&#8221; and &#8220;using C to avoid lock-in and managing multiple platforms as one&#8221;. This post was meant to say that the former is doable but can be inefficient, while the latter is just a unicorn thing.  More in the discussions underneath.</span></p>
]]></content:encoded>
			<wfw:commentRss>http://it20.info/2012/02/the-abc-of-lock-in/feed/</wfw:commentRss>
		<slash:comments>31</slash:comments>
		</item>
		<item>
		<title>Virtualization Costs, Virtualization Advantages and the Case for Multi-Hypervisors</title>
		<link>http://it20.info/2012/01/virtualization-costs-virtualization-advantages-and-the-case-for-multi-hypervisors/</link>
		<comments>http://it20.info/2012/01/virtualization-costs-virtualization-advantages-and-the-case-for-multi-hypervisors/#comments</comments>
		<pubDate>Mon, 09 Jan 2012 15:41:17 +0000</pubDate>
		<dc:creator>Massimo</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://it20.info/?p=464</guid>
		<description><![CDATA[<p align="justify">Last week I came across an interesting blog post from Mark Thiele. The idea of the article is that, as virtualization becomes a relevant cost for IT, it becomes a target for savings. I tried to engage with Mark on twitter but discussing a matter like this in 140 chars becomes a bit frustrating. <span style="color:#777"> . . . &#8594; Read More: <a href="http://it20.info/2012/01/virtualization-costs-virtualization-advantages-and-the-case-for-multi-hypervisors/">Virtualization Costs, Virtualization Advantages and the Case for Multi-Hypervisors</a></span>]]></description>
			<content:encoded><![CDATA[<p align="justify">Last week I came across an interesting <a href="http://datacenterpulse.org/blogs/mark.thiele/why_enterprises_will_force_down_cost_virtualization">blog post</a> from Mark Thiele. The idea of the article is that, as virtualization becomes a relevant cost for IT, it becomes a target for savings. I tried to engage with Mark on twitter but discussing a matter like this in 140 chars becomes a bit frustrating. So I decided to share my thoughts in a more structured way in this (hopefully) brief post.</p>
<p align="justify">Mark posted these two tables to demonstrate his theory:</p>
<p align="justify"><img src="http://www.it20.info/misc/pictures/VirtualizationCostsVirtualizationAdvantagesandtheCaseforMulti-Hypervisors-1.jpg" alt="" width="721" height="312" border="0" /></p>
<p align="justify">His theory is that, as virtualization now accounts for roughly 30% of the entire IT budget, it becomes a target for cost reduction within organizations. Perhaps I am reading too much into what Mark wrote but my understanding is that he is pointing fingers towards VMware for that &#8220;virtualization cost&#8221; and, while he is not calling this out specifically, he is alluding to the usage of competitor products. Perhaps in a mixed environment. Mark is welcome to chime in and set the record straight if that is not correct. However I&#8217;ll just go ahead and assume that. There are a lot of people thinking along these lines anyway.</p>
<p align="justify">I believe the numbers are plain wrong, the premises are plain wrong and, subsequently, the conclusions are wrong. The following is a list of counter arguments to this theory I&#8217;d like to throw onto the table.</p>
<p align="justify"><strong>Wrong numbers</strong></p>
<p align="justify">I am wondering what that 30% of virtualization cost includes. If one thing is sure that is NOT the cost of the virtualization licenses alone. I used to work for a hardware vendor and when we were selling 10K$ / 15K$ worth of hardware for a new SMB virtualization project that would have been paired with a 3K$ VMware Essentials Plus license. And that 15K$ for the hardware was just a fraction of the entire IT budget. SAP or Oracle anyone? While I am not going to disclose anything particularly sensitive let&#8217;s just say that, on average, an Enterprise buying &#8220;a few M$ worth of VMware ELA&#8221; usually has an IT budget that is in the ballpark of &#8220;a few <strong>hundreds</strong> M$ in total&#8221;. I guess it is somewhat fair to say that the entire IT budget of an organization is roughly two orders of magnitude bigger than the VMware virtualization license costs. Either that 30% is a typo (perhaps it should be 3%) or there is a 27% additional hidden cost when you deploy a virtualization solution? As usual &#8220;in medio stat virtus&#8221;. More on this later.</p>
<p align="justify"><strong>Wrong premise </strong></p>
<p align="justify">In Mark&#8217;s theory, if you adopt virtualization your bottom line remains the same. You are basically shifting costs. If you used to spend &#8220;100&#8243; a few years ago, you are now spending &#8220;100&#8243; if you sum up the virtualization costs with the savings in the other areas. My first reaction was &#8220;why would you want to do that then?&#8221;. My second reaction was &#8220;this is plain wrong&#8221;. I have been working with customers implementing virtualization solutions for the last 10 years and all of them told me that the savings are enormous and many times the ROI associated to implementing virtualization is measured in months, not even in years. Once you reached that milestone, it&#8217;s all savings from that point on. Unfortunately I can&#8217;t quantify what&#8217;s the bottom line &#8220;after virtualization&#8221; but my gut feeling is that:</p>
<ul>
<li>
<p align="justify">it&#8217;s less (far?) than 100</p>
</li>
<li>
<p align="justify">the virtualization cost is still peanuts compared to many other areas of the IT budget.</p>
</li>
</ul>
<p align="justify">In Mark&#8217;s table the &#8220;virtualization cost&#8221; is twice as much as the cost of the &#8220;people&#8221;. Really? That is beyond me. We must be kidding.</p>
<p align="justify"><strong>Wrong metric </strong></p>
<p align="justify">Or at least partially wrong metric I should say. You can virtualize for many reasons. One is to lower IT costs (not shifting them). Another one is to achieve what you cannot achieve without virtualization. More agility and more business alignment someone would say. I&#8217;d like to stick on practical examples and I&#8217;ll say better DR and High Availability for your legacy applications.</p>
<p align="justify">Or, for example, how much ($) can you associate to the ability to deploy an application in a matter of minutes Vs a matter of weeks / months? I&#8217;ll give credit to Mark to recognize this when he says <em>&#8220;Now, please don&#8217;t read this the wrong way, I&#8217;m not an advocate of the thinking that IT is merely a place that helps us cut the cost of IT&#8221;. </em></p>
<p align="justify"><strong>Multi-hypervisors </strong></p>
<p align="justify">A lot of people think that a proper multi-hypervisor strategy would help to lower the cost of virtualization. This is a very important  matter and one that would require a very detailed analysis. Not something I am going to do in this blog post anyway. &#8220;Multi-hypervisor&#8221; may mean a lot of things to different people as there are a lot of layers where you can integrate different stacks. People sometimes trivialize this complexity.</p>
<p align="justify">I am not conceptually against the theory of multi-hypervisors. I find however weird the idea that a multi-hypervisor strategy could save you on license costs. There are situations where a multi-hypervisor strategy may make sense (I may end up writing something about it) but for the majority of the Enterprise organizations out there it just makes little sense. In my opinion at least.</p>
<p align="justify">This ties back to the numbers we have discussed at the beginning. If we all agree that virtualization license costs are in the range of 3 to 5 % (or less?) of the total IT budget than it doesn&#8217;t make any sense to target that as an opportunity for savings. On the other hand I can see that the &#8220;virtualization cost&#8221; category doesn&#8217;t only account for the license costs but associated training, tooling and skills that manage the solution you are building with those licenses.</p>
<p align="justify">Now, I still believe that these hidden costs aren&#8217;t 27% of the whole IT budget (they could be another good 3% to 5% perhaps) but the point is that the higher this latest number is, the more expensive it becomes for an organization to have multiple hypervisors and virtualization stacks deployed to manage. This usually means duplicating tools, skills and, in the final analysis, duplicating efforts and costs.</p>
<p align="justify"><strong>In Conclusion&#8230;</strong></p>
<p align="justify">As you can see it&#8217;s easy to make up numbers and draw wrong conclusions from them. I have tried to give you a slightly different perspective assuming different numbers and different premises. Run your own numbers and feelings against this and Mark&#8217;s blog posts and come up with your own conclusion as whether you should actually lower those costs.</p>
<p align="justify">My way to look at this is that reducing the cost of virtualization in an organization is like trying to save on a 3% cost of the total cost of IT and, in doing so, potentially implementing something technically inferior that will drive up management costs and will lower the business advantages you have achieved. At the end of the day what you are buying is not licenses but &#8220;value for the money&#8221; and if many people are still buying VMware solutions in bulk numbers it may mean that people are not interested in saving 1% of the IT budget by dumping an excellent infrastructure solution that is delivering so much for them.</p>
<p align="justify">You have a right to disagree. I&#8217;d love to continue this discussion in the comments section if you want, certainly there is a lot left to say and argue over these numbers.</p>
<p align="justify">Massimo.</p>
]]></content:encoded>
			<wfw:commentRss>http://it20.info/2012/01/virtualization-costs-virtualization-advantages-and-the-case-for-multi-hypervisors/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>vCD Custom Portals and Backend Integrations in a Service Provider Environment</title>
		<link>http://it20.info/2011/12/vcd-custom-portals-and-backend-integrations-in-a-service-provider-environment/</link>
		<comments>http://it20.info/2011/12/vcd-custom-portals-and-backend-integrations-in-a-service-provider-environment/#comments</comments>
		<pubDate>Thu, 01 Dec 2011 09:24:13 +0000</pubDate>
		<dc:creator>Massimo</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://it20.info/?p=458</guid>
		<description><![CDATA[<p align="justify"> This article was originally posted on the VMware vCloud corporate blog. I am re-posting here for the convenience of the readers of my personal blog.</p> <p align="justify">This topic is (rightly so) coming up a lot lately with the Service Providers (SPs) I am working with so I thought I&#8217;d share some high level <span style="color:#777"> . . . &#8594; Read More: <a href="http://it20.info/2011/12/vcd-custom-portals-and-backend-integrations-in-a-service-provider-environment/">vCD Custom Portals and Backend Integrations in a Service Provider Environment</a></span>]]></description>
			<content:encoded><![CDATA[<p align="justify"><a href="http://blogs.vmware.com/vcloud/2011/11/vcd-custom-portals-and-backend-integrations-in-a-service-provider-environment.html"> This article</a> was originally posted on the <a href="http://blogs.vmware.com/vcloud/">VMware vCloud corporate blog</a>. I am re-posting here for the convenience of the readers of my personal blog.</p>
<p align="justify">This topic is (rightly so) coming up a lot lately with the Service Providers (SPs) I am working with so I thought I&#8217;d share some high level ideas on how we are engineering those clouds. This short article is meant to share some guiding principles on how to engineering custom portals and backend integrations for SPs that are adopting vCloud Director. Please note that this is a very broad topic and if we were to get into all of the details and potential ramifications we would need a book and not a blog post to describe this.</p>
<p align="justify">So what does it make it so unique? SPs have been building portals and integrations forever. Why would a vCD based solution be any different? Well, let&#8217;s make a step back. There are two main reasons why Service Providers want to use vCloud Director:</p>
<ul>
<li>
<p align="justify">Avoid reinventing the wheel and use an out-of-the-box product that delivers the cloud backbone (RBAC, virtual data centers, security, multitenancy etc) on top of which they can create their own solution and value.</p>
</li>
<li>
<p align="justify">Exposing the native vCloud APIs to enable federation with customers that are using VMware technologies (either vSphere or vCloud Director in so called &#8220;private cloud&#8221; deployments).</p>
</li>
</ul>
<p align="justify">The next picture shows, at the very high level, the vCD architecture. A more detailed description can be found <a href="../2011/03/vshield-products-packaging-explained-with-a-focus-on-vcloud-director/"> here</a> if you are interested.</p>
<p><img src="http://www.it20.info/misc/pictures/vCDCustomPortalsandBackendIntegrationsinaServiceProviderenvironment1.jpg" alt="" width="467" height="351" border="0" /></p>
<p align="justify">APIs. APIs. APIs. If there is anything that matters in the cloud that is the APIs. In other words a programmable infrastructure. If you are a Service Provider interested in vCloud Director you are probably interested in the vCloud APIs because that means that, as we mentioned above, you can reach out to a vast amount of VMware customers allowing them to connect to an &#8220;on line compatible infrastructure&#8221;. You can read more of this hybrid cloud opportunity <a href="../2011/02/vmware-vcloud-connector-on-the-way-to-the-hybrid-clouds/"> here</a> and this is a high level representation of this concept:</p>
<p><img src="http://www.it20.info/misc/pictures/vCDCustomPortalsandBackendIntegrationsinaServiceProviderenvironment2.jpg" alt="" width="811" height="610" border="0" /></p>
<p align="justify">Browser based access to the cloud is a no brainer. You can read more <a href="../2011/02/my-cloud-consumer-experience-%E2%80%93-episode-4-managing-workloads-with-vcloud-connector/"> here</a> about how to use vCC (vCloud Connector) to connect to a public cloud. You can read more <a href="http://www.vcoteam.info/newsflash/vmware-released-the-vcenter-orchestrator-plug-in-update-for-vcloud-director-15.html"> here</a> if you are interested in connecting your vCO (vCenter Orchestrator) instance to a VMware cloud. These are just two examples that describe how the end-user can leverage a vCD based public cloud. VMware, and the ecosystem as a whole, is coming out with a number of tools that interact with the vCloud APIs natively. VMware vFabric AppDirector is another good example of these tools consuming these programmable interfaces. I encourage you to have a look at the <a href="http://www.vmware.com/products/datacenter-virtualization/vfabric-appdirector/overview.html"> brief demo video available here</a>.</p>
<p align="justify">If it isn&#8217;t clear yet, this is the reason for which developing a ton of logic right above the vCloud APIs isn&#8217;t a good strategy if SPs want to offer a VMware compatible cloud service. You want the vCloud APIs to be widely available and well exposed. Not obscured by &#8220;a ton of scripts and workflows&#8221;. That is to say that building something that look like the following picture may not be a good idea if you want to be part of what I call the <a href="../2010/09/vsphere-vcloud-and-the-meaning-of-being-open/"> <em>vCloud bus</em></a>:</p>
<p><img src="http://www.it20.info/misc/pictures/vCDCustomPortalsandBackendIntegrationsinaServiceProviderenvironment3.jpg" alt="" width="850" height="638" border="0" /></p>
<p>&nbsp;</p>
<p align="justify">Do not do that. Please.</p>
<p align="justify">Having this said, let&#8217;s dig into what the SPs need and what their requirements are. An oversimplification of what they would like to achieve can be summarized as follows:</p>
<ul>
<li>
<p align="justify">They want to have a customized portal where they can keep their own traditional look and feel and potentially expose additional services.</p>
</li>
<li>
<p align="justify">They need to integrate into their backend systems through a mix of business and technical orchestration tools.</p>
</li>
</ul>
<p align="justify">So let&#8217;s try to take this apart and start with the first requirement. Ideally the SP would need to build a brand new portal (the out of the box vCloud Director web portal cannot be customized) or reuse an existing portal that they want to complement with the new vCloud Director based IaaS cloud services. As you can see this allows the SP to mesh vCD native services with other services that need to be exposed. These could be other VMware services that are not yet integrated into the vCloud API framework (VMware Chargeback or <a href="../2011/03/vshield-products-packaging-explained-with-a-focus-on-vcloud-director/">vShield App</a> come to mind) or totally different services that the SP would like to make available to external customers.</p>
<p><img src="http://www.it20.info/misc/pictures/vCDCustomPortalsandBackendIntegrationsinaServiceProviderenvironment4.jpg" alt="" width="772" height="579" border="0" /></p>
<p>There is only one principle that the SP needs to be conscious of when building this custom portal: the additional services exposed in the custom portal needs to be loosely coupled from the vCloud Director services. In other words the architect designing this needs to make sure that accessing vCD services through the native APIs doesn&#8217;t break the consistency. Basically the custom portal cannot inhibit users to access vCD through the out of the box UI or the native vCloud APIs if basic native functionalities is what the users need to access. Putting it in (yet) another way, accessing the cloud via the native vCloud APIs / UIs shouldn&#8217;t break the consistency of the whole solution but only limit the users in what they can do (as opposed to accessing a custom portal that has more advanced functionalities).</p>
<p align="justify">This is, in essence, the reason for which we removed the &#8220;Orchestration / Logic&#8221; from the top of the vCloud APIs. Should the SP build the logic on top of those APIs they are essentially obscuring them. In fact, allowing a user to access obscured vCloud APIs would lead to bypassing the logic which in turns would make the whole solution inconsistent.</p>
<p align="justify">So what do we do to satisfy the SPs requirements of synchronizing the backend according to events that may occur at the vCloud Director level? The typical example SPs usually refer to is a scenario where an end-user deploys a new vApp and there must be some logic (somewhere) that intercepts this event to update a CMDB with the relevant information. Now, we can spend the remaining of this post discussing the value of capturing a self-service vApp deployment in the cloud into such CMDB but we will leave this discussion for another post. The question is: if we can&#8217;t put this logic between the user and the vCloud APIs to intercept this event, how can the SP know what happened to track it properly (the CMDB is just an example, it could be any backend system such as ticketing or anything really).</p>
<p align="justify">In vCD 1.5 VMware introduced a new feature called &#8220;vCloud Messages&#8221; also known as &#8220;notifications&#8221; or &#8220;call-outs&#8221;. Essentially vCloud Director 1.5 is able to track internal events and notify them via an AMQP message bus for an external module to consume these information. The picture below shows the flow where vCloud Director informs the AMQP bus that an event has occurred and the Orchestrator will take the proper action to update the backend systems:</p>
<p><img src="http://www.it20.info/misc/pictures/vCDCustomPortalsandBackendIntegrationsinaServiceProviderenvironment5.jpg" alt="" width="850" height="638" border="0" /></p>
<p align="justify">In this example a vApp is deployed using the vCloud APIs, vCloud Director puts a message on the AMQP bus that the vApp has been created, the orchestrator module reads this message and it then updates the CMDB. Note that the module where the logic is implemented connects to basically all modules in the infrastructure since the notification may require actions that go beyond those of updating a back-end system.</p>
<p align="justify">It is also important to note that the diagram above is a logical representation. The &#8220;Additional Cloud Services&#8221; illustrated above can either be delivered via the Orchestration / Logic components or by totally different subsystems that are available in the Service Provider infrastructure. In other words there should also be a virtual link from the Custom Portal to the Orchestrator / Logic components. The very same principles discussed above apply here as well. Exposing additional services (made available by the orchestration layer) shouldn&#8217;t inhibit and limit end-users from accessing their resources via the native vCloud APIs (or UI for that matter).</p>
<p align="justify">Perhaps it is worth spending a minute to better characterize the Orchestration / Logic brick. In a complex organization like a Service Provider this may be comprised potentially of multiple modules and products. Usually there are at least a couple of components inside that brick and they are what I refer to as a Business Orchestrator and a Technical Orchestrator. The former is responsible for interacting with the back-end systems (it may even be considered part of the back-end systems) whereas the latter is responsible for interacting with the actual infrastructure components and modules. Graphically, it means this:</p>
<p><img src="http://www.it20.info/misc/pictures/vCDCustomPortalsandBackendIntegrationsinaServiceProviderenvironment6.jpg" alt="" width="675" height="511" border="0" /></p>
<p align="justify">One of the reasons for this split is because the business orchestrator module plays a key role in the governance of the solution but doesn&#8217;t usually have the full range of adapters and connectors to talk to the infrastructure modules. Because of this it leverages a technical orchestrator module to deal with that part. In most situations the Service Provider already have such a business orchestrator in place. Most of the time though, based on my experience, what&#8217;s missing is a more technical orchestrator module that interacts with the lower level infrastructure components. This leads to lots of extra in-house development that is expensive, time consuming and hard to maintain.</p>
<p align="justify">This is where <a href="http://www.vmware.com/products/vcenter-orchestrator/overview.html"> vCenter Orchestrator</a> comes in. We have previously mentioned, at the beginning of this post, you can use vCO as a cloud end-user tool to consume the vCloud APIs but where vCO really shines is as a technical orchestrator acting in the back of the cloud to pull all the infrastructure pieces together.  There is also a nice article that talks about how to <a href="http://www.vcoteam.info/learn-vco/building-your-custom-cloud-when-to-use-the-vcloud-api-or-the-vcenter-orchestrator-web-service.html"> extend vCloud Director capabilities using vCenter Orchestrator</a> (this ties back to the concept that additional cloud services exposed in the custom portal could be delivered by the orchestrator directly).</p>
<p align="justify">Note that what I have discussed here so far is the logical high level architecture of the solution. Different modules do not necessarily mean different products (although they often do). For example there may be situations where a single product could deliver both a portal and business orchestration modules. <a href="http://www.vmware.com/products/datacenter-virtualization/service-manager/overview.html"> VMware Service Manager</a> is an example of these products. As I said big Service Providers often have this part historically covered already anyway.</p>
<p align="justify">In conclusion, it is advisable (if not imperative) for Service Providers to be able to expose the native vCloud APIs to maximize market opportunities and value to existing VMware customers. In order to do so SPs need to follow proper design principles for backend integration and custom portals design. This brief blog post is only meant to be a starting point for outlining the criticalities associated.</p>
<p align="justify">Massimo.</p>
]]></content:encoded>
			<wfw:commentRss>http://it20.info/2011/12/vcd-custom-portals-and-backend-integrations-in-a-service-provider-environment/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Amazon, Netflix, Standard Cloud APIs and the Inevitable Lock-in</title>
		<link>http://it20.info/2011/09/amazon-netflix-standard-cloud-apis-and-the-inevitable-lock-in/</link>
		<comments>http://it20.info/2011/09/amazon-netflix-standard-cloud-apis-and-the-inevitable-lock-in/#comments</comments>
		<pubDate>Wed, 14 Sep 2011 14:27:33 +0000</pubDate>
		<dc:creator>Massimo</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://it20.info/?p=451</guid>
		<description><![CDATA[<p style="text-align: justify;">A few weeks ago Adrian Cockcroft (Cloud Architect @ Netflix) wrote another very interesting post on his blog. Adrian warms up the discussion sharing his experience about the reasons for which you may want to use public cloud services. While there are a lot of people (including myself) sometimes advocating about these concepts, <span style="color:#777"> . . . &#8594; Read More: <a href="http://it20.info/2011/09/amazon-netflix-standard-cloud-apis-and-the-inevitable-lock-in/">Amazon, Netflix, Standard Cloud APIs and the Inevitable Lock-in</a></span>]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">A few weeks ago Adrian Cockcroft (Cloud Architect @ Netflix)  wrote <a href="http://perfcap.blogspot.com/2011/08/i-come-to-use-clouds-not-to-build-them.html"> another very interesting post on his blog</a>. Adrian warms up the discussion  sharing his experience about the reasons for which you may want to use public  cloud services. While there are a lot of people (including myself) sometimes  advocating about these concepts, there isn&#8217;t anything like hearing this first  hand from the people that are actually running a business out of this model. I  like to hear/read Adrian for this reason. It&#8217;s no secret that Netflix uses  Amazon AWS to run their business and this is the second part of Adrian&#8217;s post.  Admittedly the part that intrigued me the most.</p>
<p style="text-align: justify;">The remaining part of his post is basically a public ask (or  hope) to see AWS API compatible clouds (or clones),  possibly built around the OpenStack  stack (no pun intended). He doesn&#8217;t seem to be shy about sharing his pessimism  about OpenStack success (correct me if I am wrong Adrian) but this isn&#8217;t going to be the core of the post I am  writing . Only time will tell who will be successful in doing what.</p>
<p style="text-align: justify;">Going back to Adrian&#8217;s <em>&#8220;ask&#8221;</em> I believe there are a  number of reasons why he would like to see an AWS clone. Again Adrian is welcome to  set the record straight if I got the wrong understanding.</p>
<p style="text-align: justify;">One of the reasons is somewhat logical and it boils down to:  risk mitigation, additional resiliency and problem avoidance. I came to learn  from another <a href="http://blip.tv/datastax/replacing-datacenter-oracle-with-global-apache-cassandra-on-aws-5515987"> very interesting piece by Adrian</a> that Netflix has a number of  policies for backup and data retention. This includes backing up data on S3,  copying them in different AWS availability zones, and eventually replicating  them in different AWS regions. It only makes perfect sense for Netflix to go a  step further duplicating these data at different service providers for an additional  level of risk mitigation. This is after all what this slide was trying to convey  in his interesting pitch (highly recommended if you haven&#8217;t watched it yet):</p>
<p><img src="http://www.it20.info/misc/pictures/Amazon-Netflix-Standard-Cloud-APIsandtheinevitablelock-in1.jpg" border="0" alt="" width="864" height="651" /></p>
<p style="text-align: justify;">I&#8217;d speculate that another good reason for which Adrian would  like to see alternative public clouds based on clones of the AWS APIs is this:  Netflix would like to have choices. Simple. What&#8217;s wrong with that? I wouldn&#8217;t  expect anything less if I was them. Someone  would try to argue that Netflix doesn&#8217;t want to be <em>locked-in</em> into Amazon.  I think the matter is a lot more complex and, in fact, I am not sure I agree  (entirely) with that. I don&#8217;t even know if avoiding a certain level of <em>lock-in</em> is even possible at all anyway (more on this later).</p>
<p style="text-align: justify;">Warning: I am not trying to sell vCloud to Adrian Cockcroft  or anyone else. By the way I believe Adrian knows more about vCloud than I do. .</p>
<p style="text-align: justify;">Having this said this is a hot topic. Adrian&#8217;s blog post  (along with all comments on the thread) reminded me of a couple of old blog  posts I wrote last year. They are <a href="../2010/08/open-standards-open-source-openstack-and-the-tcpip-of-cloud-apis/"> &#8220;Open standards, open source, OpenStack and the TCPIP of Cloud APIs&#8221;</a> and <a href="../2010/09/vsphere-vcloud-and-the-meaning-of-being-open/"> &#8220;vSphere, vCloud and the Meaning of Being Open&#8221;</a> where I was trying  to describe VMware&#8217;s strategy in terms of API standardization and choice of  service providers. This is an oversimplified picture, from one of those blog  posts, that focuses on the point I am trying to make: a common API that works  across different service providers.</p>
<p><img src="http://www.it20.info/misc/pictures/Amazon-Netflix-Standard-Cloud-APIsandtheinevitablelock-in2.jpg" border="0" alt="" width="776" height="401" /></p>
<p style="text-align: justify;">This picture primarily shows access to different service providers using the  same interface but the story doesn&#8217;t stop here. Since vCloud Director is a  product you can buy, you can even build your own private cloud if you want to. I  regularly use, as a consumer of cloud services, a couple of internal labs (that  mimic private clouds) as well as the <a href="../2011/01/my-cloud-consumer-experience-episode-1-the-on-boarding/"> public Stratogen cloud</a> and another public cloud I am piloting with another  big telco in Europe. I do have my choices.</p>
<p style="text-align: justify;">Here I am not specifically talking about the effort of making  the vCloud APIs an industry standard. Lately, I came to the (personal)  conclusion that a <em>standard API</em> is a function of its adoption and not a  function of a theoretical agreement. I  am instead talking about the choice of service providers the vCloud stack would  be able to guarantee to consumers. After all, it&#8217;s one stack instantiated many  times by different organizations (either private or public). I am not sure if it&#8217;s a standard (yet),  certainly it is very consistent. And this is where I can hear you claiming. &#8220;it&#8217;s  a <em>lock-in</em>&#8220;. And this is where I would argue: &#8220;is a certain minimum level of  <em>lock-in</em> avoidable anyway?&#8221;</p>
<p style="text-align: justify;">Let&#8217;s try to get into a bit more details and explore the  options this industry (more particularly consumers and providers of cloud  services) have.</p>
<p style="text-align: justify;"><strong>API lock-in</strong></p>
<p style="text-align: justify;">First of all, what on earth is a <em>lock-in</em>. How do you  define it?  A <em>lock-in</em>, to me at least, is a function of the time it takes  to move to an alternative solution. In the context we are discussing here a  <em>lock-in</em> is a function of how much time and effort it would take to rewrite your  software (for example the Netflix software) to talk to a different cloud interface.  Adrian at some point says it wouldn&#8217;t be (too) difficult for Netflix to do that  but the mere reasons for which he is looking for an AWS clone is telling me he  doesn&#8217;t want to get to that point (my speculation).</p>
<p style="text-align: justify;">At  this point, does it make any difference if  the APIs you are writing your solution against are the vCloud APIs, the AWS APIs  or the future OpenStack native APIs (these are APIs that exposes the OpenStack  personality, not the AWS clone interface). I don&#8217;t think so. Lock-in isn&#8217;t so  much what you are writing against (be it the vCloud APIs, the OpenStack APIs, or  the Amazon AWS APIs), it is rather how difficult it is to move away from it.</p>
<p style="text-align: justify;">At the end of the day, as a consumer, you don&#8217;t have control  on any of those anyway. So it doesn&#8217;t make any difference at all.</p>
<p style="text-align: justify;">If you are a service provider you are pretty much in the same  situation if you intend to use vCloud Director or OpenStack. Unless you decide  to take OpenStack, fork it and do with it whatever you want. In that case it&#8217;s a  different kind of <em>lock-in</em>, and not necessarily a better one. Good luck with  that.</p>
<p style="text-align: justify;">Sure if you are big enough you may be able to contribute to  the main OpenStack project and see what you need / want implemented sooner  rather than later but, frankly, if you are an organization of such a size,  chances are that you have a word on the roadmap of a proprietary product too. I  have seen that first hand.</p>
<p style="text-align: justify;">All in all using available third party software products (be  them vCloud Director or OpenStack) to build clouds has the advantage of allowing consumers to  connect to different service providers. Having this said, if users decide to  consume services from these service providers, they are essentially locking  themselves into that specific interface/API. Whatever that interface is.</p>
<p style="text-align: justify;">I am not getting into the <a href="../2011/02/my-cloud-consumer-experience-%E2%80%93-episode-4-managing-workloads-with-vcloud-connector/"> federation and hybrid cloud discussion</a> here because it would only be useful  to discuss why choosing one interface over the other could be better. Not the  point of this post anyway.</p>
<p style="text-align: justify;"><strong>Service Provider lock-in</strong></p>
<p style="text-align: justify;">The other option to see more openness (or the perception  thereof) would be to keep Amazon AWS as your &#8220;gold  standard&#8221; and pray for other service providers to implement a clone of their  APIs (using OpenStack or any other tool). This is, to me, the worst of both  worlds since both consumers and providers have certainly no control whatsoever  on the AWS APIs (similarly to how you&#8217;d have no control over the vCloud APIs or  the potential OpenStack native APIs). In addition to that you&#8217;d have to deal with  the complexity of creating and consuming APIs whose clone is fundamentally a  reverse engineering hack which will suffer the generic problems of copying  someone else&#8217;s interfaces.</p>
<p style="text-align: justify;">This is especially true when these interfaces are  changing at the speed of light (given the pace Amazon is innovating introducing  new cloud services) and also given the fact that <a href="http://broadcast.oreilly.com/2011/08/the-ec2-api-as-a-defacto-standard.html"> the AWS interfaces appear to be pretty complex to track</a>.</p>
<p style="text-align: justify;">In reality, Adrian was asking for cloning only a subset of the  features provided by AWS but, based on my past experience working for a company  that was trying to be the overlay interface to everything, typically the only  thing that works (somewhat) well across different virtualized platforms and interfaces is  turn on and off virtual machines. I bet Netflix needs something more compelling  than that to consider another service provider that claims to be compatible with  the Amazon APIs. OK I am exaggerating but you see (hopefully) my point. If Amazon was to facilitate this  <em>cloning process</em> or better yet  if Amazon was to provide (read: sell) to service providers its own technology  enablement stack the story would be very different but I don&#8217;t think any service  provider will be successful in implementing an AWS clone if Amazon doesn&#8217;t want  that to happen.</p>
<p style="text-align: justify;">If I was evaluating this option, as a consumer, I would just give up with the  idea of consuming a clone of Amazon&#8230;and I would just consume native Amazon AWS  resources. Sure you are limiting yourself to a single service provider (AWS) but  I think it is better to be <em>locked-in</em> into Amazon than having choices&#8230;  that don&#8217;t work very well. Because, at the end, we all need to be pragmatic  don&#8217;t we?</p>
<p style="text-align: justify;"><strong>Conclusions</strong></p>
<p style="text-align: justify;">In conclusion I just want to reiterate that it&#8217;s just a bet  you are making and you can&#8217;t really avoid a certain level of <em>lock-in</em>.  It&#8217;s just a fact of (IT) life. In the last 15 years I came across a lot of  vendors that were selling openness and freedom of choice. At the end of the day  they were just trying to sell another control point. They don&#8217;t call it a <em> lock-in</em> as it makes the whole sales process a bit harder but it is what it  is.</p>
<p style="text-align: justify;">This post is not meant to bash Amazon or OpenStack. As a  matter of fact I am bashing at least as much vCloud. It&#8217;s just a reality check  of what&#8217;s going on and how I see these things progressing going forward for both  consumers and providers of (IaaS) cloud services.</p>
<p style="text-align: justify;">My message? Make your bet and keep your fingers crossed.</p>
<p style="text-align: justify;">Perhaps I will be proven wrong. Oh well, it&#8217;s just my usual (less than) 2 cents</p>
<p style="text-align: justify;">Massimo.</p>
]]></content:encoded>
			<wfw:commentRss>http://it20.info/2011/09/amazon-netflix-standard-cloud-apis-and-the-inevitable-lock-in/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>vCloud Director 1.0.1: Networking Samples</title>
		<link>http://it20.info/2011/06/vcloud-director-1-0-1-networking-samples/</link>
		<comments>http://it20.info/2011/06/vcloud-director-1-0-1-networking-samples/#comments</comments>
		<pubDate>Wed, 29 Jun 2011 14:50:50 +0000</pubDate>
		<dc:creator>Massimo</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://it20.info/?p=442</guid>
		<description><![CDATA[<p style="text-align: justify;">My old vCloud Director Networking for Dummies post is still going strong according to my blog statistics. I believe this is an indicator that people are looking for more information about this topic so I thought I&#8217;d give it a little bit more color and create a few real life examples on how <span style="color:#777"> . . . &#8594; Read More: <a href="http://it20.info/2011/06/vcloud-director-1-0-1-networking-samples/">vCloud Director 1.0.1: Networking Samples</a></span>]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">My old <a href="../2011/01/2010/09/vcloud-director-networking-for-dummies/"> vCloud Director Networking for Dummies</a> post is still going strong according  to my blog statistics. I believe  this is an indicator that people are looking for more information about this  topic so I thought I&#8217;d give it a little bit more color and create a few real  life examples on how that theory works in practice. I suggest you read the  Networking for Dummies post  linked above before you dive into this one.</p>
<p style="text-align: justify;">Note also that the other post as well as this one are based  on vCloud Director 1.0.1 which is the latest release available as of June 2011.  Things may change in the future so, if the vCD release you are using at the time you  read this is above 1.0.1, chances are that things could be slightly different. I  can&#8217;t really say more than this at this point.</p>
<p style="text-align: justify;">Last but not least, everything I will be doing below can be  done as a cloud consumer in self service mode. As a matter of fact I will be  doing everything as an Org Admin.</p>
<p style="text-align: justify;">
<p style="text-align: justify;"><strong>Introduction</strong></p>
<p style="text-align: justify;">To walk through an actual implementation of the networking  stack I&#8217;ll use my IT20 organization hosted in the <a href="http://www.stratogen.net/products/vmware-hosting.html">Stratogen cloud</a>.  This discussion starts with the description of the networking plumbing in my  vCloud organization. From the vCD UI it looks like this:</p>
<p style="text-align: justify;"><img style="border: 0pt none;" src="http://www.it20.info/misc/pictures/vCD1.0NetworkingSamples1.jpg" border="0" alt="" width="871" height="473" /></p>
<p style="text-align: justify;">From a logical perspective it looks like this:</p>
<p style="text-align: justify;"><img src="http://www.it20.info/misc/pictures/vCD1.0NetworkingSamples2.jpg" border="0" alt="" width="864" height="651" /></p>
<p style="text-align: justify;">My Org has four public Internet addresses that Stratogen  associated to my &#8220;<em>Routed Network</em>&#8221; when they created the tenant. For security reasons I am not going to widely  advertise them in this post.</p>
<p style="text-align: justify;">You can see these assigned addresses if you right-click on the  <em>Routed Network</em> and select &#8220;<em>Configure Services</em>&#8220;:</p>
<p style="text-align: justify;"><img src="http://www.it20.info/misc/pictures/vCD1.0NetworkingSamples3.jpg" border="0" alt="" width="627" height="656" /></p>
<p style="text-align: justify;">The last piece of the puzzle is three vApps I have created  in this Org and that we are going to connect to the various networks you have  seen above. This is supposed to give you a practical idea on how things  can be configured. The names of the vApps should be self-explanatory.</p>
<p style="text-align: justify;"><img src="http://www.it20.info/misc/pictures/vCD1.0NetworkingSamples4.jpg" border="0" alt="" width="871" height="473" /></p>
<p style="text-align: justify;">
<p style="text-align: justify;"><strong>Direct Internet Connection</strong></p>
<p style="text-align: justify;">Let&#8217;s start with the most simple of the networking scenarios. Note there  is a vApp called &#8220;Turnkey_Internet&#8221; which is comprised of a single VM. That VM  is connected to the &#8220;<em>Direct Internet</em>&#8221; connection available in my Org. I have only  one comment for this example: scaring! Never do this because you are in fact  plugging your VM directly into the Internet without any level of protection (other than  what you could have inside the Guest OS of course).</p>
<p style="text-align: justify;">This is how my VM is configured:</p>
<p style="text-align: justify;"><img src="http://www.it20.info/misc/pictures/vCD1.0NetworkingSamples5.jpg" border="0" alt="" width="965" height="771" /></p>
<p style="text-align: justify;">And this is how the VM fits into the logical network view:</p>
<p style="text-align: justify;"><img src="http://www.it20.info/misc/pictures/vCD1.0NetworkingSamples6.jpg" border="0" alt="" width="864" height="651" /></p>
<p style="text-align: justify;">The way this works is pretty straightforward and, if you read  the <a href="../2011/01/2010/09/vcloud-director-networking-for-dummies/"> vCloud Director Networking for Dummies</a> post, it should be explained there.  Basically the cloud administrator has configured a pool of available IP  addresses for this &#8220;<em>External Network</em>&#8221; (since this is a vSphere PortGroup with  native Internet connectivity this pool will contain native Internet IP  addresses). Since the <em>Direct Internet</em> connection in my Org is nothing more than  a pointer to this <em>vCD External Network</em> which in turns is a pointer (with metadata)  to the PortGroup backing it, the result is that the vNIC of  my VM gets  connected directly to this PortGroup. vCD assigns the (vNIC) an IP in the pool.</p>
<p style="text-align: justify;">I am glad Stratogen configured this network for me &#8211; as it is  handy if you are experimenting with vCD networking &#8211; but in a real life scenario  you would never want to connect VMs to a connection like this (directly  connected to the Internet). However this may become pretty interesting if you,  as an Enterprise, are using virtual data centers hosted in a cloud where the  Service Provider has configured an MPLS connectivity back to  your headquarter. Something like this:</p>
<p style="text-align: justify;"><img src="http://www.it20.info/misc/pictures/vCD1.0NetworkingSamples7.jpg" border="0" alt="" width="864" height="651" /></p>
<p style="text-align: justify;">It goes without saying that, doing so, you are effectively dedicating  an <em>External Network</em> (and in turn a PortGroup) to the IT20 Org. If for any  reason you give access to another Org to  the same <em>External Network</em> (either &lt;Direct&gt; or &lt;Routed&gt; &#8211; see next  section) you are essentially giving the other Org access to  the IT20 MPLS network.</p>
<p style="text-align: justify;">
<p style="text-align: justify;"><strong>Routed Network &#8211; single-tier vApp</strong></p>
<p style="text-align: justify;">This is where things start to become more interesting,  slightly more difficult to explain and very reach at the same time. I have  another vApp that is called &#8220;Turnkey-Routed&#8221;. It  contains a single VM which is connected to the <em>Routed Network</em> available in the  IT20 organization. You can imagine this Routed Network as a dedicated layer 2  segment protected by a firewall device (vShield Edge). For more information on  how this work from a vSphere perspective read the <a href="../2011/01/2010/09/vcloud-director-networking-for-dummies/"> vCloud Director Networking for Dummies</a> post. Essentially the VM in this vApp  gets assigned an IP address available in the pool defined for this layer 2  segment. This is how vCD shows the details of the <em>Hardware Properties</em> for  this virtual machine:</p>
<p style="text-align: justify;"><img src="http://www.it20.info/misc/pictures/vCD1.0NetworkingSamples8.jpg" border="0" alt="" width="965" height="729" /></p>
<p style="text-align: justify;">And this is how it logically fits into our diagram:</p>
<p style="text-align: justify;"><img src="http://www.it20.info/misc/pictures/vCD1.0NetworkingSamples9.jpg" border="0" alt="" width="864" height="651" /></p>
<p style="text-align: justify;">Note that in the diagram above we went a couple of steps  forward. Not only we are protecting the VM with the Edge: I have also configured the  Edge to NAT the private IP. To do so I have created a one-to-one mapping rule to one of the four Internet addresses  Stratogen assigned to me. I have also configured a firewall rule to only allow  traffic on port 12320 to reach the VM (this is because the Turnkey appliance  uses particular ports to get access to SSH and web admin interfaces). How did I  do this? Move onto the <em>Routed Network</em> and right-click on <em>Configure Services</em>. Point to the &#8220;<em>External IP  Mapping</em>&#8221; tab and configure the NAT rule:</p>
<p style="text-align: justify;"><img src="http://www.it20.info/misc/pictures/vCD1.0NetworkingSamples10.jpg" border="0" alt="" width="627" height="651" /></p>
<p style="text-align: justify;">You would then point to the &#8220;<em>Firewall</em>&#8221; tab where you can  configure the firewall rule I have described above (as an example).</p>
<p style="text-align: justify;"><img src="http://www.it20.info/misc/pictures/vCD1.0NetworkingSamples11.jpg" border="0" alt="" width="627" height="651" /></p>
<p style="text-align: justify;">I have just blocked all traffic coming into this VM except  for traffic directed to port 12320. As easy as it is.</p>
<p style="text-align: justify;">
<p style="text-align: justify;"><strong>Routed Network &#8211; multi-tier vApp </strong></p>
<p style="text-align: justify;">The single-tier vApp is still pretty simple. Let&#8217;s now focus  on the third vApp I have mentioned. This is the &#8220;2Tiers&#8221; vApp which is comprised of  a front-end Windows VM (Win-Web) and a back-end Linux VM (REHL-DB). The idea is  to provide IT20 customers with access to this application protected by multiple levels of  security. The first step is to connect the front-end to the Routed Network in  the Org and NAT it. This is similar to what we have already done with the single-tier vApp  discussed above. I am not going to show screenshots of the NAT and Firewall  configurations because the steps are very similar. It goes without saying  that the Win-Web VM has a different private IP and I will be using another  public IP to create the DNAT rule. This is how the logical layout looks like for  this specific vApp. I am opening port 80 for this example:</p>
<p style="text-align: justify;"><img src="http://www.it20.info/misc/pictures/vCD1.0NetworkingSamples12.jpg" border="0" alt="" width="864" height="651" /></p>
<p style="text-align: justify;">As you can see the back-end VM is not yet connected to any  network. As I said we want to provide an additional level of security for that  VM and we don&#8217;t want to connect it &#8220;directly&#8221; to the Org network. How do we do  this? This is where the so called &#8220;<em>vApp Networks</em>&#8221; come into place. You can  imagine <em>vApp Networks</em> as layer 2 network segments dedicated (and only available)  to the specific vApp they have been created for. In other words a <em>vApp Network</em> created for one vApp cannot be used by any other vApp. If you want to know more about this  concept please refer again  to the <a href="../2011/01/2010/09/vcloud-director-networking-for-dummies/"> vCloud Director Networking for Dummies</a> post.</p>
<p style="text-align: justify;">You can create <em>vApp Networks</em> in multiple ways but the easiest one is  to click on the &#8220;Add Network&#8221; choice in the drop-down menu for the  vNIC connectivity available in the <em>Hardware Properties</em> of the VM:</p>
<p style="text-align: justify;"><img src="http://www.it20.info/misc/pictures/vCD1.0NetworkingSamples13.jpg" border="0" alt="" width="969" height="231" /></p>
<p style="text-align: justify;">Selecting it kicks off a brief wizard that asks you the very basic  metadata to create a new network (Subnet Mask, Default Gateway, IP Pool etc).  You can then select whether you want to protect this dedicated <em>vApp Network</em> with NAT and Firewall functionalities. You can do this in the <em>Networking</em> tab  when you &#8220;Open&#8221; the vApp:</p>
<p style="text-align: justify;"><img src="http://www.it20.info/misc/pictures/vCD1.0NetworkingSamples14.jpg" border="0" alt="" width="871" height="473" /></p>
<p style="text-align: justify;">Let&#8217;s pause for a second here (too many screenshots to  digest).</p>
<p style="text-align: justify;">Don&#8217;t be fooled. What we are trying to do is to create a logical layout  like the one depicted below:</p>
<p style="text-align: justify;"><img src="http://www.it20.info/misc/pictures/vCD1.0NetworkingSamples15.jpg" border="0" alt="" width="864" height="651" /></p>
<p style="text-align: justify;">In a way we are applying to this <em>vApp Network</em> the same NAT  and Firewall principles that we applied to the <em>Routed Network</em> at the  organization level. Where do you configure these rules for the Edge device that  is backing this <em>vApp Network</em>? Easy. Look at the latest screenshots above and  click <span style="text-decoration: underline;">Details</span>. Done.</p>
<p style="text-align: justify;">This is the tab where you configure the NAT rule so that the  DB private IP gets mapped to the Routed Network in the organization:</p>
<p style="text-align: justify;"><img src="http://www.it20.info/misc/pictures/vCD1.0NetworkingSamples16.jpg" border="0" alt="" width="627" height="651" /></p>
<p style="text-align: justify;">Below is the tab where you configure the Firewall rule to  allow DB traffic only (this rule is just an example):</p>
<p style="text-align: justify;"><img src="http://www.it20.info/misc/pictures/vCD1.0NetworkingSamples17.jpg" border="0" alt="" width="627" height="651" /></p>
<p style="text-align: justify;">
<p style="text-align: justify;"><strong>Conclusions</strong></p>
<p style="text-align: justify;">Let&#8217;s now try to put all these piece together and look  how the logical layout of the workloads running in the organization looks  like as a whole:</p>
<p style="text-align: justify;"><img src="http://www.it20.info/misc/pictures/vCD1.0NetworkingSamples18.jpg" border="0" alt="" width="864" height="651" /></p>
<p style="text-align: justify;">As you can see the self-service networking stack in vCloud  Director is pretty powerful and flexible although  there are certainly things that could (and should) be done better. For  example you may argue there is a lot of NATting going on (and I would have a  problem arguing the opposite). But, as we said, this post is based on the 1.0.1  version of the product and things may change in the future.</p>
<p style="text-align: justify;">Note that we haven&#8217;t covered any example on how to use the  &#8220;<em>Internal Network</em>&#8221; since it should be pretty straightforward. It&#8217;s  basically a flat layer 2 network that doesn&#8217;t go anywhere and only allows VMs  attached to it to communicate to each others.</p>
<p style="text-align: justify;">I hope you found this post useful. I&#8217;d like to get  your feedbacks.</p>
<p style="text-align: justify;">Massimo.</p>
]]></content:encoded>
			<wfw:commentRss>http://it20.info/2011/06/vcloud-director-1-0-1-networking-samples/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>The Cloud and the Sunset of the GHz-based CPU Metric</title>
		<link>http://it20.info/2011/06/the-cloud-and-the-sunset-of-the-ghz-based-cpu-metric/</link>
		<comments>http://it20.info/2011/06/the-cloud-and-the-sunset-of-the-ghz-based-cpu-metric/#comments</comments>
		<pubDate>Fri, 24 Jun 2011 08:47:14 +0000</pubDate>
		<dc:creator>Massimo</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://it20.info/?p=439</guid>
		<description><![CDATA[<p style="text-align: justify;">We have known this for years but it&#8217;s only when you get a slap on your face that you understand what&#8217;s going on for real: the GHz metric is useless these days. I was experimenting with vCloud Director the other day and I was checking out from the catalog my Turnkey Linux Core <span style="color:#777"> . . . &#8594; Read More: <a href="http://it20.info/2011/06/the-cloud-and-the-sunset-of-the-ghz-based-cpu-metric/">The Cloud and the Sunset of the GHz-based CPU Metric</a></span>]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">We have known this for years but it&#8217;s only when you get a  slap on your face that you understand what&#8217;s going on for real: the GHz metric  is useless these days. I was  experimenting with vCloud Director the other day and I was checking out from the  catalog my <a href="http://www.turnkeylinux.org/core">Turnkey Linux Core</a> virtual machine (I use that because it&#8217;s small and I can check it in and out  from the catalog very quickly &#8211; it&#8217;s also a very nice distro!). This  instance was launched in a cloud PoC I have recently started working on for a big  SP and I noted it took quite some to boot, at least more than what it usually takes  which is around 40-60 seconds. Similarly the user experience once booted was not  optimal compared to what I am used to. While I haven&#8217;t done  any serious analysis of the problem, I am going to take a stab at what I believe  it was happening behind the scene.</p>
<p style="text-align: justify;">A little background first. This service provider opted to use some quite old IBM x86 servers to run this  PoC. Since the PoC, for the moment, is focusing on functionalities &#8211; rather than  performance and scaling &#8211; we thought it was ok to use these servers. For the records they  are IBM System x 3850 (8863-Z1S). These are single-core 3.66GHz servers with 4  sockets. Admittedly, pretty old kits.  This is how they show up in vCenter:</p>
<p><img src="http://www.it20.info/misc/pictures/TheCloudandtheSunsetoftheGHz-basedCPUMetric1.jpg" border="0" alt="" width="361" height="269" /></p>
<p style="text-align: justify;">This is technology from 2004/2005 if memory serves me well. Consider that, while they &lt;should&gt; be 64-bit  servers (I&#8217;d need to double check &#8211; can&#8217;t bother) they certainly do not even have the CPU virtualization extensions  &#8211; required in the  latest vSphere releases &#8211; to support 64-bit guest OS&#8217;es. We found this out at  the beginning trying to instantiate a VM of that class. They have been working  fine anyway and are serving our needs pretty well for what we need to test.</p>
<p style="text-align: justify;">Back to the performance issue I was describing now. You  should know that when vCloud Director assigns to an Organization a vDC using the PAYG model, it sets  a certain &#8220;value&#8221; for the vCPU. You can think &#8211; roughly and conceptually &#8211; about  this value as something similar to the AWS ECU (<a href="http://en.wikipedia.org/wiki/Amazon_Elastic_Compute_Cloud#Elastic_compute_units">Elastic  Compute Unit</a>). This is a good thing to do because it provides a mechanism  for the cloud administrator to <em>normalize</em> the capacity of a vCPU. It also  allows the provider to create a mechanism to cap the workload (as you probably  don&#8217;t want a consumer to stuck an entire core). For the records vCD can also  reserve part of that &#8220;speed&#8221; for the VM so that it can guarantee that these  reserved resources are always available. The picture below shows the screen  where you set this value when creating an Organization vDC (these are all the  default values).</p>
<p><img src="http://www.it20.info/misc/pictures/TheCloudandtheSunsetoftheGHz-basedCPUMetric2.jpg" border="0" alt="" width="857" height="645" /></p>
<p style="text-align: justify;">Note that the default &#8220;speed&#8221; value for a vCPU in the PAYG  model is 0,26GHz (or 260MHz if you will). This means that, when you deploy a VM  in this vDC, vCloud Director  configures a limit on the vCPU with that value. I am not sure how Amazon  enforces the ECU on their infrastructure (or if they enforce it at all) but this  is how vCD and vSphere cooperatively do it:</p>
<p><img src="http://www.it20.info/misc/pictures/TheCloudandtheSunsetoftheGHz-basedCPUMetric3.jpg" border="0" alt="" width="697" height="467" /></p>
<p style="text-align: justify;">To the point now. Everybody knows that x86 boxes scaled CPU capacity  exponentially in the last few years. Today, a last generation 4-socket server can  have a ridiculous amount of cores (up to 80). That&#8217;s one dimension of the scalability Intel and AMD have  achieved. Another dimension is that the core itself has gone through some very profound  technology enhancements and got better and  better. Let&#8217;s try to do some math and find out how much better.</p>
<p style="text-align: justify;">To do this I am not going to do a scientific comparison (I  wish I had the time). I am going to quickly leverage a couple of benchmarks to  find out the different efficiency between the old and the new cores. I am  going to use the TPC-C benchmark &#8211; which is a simulated OLTP workload &#8211; that may not  be always relevant but it&#8217;s known to be CPU bound &#8211; although it does require a  couple of hundreds thousands of disk spindles to not be bottleneck on the disk  subsystem (which means: don&#8217;t bother trying it at home). Long story short I took  a TPC-C benchmark of an IBM server equipped with the same CPUs that we are using in  this cloud PoC and I compared it to a benchmark of one of the IBM servers that  supports the  latest generation of Intel Xeon processors:</p>
<p style="text-align: justify;"><a href="http://www.tpc.org/tpcc/results/tpcc_result_detail.asp?id=105042102">Old Benchmark</a>:      150,000 tpmC (4 sockets, 4 cores,  3.66Ghz)</p>
<p style="text-align: justify;"><a href="http://www.tpc.org/tpcc/results/tpcc_result_detail.asp?id=110111601"> New Benchmark</a>:   2,300,000 tpmC (4 sockets, 32 cores, 2.26Ghz)</p>
<p style="text-align: justify;">We are not interested in the metric (tpmC = transactions  per minute C-workload) in absolute terms because we are using this metric just to  compare the CPUs. So for the two systems the math would (more or less) look like this:</p>
<ul style="text-align: justify;">
<li>Old server: 150K transactions on 4 cores makes roughly  	38000K  transactions per 3.66Ghz core which means roughly <strong><span style="text-decoration: underline;">10 transactions per MHz</span></strong></li>
<li>New server: 2.3M transactions on a 32 cores make 72K transactions  per 2.26Ghz core which means roughly <strong><span style="text-decoration: underline;">32 transactions per MHz</span></strong></li>
</ul>
<p style="text-align: justify;">I didn&#8217;t have time to triangulate with more benchmarks so  will stick with this one and we will claim that a single MHz of a new core is  worth about three MHz of the old core we are using in the PoC.</p>
<p style="text-align: justify;">Now I guess you have an idea why talking about MHz is  meaningless at this point. I guess you also see why assigning &#8220;<em>260MHz</em>&#8221; to  the CPU tells half of the story (the other half being.. ok but of which core?). Yet there still are a lot of people out there that  think that a 3Ghz processor is faster than a 2.26GHz processors. I believe you  also have an idea now why Amazon and VMware introduced these different metrics:  it&#8217;s basically a way for the provider of resources to normalize the actual capacity  of the CPUs underneath to overcome the variance we have seen above). My initial  performance problem was in fact solved raising the value of the &#8220;vCPU speed&#8221; in  vCloud Director: I assigned more GHz to the vCPU to off-set the poor quality of  the core.</p>
<p style="text-align: justify;">Let me change gear here now. What we have discussed so far is fine when you are  dealing with VMs since you can easily use  a technique to buffer this variance (the &#8220;vCPU speed&#8221; or the Amazon &#8220;ECU&#8221;). However this becomes a little bit trickier when you  start dealing with <em>virtual data center</em> capacity. How do you normalize that? The easiest  (and more user-friendly) way to do this is to expose directly the capacity  expressed in terms of GHz, which is what vCloud Director does today when configuring  Organization vDCs in <em>reservation</em> or <em>allocation</em> mode.</p>
<p><img src="http://www.it20.info/misc/pictures/TheCloudandtheSunsetoftheGHz-basedCPUMetric4.jpg" border="0" alt="" width="691" height="451" /></p>
<p style="text-align: justify;">So what do we do? We all agree that 10GHz is no longer  meaningful but what is the other option? You may argue that in a cloud  environment you shouldn&#8217;t bother about the low level hardware implementation  details because the whole purpose of cloud is to hide them right? On the other  hand we are talking about IaaS type of cloud here so a much higher level metric  such as &#8220;application response time&#8221; wouldn&#8217;t be applicable as vCloud Director  doesn&#8217;t really manage the middleware and application part of the stack; that  would be out of its control.</p>
<p style="text-align: justify;">GHz may sound like the right thing to expose when you are  providing virtual hardware capacity in an IaaS cloud but yet the metric would  need to be consistent across different providers (and we have seen this may not  be the case if different providers are using different hardware technologies).  An option would be to try to normalize this value similarly to how the CPU in  the VM gets normalized. Sure but how? With which metric? In the VM based model  you can expose a very well known metric / object: the vCPU). In that case you  can pass onto the consumer the key to decrypt the amount of compute capacity of  that object <a href="http://aws.amazon.com/ec2/instance-types/">similarly to how  Amazon does it with ECU</a> : &#8220;<em>One EC2 Compute Unit provides the equivalent  CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor. This is also  the equivalent to an early-2006 1.7 GHz Xeon processor</em>&#8220;. The keyword here  being &#8220;equivalent&#8221;. This means you can use any CPU technology you want, someone  will tweak a parameter so that your performance experience will always be the  same. Which is hat I have done above to fix my performance problem in the PoC  for example.</p>
<p style="text-align: justify;">The vCloud Director challenge is slightly different than the  Amazon challenge though in the sense that vCD is a technology that enables  service providers to stand up cloud backbones quickly and efficiently&#8230; whereas  Amazon is indeed a Service Provider. So coordination and standardization for  them isn&#8217;t an issue (as a matter of fact ECU doesn&#8217;t really have any  industry-wide meaning outside of the context of the AWS platform).</p>
<p style="text-align: justify;">Similarly to how the industry is looking for standard APIs to  consume cloud resources across different providers, I believe there is a need to  standardize the metrics that describe the capacity that those resources can  deliver. Can this metric be an industry standard benchmark like TPC-C (for  example)? Or should it be more like a brand new synthetic value that combines a  number of benchmarks covering a wider range of workload patterns? Or should it  just be a normalized GHz number? Ironically enough this problem can only be  worse in a PaaS context because it is supposed to be playing at a level of the  stack where the hardware infrastructure is completely hidden (and exposing GHz  wouldn&#8217;t make any sense at all). However PaaS doesn&#8217;t expand its reach to a  level where higher level of metrics (such as application response time) can be  used because the application code falls into the consumer responsibility and not  in the PaaS provider set of responsibilities. Which means you could have a PaaS  layer that &#8220;screams&#8221; but an application layered on top of it that is a piece of  junk (performance wise).</p>
<p style="text-align: justify;">I&#8217;d like to point out also that you may consider having an  end-to-end governance of your entire stack where you can monitor high-level  metrics for the services / applications and you let the &#8220;governance system&#8221; deal  with the monitoring and capacity planning of the virtual hardware and all other  layers above it. While I admit this would be a desirable state we are not quite  there at the moment.</p>
<p style="text-align: justify;">However, if you think at the separation of duties and roles  that this multi-layer cloud stack brings in &#8211; a stack made of many different  services interfaces each of which has a provider and a consumer &#8211; we also need  to make sure each of these interfaces has a way to be measured consistently. In  other words, it may not be a human being having to deal with the measurement of  these IaaS metrics, it may be a &#8220;governance system&#8221; that automatically does that  for the human being, but yet we need to instrument these interfaces so that the  &#8220;governance system&#8221; can deal with them.</p>
<p style="text-align: justify;">Imagine for example a situations where a SaaS provider may be  the consumer of external IaaS resources, this end-to-end monitoring becomes  difficult to achieve hence the need to create more detailed SLA metrics between  the various layers and their interfaces in the stack. In this specific example  how is the SaaS provider subscribing IaaS resources? How are these virtual  hardware resources going to be measured, monitored and enforced from a  performance perspective by the two parties when the two parties are separate  entities with separate duties? How do you define those boundaries from an SLAs  perspective? That&#8217;s what we are debating here.</p>
<p style="text-align: justify;">To GHz or not to GHz? That is the problem. All in all, the  GHz based capacity planning and monitoring is dead. However it seems we are  still flooded with IT tools that are leveraging it pretty heavily.</p>
<p style="text-align: justify;">I&#8217;d like to hear what you think and if you have any opinion  on how to address this problem.</p>
<p style="text-align: justify;">Massimo.</p>
]]></content:encoded>
			<wfw:commentRss>http://it20.info/2011/06/the-cloud-and-the-sunset-of-the-ghz-based-cpu-metric/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>The Italian Elections and the Case for Cloudburst</title>
		<link>http://it20.info/2011/06/the-italian-elections-and-the-case-for-cloudburst/</link>
		<comments>http://it20.info/2011/06/the-italian-elections-and-the-case-for-cloudburst/#comments</comments>
		<pubDate>Thu, 02 Jun 2011 12:11:32 +0000</pubDate>
		<dc:creator>Massimo</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://it20.info/?p=428</guid>
		<description><![CDATA[<p style="text-align: justify;">A few days ago we had a big election day in Italy for renewing a good part of the public local administration. For and in itself this wasn&#8217;t a big deal and something that wouldn&#8217;t have generated a lot of attention among the 60M people living here. However, without getting into a lot <span style="color:#777"> . . . &#8594; Read More: <a href="http://it20.info/2011/06/the-italian-elections-and-the-case-for-cloudburst/">The Italian Elections and the Case for Cloudburst</a></span>]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">A few days ago we had a big election day in Italy for renewing a good part of the public local administration. For and in itself this wasn&#8217;t a big deal and something that wouldn&#8217;t have generated a lot of attention among the 60M people living here. However, without getting into a lot of details, suffice to say that this turned into yet another &#8220;do you like Mr. Berlusconi? Yes or No?&#8221; type of referendum. And this of course generated a lot of curiosity around the results. So what do most of the people working in an office do at 3PM when they close the voting? They connect to their favorite on-line news web sites and have a look at the exit polls statistics. And I am no different: killed by curiosity, I opened another tab on my browser and quickly pointed it to &#8220;<a href="http://corriere.it/">http://corriere.it</a>&#8221; the internet home of the most important Italian newspaper: <em>Il Corriere della Sera</em>. What does this have to do with cloudburst you may be wondering? Well it does have to do with cloudburst because, after waiting a minute or so on the &#8220;<em>waiting for corriere.it</em>&#8220;, this is what I was able to get (red emphasis and sketched question mark is mine):</p>
<p style="text-align: justify;"><img src="http://www.it20.info/misc/pictures/TheItalianElectionsandtheCaseforCloudburst1.jpg" border="0" alt="" width="845" height="401" /></p>
<p style="text-align: justify;">How disappointed?! By the way what I have experienced personally is not anything new. It happens very often and all the times &#8220;something interesting&#8221; happens. <a href="http://www.datacenterknowledge.com/archives/2009/06/25/michael-jackson-news-slows-web-sites/">Look at what happened when Michael Jackson passed away</a> for example.</p>
<p style="text-align: justify;">Now, I know that there have been a lot of <a href="http://www.rationalsurvivability.com/blog/?p=3016">bashing regarding the concept of being able to &#8220;cloudburst&#8221; into the cloud</a>. I believe this is due to the fact people tend to jump to extreme use cases. If you associate the term cloudburst to what it actually means in a non-IT world then yes you may think about a particular use case where your IT infrastructure may need to react in nano-seconds to an unanticipated, and somewhat catastrophic, event that may last 30 seconds or so. From <a href="http://en.wikipedia.org/wiki/Cloudburst">Wikipedia</a>:</p>
<p style="text-align: justify;"><em>&#8220;Cloudbursts descend from very high clouds, sometimes with tops above 15 kilometers&#8230;Meteorologists say the rain from a cloudburst is usually of the shower type with a fall rate equal to or greater than 100mm (3.94 inches) per hour&#8230; During a cloudburst, more than 2 cm of rain may fall in a few minutes. When there are instances of cloudbursts, the results can be disastrous.&#8221; </em></p>
<p style="text-align: justify;">In a world where people think that the Facebook and Google infrastructures are the norm, it doesn&#8217;t surprise me that someone may also think that that&#8217;s the only cloudburst scenario (and no, I am not referring to Chris Hoff here, in fact I think he is one of the most realistic people I follow on twitter).</p>
<p style="text-align: justify;">With the &#8220;Google and Facebook are the norm&#8221; mind-set, of course you are mentally led to think that an IT cloudburst is a fully automated process where your application can immediately react upon a sudden spike of connections and it goes out, automagically, deploying on-the-fly new instances of the application, modifying load balancer configurations to grab new traffic immediately. Oh, and of course it wouldn&#8217;t be a &#8220;real&#8221; IT cloudburst if, after a few sub-seconds of idle time, all of these extra resources are decommissioned, automagically, and everything returns back to normal. Yeah, dream on. I bet you are saying it&#8217;s &#8220;marketing bollocks&#8221;! Of course it is!</p>
<p style="text-align: justify;">So, if that is the meaning you are associating to the term, I agree that all this IT cloudburst talking in the industry is, most of the time, just a bunch of marketing stuff (for the moment at least). Certainly I am not selling the idea that your site could go from 300 front end servers to 1,400 in a matter of milliseconds and contract back to 300 after a 18 seconds surge. All automagically. I am not that stupid.</p>
<p style="text-align: justify;">My concept of cloudburst is a little bit different. And practical. I am a simple guy and a pragmatic person. I don&#8217;t want to re-architect applications for failure or create a chaos-monkey programs that kill production workloads to test their resiliency. Call me an old-school IT boy, but I just want to be able to see the exit polls on <a href="http://corriere.it/">http://corriere.it</a> next time.</p>
<p style="text-align: justify;">There are a couple of concepts regarding cloudburst that we should all consider. They are the reaction time and whether the triggering event is anticipated or not. The extreme example above obviously assume a near real-time reaction triggered by an unexpected event. What I have in mind is a much longer lead time to allow you to scale the resources associated to an event you can anticipate. Something like&#8230;. an <em>election day</em> for example (or a programmed marketing campaign for that matter or anything that come to your mind that has those two characteristics)!</p>
<p style="text-align: justify;">No, we are not talking about doing this with &#8220;a spacecraft equipped with a warp drive that may travel at velocities greater than that of light by many orders of magnitude, while circumventing the relativistic problem of time dilation&#8221;. We are not talking about micro-seconds response times. We are not even talking about a 4M$ titanic orchestrator product (that happens to come with an 8M$ bill worth of professional services to implement it). I am talking about deploying (yes even manually! How old school am I?) a few additional web servers in the cloud to cope with the anticipated demand so that I can look at my damn exit polls!</p>
<p style="text-align: justify;">Let me state it very clear. I have no idea about the back-end infrastructure that <em>Il Corriere della Sera</em> is running. The only thing I know is that they are running Apache on Linux according to <a href="http://uptime.netcraft.com/up/graph/?host=www.corriere.it">netcraft.com</a>. I can also imagine that they are using a traditional SQL backend database (SQL Server? Oracle? MySQL? DB2? Who knows!?).</p>
<p style="text-align: justify;">I am not even sure whether they are a VMware customer (but one can always hope). Let me speculate they have 6 load balanced web servers (just off the top of my head). They could be 3 or 9, it doesn&#8217;t make any difference when they are exhausted anyway! Whether they are virtual or physical I don&#8217;t care; does it make a difference in the end? I don&#8217;t think so. An exhausted set of virtual resources produce the same result compared to an exhausted set of physical resources in the end: a browser error! Last but not least I&#8217;d speculate that, given the nature of the on-line service they offer, the presentation tier (web along with any application logic that goes with it) is the bottleneck rather than the backend data repository or the network. So let&#8217;s say their deployment looks similar to this:</p>
<p><img src="http://www.it20.info/misc/pictures/TheItalianElectionsandtheCaseforCloudburst2.jpg" border="0" alt="" width="535" height="597" /></p>
<p style="text-align: justify;">So, if my assumptions are correct (yet to be demonstrated), why not <em>&#8220;cloudbursting&#8221;</em>? Not in the extreme Star Trek sense above. But rather in a more pragmatic sense where you could ideally sign up with a public IaaS on-line service provider to get access to Pay-As-You-Go resources and double your front end-access just before you need it. Ideally this cloudburst should happen in a public cloud because the whole idea of this extra spike is that it is going to use resources that you don&#8217;t have in house. Why not? Well, because most customers (if not all) out there are not Google nor Facebook and they <em>may not</em> have 40 spare servers in house at any point in time for capacity overflow! These customers may not have a critical mass of resources in house to deal with these peaks? You are not Google? You are no one! Come on, wake up folks! Life in the real data centers is not as fun as in the Google and Facebook</p>
<p style="text-align: justify;">What I have in mind to make <a href="http://corriere.it/">http://corriere.it</a> more scalable is actually fairly simple and doesn&#8217;t require them to turn into a new Google. While the scenario below could easily be implemented using different technologies (such as for example an on-premise physical deployment extended to AWS virtual servers) I am going to describe, at a high level, what you&#8217;d need to do if you were a VMware customer with a vSphere deployment on premise extending to a vCloud public Service Provider. I am just more familiar with this stack, that&#8217;s why I am describing it:</p>
<ul style="text-align: justify;">
<li>A few days upfront &#8220;the event&#8221; you subscribe with a vCloud Service Provider. This entitles you to access public resources with a PAYG model. <a href="../2011/02/2011/01/my-cloud-consumer-experience-episode-1-the-on-boarding/">It shouldn&#8217;t take more than 104 minutes</a>.</li>
<li>Depending on the nature of the connections and related security you need to have, you can ask the provider to setup a VPN between your remote virtual data center and your own on-premise vSphere deployment. <a href="../2011/03/vshield-products-packaging-explained-with-a-focus-on-vcloud-director/">For more background on how this could be done you can read this</a>. This may be required if the web/application servers need to connect to a back-end database that is not reachable from outside the organization firewall (very likely).</li>
<li>You can then deploy new web/application server instances from vCenter; this may be as easy as a clone of an existing web/application instance or it may require slightly more manual work like if you were to start from a generic OS template . This isn&#8217;t very different from what you&#8217;d need to do if you were to deploy a new on-premise instance of the same web/application server.</li>
<li>When you are done with that additional virtual server deployments you can then easily either <a href="../2011/02/2011/01/my-cloud-consumer-experience-%E2%80%93-episode-3-moving-vsphere-workloads-into-the-cloud/">manually export/import these instances from the on-premise vSphere deployment into your remote virtual data center</a>, or you can use <a href="../2011/02/my-cloud-consumer-experience-%E2%80%93-episode-4-managing-workloads-with-vcloud-connector/">vCloud Connector to move these workloads if you are a GUI aficionados</a>. Moving stuff around compatible infrastructures is obviously a huge plus in this case as it simplify a lot of the work. This may not be the case if your source is a physical environment and/or your target is AWS.</li>
<li>Last but not least you need to reconfigure the load balancer to include these new front-end instances to make them part of the <a href="http://corriere.it/">http://corriere.it</a> site.</li>
<li>When &#8220;the event&#8221; is gone and the traffic is notably back to normal you can decide to reconfigure the load balancer and decommission these additional front-end instances in your remote virtual data center in the public cloud to avoid incurring into additional charges due to the PAYG model.</li>
</ul>
<p style="text-align: justify;">This is not rocket science. You can even try to optimize the process above a little bit so that if you have 3 or 4 or 7 of these &#8220;events&#8221; in a year you can commission and decommission these workloads a little bit more efficiently. This can be done either manually or <a href="http://blogs.vmware.com/vcloud/2011/04/vmware-hosting-with-php-and-the-vcloud-api.html">with a little bit of simple scripting</a>, still not having to spend a 4M$ tax for a &#8220;cloud&#8221; orchestrator (i.e. shooting a fly with a bazooka). Same thing for the load balancing. No I am not talking about a &#8220;global super fancy&#8221; load balancer that can provide workload balancing across sites with built-in DR algorithms, locality optimizations and the like. I am talking about the same load balancer you used to use that is now also pointing to the remote web servers via the established VPN tunnel. Sexy? Not at all but remember we are not talking about optimizing the datacenter and/or the network. We are just trying to fix a problem that is server capacity exhaustion. If your problem is network related (latency and bandwidth) then you may want to do something else (perhaps using a global super fancy load balancer, why not?). So what I am suggesting is to extend the infrastructure like this:</p>
<p style="text-align: justify;"><img src="http://www.it20.info/misc/pictures/TheItalianElectionsandtheCaseforCloudburst3.jpg" border="0" alt="" width="1080" height="597" /></p>
<p style="text-align: justify;">Note we have always been talking about events that can be anticipated and that can give to <em>Il Corriere della Sera</em> IT administrators the lead time required to provision new cloud resources. The same concept could apply for unanticipated events provided that the duration of &#8220;the event&#8221; is long enough to make the provisioning worth it. If an unanticipated big event generates insane amount of traffic for a few days (and you have a clean and structured semi-automated provisioning methodology) you may think about cloudbursting. If, on the other hand, an unanticipated event generates a lot of traffic for about 36 hours (and your provisioning is going to be manual with some lengthy customization work) it may not make a lot of sense to cloudburst. Another picture to fix the concept in your mind (hopefully).</p>
<p><img src="http://www.it20.info/misc/pictures/TheItalianElectionsandtheCaseforCloudburst4.jpg" border="0" alt="" width="1101" height="516" /></p>
<p style="text-align: justify;">Is this the sexy sub-second type of IT cloudburst most Google-minded people think about? Probably not! But, hell, at least I will be able to see the exit polls next time. This is in fact a relatively low investment, of money and time, to produce (potentially) a significantly better on-line service. But if you are in the business of over-engineering things you may perhaps disagree with my point of view.</p>
<p style="text-align: justify;">My next action item is to go and talk to <em>Il Corriere della Sera</em> now to see if this makes any sense to them! Which is the only thing that matters at this point!</p>
<p style="text-align: justify;">Massimo.</p>
]]></content:encoded>
			<wfw:commentRss>http://it20.info/2011/06/the-italian-elections-and-the-case-for-cloudburst/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>TCP-clouds, UDP-clouds, &#8220;design for fail&#8221; and AWS</title>
		<link>http://it20.info/2011/04/tcp-clouds-udp-clouds-design-for-fail-and-aws/</link>
		<comments>http://it20.info/2011/04/tcp-clouds-udp-clouds-design-for-fail-and-aws/#comments</comments>
		<pubDate>Wed, 27 Apr 2011 14:05:21 +0000</pubDate>
		<dc:creator>Massimo</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://it20.info/?p=419</guid>
		<description><![CDATA[<p style="text-align: justify;">An entire Amazon AWS Region was recently down for four days. Everyone has got to blog something about it and this is my attempt. Just as a warning: this post may be highly controversial.</p> <p style="text-align: justify;">There has been a litany of tweets pontificating how applications on AWS should be deployed in a <span style="color:#777"> . . . &#8594; Read More: <a href="http://it20.info/2011/04/tcp-clouds-udp-clouds-design-for-fail-and-aws/">TCP-clouds, UDP-clouds, &#8220;design for fail&#8221; and AWS</a></span>]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">An entire Amazon AWS Region <a href="http://www.theregister.co.uk/2011/04/21/amazon_web_services_outages_spans_zones/">was  recently down for  four days</a>. Everyone has got to  blog something about it and this is my attempt. Just as a warning: this post may be  highly controversial.</p>
<p style="text-align: justify;">There has been a litany of tweets pontificating how  applications on AWS should be deployed in a certain way to achieve the maximum  level of availability and how applications need to be &#8220;re-architected&#8221;  to properly fit into the new <em>cloud paradigm</em>. Basically the idea is that your  application should be thought, designed, architected, developed and deployed  with failure in mind. Many call it &#8220;<em>design for fail</em>&#8220;. That is to say:  software architects and developers should never assume that any given piece of  the infrastructure is reliable.</p>
<p style="text-align: justify;">I beg to differ. I don&#8217;t like this idea even though  some of you will be thinking I am a bit archaic.</p>
<p style="text-align: justify;"><a href="http://twitter.com/#%21/GeorgeReese">George Reese</a> wrote a great blog post titled <a href="http://broadcast.oreilly.com/2011/04/the-aws-outage-the-clouds-shining-moment.html"> The AWS Outage: The Cloud&#8217;s Shining Moment</a> outlining the differences between  the &#8220;<em>design for fail</em>&#8221; model and the &#8220;<em>traditional</em>&#8221; model. The  traditional model, among other things, has high-availability and DR  characteristics built right into the infrastructure and these features are  typically application-agnostic (a couple of years ago I wrote <a href="http://it20.info/misc/storagearchitecturesforvirtualization.htm">a big  document on the various alternatives for HA and DR of virtual infrastructures</a> if you are interested). George nailed down the story very well and the story is that there are a couple  of different philosophies at play here. I don&#8217;t call these two models &#8220;<em>design  for fail</em>&#8221; and &#8220;<em>traditional</em>&#8221; though. I call them <strong><em> TCP-clouds</em></strong> and <strong><em>UDP-clouds</em></strong>. Let&#8217;s look at a summary of the characteristics of  these two protocols.</p>
<p style="text-align: justify;"><img src="http://www.it20.info/misc/pictures/TCP-clouds-UDP-clouds-design-for-fail-and-AWS1.jpg" border="0" alt="" width="554" height="421" /></p>
<p style="text-align: justify;">In the context of cloud resiliency this is what that means:</p>
<p style="text-align: justify;"><img src="http://www.it20.info/misc/pictures/TCP-clouds-UDP-clouds-design-for-fail-and-AWS2.jpg" border="0" alt="" width="690" height="523" /></p>
<p style="text-align: justify;">AWS uses a UDP-cloud model because it doesn&#8217;t guarantee reliability at the infrastructure level. AWS  essentially offers an efficient distributed computing platform that doesn&#8217;t have any  built-in high availability services. The notion of <em>Availability Zones</em> and  <em>Regions</em> is often misunderstood since the name may imply there is high availability built  into the EC2 service. That&#8217;s not the case: AWS suggests to deploy in multiple  Availability Zones simply to avoid concurrent failures. It&#8217;s mere statistic. In  other words, if you deploy your application in a given Availability Zone, there  is nothing that will &#8220;fail it over&#8221; to another Availability Zone as part of the  AWS service (<a href="https://aws.amazon.com/rds/">RDS</a> is a vertical example that does that  for MySQL but I am instead talking  about an application-agnostic service that does that for every application  regardless of the nature).</p>
<p style="text-align: justify;">Since I am not able at the moment to write a structured  thought around this complex matter, let me write down mixed and random thoughts,  opinions and questions to try to make you think. I am giving you some food for  thoughts. As far as answers, call me when you find them please.</p>
<p style="text-align: justify;">
<p style="text-align: justify;"><strong>Isn&#8217;t this &#8220;design for fail&#8221; theory a step back? </strong></p>
<p style="text-align: justify;">What we have seen in the last decade was a trend where we  were able to remove the <em> <a href="http://en.wikipedia.org/wiki/Non-Functional_Requirements">non-functional requirements</a></em> complexity from within the traditional OS and put them down into the &#8220;virtual  infrastructure&#8221; (arguably the backbone of any IaaS cloud). This is the  point I was trying to come across during this <a href="../2008/09/plagiarism-did-paul-maritz-steal-my-pitch-for-the-vmworld-2008-keynote/"> VMworld 2007 breakout session</a> 4 years ago. And what we are saying now is that  we should put that logic back into the application (not even the Guest OS)? I thought the trend I have  just described was quite successful and one of the many reasons of the success  of virtualization deployments. Are we now questioning it?  My idea is fairly simple although I am open to be challenged:  developers focus on <a href="http://en.wikipedia.org/wiki/Functional_requirement">functional requirements</a>, IT focuses on  <a href="http://en.wikipedia.org/wiki/Non-Functional_Requirements">non-functional  requirements</a> (which includes resiliency and reliability among other aspects). If interested, you  can download the full deck <a href="../?attachment_id=48">here</a>.  Note I did that presentation before joining VMware so, if you think I am biased,  well I am biased just because I bought into that school of thought long before I was on the  VMware&#8217;s payroll system.</p>
<p style="text-align: justify;">
<p style="text-align: justify;"><strong>Excuse me? What did you say? NoSQL&#8230; to whom? </strong></p>
<p style="text-align: justify;">In his post George suggested exploring NoSQL solutions. Not a  bad idea however, other than the risk of losing transactions that he was mentioning,  I&#8217;d say 95% of the customers I have been working with so far would look at me  strangely and they&#8217;d ask: &#8220;what do you e x a c t l y mean by NoSQL? Is it a bad  word?&#8221;. Let&#8217;s be honest folks: this is not mainstream. If we want to create a  cloud for an elite of people I am fine with that. However I am convinced one of the key values of  an IaaS infrastructure is, among others, providing a cloud-like experience  (pay-as-you-go, elasticity, etc) to traditional workloads. I am not  philosophically against the idea of re-architecting applications, however I am  also convinced that, for one person thinking about writing a brand new Ruby application  for a UDP-cloud leveraging NoSQL (pardon me?)&#8230; there are at least 1.000 poor sysadmins trying to figure out how to live  with their traditional applications.</p>
<p style="text-align: justify;">
<p style="text-align: justify;"><strong>Can you afford a personal Chaos Monkey? </strong></p>
<p style="text-align: justify;">Some of the AWS customers developed tools to test the  resiliency of their applications. Do you remember the old good HA and DR plans?   IT people would walk into the server room to power-off servers and  eventually the entire datacenter to simulate a failure and see if their HA and DR policies were working  properly. If everything was good applications could survive the failure (more or less)  transparently. This is what a <a href="http://www.readwriteweb.com/cloud/2010/12/chaos-monkey-how-netflix-uses.php">Chaos Monkey</a> tool does, but with a different  perspective: these are software programs that are designed to break things  randomly (on  purpose) in order to see if the application itself is robust enough to survive  those artificially created infrastructure issues in the cloud. In a TCP-cloud it  would be the cloud provider to run traditional tests to make sure the  infrastructure could self-recover. In a UDP-cloud it is the developer to run  these Chaos Monkey tests to make sure the application could self-recover since it&#8217;s been &#8220;<em>designed  for fail</em>&#8220;. Now, my take is that if you are Netflix or the like of Nasa  and JPMorgan (these two are just examples of big organizations &#8211; not even sure  if they are on Amazon) then  you may have enough motivation and business reasons to re-architect your  application for a UDP-Cloud and create your own Chaos Monkey to test your &#8220;<em>design  for fail</em>&#8221; deployment. Certainly <a href="http://www.readwriteweb.com/cloud/2010/12/chaos-monkey-how-netflix-uses.php">at Netflix they know what they are doing</a> and in fact they seem to not have  been impacted by this AWS outage. But if you are <a href="https://forums.aws.amazon.com/thread.jspa?threadID=65649&amp;tstart=0"> these guys</a> do you think you have bandwidth, knowledge and time to  re-architect the application and test it for failure? That AWS forum discussion  showed up during the 4 days debacle and it deserves a proper copy and paste  just in case it gets lost:</p>
<p style="text-align: justify;"><em>&lt; Sorry, I could not get through in any other way. We are  a monitoring company and are monitoring hundreds of cardiac patients at home. We  were unable to see their ECG signals since 21st of April.</em></p>
<p style="text-align: justify;"><em>&gt; Man mission critical systems should never be ran in the  cloud. Just because AWS is HIPPA certified doesn&#8217;t mean it won&#8217;t go down for 48+  hours in a row.</em></p>
<p style="text-align: justify;"><em>&lt; Well, it is supposed to be reliable&#8230;Anyway, I am  begging anyone from Amazon team to contact us directly.</em></p>
<p style="text-align: justify;">This is shocking isn&#8217;t it? Try to argue with them about NoSQL  and &#8220;<em>design for fail</em>&#8220;. They barely probably understand the notion of Availability Zones and Regions.  Don&#8217;t get me wrong. It&#8217;s not these people&#8217;s fault. They are not in the business  to re-architect an application to be written with reliability in mind,  they are in the business of helping their patients. Sure you can argue with them  that it was their fault if they failed. But the net of this story is that they  are not going to re-architect anything nor write a Chaos Monkey. When they  realize what happened, they will look for a TCP-Cloud.</p>
<p style="text-align: justify;">
<p style="text-align: justify;"><strong>Design for fail: philosophy or necessity? </strong></p>
<p style="text-align: justify;">I hope you&#8217;ve got at least to this point because this is my  biggest struggle at the moment. The more I read about suggestions to design  applications for fail the more I miss whether these suggestions are tactical  or strategic. In other words, are you suggesting to design for fail simply because that&#8217;s the  way Amazon AWS works today (but you&#8217;d rather use an Amazon TCP-cloud if that was  available)? Or are you suggesting that, in any case, you should design an  application for fail because you are happy to deal with a UDP-cloud and  that&#8217;s how every cloud should behave? Are we saying that it&#8217;s strategically and  philosophically better to have developers deal with application high  availability and disaster tolerance because that&#8217;s what makes sense to do? Or  are we saying we need to do this because that&#8217;s the only option we have on  Amazon AWS (today) and there is no other choice? I know it may sound like a  rhetoric question but it&#8217;s actually not. Perhaps we need both models?</p>
<p style="text-align: justify;">
<p style="text-align: justify;"><strong>You don&#8217;t like the noise coming from the other apartments?  Buy the entire building! </strong></p>
<p style="text-align: justify;">This isn&#8217;t related to the outage and the resiliency of the  cloud but it relates to the  overall TCP-cloud Vs UDP-cloud discussion. Similar to the &#8220;<em>design for fail</em>&#8221;  there is the &#8220;<em>deploy for performance</em>&#8221; thread going on. In a multi-tenant environment (a  must-have to achieve economy of scale and elasticity) there is obviously  contention of resources. In an ideal world I&#8217;d like to be able to buy virtual  capacity for what I need and have a certain level of guarantee that that  capacity (or at least a contracted part of it) is always available for me. There  are of course circumstances where I can trade-off performance and availability  of capacity for a lower cost, but there are other situations where I cannot trade  that off. A TCP-cloud should (ideally) be able to deliver that guarantee. A UDP-cloud works  in best-effort mode and typically leverages statistical law to fight contention.  This is the statistical assumption: <em> not all users running on a shared infrastructure will be pushing like hell at  the same time (one would hope &#8211; finger  crossed)</em>.</p>
<p style="text-align: justify;">So what do you have to do if you are running on a UDP-cloud? <a href="http://perfcap.blogspot.com/2011/03/understanding-and-using-amazon-ebs.html"> You keep the other people out of your garden</a>.</p>
<p style="text-align: justify;">I think <a href="http://twitter.com/adrianco">Adrian</a> is a  genius but I don&#8217;t agree with his point of view :</p>
<p style="text-align: justify;"><em>&#8220;&#8230;you cannot control who you are sharing with and some  of the time you will be impacted by the other tenants, increasing variance  within each EC2 instance. You can minimize the variance by running on the  biggest instance type, e.g. m1.xlarge, or m2.4xlarge. In this case there isn&#8217;t  room for another big tenant, so you get as much as possible of the disk space  and network bandwidth to yourself.&#8221; </em></p>
<p style="text-align: justify;"><em>&#8220;&#8230;busy client can slow down other clients that share the  same EBS service resources. EBS volumes are between 1GB and 1TB in size. If you  allocate a 1TB volume, you reduce the amount of multi-tenant sharing that is  going on for the resources you use, and you get more consistent performance.  Netflix uses this technique, our high traffic EBS volumes are mostly 1TB,  although we don&#8217;t need that much space.&#8221;</em></p>
<p style="text-align: justify;"><em>&#8220;If you ever see public benchmarks of AWS that only use  m1.small, they are useless, it shows that the people running the benchmark  either didn&#8217;t know what they were doing or are deliberately trying to make some  other system look better.&#8221;</em></p>
<p>The last sentence is like saying that, if you buy a new apartment and then complain about the  big noise coming from other apartments, it&#8217;s your fault: you should have bought  the entire building and enjoyed the silence! Hell Adrian, I say no! There must  be a better way.</p>
<p style="text-align: justify;">I think there must be rules in place to keep the noise at an  acceptable level and if there is someone trying to scream all the time someone  should &#8220;enforce&#8221; silence without having you to buy an entire building to cook  and sleep in peace. That&#8217;s how it works in real life, that&#8217;s how it should work  in the cloud. In my opinion at least.</p>
<p style="text-align: justify;">In cloud terms I&#8217;d be ok if what I was buying always delivers  a contracted baseline as a guarantee and then can burst (I said burst  <a href="http://twitter.com/beaker">Beaker</a>,  not cloudburst) to higher throughput if there isn&#8217;t contention. What I would NOT  be ok with is no baseline at all so what I get is no predictable performance  all times. BTW note that Amazon made a step forward in the right direction a few weeks ago announcing  the availability of what they call <a href="https://aws.amazon.com/dedicated-instances/">dedicated instances</a>. This is an  attempt to solve the  <a href="http://alan.blog-city.com/has_amazon_ec2_become_over_subscribed.htm"> noisy neighbors</a> problem. However in doing so they did  trade off multi-tenancy (hence the higher cost of such a service).</p>
<p style="text-align: justify;">For the records I have to say that I don&#8217;t think there is a  single public cloud at the moment delivering such a fine grained QoS across all  subsystems on rented resources. This is a  generic discussion about TCP-clouds and UDP-clouds and if you interpreted it like a  vCloud Vs AWS shootout you are mistaken. In fact I think George gave vCloud  too much credit in his blog associating it to the &#8220;traditional&#8221; datacenter  model. There is a gap between what we can deliver, in terms of non-functional  requirements, with a raw vSphere deployments and what we can deliver with a  vCloud Director 1.x implementation. I am not hiding this by any means, in fact <a href="http://blogs.vmware.com/vcloud/2010/05/public-cloud-adoption-curve-is-history-repeating-v2.html"> you can read here (the post but more importantly the comments)</a> what I had to  say about this. Having this said <span style="text-decoration: underline;">I believe</span> VMware has a vision to fill  that gap and create a true TCP-cloud. Last but not least I don&#8217;t see why a  VMware service provider partner shouldn&#8217;t be able to implement a vCloud-powered  UDP-cloud if need be.</p>
<p style="text-align: justify;">
<p style="text-align: justify;"><strong>PaaS and Design for fail?</strong></p>
<p style="text-align: justify;">If I struggle with IaaS clouds (and I do), go figure with PaaS  clouds. To me PaaS is all about moving the level of abstraction at a higher level.  IaaS is all about hiding infrastructure details. PaaS is all about hiding  infrastructure and middleware details. In a PaaS you can upload your WAR file  and that&#8217;s it. It&#8217;s the PaaS cloud provider that is going to deal with the complexity  of setting up, managing and maintaining the middleware stack that can interpret  that WAR file (for example). Fundamentally the developer should focus (even more  than with IaaS) on the functional requirements of the application and let the  cloud provider deal with the non-functional requirements aspect of it. Last time  I checked HA and DR were still part of the non-.functional requirements domain. Note  that, ironically, it may be easier for a PaaS cloud  provider to build out-of-the-box resiliency given the nature of the interfaces they  are exposing. Amazon is half way through that already with their <a href="https://aws.amazon.com/rds/">RDS</a> &#8220;My-SQL as a service&#8221;: they already offer  automatic failover across Availability Zones and they would just need to extend this failover  support across regions (this would have helped with the recent failure by the  way).  So, if my theory is sound, that means that if  you are architecting your application for PaaS you shouldn&#8217;t design for fail.  Upload your WARs, create a db instance on the fly and you are done. The cloud  provider will figure out how to failover to the next server, to the next  datacenter room or to another geography should a problem occur at any of the given levels.</p>
<p style="text-align: justify;">
<p style="text-align: justify;"><strong>So why isn&#8217;t Amazon offering resiliency and reliability  as part of their cloud services in the end?</strong></p>
<p style="text-align: justify;">After all they offer other non-functional requirements such  as automatic scaling of applications through tools such as  <a href="https://aws.amazon.com/autoscaling/">Autoscaling</a>. So why  would Amazon offer auto-scale services and shouldn&#8217;t offer an automatic,  agnostic, infrastructure-level recovery service across Availability Zones (or  even better across Regions)? Guess what. It is at least two order of magnitude easier to instantiate a new web server and add  an IP to a load balancer than implementing a (reasonably performant) backend  traditional database that can  geographically fail over without losing transactions in case of a disaster.  Dealing with stateless objects is a piece of cake. Try to deal with statefull  objects if you can.</p>
<p style="text-align: justify;">I am sure Amazon doesn&#8217;t  think that dealing with autoscaling is something the cloud should do for  developers whereas dealing  with reliability and DR is something a developer should do on his/her own. What do you think?  My speculation is that they are simply not there yet. As easy as it sounds. But don&#8217;t be fooled. Amazon is full of smart people and  I think they are looking into this as we speak. While we are suggesting (to an  elite of programmers) to design for fail, they are thinking how to auto-recovery  their infrastructure from a failure (for the masses). I bet we will see more failure recovery across  AZs and Regions  type of services in one form or another from AWS. I believe they want to  implement a TCP-cloud in the long run since the UDP-cloud is not going to serve  the majority of the users out there. Mark my words. I&#8217;ll have to link to this  blog post once this happens and I&#8217;ll have to say &#8220;I told you&#8221; (I hate  this). And that is only going to be a good thing because  developers will start again to focus on functionalities and <span style="text-decoration: line-through;">IT</span> the cloud will continue to focus on making sure those functionalities are (highly) available.</p>
<p style="text-align: justify;">
<p style="text-align: justify;">As I said, just food for thoughts. If you find definitive answers, please let me know.</p>
<p style="text-align: justify;">Last but not least this is a good time to remind the disclosure of my blog  (courtesy of a big copy and paste from the <a href="http://twitter.com/samj">Sam Johnston</a>&#8216;s  blog): &#8220;The views expressed on  these pages are mine alone and not (necessarily) those of any current, future or  former client or employer. As I reserve the right to review my position based on  future evidence, they may not even reflect my own views by the time you read  them. Protip: If in doubt, ask.&#8221;</p>
<p style="text-align: justify;">Massimo.</p>
]]></content:encoded>
			<wfw:commentRss>http://it20.info/2011/04/tcp-clouds-udp-clouds-design-for-fail-and-aws/feed/</wfw:commentRss>
		<slash:comments>27</slash:comments>
		</item>
		<item>
		<title>The 93.000 Firewall Rules Problem and Why Cloud is Not Just Orchestration</title>
		<link>http://it20.info/2011/03/the-93-000-firewall-rules-problem-and-why-cloud-is-not-just-orchestration/</link>
		<comments>http://it20.info/2011/03/the-93-000-firewall-rules-problem-and-why-cloud-is-not-just-orchestration/#comments</comments>
		<pubDate>Wed, 30 Mar 2011 20:14:19 +0000</pubDate>
		<dc:creator>Massimo</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://it20.info/?p=414</guid>
		<description><![CDATA[<p style="text-align: justify;">A few days ago I was in a very interesting meeting with a big Service Provider in Europe and I heard a lot of interesting comments. I&#8217;d like to quote the best that I heard which was &#8220;Oh a portal? Oh not another one&#8230; we have many of them already!&#8221; but this will <span style="color:#777"> . . . &#8594; Read More: <a href="http://it20.info/2011/03/the-93-000-firewall-rules-problem-and-why-cloud-is-not-just-orchestration/">The 93.000 Firewall Rules Problem and Why Cloud is Not Just Orchestration</a></span>]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">A few days ago I was in a very interesting meeting with a big  Service Provider in Europe and I heard a lot of interesting  comments. I&#8217;d like to  quote the best that I heard which was &#8220;<em>Oh a portal? Oh not another one&#8230; we  have many of them already!&#8221;</em> but this will open up a different can of worms  so I am not going to talk about this now. What I am going  to talk about relates to another comment someone made in the middle of the meeting which was &#8220;<em>&#8230;there  is a firewall with 93.000 rules configured</em>&#8220;.</p>
<p style="text-align: justify;">I can&#8217;t say to be a security expert by any stretch,  however they sound a lot to me. This was confirmed by someone with a lot of  background in this area saying that &#8220;<em>&#8230; they are a lot but the   record is a Cisco device (somewhere on  this earth) with 750.000 rules</em>&#8220;. Suddenly someone  else jumped into the discussion asking &#8220;<em>&#8230;and what happens when you fat  finger rule #457.986?</em>&#8220;. I thought this was a joke (however I am not sure).</p>
<p style="text-align: justify;">Before we make any step further, let&#8217;s try to dump, in a picture,  the layout of this scenario (at a very high level):</p>
<p><img src="http://www.it20.info/misc/pictures/the93kfirewallrulesproblemandwhycloudisnotjustorchestration1.jpg" border="0" alt="" width="864" height="651" /></p>
<p style="text-align: justify;">Basically the idea, pretty common these days, is that you have a multi-tenant virtual infrastructure  with a number of VMs running on top of it. These VMs belong to different customers and, by means of standard layer 2 segregations (VLANs if you  will), you keep them separate. The big (BIG) firewall at the bottom of the  picture is the one that is holding the 93.000 rules that govern how these  workloads talk to each others. By the way this doesn&#8217;t appear  obvious in the picture but each customer could (and will!) have more than  one single VLAN because that&#8217;s how it works in this world (see  below). So 93.000 firewall  rules is just the tip of the  iceberg&#8230; there are other problems these Service Providers are dealing with  which are, for example, the sprawl of VLANs  &#8211; along with all sort of  issues associated with that.</p>
<p style="text-align: justify;">So why is this a problem for an IaaS cloud? I think there are  at least a couple of dimensions to this problem.</p>
<p style="text-align: justify;"><strong>Manageability, serviceability and  scalability</strong></p>
<p style="text-align: justify;">The first dimension relates to &#8220;how on earth can you deal  with such a beast?&#8221;. How do you manage this firewall but, even more importantly, how  do you troubleshoot it? That&#8217;s why I am not sure that the person that referred  to the &#8220;fat finger&#8221; problem was really joking. Again, my background is not  security so bear with me and please advice where I am missing  something. However, whenever I mention situations  like these to people that do have a  security background their typical reaction is:</p>
<ol style="text-align: justify;">
<li>they laugh first&#8230;.</li>
<li>&#8230;and scratch their head then.</li>
</ol>
<p style="text-align: justify;">So there must be something wrong somewhere, I  think.</p>
<p style="text-align: justify;">For sake of clarity, I am not bashing  the firewall administrator that  configured 93.000 in that box. I think that the problem is how networks  (and related security) have been working until now and the  associated &#8220;best practices&#8221; we built in the last 10 years.  One could write a book on this but, in a nutshell, the way it works is that, to  secure <em>&#8220;services&#8221;,</em> you need to create layer two domains (aka VLANs) that  you connect by means of a firewall. Depending on what you need you may have to  create subnet-based rules and/or IP-based  rules.  Take this approach and apply it to a Service Provider with thousands of  customers each with a certain amount of &#8220;<em>services</em>&#8221;  deployed, and before  you realize what&#8217;s going on you get to thousands of firewall rules  in a blink of an eye.</p>
<p style="text-align: justify;">
<p style="text-align: justify;"><strong>End-user self-service</strong></p>
<p style="text-align: justify;">The other dimension of  the problems we are discussing strictly pertains to  self-service, a key concept of paramount importance  in all cloud related discussions. This is a pattern I have seen over and over again at every  single Service Provider I have met so far: the usage of a central monolithic  firewall to serve multiple different tenants doesn&#8217;t allow  the SP to create  (easily) a self-service experience for the user. Why? Simply because the more  complex and more critical the object (whose functionalities you want to expose  to the end-user)  becomes, the more complex and critical the tool that mediates its access needs to be.  You could solve this problem by using a  dedicated physical firewall per each of the customers the SP is hosting. That would reduce the  complexity and the criticality to a level for which the effort of the SP would  be as low as telling the customer &#8220;<em>Here is how to access the device as  root</em>&#8220;. Between the lines you could read &#8220;<em>Screw it up and only your own  organization will be screwed up, I don&#8217;t care</em>&#8220;. It sounds great but this  isn&#8217;t very scalable nor manageable obviously. Do you deploy a new physical  firewall every time you get a new customer? Not the promise of cloud I&#8217;d  say if cloud is really about agility, scalability, pay-per-use and the list of  attributes goes on. These attributes have, in fact, very little to do with the  option of deploying a new physical device on-the-fly when needed.</p>
<p style="text-align: justify;">So what did all these SPs do when they stood up their so  called&#8230; &#8220;<em>clouds</em>&#8220;? They created a portal  (probably one of those many we were talking about at the beginning) where they  gave some self-service capabilities to do basic and simple stuff  (such as VMs provisioning) and they implemented a  ticket system for more advanced stuff (such as creating network security rules for the workloads  they were provisioning). Not very different from how you&#8217;d do it with a  traditional hosting solution you may think. Well that&#8217;s one of the reasons many  people refer to this practice as &#8220;lipstick on a pig&#8221;  (i.e. take a hosting solution, put a <em>cloud</em> label on it and sell it as if it was a <em>cloud</em>).</p>
<p style="text-align: justify;">
<p style="text-align: justify;"><strong>The role of the orchestrator </strong></p>
<p style="text-align: justify;">I always say that orchestration is not cloud  but cloud needs orchestration. Will orchestration alone help solving the  problems we are discussing here? I don&#8217;t personally think so. I see orchestrators  more like tools that are supposed to solve operational  issues (especially at the level of scaling a cloud  infrastructure requires) not like tools that can fix broken  architectures. If you take a stone and clean it, it  doesn&#8217;t become a gold nugget automagically. It becomes a cleaned stone. Same  thing goes for cloud. If you take a &#8220;<em>junk architecture</em>&#8221; and you  orchestrate it, does it become a &#8220;<em>great architecture</em>&#8220;? No, it becomes an  &#8220;<em>orchestrated junk architecture</em>&#8220;. Better than having to deal with it  manually&#8230; but still &#8220;<em>junk</em>&#8220;.</p>
<p style="text-align: justify;">Don&#8217;t get me wrong, I do think that  orchestration is key and you can&#8217;t have a cloud without (at least a certain  degree of) orchestration. However don&#8217;t think that a properly architected cloud  is just your &#8220;legacy&#8221; stuff with an additional kilo of orchestration workflows  and a nice new portal (&#8220;<em>Oh a portal? Oh not another one&#8230; we have  many of them already!</em>&#8220;)<em>.</em></p>
<p style="text-align: justify;">
<p style="text-align: justify;"><strong>Is there a way  out? </strong></p>
<p style="text-align: justify;">Yes there is (I think). I believe there is a  shared feeling in the industry, at this point, that an architecture as shown in  the picture below is the way to go forward. So what is that vFW (aka <em>virtual  Firewall</em>) below? <a href="../2011/03/vshield-products-packaging-explained-with-a-focus-on-vcloud-director/"> At VMware we call it vShield Edge</a>. Other vendors may call it differently.  Other vendors don&#8217;t have anything like this today (so expect some level of  bashing from their sales rep in the field) but they may end-up having it down  the road (expect some level of embarassement from the same sales rep that bashed  this approach in the past). We started shipping vShield Edge less than a year  ago but we have seen a huge number of people experimenting with an approach like  this for years. Just recently I have met another SP that said that 2 years ago  they started looking into something like this using virtual appliances from  Vyatta. Just recently <a href="../2011/02/vcloud-the-morphing-channel-behavior-and-neural-circuits/"> I wrote about a small business partner getting into the &#8220;cloud&#8221; from a provider  perspective and using the same model/architecture </a>without anyone telling  them this was &#8220;the right&#8221; model: they figured this out themselves based on the  challenges they were dealing with! And if this isn&#8217;t enough to convince you that  there is a trend here, look at what <a href="http://aws.typepad.com/aws/2011/03/new-approach-amazon-ec2-networking.html"> Amazon has started to pitch a couple of weeks ago</a>.</p>
<p><img src="http://www.it20.info/misc/pictures/the93kfirewallrulesproblemandwhycloudisnotjustorchestration2.jpg" border="0" alt="" width="864" height="651" /></p>
<p style="text-align: justify;">So what&#8217;s so neat about this model? The idea is pretty simple:  instead of using a monolithic physical firewall outside of the virtual  infrastructure domain, you can deploy different virtualization-aware firewalls  that are essentially backing the same VLAN(s) but do that in a more flexible and  agile way. Other than simplifying the complexity of a single object  configuration (the &#8220;<em>93.000 rules</em>&#8221; problem) you also gain easy  self-service through administration delegation. As we have said at the beginning  it is difficult to get controlled access to a shared device. However if you  create a virtual device that is only supposed to &#8220;rule&#8221; access to given VLANs  dedicated to a customer&#8230; you can easily delegate full access for that virtual  device to that specific customer. <a href="../2010/09/vcloud-director-networking-for-dummies/">This  is at at the core of the vCloud Director self-service capabilities</a>. In many  cases you&#8217;d still want to have the traditional physical device for data center  level protection against external attacks and advanced firewall features that  these virtual firewall may be missing today. However the complexity of its  configuration would be drastically reduced because the <em>workloads security  rules</em> would be managed directly on the virtual firewall devices.</p>
<p style="text-align: justify;">
<p style="text-align: justify;"><strong>Can we do even better? </strong></p>
<p style="text-align: justify;">We could do something better, yes! What we  have been talking about so far is, basically, all about keeping the very same  number of VLANs and firewall rules.. and spread these rules across virtual  firewalls. This solves a lot of problems when it comes to self-service for  example (delegation of the entire device) and scalability (just deploy another  virtual appliance when there is a new customer) but it doesn&#8217;t really solve  itself the problem of VLAN sprawl and the 93.000 firewall rules (although they  are now segmented in different and dedicated security domains per each  customer). VMware has other technologies that may help to address these other  problems.</p>
<p style="text-align: justify;">The first one is called vCloud Director  Network Isolation (vCDNI) in vCloud parlance or vShield PortGroup Isolation  (PGI) in vShield parlance. It&#8217;s, basically, a technology that allows you to <em> virtualize a VLAN</em>. This allows different customers to be assigned dedicated  vDS PortGroups that represent separate layer 2 domains&#8230; yet sharing the same  VLAN ID. We use a technique called MAC-in-MAC to implement this. Kamau just  posted a very interesting blog on how this works. <a href="http://www.borgcube.com/blogs/2011/03/vcd-network-isolation-vcdni/">You  can read more here</a> if you are interested. This technology is already  available and fully integrated in vCloud Director so you can use it today if you  want to.</p>
<p style="text-align: justify;">There is another elegant method to solve the  VLAN sprawl problem and, more specifically, the proliferation of rules you have  to create in the firewall(s). This can be achieved with another vShield  technology called vShield App. Think of vShield App as a vDS port-based firewall  where you can say &#8220;<em>this vNic can talk to this other vNic over this particular  port</em>&#8220;. The vNics in question are connected to the same vDS PortGroup (i.e in  essence one single layer 2 domain). So imagine having a single network segment  where you can create rules that mimic the deployment of a DMZ, an Application  security zone, a Database security zone, etc etc. Instead of using three VLANs  (in this example) you could use one and have this segmentation happening at the  vDS layer via vShield App rules. The cool thing about App, in my opinion at  least, is that it supports both the typical 5-tuple firewall rules as well as it  works with traditional vSphere constructs such as datacenters, clusters,  resource pools and things like that. So that you can say that all VMs that are  in this &#8220;container&#8221; can only communicate with VMs that are in this other  &#8220;container&#8221; over a specific port. This way you can change IPs, add/remove VMs  from the containers and the security policies will still apply simplifying and  reducing the &#8220;<em>93.000 rules problem</em>&#8220;. For sake of clarity this vShield  technology (App) isn&#8217;t integrated (today) with vCloud Director but I hope you  see a trend here.</p>
<p style="text-align: justify;">Now imagine combining vCDNI with vShield App.  You could &#8211; potentially &#8211; use one single VLAN to support multiple tenants, and  within each &#8220;virtual VLAN&#8221; you can create rules that represent multiple security  zones effectively mimicking DMZ&#8217;s, back-end&#8217;s etc.</p>
<p style="text-align: justify;">
<p style="text-align: justify;"><strong>Conclusions</strong></p>
<p style="text-align: justify;">While I focused a lot on the products I am working with at the  moment, the message that I wanted to pass along with this post is that the  current network security model seems to be broken, in a big way. Especially if  you think about it in the scope of cloud-like deployments where agility and  self-service are big mantras. There are alternative architectures that are  proving to be better in this context and there is a range of products that can  implement that new architecture. I mentioned vShield and vCloud Director but you  can use other products if you want&#8230; as long as you fix that junk! The other  point I was trying to make in this post is that orchestration itself cannot fix  a bad architecture and these two topics (architecture and orchestration) should  really be considered two separate workstreams when you design your cloud  infrastructure. Once again, orchestration is not the means by which you can fix  a bad architecture layout.</p>
<p style="text-align: justify;">Now I talk like if I knew what I was saying. Funny.</p>
<p style="text-align: justify;">Massimo.</p>
<p style="text-align: justify;">
]]></content:encoded>
			<wfw:commentRss>http://it20.info/2011/03/the-93-000-firewall-rules-problem-and-why-cloud-is-not-just-orchestration/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
	</channel>
</rss>

