|
|
-
In this article, I'd like to document a setup I have been working on for a few days at the LSI office in Milano (great guys and free beverage there! Thanks!). LSI is the company from which IBM OEMs the DS3000, DS4000 and DS5000 lines of storage servers. Since I am trying to get a little bit more into the storage and network subsystems I wanted to spend a few days playing with those kits. I have concentrated on today's hot topic of Disaster Recovery and particularly the integration of LSI RVM (Remote Volume Mirroring) into the VMware SRM (Site Recovery Manager). I have to admit that I am not a storage guru, nor I have looked too much into SRM, so most of the stuff you will find here might be pretty basic. This is clearly not an advanced read for the likes of Duncan Epping, nor for those that go to bed with the VMware vmkfstools CLI or "talk UUID." (I guess Duncan will get what I mean.) Yet it's intended to provide a bit of background about what happens behind the scenes (the "scenes" would be the GUIs of the various products involved in this case). The SRM part is really focused on the storage integration which was the thing I was most interested in for this 2-days storage marathon. I like to treat these articles as a sort of personal log / documentation of what I have done (for future reference) so it will certainly serve me in the long run. Hopefully it will be of use for some of you, too.
Last but not least while the bar on the right of your browser might suggest this is a long post... consider that it's full of screenshots! So without further adieu, let's get started.
Basic Remote Mirror Setup
This part doesn't involve any specific SRM concept in action. It's just meant to describe the basic infrastructure setup (both logical and physical) as well as the way the storage replicates and how the VMware hosts deal with replicated LUNs. It is important to understand what happens at a lower level in order to move on and plug SRM on top of this. The picture below outlines how the logical layout of the infrastructure looks (including SRM):
For completeness, the following picture describes how the physical infrastructure looks instead:
As the picture outlines, the Virtual Center VMs in both sites also host the SRM service. Depending on the scale of your project you might want to have dedicated virtual machines to host the SRM instances or even dedicated physical servers. Milano, in our lab scenario, is the primary site while Roma is the DR site. As you can imagine, LUNs need to be replicated from the DS4700 in Milano onto the DS4800 in Roma. LSI calls this storage feature RVM (Remote Volume Mirroring) and it's essentially an advanced function that allows you to keep a copy of your LUNs on a remote storage server.
Notice that the DS4700 is a storage server that includes into a single 3U package both the controllers (A and B) as well as the first string of disks (more can be attached through FC ports on the rear). On the other hand, the DS4800 has a 4U "head" unit that hosts the controllers but doesn't include any disk in the base chassis. They can be added with external expansions (as in the picture above). You might guess that the 4800 is a more powerful machine than the DS4700 and that, in a real life scenario, you might want to have that situation inverted. Your guessing is correct but for the sake of the tests this wasn't interesting since we weren't looking for ultimate performance. Also consider that any DS4xxx type of storage is "replication compatible" both ways with any other DS4xxx type of storage. And even DS5xxx!
Note: Other than the standard zoning so that each of the servers with two HBAs can see each of the two controllers on the storage array, please consider that for the RVM feature to work all controllers need to be connected in a certain way. Specifically for this scenario the last FC port of ControllerA on the DS4700 needs to be connected to the last FC port of ControllerA on the DS4800. Same zoning process for ControllerB. Without this extra SAN configuration RVM would not work. And no, having a single switch per site is not a best practice - you would need two in a real life environment.
The storage configuration (a summary of it) is described in the pictures below. Basically the DS4700 in Milano has a couple of LUNs that are dedicated to the local cluster and that do not replicate (these are VC-MILANO and SERVICE-MILANO). These LUNs host the Virtual Center instance as well as a Windows template. There are other LUNs (SRM-1-MILANO, SRM-2-MILANO, SRM-3-MILANO and SRM-4-MILANO) that are replicated onto the DS4800 in Roma. A simple synchronous mirroring configuration has been established.
The way you set this up is that you first create companion LUNs on the target: they need to be at least as big as the source LUNs, or bigger if you want.
Through the LSI Storage Manager (SANtricity) you then select the source LUN and you mirror it onto the remote storage: a list of DSxxx storage devices with the mirroring feature enabled is shown, as well as a list of compatible companion LUNs for each device. The DS4800 does not mask the replicated LUNs to the cluster in Roma. This means that the hosts in the cluster have no idea whatsoever that there are LUNs on that array that are in sync with the cluster in Milano. In our lab we have manually created SRM-1-ROMA, SRM-2-ROMA, SRM-3-ROMA, SRM-4-ROMA on the DS4800 (as you can see in the picture above) and then we went through the steps described to create the mirror.
Now that the replication is in place, the first test we did at the storage infrastructure level was to create a snapshot of a replicated LUN. From the Storage Manager we created a snapshot of SRM-1-ROMA leaving the mirror link between SRM-1-MILANO and SRM-1-ROMA in place as the picture below suggests:
This is how you would read the above picture: SRM-1-ROMA is a replica of a LUN coming from another storage server. As such it's in a read-only state (in fact you don't want to write onto it since it's continuously being updated by its master LUN on a remote storage). However, we took a snapshot of that R/O LUN at a certain point in time and we called it Snap-SRM-1-ROMA-1. This LUN is now enabled for R/W so it could be fully used as a point in time copy of an R/O LUN under replication.
The next step was then to manually map this snapshot to the cluster in Roma so the servers would be able to recognize it:
And this is when the "fun" begins.
************* Background information that you need to understand and be familiar with before you move on *********************************
There are two key parameters that rule how an ESX host deals with the LUNs:
It took me a while to digest them (and right now I think I am halfway to it), but essentially the DisallowSnapshotLUN (when active, which is the default) instructs the ESX host NOT to import the VMware Datastore if it recognizes it's a snapshot of an existing LUN. When the parameter is turned off to False the ESX host is allowed to import the snapshot as a VMware Datastore without modifying its original name or its UUID.
The first parameter (when active, which is NOT the default) instructs the ESX host to resign the LUN and import it into the ESX host as a new VMware Datastore (which gets labeled snap-xxxxxx-<Original Datastore Name>) with a new UUID. When this parameter is turned on, the DisallowSnapshotLUN value is irrelevant as the LUN gets resigned right away and imported as a new Datastore.
These parameters get very important (and very critical) when you are dealing with snapshots and clone on the same storage server and you try to give the original ESX hosts visibility of these new spaces. For example, if you try to expose to a given host/cluster the original LUN as well as its snapshot without resigning it, you might incur potential data loss and inconsistency as the host/cluster will only make one of these two entities available (they are in fact essentially the same thing: same Datastore name, same UUID). When you are dealing with a remote copy of the LUN(s), this becomes a less important issue because you are basically importing a snapshot (or a mirror) into a different set of ESX hosts.
This should be enough for a dummy (like myself), but if you want to get into deeper details about these two parameters and the UUID thing I suggest you read one of Duncan's best articles as well as this post from Chad.
************************************************************************************************************************************************************
If you are now familiar with the background above you should guess what happens. Mapping the snapshot Snap-SRM-1-ROMA-1 to the cluster in Roma forced the ESX hosts to recognize the LUN after the rescan:
Since we left the parameters above at their defaults (EnableResignature=0, DisallowSnapshotLUN=1), the LUN doesn't show up as a VMware Datastore on any of the hosts in Roma:
This is the desired behavior since the hosts recognize this is a LUN that is coming from a different storage subsystem (so with a sort of "incompatible" UUID). As a matter of fact, you can manually add a brand new Datastore and the LUN above is showed as available space for a new VMFS file system (which we didn't create as we didn't want to destroy the content):
At this point we changed the DisallowSnapshotLUN parameter to 0 (that setting should read "Allow Snapshot to be imported"):
After this change (which doesn't require a reboot of the host), the hypervisor imports the VMware Datastore simply after a rescan of the HBAs:
11.JPG)
Similarly, by changing the EnableResignature parameter to 1 and rescanning the HBAs, the Datastore gets imported with a new UUID and a new name as you can see from the picture below:
12.JPG)
What I have described above (at a very high level) are basically the steps you would need to implement in order to manually deal with a DR procedure. SRM does that under the covers along with a number of other things, such as reconfiguring the VMs on the DR site (alternatively you would have to manually add them to the DR cluster after importing the Datastores). It's a common misconception that VMware SRM is a layer of additional technologies on top of what VI3 already provides (SRM today is not compatible with vSphere, but it should be soon). I think a better way to describe what SRM does is that it's a method to code all the actions you would have to manually implement in order to either test or run a DR Recovery Plan. Many refer to SRM as a "binary coded DR runbook." There is nothing that you can't do if you don't have SRM. But having SRM might save you time... and some risks (manual DR procedures might be error prone).
Site Recovery Manager Setup (Test the Recovery Plan)
In this section, we are going to essentially automate the manual process above by means of a DR orchestrator (in this case, it is called VMware Site Recovery Manager). This article is not intended to be a detailed description of the capabilities of SRM nor a step-by-step guide to its configuration. We will assume from now on the reader has a basic understanding of the product. Before we get into the details it is important to describe the virtual environments (guest OSes) we created in the production site. Notice that there are additional VMs that we have used to host a number of infrastructure services (such as the Virtual Center servers themselves). These VMs generally would be either hosted on external physical hardware or would not be subject to any SRM DR plan anyway. We will focus on what we pretend to be "production VMs" in our lab test. From this perspective we have essentially created three VMs (Web1, Web2, Web3) that we mapped into the 4 LUNs described above. (SRM-1-MILANO, SRM-2-MILANO, SRM-3-MILANO and SRM-4-MILANO) The following picture outlines the mappings.
-
Web1 has two VMDK files associated to it. One is on the srm-1 VMware Datastore (which in turn is on the SRM-1-MILANO LUN) and another one is on the srm-2.
-
Web2 has one single VMDK file associated to it which is on the srm-2 Datastore.
-
Web3 is a bit more tricky. It has a VMDK on srm-3 and it also has an RDM (Raw Device Mapping) onto the SRM-4-MILANO LUN. Notice this LUN doesn't have an srm-4 Datastore associated because it's raw. Since the RDM mapping is set to virtual, Web3 has a VMDK pointer (on srm3) to the SRM-4-MILANO raw LUN.
It is of paramount importance to understand how all the VMs interact with the Datastores / LUNs because there might be some consistency dependencies that SRM will have to deal with. In fact, once we have installed SRM as well as the LSI SRA (Storage Replication Adapter), this is what the "Configure Array Managers" window displays:
Have you noticed how the various LUNs get grouped together? The first group includes the srm-3 Datastore as well as the SRM-4-MILANO because there is a virtual RDM mapping from a VMDK file on srm-3 onto the fourth LUN. So they are somewhat dependent.
Similarly, there is another group that includes both srm-1 and srm-2. And that's because there are interdependencies as you can depict from the picture with the layout of the VM disk configuration: Web1 is dependent on the first and on the second LUN so they need to be treated as a single Protection Group (you can't split them, as this would split the VM configuration and this wouldn't maintain data consistency!). However, now that you have to treat srm-1 and srm-2 as a single Datastore Group, SRM realizes what the other dependencies are. In fact, Web1 is not the only VM that is hosted (partially) on srm-2: Web2 is hosted on srm-2 and it must be included in the very same Protection Group. This is what you would see from a GUI perspective when selecting this Datastore Group :
When you select the Datastore or the Datastore Group. SRM automatically displays the VMs that are dependent on that Datastore or those Datastores. That's a read only field. Notice you can't select either srm-1 or srm-2: they are a single entity for SRM.
What we did from here is simple. We created two Protection Groups on the SRM instance hosted on the production site (Milano). These PGs build on top of the srm-1 / srm-2 Datastore Group and the srm-3 Datastore (which includes the RDM on the fourth LUN). Subsequently, we created a Recovery Plan on the DR site (Roma) which contains the failover instructions for these two Protection Groups. That's it.
Our production site is now protected. What we need to do is "Test" our Recovery Plan. One of the advantages of SRM is that it has a built-in intelligence to simulate a DR. Obviously this process is not (and should not be) disruptive: you want to keep the replica of the LUNs in place as well not shutting down the VMs in production to run this test. How do I do so? It's easy. Let's push the Test button on the SRM GUI and go through the plan.
The trick here is that you want to create a dedicated environment (from a storage and network perspective) that doesn't interfere with the production environment. As soon as the test starts, a snapshot of the replicated LUNs is created (at least those that are in the Protection Group associated to the Recovery Plan that is being tested). It's conceptually identical to what we have already done with a manual snapshot (see above), but this time it is SRM that instructs the LSI SRA (Storage Replication Adapter) to create the snapshots and the SRA in turn talks natively to the LSI devices to do so. The SRA is basically the driver that SRM uses to communicate with the actual storage subsystem. You can see the snapshots being created in the next picture:
************* Background information that you need to understand and be familiar with before you move on *********************************
VMware SRM is configured by default to set the EnableResignature parameter to 1 (that means TRUE) on each of the hosts in the receiving cluster. This means that, independent of the behavior you configured on the hosts, SRM will always resign the LUNs when imported into the remote cluster in the DR site. This will cause the LUNs to be renamed with the (in)famous naming convention snap-xxxx-<Original Datastore Name>.
If you want to keep things clear and "human readable," you can change the SRM configuration to rename the Datastore to their original names. This is achieved through an SRM configuration file that is vmware-dr.xml and it's located in the C:\Program Files\Site Recovery Manager\Config directory of the SRM server in the DR site. You have to identify the line
<fixRecoveredDatastoreNames>false</fixRecoveredDatastoreNames>
and modify it to:
<fixRecoveredDatastoreNames>true</fixRecoveredDatastoreNames>
Thanks to Duncan E. and Mike L. for their researches.
It's important to understand that this will not change back the value of the EnableResignature parameter to 0. In fact the LUN will be resigned anyway but SRM will take an extra step to rename the Datastore back to its original name (effectively just deleting the snap-xxxx portion of the new Datastore name).
Not being an expert on this, I can only think that doing so is important when you want to maintain a decent naming convention, especially when you consider that a failback onto the production site would cause SRM to rename the Datastore into something like snap-xxxxx-snap-yyyyyy<Original Datastore Name> (which is indecent in my opinion). Apparently it would have been easier for SRM to configure the host to allow snapshot LUNs (DisallowSnapshotLUN = 0) and not bother in the first place with the resignature and the rename. But if VMware decided to do so, there must be other (hopefully good) reasons.
************************************************************************************************************************************************************
Having this said, we have the background to understand the next picture which outlines the storage configuration on the cluster at the DR site in Roma:
The Datastores have been imported with the original names due to the change in the vmware-dr.xml file. The UUID for the Datastores, however, have been changed since they have been resigned. This is not a problem for SRM because the "place-holder vmx files" that are kept at the DR site do not contain any reference to the disk configuration of the VM. The Datastores are parsed during the execution of the Recovery Plan and the correct disks (with the actual UUIDs) get included in the final vmx prior to the startup of the VM.
Notice that the production VMs are being started off the snapshots that the LSI SRA has created and they are now connected to a so-called "Bubble Network." The Bubble Network is a standard VMware Virtual Switch with no Physical NICs connected to it that gets created for the time of the test. This allows the system administrator to test the restart of a copy of the VMs (currently running in production) without bothering about potential network conflicts. Of course at this time, the replica between the primary and DR sites is still in place and we are still fully protected from a potential disaster.
The test is being executed, and apparently everything has been running smoothly. At this point, SRM pauses for the system administrator to make an evaluation of the test (notice in the SANtricity Storage Manager how the snapshots also have been automatically mapped to the cluster):
Once the administrator is done with the checks he/she can push the "Continue" button, which essentially rolls back the Test. This, in a nutshell, includes shutting down the VMs in the DR site and deleting the snapshots taken from the replicated LUNs. Everything is now back to normal for the next Test to run (or a disaster to recover from).
Site Recovery Manager Setup (Run the Recovery Plan)
Running the Recovery Plan is different than testing the Recovery Plan. The most important difference is that SRM doesn't create snapshots of the replicated LUNs; rather it uses the replicated LUNs directly. The other difference is that the VMs on the recovery site are connected to the actual physical network and no longer to the "Bubble Network" that is used in the Test. Everything else is pretty similar to what we have seen already.
As you can see, SRM instructed the LSI SRA to revert the role of the mirroring: now the LUNs on the DS4800 (the storage server at the DR site in Roma) are "Active" and get replicated onto the "Passive" LUNs on the DS4700 in Milano. Most likely this is not what would happen in a real life disaster. In that case, probably the DS4700 would not be available (due to the disaster) so the SRM would only activate the replicas on the DS4800 in the DR site.
At this point the VMs would be restarted on the cluster in Roma similarly to what happened in the Test scenario (with the exception that they would connect to the actual physical network since they are restarting there to really take over). Remember this is no longer a Test, it's a real Run of a real Recovery Plan. Doing this on a production environment will have devastating results!
At the end of the process, all production VMs (Web1, Web2 and Web3) would be running on the VI3 cluster in Roma which now effectively can be considered the new production site.
Failback
Failback is a nightmare, at least in my opinion. Unfortunately there is not a "Failback Button" on the SRM console. However, you could work on the VMware consoles to create a Recovery Plan that will move all the VMs currently running on the DR site (Roma, for us) onto the original production site (Milano, in our case). Rather than a real failback, I think it's more appropriate to define this as a new failover plan that happens to bring the workloads back to their original positions. VMware has published a useful document that, in chapter 6, describes the steps to failback from an SRM failover. It's a good read. There is only one caveat in that paper that would need further investigation: at some point in the failback process it's suggested to set the DisallowSnapshotLUN parameter on the hosts in the original site to 0 (it would be the hosts in Milano, in our case). This means that when the storage is brought back to the original place, the ESX hosts on the original production site would be able to import the Datastores without resigning them. Since this is done via SRM, it is inconsistent with the behavior we have noticed during the failover. SRM seems to automatically set (on the fly) the EnableResignature to 1 on the hosts where the LUNs are being re-activated, effectively forcing the hosts to re-sign the volumes - and thus making the DisallowSnapshotLUN irrelevant. Further investigation would be required to nail down this inconsistency between the documentation and the behavior we have noticed.
Massimo.
|
-
The last day of March 2009 Intel officially unveiled its brand new Nehalem core architecture under the Xeon 5500 product name umbrella. There is not much to say about it other than it's impressive from a performance perspective. Just to give you a sense of what we are talking about the new product - only available for 2-socket servers today and with up to 4 cores per socket - has published many benchmark numbers that are either on par or slightly better than 4-socket Intel based servers with up to as many as 24 cores. One might wonder why a successful (and clever) company like Intel is going to cannibalize their highly profitable multi-socket market with a lower profitable product such as the 5xxx Xeon series. And I think the answer to this question is in one of the slides they used to present Nehalem at the launch event:
These numbers are impressive but I am pretty sure that if SUN and IBM marketing people would ever be able to read the small text at the bottom (which seems to be technically impossible) I am pretty sure they would come up with something to counter those numbers as they are obviously presented in a way that favors Intel; however I am not sure about this as I can't read the text myself so I don't know the assumptions behind those numbers. What it is important in this chart however is not the numbers (we know Nehalem has impressive performance per core) but it's the fact that Intel is now using Xeon to go after a 20+ Billion $ UNIX market. Up until now - and in the last 10 years - they would have been using Itanic (ehm... I mean Itanium... sorry for the typo) to go after the IBM Power or the SUN Sparc processors to get a slice of the Unix pie. This doesn't seem to be the case any longer. One might wonder where Itanium falls into all this: good question.
A bit of history on Itanium might help. Originally the Intel vision for the 64-bit Itanium was that it should have been the x86 32-bit follow-on product: the replacement for the Xeon brand basically. And they might have had a chance to succeed if AMD didn't come out with a much smarter evolution for x86 32-bit processors: in case you are wondering that would be an x86 64-bit architecture (namely AMD Opteron). When Intel understood they couldn't fight the Opteron with Itanium - since Opteron was 100% backward compatible with the Xeon software available whereas Itanium was basically not and would have required massive and painful applications porting - they decided to introduce the same "enhancements" to their Xeon processors. This was initially referred by Intel to as x86-32e: obviously they couldn't say Xeon was 64-bit as it would have overlapped too much with Itanium so they preferred to stay with the ridiculous definition of "32-bit Extended". This was the time where they tried to pitch Itanium as the only "native" 64-bit processor whereas the Xeon (as well as the Opteron obviously) were "just extensions to current 32-bit architectures". And this is when they shot themselves in the feet since they tried to play with the words (i.e. native sounds better than extended) but the only problem is that they forgot that, as far as IT is concerned, native means you have to port the application whereas extended means it's compatible. So, for most of the customers, eventually extended sounded much (much!) better than native. And this is when Itanium started to see its decline in perception. I did a presentation at an IBM System x Symposium in France back in 2004 where I have shared these thoughts. Interestingly enough at that time we had an Itanium based System x box in our portfolio - the x455 - and I basically implied that Itanium (hence the x455) was at a dead-end and a useless product given the historical context we were facing. This is for example a chart that I used in 2004 to predict Windows on Itanium had no real place and didn't make any sense at all; it took a while but I think now MS think along the same lines:
Funny enough there was an Intel representative in the room that apparently didn't like these messages and he decided to escalate and complain about my pitch to my line all the way to the General Manager of the IBM Systems and Technology Group (that reported directly to Lou Gerstner - CEO of IBM at that time). I was never been officially involved in this complaint but the fact is that, later in the year, we dropped the x455. I like to think I gave a hint to the product marketing team on what to do but more likely what I said in the session might have been a blessing from the field about what product management was going to do anyway (and for very good business reasons). For your information I have posted the entire Power Point deck in the Files session of my site if you want to have a look. You can download it here.
To make a long story short Intel had nothing left to do than re-position Itanium as a high-end RISC replacement with the help of HP that, confident in its value and roadmap, decided to completely drop their own RISC offering - the HP PA-RISC processor - and jump onto the Intel Itanium processor as a strategic replacement. Intel tried to position Itanium as an open platform mentioning they had dozens of OEMs offering servers based on that processors but usually they forget to mention that the vast majority of the sales numbers they were seeing were coming from HP which is the only tier 1 server vendor today offering such a processor (IBM and Dell used to but they withdrew it and SUN never even attempted to).
As Xeon (and the AMD Opteron) became more and more enterprise-ready, the Itanium potential started to shrink even further. Up until now when Nehalem seems to be the last nail on the Itanium coffin. Consider also that the first Nehalem incarnation is a CPU model for 2-socket servers (Xeon 5xxx). This might leave the impression that Itanium can address a much larger window as it shines on highly scalable boxes. The truth is that this is the first product iteration based on the Nehalem core. Later in the year Intel will announce a multi-socket Nehalem based CPU - aka Nehalem EX - capable of scaling up to 8 sockets (Xeon 7xxx series). This CPU will feature 8 cores and Hyper-Threading thus providing execution support for 128 simultaneous threads (8 sockets x 8 core x 2 threads) in a single system image. Last but not least this new CPU will also feature additional enterprise functionalities such as MCA (Machine Control Architecture) which was one of the few things Intel used to position Itanium as "more enterprise" than Xeon. On paper a system like this could address the need for 99.9% of the customers' requirements. This statement obviously refers to performance but we obviously all know that performance is just one aspect of platform selection. This will obviously cause some adjustments in the server market shares and this goes back to the fact that apparently Intel is cannibalizing their current high-end market. Most likely what they have in mind, instead, is that they want to push the bar further and enter even more aggressively into the UNIX market with a more appealing and serious offering (than Itanium) like Xeon. The idea is: I will cannibalize a high-end x86 profitable market today which is worth a few B$ with a lower-end and less profitable product, because I want to use its big brother (Nehalem EX) to go after a 20B$ UNIX market. Since a picture is worth 1000 words this is what I am trying to say:
Note that I am not implying this is what I think it will happen. As I said performance is just a metric in platform selection. I am only speculating on the view that Intel has going forward. I am not ruling out completely (either) that this view has a point given what's going on and if this happens this will not only impact Itanium in the RISC space but other UNIX platforms as well.
Back to the Itanium discussion, last but not least it's worth mentioning that there is going to be a convergence in the Itanium Tukwila time frame (unsurprisingly delayed again) where you can drop this new CPU into a Nehalem standard socket (see the Update below). Intel has always pictured this flexibility as a mean to lower Itanium development costs and make it more flexible/cheap for customers and OEMs to move from Xeon to Itanium. The reality is that at the end of the day you end up having a common system, with the same components, with the same CPU socket. At that point you'll have the choice of installing either a cheap, super fast Nehalem processor with an unmatched flexibility of OS flavours and ISV applications... or installing a more expensive, somewhat slow Itanium Tukwila processor with an embarrassing flexibility of choice of OSes and ISV applications (at least compared to the Xeon family). I am pretty sure there are some HP execs regretting the port of HP-UX onto Itanium rather than having ported it onto the x86 architecture - if they knew 10 years ago what the x86 architecture would have looked like 10 years later.
It's well known that not only Itanium didn't bring any profit but its development costs have been impressive and they never got on par with slow sales. In a word Intel has lost tons of money on Itanium. Having this said there are obviously a number of issues that prevent Intel from dropping immediately the dead processor: for example contracts that they have signed with "these dozens of OEMs" - and one in particular which I won't mention (again) - that dropped their in-house developed CPU architecture for jumping on Itanium. They cannot just say "hey we are dropping Itanium" and leave these vendors in the mud (especially one). So I guess it's fair to say that, officially, Itanium is alive and healthy, obviously you can imagine what the reality is.
Massimo.
Update (10th June 2009): while Tukwila and Nehalem EX will share the same QPI bus the sockets of the two processors will continue to remain incompatible for the moment.
|
-
On Monday 16th Cisco unveiled its Unified Computing System (UCS). A few days ago I have been briefed by some local Cisco guys about the product (err, the architecture as they stressed). I assume that people reading this post know what Cisco is doing and are familiar with the announcement. In a nutshell they have announced a new thing which is a mix of hardware (primarily) and software that is comprised of the following:
-
their Unified Fabric technology (as it can be found in other products like the Nexus family of switches)
-
their new Blade technology
-
their Management technology (which is an OEM and supposedly customized version of the BMC BladeLogic software)
Consider there is not a lot of information available at the moment so most of the discussions are based on preliminary - and poor - initial documentation. This picture explodes the pieces and it's one of the few diagrams that is being shared by Cisco at this stage:
Never mind I work for IBM and many of my colleagues see this as a potential threat to our server hardware business (which I am sure it is the case). In the final analysis I am a technology geek and that's how I run this personal blog. What I write here is my own unbiased (believe it or not) personal opinion.
I must admit I am fascinated by what Cisco is trying to achieve here. Ideally it sounds like a very compelling solution and something that anyone should be seriously valuating for virtualization deployments. Having this said, as for all things in life - none excluded - there are pros and cons. I am not going to spend time to talk about the pros as they are obvious and Cisco is certainly going to explain those to you in the details. These include, for example, the potential benefits of the Unified Fabric, which are enormous. I believe end-users reading this blog would be better served, at this point, by someone that starts to highlight the (potential) challenges of designing and implementing such a vision and architecture. This is done to balance the flow of "pros" you will be flooded with. Note this is nothing new on this blog: when VMware announced VMware 3i I wrote an article on the misleading marketing information that were associated to it; similarly I have done a reality check for VMware Site Recovery Manager to underline its deficiencies rather than magnifying its excellences (that's what the VMware marketing is paid for).
This is exactly what I'd like to do here with this new article: I'd like to underline the challenges that Cisco is facing. However I don't want to do that from a competitor blade vendor perspective (that's what the Dell/IBM/HP marketing organizations are for), but rather from a VMware virtualization expert (vExpert) perspective based on feedbacks from the field and various customers' projects I have been involved in now and in the past.
(Physically) Unified Fabric? No, Divide et Impera!
Cisco is trying to capture a potential convergence in the datacenter. This is a process that started early in the 21st century when the major servers vendors started to ship blades form factors: those blade chassis in fact integrate both Ethernet and Fibre Channel switches as well as compute nodes (i.e. blade servers). This wasn't an easy thing to do in organizations with very strong vertical specializations (and politics!) in the data center. That's why we still see an exaggerated number of "pass-through" technologies being used on blade chassis that basically externalize the thousands of Ethernet and Fibre Channel ports of each blade. This diminishes the intrinsic value of the blade technologies, however it allows to connect the blades to the legacy infrastructure switches. Most of the time in fact this is not done for technical reasons but merely for political reasons: "The server guys are responsible for servers, that's it; the network guys have their own infrastructure and that's (physically) separated from servers....". This is what usually happens with big organizations. I have been through that many times.
Having this said, I support the Cisco message: what these big accounts are doing is very inefficient and there is space for a huge optimization if they could possibly get the internal political issues resolved. However I think this is one of the problems Cisco is going to face in promoting their Unified Fabric technologies. Well, in reality this situation is exacerbated by the fact that we are talking about a convergence of IP and Storage networks, so even more politics involved.
Unified Fabric, Weak security?
Once we get passed the physical consolidation concerns I have discussed above and the customers have accepted to position the switches in a non conventional location (i.e. closer to the servers than to the infrastructure) Cisco might face another concern related to security. As a background, this will of consolidating and reducing the cabling complexity that each VMware ESX server has associated is nothing new. I have discussed this very exact topic back in 2007 in the article "Infiniband Vs 10Gbit Ethernet... with an eye on virtualization". As you might see from the picture in the post (which I am attaching hereafter for your convenience) InfiniBand was supposed to deliver the same concept of I/O virtualization that is being evangelized by Cisco with their Unified Fabric:
This is very similar to the latest Cisco Nexus value proposition (hence to this UCS announcement as it's based on the Nexus core technology). No matter if it's InfiniBand or 10Gbit Unified Fabric, the biggest problem with this layout and architecture - as reported by customers and VMware network security experts in the forums threads linked below - is that each ESX server has a number of network security zones that best practices would require to keep separate from each other. Many customers achieve this creating network security zones (i.e. for the ConsoleOS, VMotion, iSCSI, VMs etc) by means of physically different network adapters that connect to physically separated network switches. For these customers VLANs and PortGroups technologies are not usually a viable option as they don't implement and guarantee the same level of security and separation they need. In the picture above the criticality lies in the fact that these physically and logically separated network segments need to collapse into a single Bridge/Switch for the whole I/O virtualization to work (be it InfiniBand or Cisco Unified Fabric).
Last but not least consider this discussion is multidimensional. Not only Cisco is trying to unify all different IP segments on the same wire - as already discussed- but they are also trying to unify both IP traffic and Fibre Channel traffic on the same wire (by means of a new technology called FCoE or Fibre Channel over Ethernet). Obviously this additional dimension adds even more potential security concerns than "simply" collapsing heterogeneous network security zones. There have been a number of interesting discussions on the VMware forum that I highly encourage you to read if you are interested in the matter. You can find them here and here.
This is going to be another challenge for Cisco.
Unified Computing? More like Partially Unified Computing
I don't really get the Cisco message here. I have already talked about how I see the technology trends in this industry; in a nutshell what's happening is that data centers are being transformed from vertical silos of servers, storage that support (statically) applications into pools of physical resources that could be used when they are needed. You can read more about these trends in this other article I wrote. The picture in the original post doesn't call out one important element of the architecture which is the network: I didn't call it out because it was obviously there but let's try to refine that diagram to draw the complete picture of the elements that comprise a virtualized data center.
A properly designed and innovative x86 virtualized data center requires these 4 distinct elements:
- A Shared Server infrastructure
- A Shared Network infrastructure
- A Shared Storage infrastructure
- The Virtualization software (which is the glue that ties together all these components)
Note: in a traditional virtual infrastructure the storage network (be it fibre or Ethernet) is physically separated from the IP network (which is typically Ethernet). In the context of the Unified Fabric there is a single network (based on 10Gbit technologies) that carries both storage and IP. This doesn't really change the idea of the diagram above; it actually enforces the message meaning that the Shared Network is also shared from a "protocol being carried" perspective.
One of the challenges customers have today is that these 4 elements are really managed and operated by different vertical (and specific) management tools: you have to use vCenter to manage VMware, you have to use the Server tools to manage the Shared Servers infrastructure, you have to use specific tools to manage and operate the Network infrastructure and ultimately you have to use specific GUIs to manage the shared disk space. This is not, by the way, a negative thing per se because it allows a customer to switch from one vendor to another at any level they want, thus allowing them to not be locked-in. This is a concept that is historically at the very basis of any x86 deployments and one of the most important aspects that determined - and still determines - the success of this platform.
The point I am trying to make is that Cisco "Unified" with their offering only two of these four elements. Namely Servers and Network:
What is this going to mean for customers from a "unification" perspective? Very little I think. Consider also that the servers themselves, frankly speaking, are probably the most commodity thing of all four from a management perspective simply because management standardization (such as IPMI and BMC) is allowing third parties to build into their own products an x86 management layer. A typical example of this is, funny enough, the VMware effort to create a CIM-based interface to manage standard x86 servers (this implementation first appeared in ESXi and it's now available in the standard ESX version). This is an example of this concept:
I certainly don't want to downplay the challenges associated to managing a server farm but, if you ask me, extending an existing tool to add functionalities that properly manage an x86 servers deployment is not something that should be under scrutiny for a technology Nobel prize. So to speak. Ironically VMware is "unifying" Virtualization with servers management whereas Cisco is "unifying" Network with servers management. Not the holistic unification it's being discussed in the marketing announcements though.
Similarly to the "unified" management concept above, building a brand new x86 blade is a relatively easy task compared to building a brand new Storage subsystem or compared to building a brand new Virtualization software infrastructure element (ask Microsoft). So I am starting to wonder why they have chosen to (partially) "unify" starting from the easiest of the four elements. Here I am assuming that the innovative characteristics of their blades are either easily achievable by long standing tier 1 servers vendors (Dell, HP, IBM, SUN) or are not strictly necessary as of today: The speculated 500+GB of memory support per Cisco blade seems cool but I am challenging the need for something like this given the current well known rule of thumbs for sizing ESX hosts. Sure Nehalem will change these numbers but even assuming doubling the amount of RAM required for a 2S/8Core system we are far far away from the 500+GB Cisco specs.
More so Cisco has clearly stated that they want to leave the Software Virtualization as well as the Shared Storage elements open. I don't want to provide more details here as I am not sure about the level of confidentiality associated to the info I have but the key point is that they don't have a strategy that calls for a single Virtualization vendor nor a single Storage vendor. Enough for now. And this again leads me to think what sort of "unification" this is all about. What I have learned basically is that you can buy UCS and use, now or in the future, your storage vendor of choice - with the management framework that comes with it - as well as your virtualization of choice - again with the management framework that comes with it. You have to do this with all the benefits and challenges that end-users experience today in aggregating and integrating different vendors to create the ultimate virtualized infrastructure.
Don't get me wrong. I am a fan of this Unified Fabric concept and I hope it will take off as it will solve many of the enterprise customers challenges associated to the management of the distributed infrastructure. There is lots of information available on the web, as I said, on the benefits of implementing this highly consolidated and "intelligent" fabric. This is from Chad Sakac (with EMC) and it discusses some of these benefits, for example.
What I am questioning is this Cisco move to extend their value proposition from the Unified Fabric into a market (x86 blades) that isn't really adding any additional benefit to their unification story. Reading through Chad's excellent post I can't really depict what is the uniqueness of doing something like what he describes, using alternative components such as Dell / HP / IBM / Sun servers and Dell / EMC / HP /IBM / NetApp / Sun storage all tied together with the Cisco Nexus technology which remains the real Cisco value add in this context. That's what I am missing.
That's the question I have asked during the session a few days ago: what's in - for the customers- if they use a Cisco UCS infrastructure compared to an IBM BladeCenter + Cisco Nexus infrastructure? Granted Nexus switches for the IBM BladeCenter do not exist today, this is a hypothetical question. Sure they have this "integrated management" framework but what's the value in it if what it does is simply managing a subset of the entire infrastructure? Customers will still be forced to deal with a number of vertical management pieces to operate the infrastructure end-to-end.
I am missing it unless there is some sort of grand plan behind the scenes to make the EMC and Cisco pair "more tied" (whatever that means). How about an "EMCisco"? I am going to copyright this term: a brief search on the Internet didn't find any result for this term used in the IT context (although apparently there is a DJ called EMCisco). This single IT entity would, in fact, be able to provide an end-to-end infrastructure comprised of virtualization software, network, servers and storage and they would be able to really integrate the whole thing into a single management and operational framework with a potential much deeper integration (other than standard public API's that interconnect the different four elements). The interesting part is that, as I said, the x86 server market - and its surroundings - is literally modular and no single customer that I know would be willing to be locked-in in such a way (unless there are compelling reasons to do so - which I am not ruling out).
The bottom line is that, if I was malicious, I would be led to think that today Cisco is more interested in getting a slice of the 30B+ US$ x86 server market - on top of what they can do with their Unified Fabric solutions - through the development and integration of the most commodity piece of all the four elements. I can easily see what's in for Cisco: easy additional money. I can't really see, so far, what's in for customers.
I'll let Cisco give you the bright side of their new UCS platform. My role here was to show you the dark side of it (someone has to).
Massimo.
|
-
A few weeks ago I wrote a
tutorial on
how to deploy Hyper-V R2 on the IBM BladeCenter S where I demonstrated, among other things, how
to LiveMigrate from one blade to another. I didn't spend
too much time commenting on the implications this will have in the market.
In this article, I'd like to comment on some of those potential implications.
Reading my piece you might have had the impression that I was "backing"
Microsoft and putting Hyper-V R2 on the
spotlight. That was not my intention: in fact the geek at the bottom of my heart
just wanted to give it a try, as easy as it is. While I was pretty much happy with what I have
seen, I was certainly not implying that Hyper-V R2 will be able to
match VMware Enterprise technologies (both current and future). In fact, I don't
honestly think that this is the case. Part of the misunderstanding is that, for some reason, this
industry has grown with the stereotype that a virtualization product that is
capable of moving a live workload from one server to another is to be considered
enterprise-grade. VMotion has become the industry benchmark for being an enterprise product. I want to
challenge this stereotype.
My article created a bit of confusion around this concept. "I
saw your article. Are you saying that Microsoft is going to be on par with
VMware?" is a common question I have heard a lot lately. I want to use this
new article to give you the "other
side of the coin" regarding these two important technologies Microsoft is going to bring
to the market that are LiveMigrate and CSVs (Cluster Shared
Volumes). While having these two capabilities in the new product will help
Microsoft to overcome some limitations they have today for some deployment
scenarios, this doesn't mean these features could be used in all scenarios
(specifically Enterprise scenarios).
The devil is in the detail, so when you start digging a bit
into the LiveMigration technology, for example, you can find that:
"..... On a given server running Hyper-V, only one live migration (to or from
the server) can be in progress at a given time. For example, if you have a
four-node cluster, up to two live migrations can occur simultaneously if each
live migration involves different nodes....."
The full story is here:
http://technet.microsoft.com/en-us/library/dd443539.aspx
This obviously is documentation that relates to an early beta
of the product. But if they are going to stick with these limitations, it would be hard to imagine
wide deployments in enterprise scenarios where you might require multiple live
migration tasks going on cluster-wise at any point in time for resource
optimization reasons. So assuming Microsoft (or Citrix with their new Essentials
for Hyper-V
package) will come out with some
sort of DRS-like product in the R2 timeframe, they might not have the
underlying infrastructure ready to leverage these add-on tools.
The same goes for Cluster Shared Volumes: the devil is always in the details. If you
have read my previous article you might have had the impression that CSV will
deliver pretty much what
VMFS delivers today. Well, apparently yes, but again, if you dig a bit into the details you will
find out some limitations that might not be relevant for small deployments
but might be show-stoppers for enterprise deployments.

At the time of writing, these slides were publicly available at this
link. Kudos to Microsoft for not hiding these details and for letting the
people know about the limitations.
While it appears at first that CSVs are a "transparent" technology, the
reality is that as soon as you start pushing the envelope, they are not. How
many enterprise IT organizations today leverage storage replication technologies to implement
Disaster/Recovery scenarios? Based on my experience I would say many of them.
Hyper-V R2 with CSVs will break this common implementation pattern if they won't
be able to overcome this limitation. A pattern that I would imagine all these
enterprise customers want to continue to leverage and something that is not just
bound to current VMware deployments as it's a technique that is being leveraged
by UNIX and Mainframe deployments as well to achieve High Availability and
Disaster Recovery.
These are just two examples. As I said, supporting techniques that
allow a live workload to fly from a physical server to another is just
one aspect, but probably not even the most important. The fact that you have a
small Cessna - and so you can technically fly - it doesn't mean it's the
most optimal, secure and comfortable means of transportation to go from Milan to New York. For that you want to fly on a 767 (and in business class, if
possible!). Of course there are a lot of Cessnas around as they fit a
part of the market.
On the other hand, as I said in my previous article, Microsoft
has a tremendous asset: they are making (almost) everything available for free.
Which leads to at least a couple interesting comments.
Does the price discussion really matter anyway?
The first comment is: does the price discussion really matter
anyway? The Microsoft
pricing strategy is so that when you have properly licensed your Windows guests
(typically via either Windows Server Enterprise or Datacenter SKUs), your
underneath Microsoft Hyper-V virtual infrastructure is already licensed by
definition. And this is true today. Suppose you have 50 Windows guests to
deploy on four 2-socket servers for example. Most likely the cheaper way to
license these 50 guests is via the Windows Server 2008 Datacenter SKU which is
licensed per physical socket and provides unlimited number of guests. If you do
so, it doesn't really matter whether you want to use Hyper-V 2008 Server or a full-GUI Windows 2008 Server
w/ Hyper-V or a GUI-less Core Windows 2008 Server w/Hyper-V as your parent
partition. You have the right to use everything you
want for free including the Failover technology (Microsoft Cluster Server). This
excludes the MS Virtual Machine Manager but this won't change in the Hyper-V R2
time frame. So this claim that with Hyper-V R2 they will have more stuff
for free is a bit misleading in my opinion in the sense that they already
effectively provide many things today (not just the Hyper-V Server SKU) for free.
There is a caveat to this, though, and it boils down to how customers are
going to license the Windows guests. The analysis above assumes that customers
are going to buy brand new licenses for their new deployment (because they had OEM Windows licenses on their old physical servers that could
not be repurposed, for example). If the customer has Windows licenses that they can repurpose
on the new virtual infrastructure, then the discussion on the cost of the virtual
infrastructure itself is no longer trivial. And, yes, there will be a big bonus in
this regard during the R2 timeframe - as the free Hyper-V Server R2 version will
have more features and fewer limitations than the current free version. The pricing discussion
can get very complicated, as mentioned in my
blog a few
months ago. It would be interesting to see some statistics on how customers have
currently licensed their legacy physical servers.
Last but not least, I am assuming that these customers are using Windows
guests on their Hyper-V infrastructure. While Microsoft supports a limited
number of Linux distributions (today SUSE, but they announced future support for Red Hat,
too), I don't see too many Linux-only customers leveraging
Hyper-V for their virtual infrastructure deployments.
Clearly the Microsoft virtualization strategy is different
than the VMware virtualization strategy
The second comment regarding the (virtual) price war is this: clearly
the Microsoft virtualization strategy is different than the VMware
virtualization strategy. And the pricing strategy reflects that. I wrote another
article on this topic which I invite you to read. I am attaching hereafter
the picture for your convenience because I want to use it to back my point.

In a nutshell, Microsoft makes money out of the red part
whereas VMware makes money out of the blue part. Microsoft is probably going to
stick with their "Virtualization is a value item of the OS" strategy for the time
to come if the pricing schema for Hyper-V R2 (due early next year) is what they
are pitching today. Basically what they are doing growing the
blue part, and giving it away for free. The only way they can sustain this is by
continuing to make money on the red part. This has at least a couple of
implications that are worth underlining:
-
Their Linux strategy is pretty much opportunistic (well, it's obvious and
totally expected after all - it's a dumb statement) in the sense they want to give customers with
a "few Linux servers here and there" the possibility to leverage the
Hyper-V infrastructure these customers are using for the majority of the
(Windows) VMs. Even though a Linux shop (probably) would not want to use
Hyper-V for technical (or religious) reasons, it wouldn't even make sense
for Microsoft to go down that path because they would need to make the blue
part technologically compelling on its own while giving it away for free. There would be
no revenue stream for Microsoft in such a scenario so probably not worth the
effort for them.
-
Microsoft will not have a business interest in making the
blue part grow too much as long as they are going to give it away for free.
This means that they won't be able to afford to be so aggressive in the
Virtual Appliance space because the JEOS concept is pretty dangerous to
their current business model as you may detect from the
picture (the smaller the red part is the less leverage they have). Unless
they radically change their software licensing model - which I wouldn't
rule out- I don't see how they could sustain an aggressive move toward this
JEOS concept. Consider also that the smaller the red part is, the easier it
could be to migrate to a different OS for the ISV. This is a generic
statement obviously and might not be applicable to specific situations.
All in all what Microsoft is doing is interesting and it will
benefit customers because it will keep VMware honest in what they are doing -
in terms of both technology and pricing. My speculation is that this is going to be a two-horse
race in the long run between VMware and a virtual
agglomerate comprised of Microsoft and its historical partner Citrix. There are
concerns and rumors in the industry - admittedly, I am personally backing them -
that Citrix has sort of lost interest in battling at the XenServer level, which
is now being distributed for free. Some people are speculating that
Citrix is shifting their strategy to expand and provide value on top of someone
else's basic virtualization offering (namely Microsoft Hyper-V) and losing focus
on their own commodity hypervisor and management offerings (XenServer). Similar to what they are already doing with XenApp expanding the core
Microsoft Terminal Services technology.
There is no question that the aggressive pricing move from
Microsoft in the R2 time frame will garner some reaction from VMware. I
don't have any insight but I wouldn't be totally surprised if VMware was going to
provide VMotion either for free or in one of the less prestigious future
vSphere SKUs. There are enough technology deltas, on top of VMotion, that will
differentiate VMware from Microsoft (especially for enterprise deployments) that
will allow the guys in Palo Alto to continue to charge premium prices if they
want to.
However, I think that VMware will be at a fork sooner or
later: they could either continue to charge a premium for their unmatched
features to fill a need some of the Enterprise customers have (and that no one
in this industry can or will match), or they could substantially lower their
prices to appeal many more customers - especially those that can't afford their
technologies. The theory is that you could earn $1,000 either charging 1,000
customers $1, or charging 100 customers $10. This always holds true unless you
figure out a way to charge 900 customers $1 and 100 customers $10. Their
Acceleration Kits are an attempt to achieve that value proposition, but what Microsoft is doing in the R2 time frame might
require a revisiting of the current VMware portfolio layout (which I am sure is in VMware plans). Of course, we need to remember R2 is still
about a year away so VMware has some time to think about this.
Massimo.
|
-
I have just got back from VMworld 2009 Europe in Cannes. It was an interesting week and not just because we were in Cote D'Azur (Azur, not Azure like in Windows Azure). There have been a few interesting announcements, demo and breakout sessions going on at the Palais de Festival during the week so it would be difficult to make a ranking but if I have to give my "virtual Oscar" to something I have seen.... that would be AppSpeed.
AppSpeed is a new technology that will take some sort of product shape during 2009 under the vSphere umbrella. Whether it's going to be part of the VDC-OS most expensive SKUs or it's going to be a separated product, that I don't know. The roots of this product are in an acquisition VMware did in the summer of 2008 when they acquired a company called B-Hive that developed a product called Conductor. Conductor - AppSpeed from now on - is an "SLA product" that basically takes apart the architecture of an application and creates a logical view of the sub-workloads taking place; a typical example is a multi-tier application that has web, application logic and database components. Not only this, the interesting part is that AppSpeed will monitor the performance of the workload in the way end-users perceive it that is: latency and time of execution. This means that once AppSpeed has built the logical mapping of the applications, the system administrator will have available at the fingertips information such as, for example, how long the web front end takes to respond to the request (i.e. web server response time), how long it takes for the transaction to get to the DB server (i.e. network latency), how long the DB server takes to respond back to the front end (i.e. DB server response time). If you want more information about AppSpeed you can see here; there is also a very nice on-line demo here.
I see this as a huge step forward in virtual infrastructure deployments for two particular reasons that I am going to articulate hereafter.
The first reason is because this is what customers implementing virtualization have asked me since I started deploying these technologies. "How much is the ESX overhead?" is probably the most frequently asked question that I have heard in the last 10 years or so of virtualization implementations and evangelism. The good news is that the answer was easy: "it depends". The bad news is that it was rarely satisfactory for the customer. The fundamental problem we have had so far is that VMware systems administrators and the application folks use different metrics to check the health of the implementation. Systems administrators would usually monitor resource usage on the host (i.e. CPU, Memory etc) such as "your VM is only consuming 10% of its allotted resources so it's doing well". However the end-users use a different metric such as "I don't care it's only using 10% of its allotted resources, the fact of the matter is that the job takes 2 minutes to complete so it's slow!". AppSpeed is going to bridge these two disconnected worlds giving the systems administrators higher level monitoring techniques that are very close to the language the end-users speak.
An interesting scenario that was pitched during the breakout session in Cannes was that AppSpeed could even be used in the pre-virtualization stage. The idea is that before virtualizing a given multi-tiered application (or part of it) you would use the AppSpeed sensors to build the logical map while the application is still running on one or more physical servers. That would give you the benchmark when you move the application into the virtual world. So for example if your transactional application deployed on your physical infrastructure has a 2 seconds response time or your batch workload has a 5 minutes elapsed time of execution, you can then benchmark your new virtual deployments against these values to see whether virtualization has brought in some overhead (and how much). And with the "decomponentization" that AppSpeed does at the application level you should be able to drill down to the level where you can determine where the issue is. It's not yet clear to me whether the correlation between AppSpeed metrics and standard resource usage metrics is going to be done out-of-the-box by the VMware tools or it's the systems administrator that will have to match the two metrics.
The second reason for which I think this is an enormous step forward in virtualization deployments is because I have always laughed at those people referring, in the early days, to VMware ESX as the mainframe software for x86 servers. There is a fundamental difference between a VMware ESX server and a mainframe and that is that mainframe operations are usually driven by "goal modes" in the sense that the administrator would set the goal - or the desired performance for a given workload - and it would let the system figure out itself the configuration of resources to deliver on the goal. While ESX has many of the knobs and parameters you could find on high end UNIX boxes and mainframes, its operations are still driven by "let's try to add more resources to that workload and see what happens". The pattern on ESX usually is:
-
The end-user complains about the application to be slow (what does slow mean by the way?)
-
the ESX administrator tries to add more resources (i.e. either increasing the CPU and Memory shares or increasing the number of vCPUs and Memory allocated to the VM)
-
the ESX administrator keeps his/her fingers crossed and goes back to the end-user to see if anything has changed
-
the end-user will either be happy or will continue to complain because the application is still slow (and the discussion would go on and on).
While AppSpeed won't add magically the goal mode capabilities to the VMware infrastructure it's clearly a step into that direction. Most likely in the first incarnation of the product the technology will allow to monitor "passively" the response time of a given application which would require a system administrator to work on the vSphere knobs to change the behaviour reported by AppSpeed. Continuing to speculate it would be natural for VMware to get to that "goal mode" state where a system administrator (or the end-user directly through the vApp SLAs) would set the "response time" for the application and would let the infrastructure figure out how to achieve that level of performance (and perhaps charge back accordingly).
I am certainly not saying that vSphere (or any future VMware products incarnation) would easily get to the point of matching the mainframe operations any time soon but AppSpeed is certainly a move into that direction. It is also worth noticing the different nature of the applications deployed on the mainframe and those deployed on x86 infrastructure. While applications deployed on the mainframe can usually be tuned increasing or decreasing priority access to physical resources while keeping the same number of application instances, on VMware infrastructure you can either use the same technique or - most likely - you might be forced to clone those workloads to scale-out (think of a web or application layer comprised of more VMs). This certainly adds complexity to the automation and the "goal mode" scenario since it's not just a matter of tuning priority shares for an existing VM but it is rather a process that would need to provision and de-provision workload instances on the infrastructure. It can be done but it's not as trivial as tuning a CPU power knob. The mainframe still rules in this space and it's always used as a benchmark for these sort of functionalities. And beating it is not trivial.
The limited documentation and demos available for the technology would lead to think that AppSpeed is able to respond to events automatically triggering resource reconfigurations (either shares reconfigurations or the ability to spawn new VMs) although I am not sure if that capability demonstrated was an ad-hoc scenario implemented for the demo or it's an out-of-the-box capability natively integrated with the VMware infrastructure underneath. Since, as I said, this is not a trivial thing to achieve, I would speculate that, initially, the product will only have monitoring capabilities based on which a system administrator could take corrective actions. We'll see as we know more though.
There are a couple of downsides however to this technology. The first one is that it's obviously a VMware oriented product so one should expect a real end-to-end meaningful measuring only if the end-to-end application architecture runs on VMware. To be honest VMware has countered this statement saying that you can also probe applications that run on physical boxes; this is the case for example of complex multi-platform and multi-tier applications where the front-end might run on a VMware infrastructure while the back-end might run on a UNIX box for example. This leads to the second concern which is this technology doesn't require any agent to be installed into the VM or the physical host running the application - which is a good thing - but it requires the AppSpeed server to sniff the network (virtual or physical) in promiscuous mode. This might be a security concern for some organizations.
All in all I would say AppSpeed is what any VMware system administrator was waiting for hence it gets my "virtual Oscar" (I know they don't give Oscars at the Palais de Festival.... but nonetheless it sounds nice).
Massimo.
P.S. I have just been informed that due to previous trademark registrations the name AppSpeed might change at the product general availability. Still up in the air, but watch out for the potential new name.
|
-
My good friend at Microsoft, Giorgio Malusardi, noticed my post "Enterprise Virtualization in a Box" which was essentially an example of how to create a BladeCenter-contained VMware-enabled data center in a box (including servers, storage and networking). Giorgio challenged me with the task to create something similar using the Hyper-V Server R2 Beta that has just been announced. And I accepted the challenge!
This tutorial is going to document the setup of the environment based on what I have seen and I have done. I will share my point of view of what's going on and the implication this will or might have in the x86 market in another piece.
Microsoft Virtualization Background
For those of you that are missing the Microsoft basics it would be beneficial to set the stage. Right now, Microsoft is shipping the first version of their hypervisor - Hyper-V - by means of two different channels. The first one is as a component (or role) of their Microsoft Windows Server 2008 products. You can enable or disable this role in either a normal (GUI-based) Windows Server 2008 install, or a core (GUI-less) Windows Server 2008 install. Obviously, in order to get Hyper-V, you need to buy a Windows Server 2008 SKU (Hyper-V is included in any 64-bit x86 version of the Standard, Enterprise and Datacenter SKUs). The license rights for guests and included features - such as Failover Clustering technology - are determined by which SKU is purchased.
The second channel is as a free download from the Microsoft web site in a package called Microsoft Hyper-V Server 2008. In a nutshell this is basically a scaled-down version of Windows Server 2008 with the following restrictions and peculiarities:
-
It is a core install only (i.e. GUI-less as the only option)
-
The only role that it supports - which is enabled by default - is Hyper-V (for example, you can't enable the Failover Clustering role)
-
It doesn't include any license for Windows guest OS'es
-
It does have a number of artificial limitations in terms of number of CPUs and amount of system memory supported.
That's what's available as of today. However, Microsoft recently announced the availability of the Beta version of Windows Server 2008 R2 and Hyper-V Server 2008 R2. Both these products will ship the second generation of the Hyper-V hypervisor and are currently scheduled to ship in about a year from now (roughly). With this Beta, Microsoft announced new features and new restrictions for the free package. The following table is a summary of the features in the current and future offerings:

* Cluster Shared Volumes is a technology currently in Beta and will ship along with the second generation of Hyper-V. It allows to use the NTFS file system as if it was a "cluster file system" (ala VMFS so to speak). See below in the document for more information on the CSV technology.
Those of you familiar with the Microsoft Virtualization technology will notice that the Windows Server 2008 R2 SKUs will have similar restrictions and limitations compared to the current releases. This statement obviously doesn't take into account new features introduced with the second generation of the hypervisor (such as Live Migration, for example). As you may have noticed, the biggest delta both in terms of new features and artificial limitations is between the currently shipping Hyper-V Server 2008 (first column from the left) and the future Hyper-V Server 2008 R2 (second column from the left). Among many differences, it's specifically worth to note that the new (free!) product will support:
-
8 sockets (vs. current artificially limited 4)
-
1TB of memory (vs. current artificially limited 32GB)
-
Quick and Live Migration (vs. nothing)
-
Failover Clustering (vs. nothing)
-
Cluster Shared Volumes (vs. nothing)
The Hyper-V Server R2 Based Self-Contained Data Center
Back on track. As I said, the challenge was to replicate the VMware-based setup we have done on the BladeCenter S. We have used the very same hardware setup we have used for the VMware test. While we wanted to test the Hyper-V Server R2 Beta it must be noticed that the currently shipping Hyper-V solution works as well on the BladeCenter S today. This is a (generic) picture of the BladeCenter S chassis:
For this proof of concept, I decided to look at the things from the following perspective:
-
I wanted to focus on the Hyper-V Server R2 free product (and not on the general purpose Windows Server 2008 R2 w/ Hyper-V role enabled)
-
I wanted to focus on new technologies that will be shipping in the R2 timeframe. This includes CSV, Failover Clustering and Live Migration
-
I wanted to focus on what you could do with the future Microsoft free offering. This includes the standard free tools to manage the environment and obviously doesn't include the fee-based products such as Virtual Machine Manager (the current version wouldn't support Hyper-V Server R2 anyway and there is not a "sister Beta version" of VMM to test with the Hyper-V R2 Beta bits).
All this being said we can "replay" what I have done.
Hyper-V Server R2 Nodes Setup
First, I started installing Windows Hyper-V Server R2 on the two local disks of the two blades in the chassis. This is a picture taken from the Management Module of the BladeCenter S during the setup (remote attended install):
I could have set up the basic OS on the shared storage as well as dedicating a small LUN to each of the two blades but I remember there was a registry tweak to apply in the Windows 2003 timeframe to allow a single shared SAS/FC to handle both the C:\ drive as well as the shared storage in a MSCS scenario. I didn't want to get into that level of complexity, especially as it was not one of the main goals I had with this Proof of Concept. Enough to say that I am sure you could get rid of the local disks if you really want to.
The setup doesn't really ask too many things. Actually nothing. At the next reboot you are asked to change the Administrator password and off you go. This is what you get on a Hyper-V Server R2 Beta local console:
Through the Hyper-V Configuration panel (blue window), I did the following:
-
Changed the default Host Name (into HVR2NODO1 and HVR2NODO2)
-
Restarted server to apply the computer name settings
-
Changed the IP to static addresses (192.168.88.131/132)
-
Enabled RDP support
-
Configured Remote Management to allow WinRM and relax Firewall settings
-
Enabled an extra firewall setting (through the command Netsh advfirewall firewall set rule group=“Remote Volume management” new enable=yes) for managing the disks through a remote MMC snap-in
-
Joined the domain (Windows 2008 R2 Domain created on a separate server on the network)
-
Added the domain Administrator to the local Administrators group (option 4 of the Hyper-V Configuration tool).
At this point - before enabling Failover Clustering support - I configured both blades to access two shared LUNs created with the IBM Storage Configuration Manager, which is the tool you can use to configure the BladeCenter S integrated storage. This picture shows that a Quorum LUN (10GB) and a CSV LUN (100GB) have been assigned to both blades in the chassis.
A restart of both blades allowed the domain change to take effect as well as the disks to be recognized by the two Hyper-V Server R2 instances (alternatively, a disk rescan would do this job).
Because of the fully redundant fabric architecture of the BladeCenter S, the two disks we have just configured (Quorum and CSV1) are seen twice by the hypervisor OS because of the dual path that each blade has to get to the disks (this is, by the way, the big plus of this chassis with the integrated storage). A multipath I/O software needs to be installed on the Hyper-V hosts to manage the disks properly. This is done by first enabling Hyper-V-based MPIO support which is not installed by default. The command "oclist" displays all features that have been enabled/disabled on the host as you can see from the picture below:
On one of the two hosts, I manually enabled base Microsoft MPIO support (via the command "start /w ocsetup MultipathIo"), but this is not enough. I had to install storage specific multipath software which interacts with the base Microsoft MPIO code. In IBM terms this is called IBM Subsystem Device Driver and can be downloaded off the external website. At the time of this writing, the package is located at this link and it's called the "SDDDSM Package for RSSM" (SDDDSM= Subsystem Device Driver Device Specific Module; RSSM=Raid SAS Switch Module). It's interesting to notice that the package in subject has a typical Windows setup, so I was wondering how it could be installed on a GUI-less system. Well, launching the setup.exe did the job, as you can see in the following pictures.
First impression was that this was not really a GUI-less system, but rather a standard Windows system where explorer.exe was disabled. Well, never mind....
After the reboot the system was up and running again, and the hypervisor correctly reported only two disks being assigned to the blade (the 68GB disk is the local hard drive whereas the 100GB and the 10GB are the two LUNs I created with the Storage Configuration Manager utility).
On the second blade we found out right away that installing the IBM SDD software automatically enabled Windows base MPIO support (if it doesn't just use the command above to enable it).
At this point we enabled the Failover Clustering feature on both hosts via option #8 of the Hyper-V Configuration window. This enables the Microsoft Cluster Server code on the two hosts. The picture below shows what happens on the console when you enable this feature. The Cluster itself will be configured later.
This is pretty much it for the Hyper-V hosts setup. This concludes the configuration of the base support that needs to be done on the Hyper-V Server Configuration console. From now on we can do pretty much everything from the Microsoft remote tools.
Hyper-V Server R2 Nodes Configuration from a Remote Workstation
We can switch focus to a Windows 2008 R2 Server that we previously installed and configured to be a Domain Controller for our test bed. Remote administration of the Hyper-V hosts could either be accomplished from this host (after enabling some remote administrative tools that are disabled by default) or from a Vista / Windows7 workstation using the latest RSAT tools available from the Microsoft web site. These tools include advanced Remote Administration MMC Snap-Ins that don't ship with the base client OS and allow to do enhanced tasks such as Live Migration. The latest release of these tools (in beta) can be downloaded here.
If you use the workstation it must be in the same domain you joined the HyperVR2 hosts to. If it is not in the same domain, extra configuration steps on the Hyper-V servers are required to relax cross-domain security restrictions. Since one of the purposes of this test was to demonstrate how you can remotely manage advanced hypervisor features using free tools, we have created an MMC configuration (that we called "MasterMMC") which includes the following Snap-Ins:
-
Remote Disk Management
-
Failover Cluster Manager
-
Hyper-V Manager.
I have used the Remote Disk Management tool to configure partitions and file systems on the two shared disks on both blades: I have assigned the Quorum LUN the Q: letter and the CSV1 LUN the X: letter on both nodes to prepare for cluster enablement. Initially I had a hard time getting to the Hyper-V nodes via this applet. I eventually managed to get to a stable state where I could manage the disks, but I have had many connection issues ("RPC Server unavailable") that I couldn't nail down to a particular problem. Firewall issues as well as bugs in the code (which didn't refresh the pane properly for which I had to close and re-open the MasterMMC) might be potential causes.
The Hyper-V Manager Snap-In was more straightforward. The only thing I have done here is assigning the second Gigabit adapter on the blade to a VirtualSwitch (called VMs in the screenshot below) that I have defined on both Hyper-V nodes. The first NIC (which I have configured with a static IP address at the beginning of the setup) remains assigned/dedicated to the parent partition.
This is the "network" (aka VirtualSwitch) to which you will connect the guests to get physical network access.
Notice that the BladeCenter S supports blades with up to 4 NICs configured. For this test only two NICs have been configured on each blade. Remember, Hyper-V currently does not allow NIC teaming at the hypervisor level (i.e. assigning more NICs to the same VirtualSwitch). Microsoft advises to use third-party NIC software to create bonds of network adapters and assign the resulting "bonded NIC" to the VirtualSwitch. It's not clear whether Hyper-V R2 is going to change this when they ship the gold code.
The next step is to configure the cluster across the nodes. This is not really Hyper-V Server specific, as the procedure is pretty similar to what you would do on a Windows 2008 Enterprise Server. It involves validating the hardware setup first with the built-in utility and then configuring the cluster properties (clustername, IP address etc).
Next, I enabled Cluster Shared Volumes. Those of you that are familiar with Hyper-V and Failover Clustering know that in order to manage a Guest as a single entity (i.e. independently "Quick Migrating" a VM from one host to another) the VM needs to be created on a dedicated shared LUN. This is, by the way, the configuration Microsoft usually advises. This has a number of implications in that you could easily run out of drive letters in the cluster (this can, however, be by-passed using specific mounting techniques), but more importantly it introduces a management overhead: you need to create a LUN for each VM you need to deploy, rather than leveraging a BIG shared LUN cluster-wise (like VMware VMFS allows you to do). That is what CSVs are all about: they provide a "cluster file system"-like environment where you can run a number of different guests on different hosts pointing to the same shared LUN. In fact, it's not by chance that I have assigned the blades a 10GB disk to be used as a dedicated Quorum, as well as a unique 100GB CSV1 LUN to be used concurrently as a shared repository to host multiple VMs. This is obviously a new and big benefit since the current Microsoft Cluster Server architecture is such that if a node owns and can access a LUN the other host in the cluster is inhibited from accessing it (at least until the group containing the LUN fails and the cluster changes its ownership).
The picture below shows the disclaimer about CSV: they can only be used to host virtual machines in a Hyper-V R2 environment! This means they can't be used in a general purpose Windows Server 2008 Microsoft Failover Clustering scenario.
The cluster configuration wizard asks me which volumes I want to enable: CSV1 is the only remaining partition I have (the Q: drive has already been used for the Quorum):
Once the CSV has been enabled, on each cluster node a new directory structure appears. The default is "C:\ClusterStorage\Volume1"
This is a "virtual pointer" that refers to the CSV1 LUN and it shows up on each of the two blades and describes a sort of common/shared name space that both blades can access at the same time. This concept applies to virtual machines only and the usage of CSV cannot be extended to a general purpose cluster file system at the moment.
Now that we have a cluster set up and a CSV volume available, we are going to create a virtual machine. We point to the Hyper-V Manager Snap-In in the MasterMMC window and we configure the VM to be hosted on the CSVs explicitly choosing the common local name space that identifies the CSV on the Storage Area Network:
At first it seems to be odd to create a to-be-clustered virtual machine on a "C:" drive, but that's the way it works. Obviously the VM files won't be created on the local drive on the blade because, as I said, that path represents a location that is actually on the SAN. This is how our MasterMMC looks like in the end once we have done all this:
So far we have only created the VM. It's not yet clustered, as there is no integration between the Hyper-V hosts and the Failover Clustering applet, using the free management tools. Microsoft Virtual Machine Manager is supposed to provide this integrated view and operations, but as I said at the beginning, the currently shipping VMM version doesn't manage Hyper-V R2 Beta hosts yet. Besides, it would be beyond the scope of this document anyway. So in order to clusterize the VM we have to explicitly and manually declare this VM as a clustered resource. The steps are similar to how you would configure any cluster resource; just make sure you select "Virtual Machine" as a resource type and then you are presented with a list of VMs that are running on the cluster hosts (i.e. both Hyper-V R2 Beta servers). Notice that the virtual machine needs to be powered off to be clusterized (otherwise the wizard will fail).
Once we have configured the resource we can bring the virtual machine on-line:
The resource (virtual machine) is now online and it's running on the second Hyper-V R2 node (HVR2NODO2) as you can see from the picture below:
At this point you can invoke from the Failover Clustering interface a "Live Migration" of the resource as you can see below:
And the virtual machine will start the live migration onto the other host:
During my test I have been able to successfully move the virtual machine from one node to the other with basically no downtime except for a ping or two:
Consider the networking configuration might not be optimal, and we will have to see what Microsoft will suggest in terms of network subsystem setup in the context of Live Migrating a virtual machine. Having said this, loosing one or two pings is usually something most web and client/server applications would be able to handle, and it's not too much different from the experience you would have using alternative live migration technologies from other vendors such as VMware, Citrix, VirtualIron etc.
The last test of this proof of concept is to create another virtual machine and demonstrate that they could run simultaneously on the two Hyper-V R2 Beta hosts while insisting on a common shared LUN through the CSV technology. These are the screenshots of the two virtual machines running on different hosts but insisting on the same repository which is the CSV1 volume mapped on both hosts as "C:\ClusterStorage\Volume1":
There is one pretty interesting thing in these, if you noticed. Despite the fact that both nodes can access the CSV at the very same time (otherwise they couldn't simultaneously run two virtual machines hosted on the same volume), the actual LUN is, at any point in time, "officially owned" by one of the two nodes (in this case the owner of the LUN is always HVR2NODO2). I must admit I have to dig more into the CSVs but they seem to be arbitrated and controlled by one node at a time. My assumption is the cluster node that is NOT the owner of the LUN would not use the owner of the LUN as a proxy to get there because this would hurt substantially the disk access performance (i.e. one node has direct access while the other node has a pass-through access through the owner of the LUN - not a viable scenario). Somehow the other node (i.e. HVR2NODO1) has direct access to the LUN performance-wise but it also must coordinate access rights with the official owner of the LUN itself (that is HVR2NODO2).
In a scenario like this it would be interesting to understand what happens when the node that is the owner of the CSV crashes.
To recap, this is the summary of my current setup:
Running on CSV Owner
VirtualMachine ( ) HVR2NODO2 HVR2NODO2
VirtualMachine (2) HVR2NODO1 HVR2NODO2
In a cluster file system environment, if HVR2NODO2 fails, VirtualMachine(2) would continue to run on the other node (HVR2NODO1) without any interruption and VirtualMachine( ) would go off-line to restart on the same surviving node (HVR2NODO1).
So I turned off blade #2 in the chassis (which is HVR2NODO2) via the remote BladeCenter S Management Module (MM):
VirtualMachine(2) didn't experience any issue both from either a ping perspective or a Failover Cluster Manager notification. This would lead me to think that CSV ownership would change transparently without any service interruption. This was somewhat expected and the only point of concern was the ownership of the CSV (which apparently can be managed in a smart way). However, the other virtual machine experienced downtime. This was expected as well, since VirtualMachine( ) was running on HVR2NODO2 which was turned off "in the hard way" so the failover algorithms had to kick in to bring it back on-line on the surviving node (HVR2NODO1) with a standard boot-up procedure.
Notice that the ping window first loses the link, then it starts to get a host destination unreachable message from the local IP address (192.168.88.133 is the host from which I am pinging). Eventually it starts to ping the guest again once it's brought back on-line.
Preliminary Conclusions and Impressions
As I said at the beginning, I will write another piece on what I think the implications of these technologies will be in the market. From what I have seen so far, the Hyper-V R2 platform seems to be pretty stable (once I got passed some weird issues with the Remote Disk Management stuff). Let's not forget that we will not see these technologies before year end 2009 or the beginning of 2010. This is the common speculation in the industry, anyway. While this will allow plenty of time for Microsoft to fix these problems, the fact that these are still one year away will give VMware some time to think about their main competitor.... although I am sure all this is already on their radar in Palo Alto.
There are a number of aspects in the Microsoft technologies that I think are a long way from catching up with what VMware is doing. VMware had the advantage of starting to develop a true virtualization platform from a blank sheet. Microsoft, on the other hand, has a legacy of technologies, so virtualization for Microsoft seems more hammered-in than anything else. An example is the fact that when you create a Virtual Machine from the Hyper-V Manager, the default location is "C:\ProgramData\Microsoft\Windows\Hyper-V", which is not what I would define as a proper default location for hosting enterprise workloads (in fact, it looks more like a Microsoft Office document default location). This might sound simple, but it tells you a lot about the heritage Microsoft wants and needs to protect.
That's pretty much it for the negative part. As far as the positive aspects are concerned, everything you have seen here (except the BladeCenter S and the Windows guests!) is all software that is free of charge. And this is not a trivial aspect or something to overlook.
Massimo.
|
-
VMworld 2009 Europe is coming (last week of February). I was planning to go and I have just found out that they have also accepted one of the two topics I submitted for the break-out sessions. The title of the session that got selected is:
Virtual Infrastructures: Scale Up or Scale Out? Rack or Blade form factors?
This is the abstract as I entered it originally (I assume it will remain the same):
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
As virtualization is becoming mainstream many organizations are undergoing design efforts to properly deploy their new virtual infrastructures. These organizations usually want to do this within known best practices boundaries.
Two of the most common concerns in the design criteria surround the hardware footprint. Specifically two of the most frequently asked questions are:
1) Should I use many small servers or fewer bigger servers? 2) Should I use rack optimized severs or a blade form factor?
This session will briefly discuss the history of virtualization deployments in the context of the underlying hardware infrastructure and how it is morphing. Pros and cons of the Scale Up and Scale Out models will be discussed with real life examples and general recommendations for deploying many small boxes or few bigger high-end nodes. The session will also outline major differences and design considerations for deploying different form factors including rack servers, blade servers as well as non conventional x86 server footprints.
The objective for this session is to demonstrate that one solution doesn’t fit all needs and that each organization needs to assess its own requirements and pain points to determine the best hardware layout among the many. This session is supposed to empower these organizations with a list of design considerations in order to elaborate the server infrastructure layout that best meets their needs.
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
For those of you that are not patient I will give you the answer right away: It depends! (or IT depends?)
This is not clearly an AD for my session: I am not paid by the number of people that will seat down! By the way as far as the salary is concerned, most of you know that I work for a hardware vendor (IBM). Despite that I am trying (well no... I guarantee!) to keep that session (fairly) technical and not a sales/marketing advertisement. The good thing about IBM is that we have hardware technologies in the x86 space that span pretty much all the spectrum so there is no (evident) conflict of interests in talking about one scenario Vs the other.
The ESX Scale Out Vs Scale Up dilemma is something that has always (professionally) fascinated me. In 2004 I was tired of hearing all religious wars on the VMTN community forums about the advantages of one model Vs the other so I decided to write a (hopefully balanced) Redpaper on the subject. The reviews were pretty favorable as you could see (no that was not my family voting - at least I don't think so) and most of the content and philosophies could be applied these days.

You can still download this Redpaper at this link: http://www.redbooks.ibm.com/abstracts/redp3953.html?Open
Another "project" I have been lately working on with respect to this dilemma of scaling Up Vs Out is a table I have on my site whose title is Virtual Infrastructure: Platforms of Choice
The idea behind that is that someone would look at the attributes and track down which hardware form factor can deliver what she/he is looking for. It's still very much work in progress as you may notice. One of the many challenges of filling a table like that (as well as of presenting a topic like this) is that the matter in subject is, at least, bi-dimensional. Scale Up Vs Scale Out is one dimension (i.e. big servers Vs small servers) and the "hardware form factor" is another dimension (i.e. racks Vs blades). There are rack optimized designs that scale out, other rack optimized designs that scale up, there are blades whose design is a natural fit for scaling out and there are also other blades whose design resemble a scale up solution (albeit with a number of limitations).
This discussion is not trivial. To add complexity to an already complex matter other non-conventional form factors are emerging in the market such as the IBM iDataPlex which would be hard to define a rack design (or even a blade design). At this point in time I am thinking about including some iDataPlex charts in the deck just to describe this new trend/architecture (as you can depict my deck is well before draft stage - how would you define a PowerPoint document with one blank page?).
All in all if you have comments or feedbacks on what you would like to see in a session like this feel free to send me an e-mail: massimo@it20.info.
Looking forward to Cannes and if you come by, please stop and say hi.
Massimo.
|
-
In this post I am going to talk about a specific piece of hardware technology that is intercepting a specific virtualization industry trend. This piece of technology is called BladeCenter S. Those of you that have been reading my blog know I don't usually talk about IBM specific stuff (I work for IBM) but this time I felt like the infringement of the law was worth it. Believe me or not I would have posted this anyway.
Before we get into the specific of the technology let me take a step back and briefly touch on the industry trend I was referring to. This is going to be basic stuff for most of the virtualization experts out there plus these concepts are not new and I have written/talked about those in the past. Having this said sometimes it's good to pause for a second and try to summarize what is happening in this industry. Up until the late nineties (almost) every data center looked something like this:
Very inflexible and vertical silos. Each silo was comprised of the following building blocks:
Do you have 100 application services? Deploy 100 of these independent silos! Have you ever heard virtualization (true or appointed) experts talking about how bad life was those days? Look at the picture... and you can imagine how life was. I can tell you: it was very bad (compared to what we have today obviously, at that time it was... OK).
At the beginning of the 21st century we have started to see the very first form of "visible" virtualization of an x86 IT infrastructure. I am using the world "visible" because someone might argue that the concept of virtualization was already included in the OS under the form of memory virtualization (physical memory Vs virtual memory etc ; I am not interested in these academic discussions and I am not interested in determining where virtualization first appeared in the x86 ecosystem (we can stay here for days without getting to any useful outcome). I want to focus more on tangible things that end-users/human beings (not IT geeks) understand and can appreciate. Having defined the context, the first form of "visible" virtualization of an x86 IT infrastructure was the storage and particularly the consolidation of all Direct Attached Storage into a single pool of storage resources called SAN (Storage Area Network). And since my mantra is that a picture is worth 1000 words, here it is how a common x86 IT infrastructure looked like at the beginning of this century:
Note: If you ask 100 storage specialists nowadays what storage virtualization is you might very well get 100 responses (perhaps more?) ranging from "Raid 0 is the basic form of storage virtualization" all the way to "a storage grid (whatever that is) is the only form of storage virtualization". I am using here the word virtualization in the context of storage to describe the high level practice of decoupling the disk subsystem from the servers and locate it into a common resource pool.
Back to the basic this is what customers have been doing for the last 10 years or so: getting rid of this locally attached / inefficient / inflexible disk subsystem and move (almost) all the disk spindles into a central repository that is the so called Storage Server (the physical data repository attached to the SAN). The very first advantage that this has brought to customers is a more efficient and flexible way to use the storage space; someone might refer to this as Storage Consolidation. On the other hand shared consolidated storage brought in (as a bonus I would say) a brand new architecture that allowed customers to do things that were not simply possible before. One example for all is High Availability clusters: in the good old days of DAS (and the inflexible silos described at the beginning) your application data would most likely be hold physically on the same server that was running the application. Should that server fail you couldn't access any longer your data (unless you restore them from a backup); with SAN shared storage this changed as you can now "attach on the fly" the same set of data to another server and restart the application from there while being consistent in terms of data persistency. Microsoft Cluster Server, anyone?
Well time goes by and right now storage virtualization is no longer the hot topic (I guess everyone recognizes it as more of a prerequisite to run an efficient IT). The buzz word today is server virtualization and, if you think about it, it's the natural progression of what we have seen happening in the past: it's about taking the silo apart and move additional stuff below the virtualization bar. We have done that with storage, who's next? Did I ever say a picture is worth 1000 worth?
This is where we are today basically. VMware pioneered this concept some 10 years ago and there is now a string of companies that have realized the benefits of this and are working hard to deliver products to implement this idea. I started working on server virtualization some 8 years ago and at that time it was all about server consolidation (i.e. how many servers do you have? 100? we can bring them down to 5 etc . The more I was working on it the more I understood that we were only scratching the surface of the potentials. Today server consolidation is still a huge advantage for those customers virtualizing but it's clearly only one of the many advantage line items. As it was for storage virtualization we started with the consolidation concept to find out that there were many other hidden and indirect advantages as a bonus of doing that. One example for all is that, as you virtualize your Windows or Linux systems, it becomes far easier to create a Disaster/Recovery plan for your x86 IT infrastructure.
Last but not least the server virtualization trend is intimately associated to the storage virtualization (i.e. SAN) trend for two key reasons:
-
the standard server virtualization best practices require shared storage to exploit all the benefits
-
server virtualization is allowing customers to get rid (completely) of local attached storage. While data has been historically moved to a shared repository (SAN) the standard "2 x Raid1 drives pair" remained a (negative) legacy of the x86 deployments. The latest trends (that are embedded hypervisors on flash disks and/or PXE boot techniques for the hypervisors) will help getting rid completely of all the local server spindles for good!
So why am I so excited about the BladeCenter S you might wonder? Well the BladeCenter S maps exactly the industry trend I have described above. Instead of going out for shopping and cabling together all these elements (servers, SANs, etc) BladeCenter S is a single package that contains them all: servers, storage and network! Enterprise Virtualization In-a-box! Or a data-center-in-a-box if you will!
What you see here is basically the physical view/package of the de-facto-standard hardware architecture to support virtual environments. The key point I am trying to outline here is that the disks you see integrated into the chassis are really connected to a true fully redundant internal SAN comprised of 2 x SAS redundant RAIDed switches. It essentially maps the standard servers to storage architecture blue-prints we have been using in the last few years to implement shared storage virtualized deployments. The following picture, for example, is an extract from the standard VMware SAN configuration guide and it illustrates this standard blue-print (which is mapped into the BladeCenter S internal architecture):
Notice that the only slight difference is that the SAS switches integrated into the BladeCenter S deliver both switch as well as SP functionalities.
It might perhaps help sharing with you some more documentation I have been working on and that we presented at the local VMware Virtualization Forum that took place in Milan a few days ago. The following picture describes the internal architecture of the BladeCenter S in further details:
Notice how the servers-storage connections are similar in concept to those in the standard VMware blueprint (but not limited to VMware deployments though) attached above. Each blade is equipped with a dual-port SAS HBA which in turn connects to 2 x SAS RAIDed switches which control the disks. For those of you familiar with the IBM storage products family this is very similar to what happens when you connect ESX servers to an external DS3200 SAS Storage Server configured with dual controllers. Since in the last few months I have been talking to customers and partners that were pretty confused about what this really is and how it compares to other implementations available in the industry I did want to outline what other blade vendors are doing to underline the differences:
While from a physical standpoint it might look pretty similar (i.e. "a chassis with a bunch of blades and a bunch of disks") if you dig into the internals it's of course completely different. The other option outlined in the picture above involves dedicating a single blade (hence a Single Point Of Failure) with Windows Server 2003 Storage Server and a bunch of disks attached to it. The Windows instance running on the Storage Blade controls the disks and exposes them onto the internal Ethernet network via NFS/iSCSI protocols. This is how other blades in the chassis can "share" those disks. There are, obviously, fundamental differences between having a multi-purpose Windows blade sharing disks over the network compared to using a standard and fully redundant SAN approach comprised of a dedicated couple of purpose designed SAS RAID switches that control the disks and map those disks to compute nodes (i.e. the blades dedicated to the virtual infrastructure). The following picture reminds the physical layout of the BladeCenter S with the integrated SAN.
On the left hand side you can see the front of the chassis where the disks (we had 4 of them in our demo on-site) and the blades (2 x HS21XM in our setup) are installed. On the right hand side the rear view of the BC S chassis shows the 2 x Ethernet switches (that can support up to 4 Ethernet connections from each of the blades) and 2 x SAS RAIDed switches (that control the disks on the front of the chassis and are connected to the blades by means of the SAS daughter cards).
Another interesting point I wanted to outline via this setup is that the BladeCenter S is really meant to be a self-contained data center. This doesn't only include the standard User Workloads (i.e. the guests that are going to support the customer own environment such as Active Directory, Databases, Web Servers, Application Servers etc) but it also includes all the additional services that are required to configure, monitor and maintain the data center (in a box). Examples of these System Services include the vCenter service (red rectangle in the figure above) which can be installed on top of the virtual infrastructure as well as what I refer to as the HW Management service which is the suite of software products that are used to manage the hardware and its configuration (the yellow rectangle in the figure above - it might include things like IBM Systems Director, IBM Storage Configuration Manager etc). The logical view shows these two services (vCenter and HW Management) as external entities that map respectively the ESX hosts comprising the virtual infrastructure and the Management Module (MM for short) that is the heart of the BladeCenter chassis. There is no reason though for which these services need to be installed physically outside of the BladeCenter "domain". A forward-looking take of these services is to consider them a sort of System Partitions that run side by side with the end-user workloads. These System Services, as of today, need to be installed manually but ideally in the future they could potentially be distributed as Virtual Appliances (yes Virtual Appliances is my obsession, sorry) for a more streamlined and fast deployment.
In the next few screenshot I'd like to give you a high-level feeling of what happens when you connect to the HW Management service to configure the hardware components (the shared storage in this case). For this setup I have only installed the IBM Storage Configuration Manager in that HW Management System Partition.
First you connect, via web, to the SCM service. One of the main screen summarizes the actual internal hardware storage configuration which is a RAID subsystem comprised of 2 x SAS switches:
Next is the physical view of the chassis. As you can see we have 4 x physical disks plugged into the front of the chassis and 2 x physical SAS switches in the back of the chassis (the two additional devices you notice in the front are the SAS controller caches). A maximum of 12 physical disks can be installed:
The following view details the characteristics of the physical hard disks:
Next we create a Storage Pool (aka Array) comprised of these 4 physical drives. This is a very basic configuration where we designate one of the disk as a global hot spare and three of the disks as a Raid 5 Storage Pool. Total available capacity is 2 disks (1 is used for parity in a RAID 5 array). Notice that the space available is basically 0 because I have already created LUNs out of this array (see next):
These are the two Logical Units (aka LUNs) that I have created using the Storage Pool described above. One is 90GB and the other one is 43GB in capacity:
The following view lists the discovered SAS daughter cards (hence the corresponding blades) on the SAS fabric. Notice that each blade has two ports for redundancy and each port has its own SAS WWN. This is not any different from a standard FC configuration for those of you used to Storage Area Networks:
This is how I have mapped Servers to LUNs. On the left hand side I have listed both blades whereas on the right hand side I have listed both LUNs I have created. Doing so I allowed both blades to share both LUNs. There is no particular reason for which I have created 2 LUNs. I could have created 1 or 3 or 4 if I wanted/needed to and I would have been able to share them with both blades:
So far we have been working against the HW Management to configure the hardware (this example is limited to configuring the shared storage). Now we can switch gear and we can connect to the other System Partition to manage the virtual infrastructure software. In this case we will connect to the vCenter service to configure our VMware infrastructure. Notice that, although I have been using a beta version of the next VMware virtual infrastructure product, everything you will see here can be done with the latest VI3 version available today.
The following screenshot outlines the overall configuration of our data-center-in-a-box. As you can see there are 2 blades equipped with ESX and they belong to a cluster. On these blades we have created the two management partitions we have been discussing (vCenter and HW Management). There are also some Guests templates I have created. One important thing to notice from this screenshot is that the first blade can access both shared SAS LUNs (for the records it can also access its own dedicated/local Storage1 VMFS volume):
The next picture confirms that both blades can access the shared LUNs created. This allows all VMware advanced features such as VMotion, DRS, HA etc:
Here we will attempt a VMotion of the HW Management partition running on esx1 onto the other host in the cluster:
The Guest is being moved from one host onto the other. Notice the status bar at the bottom:
And here the Guest has moved and it's now running on esx2 as you can see from the Summary pane (and the status bar at the bottom):
I truly believe that the BladeCenter S is a piece of technology that is sometimes under valuated. There is an enormous potential in it that many people haven't fully exploited. It's really what I would describe as a no-compromise Enterprise "pocket" data center. Not so much "pocket" after all because if you think that an HS21XM blade could support, on average, some 15/20 VMs (depending on the workload), we are talking about a 7U Enterprise solution that could support around 100 VMs. Far more than what an average SMB shop might require.
Massimo.
|
-
Early in 2007 I wrote a post whose title was "Will Microsoft Sunset VMware?". You can read it here. The closing of that post was:
> This analysis is as of April 2007. I am sure many things can and will change and I might be proven wrong. Let's see what happens.
I went through it this morning and I have to say that (so far) I have gotten it right. I could even republish it "as is" and it would still hold true even 18 months later (except Microsoft did change the name of their hypervisor!): Xen didn't really take over the world (and the KVM speculations I made are materializing now with RedHat and SUSE switching to KVM and abandoning Xen) and also all the thoughts about innovation, add-on value, cost and so forth do still make some sort of sense as of (end of) October 2008.
The reason I bring this topic to the foreground again on my blog is because more than ever I read on the blogsphere comments about how VMware is going to be eclipsed by Microsoft given the fact that the Redmond giant is engaging seriously. I am not ruling out this possibility as no one knows what will happen in the future (one could only speculate given past and present experiences) but I wanted to stress on the fact that these people don't get (in my opinion) what's really going on here. And what's going on ... is a very big thing.
Let me try to be concise (something that I have never really mastered). Overall at VMware I think they are working out their plan at two different levels which I refer to as the tactical level and the strategic level.
At the tactical level, VMware is engaged to provide the best hypervisor and the best management tools to create a virtual infrastructure. At this level, they position VMware ESX as the best hypervisor Vs Microsoft Hyper-V; VMware VI3 (along with all the other tools they have announced in the last year or so) as the best management tools Vs the Microsoft Systems Center suite (which includes Virtual Machine Manager) etc etc all this aimed at supporting legacy Linux and Windows type of workload in the best possible way.
After all if you think how you use today's virtual infrastructure - built on various software platforms such as VMware, Microsoft, Citrix or VirtualIron - is used, I think it's fair to say that your virtual machine can be defined as super flexible and powerful (virtual) hardware but the software stack you run within the VM (i.e. the black box) is hardly different than the software stack you would be running on a physical box. So given a legacy Linux or Windows stack comprised of many dozens, hundreds or even thousands of physical servers, what is the best target virtualization platform to make a giant P2V, so to speak? This is the tactical battle VMware is engaged in to stay ahead of Microsoft.
I agree that if you only look at things from this level, VMware could be in a dangerous position when it's all about "just" writing code to catch your competitor's feature set. We know MS is pretty good at that plus they have deep pockets they can throw at tons of developers to shrink the gap. Well, it's clearly not that easy and I am obviously exaggerating but you have got the idea: if it's just about "a tool" there is always a possibility that your competitors will catch you if they become serious about that. I think this is why many people think that VMware could become the next Netscape.
The strategic level at which VMware is engaged... actually I touched on this 18 months ago and that very same thought remains very much true, and it's materializing with the latest VMware messages. In that blog post (April 2007) I wrote:
>Changing the rules: perhaps one of the most important thing which is leading me to think that VMware will not be sunset is the fact that they (VMware) are thinking about "changing the rules" in the datacenter and >of IT in general rather than viewing virtualization as a means to reduce the number of servers from 20 to 1. While the use of virtualization has originally being considered for Server Consolidation projects clearly this >is now one of the many facets of the advantages that a virtualized Datacenter and a virtualized IT will gain (Disaster Recovery is certainly one example of these new scenarios). Another example of these new use cases >for virtualization are Virtual Desktops hosted in the Datacenter that are changing the way Administrators are thinking about their distributed IT. The next frontier would be Virtual Appliances which is a very different >way to develop and deploy applications compared to what we are doing today. In such a scenario the role of the Operating System would change drastically where some of the OS features would be drained into the >virtual infrastructure while some others will be distributed as part of the application in a consolidated virtual machine file (that is the virtual appliance). This is a fascinating scenario and as you can imagine it >involves more than just developing a hypervisor with a management interface to it: it involves creating a new culture on how we deal with IT, taking all the pieces apart and rebuild our datacenters in a much more >efficient way.
I wouldn't know how to say it better in October 2008. Perhaps the only thing I can do is add a couple of pictures that would graphically outline this concept:
The silo on the left outlines what I think to be the Microsoft systems virtualization strategy. Systems being here a key word: MS does have a more articulated virtualization strategy that goes beyond virtualizing a piece of server hardware (so do VMware and Citrix, for the record). However this discussion is really centered on systems virtualization and the corresponding stack. Back to the point... at Microsoft they can't afford to compromise a very successful (and healthy) business such as Windows OS, so Windows does need to remain very centric in their systems virtualization strategy. Windows is the mean by which they deliver their value and Windows will be their strategic play. It's not by chance that they pitch Hyper-V as a Windows 2008 value item, for example. It's not by chance that they pitch Microsoft Systems Center as a toolset to properly manage both virtual and physical Windows deployments. It's not by chance that all of their products are Windows-based (except perhaps Office for MAC and a few others which would be fair to describe as "not the bulk of their business" anyway). We can go on and on but at the end we will always be gravitating around one central and critical word: Windows.
The silo on the right, on the other hand, outlines what I think to be the ultimate VMware strategy. They basically want the virtualization layer to become the Datacenter OS. I speculated about this at VMworld 2007 and they announced this at VMworld 2008 (read this irreverent post if you have time). VMware would like to challenge the current notion of the OS: they would like to take apart the OS we know and redistribute part of its features into their new Virtual Datacenter OS concept and part of its features into this new Just Enough OS (JEOS) concept. JEOS wraps the application and only provide minimal assistance to it (to the point it only needs to provide boot capabilities and a proper minimal run-time environment).
As you can depict from the pictures it would be very difficult to map what Microsoft and VMware are trying to drive strategically and come up with an apple-to-apple comparison. This is the strategic challenge in which VMware is engaged. And the interesting thing is that they are not engaged against Microsoft, they are engaged against a whole industry that is used to look at the x86 stack in a "slightly" different (and much less aggressive) way than VMware is, in my opinion, envisioning. As a matter of fact we are still trying to get users digest "virtualization" to support standard legacy software stacks (and it's not always easy). I am sure you can imagine what it will take for the industry as a whole to digest this new software stack layout. This is in fact, not by chance, one of the strongest value propositions Microsoft is promoting: all the benefits of virtualization without disruption and discontinuity from the past.
The final analysis: this is where the real battleground is for the next few years to come. If the industry embraces the VMware message and strategy and starts to redefine the software boundaries in the data center, then VMware will have the lead. If the industry does not embrace the VMware messages and will settle on the advantages of running a legacy software stack in a slim software bubble (VM) as opposed to running the same software stack on top of a dedicated physical box... than MS can cause much trouble for the VMware business, and VMware will be forced to continue their tactical battle I talked about at the beginning.
My speculation is that virtual appliances will have a huge role in this. Virtual appliances, by definition, implement the ultimate VMware vision. The success (or lack of thereof) of the virtual appliances will determine VMware's future as a winner or as a looser in the data centers. VMware could well be the next Netscape but, what if it is the next Microsoft? Interesting dilemma. I don't know who is going to win and who is going to lose in the end, but I am certain Microsoft will not sunset VMware nor will VMware sunset Microsoft. The x86 market is healthy enough that, while the winners can really make tons of money, the losers will have their slice of the pie, too, for some time to come.
Massimo.
|
-
I have been working in IT for about 17 years now, 14 of which at IBM. Since the first day I was immediately exposed to the concept of a centralized IT where everything is fully controlled, fully secured, fully automated and easy to manage within the data center boundaries; on the other hand whatever sits outside of the server room should be dumb and wouldn't require any (major) maintenance tax onto the IT organization. For those that have been around for a while this exactly describes how a mainframe operates (more or less).
"Unfortunately" (you can speculate on the apexes if you want) I have built my career at IBM on something that sits exactly on the other side of the spectrum compared to the mainframe: that is the x86-based server business (was PC Servers, was Netfinity, was xSeries, is now System x / BladeCenter). That's why I have enjoyed, in the last few years, looking at the mainframes as the holy grail (or the polar star) where I'd like to push my "little" x86 servers.
So why is the distributed IT broken? Simply because I think businesses have sold their soul to the evil as they compromised things like control, security, automation and low costs of operations for the nirvana of flexibility and low acquisition costs that came with x86 servers (and PCs). And being this model a client-server model it has affected both the x86-based server portion of the data center as well as the (even more distributed) client environment. Client-Server here doesn't strictly pertain to the architecture of the applications but it rather pertains to the devices one will end up managing no matter what the application architecture is: the application of choice might be Web-based but at the end of the day most likely the IT organization will be running the web server on an x86 Windows or Linux box and the end-user browser will be accessed on a fully featured PC/Laptop running a Windows client OS. It's going to be a Client/Server world anyway no matter the application architecture.
In this brief post I just want to show a couple of proof points of this broken IT model. The first one is a screenshot of a "server" I found during a local customer visit. Ready? Fasten your seat-belt please:
Now, this is not a guess, this is for sure (I did ask) a Microsoft Software Update Services (SUS) "Server". While the first sticker (on the green bazel) says "Test..." the other one features a "NON SPEGNERE" that means "DO NOT POWER OFF" so those of you that are thinking this was a sort of quick and dirty trial on the desk... should be thinking twice about it. A couple of additional things you might want to notice are that this "server" was physically located on an office desk so it means that the x86-based portion of that data center basically left the actual physical data center rooms and has had ramifications outside of it (very scaring). The second thing to notice is that by no means this is a small SMB shop (I have seen production MAIL servers at those accounts that were even worse than this); no this is a big enterprise customer with many thousands of (actual) servers. Definitely if such big organizations are doing things like these, what's going on in "our" server rooms (and outside of them!) is pretty scaring to say the least.
So much for the server side of the things. How about the clients (desktops/laptops)? Do you remember those zero-maintenance 3270/5250 terminals we all used to access our AS/400 and mainframe programs? Well I took this other picture a few days ago and while it's not as scaring as the other above it tells a lot about where we have got with desktop/laptop management:

It literally says:
--------------------------------------------------------------------------------------------------------------------------
Distribution point for 1GB additional memory (RAM) to install Lotus Notes 8.0.1
The laptop needs to be Powered Off! Not Hibernated!!!
--------------------------------------------------------------------------------------------------------------------------
The scaring thing about this is that the organization going through this massive process has roughly 9.000 employees. If you compare this (little example) to the way a central processing unit with dumb terminals used to work you start getting the feeling about how much broken things are in the x86 (client-server) space.
Now I am 100% sure we won't go back to those days (nor I am suggesting that we try to do that) also because no one would want to give up with the GUI experience for a green character interface (how the h%&l can I watch YouTube on a 3270 terminal?) but yet clearly something needs to be done. The good news is that there are technologies that will allow IT organizations to do this and get to the point where they do not need to trade-off control, security and other important data center aspects to get the flexibility and experience end-users demand (and expect) in the 21st century.
Imagine... a world where your SUS "Server" will just be a service running in your server room (or someone else's server room out in the cloud) that doesn't require a "dedicated server" in your data center (and not even a dedicated desktop in the office - can you believe it?) and where your e-mail client update won't pre-req anyone to go to the office (and waste half a day) to get an additional 1GB of memory....
You may say I am a dreamer, but I am not the only one (where did I hear this?).
Massimo.
|
-
I think he did but, relax Paul, I am not going to sue you... ;-)
Joking aside I was sitting at the VMworld 2008 Keynote in Las Vegas back on Monday last week and I was somewhat surprised (perhaps even pleased) to see Paul touching on many innovative concepts I have been talking about last year at VMworld 2007 in my breakout session. Those of you that are entitled to download the official VMworld presentations can find it searching on the portal for session number S288511 (Virtual Appliances and the New Datacenter: Changing the Rules); those of you that do not have an account on the VMworld portal can get a similar (superset) version here: http://it20.info/files/3/documentation/entry54.aspx
There are many concepts in common between the two "visions" and if you have time to look at my pitch, those that have been at VMworld 2008 or that have been reading about the announcements made will recognize the affinities. One for all is this concept of the Datacenter OS that I discussed last year (VMware Virtual Datacenter OS anyone?). BTW this was somewhat funny as my session was due on Wednesday at VMworld and I was sitting in the VMworld 2007 Keynote when Diane Greene explicitly said "oh, and by the way, I want to make sure you all understand that we are not trying to create an OS here!". My reaction? For the first 5 seconds I was thinking how to try to cancel my session as it was clear for me that I would have given contradictory messages, but then I thought "Mh, I don't think so; I think what you are building Diane is exactly what an OS is and what an OS is supposed to do and I am going to say that!". Well, I guess after all it was a good decision.
Talking about affinities between my pitch and the new VMware vision how about these examples? This is the VMware vision presented at VMworld 2008 which calls out the "Virtual Datacenter OS" concept:
Do the following slides sound familiar in the context of the vision? Notice the logo on the right hand side... it says VMworld 2007


Well I have to admit I had to use the legacy Virtual Appliance term where they have used the term vApp to identify the black-box but hey!... I have been working out the vision, didn't have too much time to work out the marketing details :-)
On another topic, the vCloud component of the vision was (very) intriguing too. I can't really find a good architectural picture of what they have been talking about but in essence the overview is as follows (taken from the VMware web site a couple of minutes ago):
Mh, do you mean something like this Paul?
I have never used the term Cloud since, while it's been around for some time, it was not a concept widely used by standard IT human beings (and quite frankly I have been exposed to it only recently too).
In conclusion of this post, what should I say? Maybe something like this?
For some reason I called out in 2007 in red what it has become the most important and appealing concept of VMworld 2008 (ok ok... among many others).
The “Virtual Infrastructure” (be it VMware VI3, Xen, Windows Virtualization) to potentially become “the Datacenter OS”
I think I heard the concept many times last week.
You will notice, by the way, that this presentation was really product agnostic. It could theoretically be applied to any virtualization software stack out there (not only to VMware - as I have specified in the original closing statement). The problem is that to get there you have to believe in it and you have to be able to commit to the plan. And VMware apparently seems to be well positioned for both. What I am trying to say here is that this is an open race but it is the only way forward (in my opinion) no matter who implements it. It doesn't require a visionar to understand that the current model is pretty broken. Even a geek like me understands that ... and by the way this is the reason for which I am not suing VMware, how can I sue them for coming out with the obvious next step?
Massimo.
P.S. Paul, if you ever read this article, understand I am giving you/VMware a hard time just for kidding. Seriously I really believe you rock and I look forward to seeing who is going to win this race.... don't sit on your laurels though (oh and if you read this please let me know as I want to call out in my CV that "Paul Maritz reads my blog" - thanks).
|
-
So finally it happened. Hypervisors are (essentially) free. I remember the very first engagement I had with VMware technologies some 8 years ago; that was the ESX 1.1 (beta) time frame: we did a Proof of Concept and closed the deal with a very satisfied customer... While they were very happy about the achievements they have always taken the opportunity to remind me how expensive VMware (i.e. ESX 1.1) was. Well, time goes by I guess and what used to be a large chunk of the project expenditure it is now a piece of (business) commodity. "Business commodity" meaning that hypervisor vendors are no longer going to make money out of it, which is different than being a "technology commodity". Well I guess I will save the "control point concept" for another post: it's a long discussion - interesting though.
Back on track.
I am currently doing some research on virtualization vendors positioning in the x86 space and on July 24th Mike DiPetrillo posted a very interesting thought about the implication of making ESXi (yeah Mike, it's ESXi, no longer ESX 3.5i) free of charge. I suggest you read it carefully, along with all the interesting comments, on line here. It's a long thread but if you are among the 99.99% of x86 customers in the SMB space wondering "should I use VMware or Microsoft technologies to virtualize my datacenter?" I strongly suggest that you go through it. Mike did a great job (well other than getting the official brand name of his flagship product wrong... ;-) sorry Mike, I had to say that) in setting the stage.
I am not going to repeat what's in the post (as I have assumed you read it at this point) but this is what he came out with (in blue pen) in order to virtualize some 30 Windows servers on as few as 3 physical hosts.
Need: Microsoft -- VMware
Basic Consolidation: $3,000 -- FREE
Centralized Management: $3,500 -- $2,995
Basic Advanced Features (Backup and Patching): $7,260 -- $2,995
There have been a number of comments regarding the fact that Mike used additional fee-based products (for <patching> for example) whereas many customers would be fine with free tools such as WSUS (Windows Software Update Services). Many might also argue that you can't buy VMware products without software subscription and support (which I think Mike didn't take into account) whereas you can buy MS products without those. I am not interested in this micro-level details since, at the end of the day, getting to an apple-to-apple comparison is going to be impossible given the fact that both software vendors have offerings that can hardly intersect with each other: it would be like to try to find the face of a sphere... the sphere has many faces... actually an unlimited number.
I am probably one of the most agnostic persons around when it comes to the virtualization software to be used as I don't have any vested interested in any of the parts involved. As long as customers and end-users can get "the most for the cheapest" I am the happiest person on this earth that's why I welcome so much VMware and Microsoft engaging at this level to provide more value at a more reasonable cost.
Having this said there is something in this analysis that I want to challenge (and not necessarily to put either one vendor or the other in a bad light - I want to do that based on my experience and for the sake of customers). Specifically I want to challenge the assumption that High-Availability should be taken out of the picture. I have always advocated that one of the primary reasons for which customers virtualize is for easy HA and DR. I have written about this feeling many times; here and here for example. So I wanted to re-run Mike's number taking into account Windows Server 2008 Enterprise Edition (which includes MS Cluster Server) and VMware VI3 Standard Edition (which includes VMware HA). I am not a master in pricing but I understand from Mike's post that Win 2008 EE is $4,000 so that's the number I am going to re-work with. On the VMware front things start getting a bit cumbersome. The best option (I think - other suggestions?) would be to use the "Standard" Accelerator Kit which is the counterpart of the "Foundation" Mike used. The "Standard" Accelerator Kit comes at $5,995 but it includes VMware Virtual Center Foundation and "only" 4-sockets VI3 Standard licenses (the "Foundation" Accelerator Kit has 6-sockets licenses). So I have to add another $2,995 for the additional 2-sockets VI3 Standard license a-la-carte (yes apparently, by chance, the 2-sockets "VI3 Standard" license comes at the same price of the "VI3 Foundation" Accelerator Kit).
So the new table, which includes High Availability functionalities, would look like this:
Need: Microsoft -- VMware
Basic Consolidation: $12,000 -- FREE -> Win 2008EE x 3 -- ESX 3i
Centralized Management (includes high availability): -> Win 2008EE x 3 + Systems Center VMM Workgroup -- VI3 Std Accelerator Kit + VI3 Std 2-sockets $12,500 -- $8,990
Basic Advanced Features (includes high availability) -> Win 2008EE x 3 + Systems Center Suite Enterprise -- VI3 Std Accelerator Kit + VI3 Std 2-sockets (Backup and Patching): $16,260 -- $8,990
So even if you add the mandatory subscription and support costs to the VMware column they continue to lead (substantially?) in terms of price / features. I want to underline again that someone might argue that some of the fee-based MS features are not strictly needed as the same result can be achieved paying less compared to what's in the table above. I'll leave it to you to work out the micro-details and perhaps you might find out that the MS stack might make more sense to you and your specific situation.
I have also to say however that many people have mentioned that Mike didn't take into account the soon to be released stand-alone Hyper-V version for $28. While it's true that this version can change the dynamics of the first table above, it is my understanding that this specific version will not support Enterprise features such as MS Cluster Server so it cannot be used to alter the pricing dynamics of the second table.
However, what I struggle to fully get is related to the last comment Mike did on the post:
>NOTE: Windows licenses were not calculated into the costs since we assumed that the average SMB customer will continue to use and run their existing Windows Server 2003 installations which they already own the licenses for. >Based on lots of conversations with analysts, press, bloggers, and customers this is a safe guess for the next 1 - 2 years as Windows Server 2008 gets adopted. If you were to calculate in licenses costs then the best license to use >would be Windows Server 2008 Datacenter which allows for unlimited VMs no matter which solution you use. You should then subtract $12,000 $3,000 from the Microsoft column in each example and add $6,000 to BOTH >columns (Microsoft and VMware) to get the cost for early adopters of Windows Server 2008.
And then again in the comments section:
>Good point about OEM licenses. I tried to keep most of the complexities of Microsoft licensing out of this post, but yes, if you have an OEM license then you cannot reassign it to another server unless you bought Software >Assurance for that license within 90-days of original purchase. I will make a note of this in the blog. I will also put on the to-do list to provide some insight into the complexities of Microsoft licensing.
>The OEM licenses could impact the cost model in 2 ways:
>1) If you decided you were going to stick with Windows 2003 for some time to come then you would purchase new Windows 2003 licenses for your VMs. You would still end up purchasing Windows 2008 licenses as well for the >Hyper-V hosts and not for the VMware hosts. The only exception would be if you purchased Software Assurance with the Windows 2008 licenses in which case you get downgrade rights and could run Windows 2003 in your >VMs. All of this gets complex since Software Assurance is only available through certain Microsoft license agreements which are not always present with SMB customers (our target example in this post).
>2) If you decide to go ahead and upgrade to Windows 2008 then the note at the end of the blog post still holds true on the impact to both columns.
>Thanks again for pointing this out. I for one really do hate OEM licenses since they're so restrictive. Cheaper - yes, but restrictive.
Mike is right to assume that most customers would continue to use Windows Server 2003 for some time to come. He is also right that, assuming a complete re-licensing needs to be done, Windows Server Datacenter is the best option given its "unlimited VMs policy". Last but not least he is also right in mentioning that usually these "upgrades" paths for the OEM versions are not available for the SMB customer set.
I think he is missing some specific scenarios and he has some wrong assumptions though . When you license a host with Windows 2008 Datacenter not only you have unlimited virtual machine licenses, but you also have license entitlement for the underline virtualization technology (i.e. Hyper-V / Parent Partition). This means that, if for some reasons (see below) the customer needs to re-license all the 30 Windows instances he is virtualizing, he is going to license everything with Windows 2008 Datacenter which includes both virtual machines and Hyper-V entitlements for the host. Notice that the customer could then install Windows 2003 guests as you have downgrade rights. This is an intriguing scenario because basically you are buying one (Datacenter) license from MS that entitles you to use both the virtualization layer as well as the guests. For the VMware scenario in parallel the Datacenter license is still the one that makes more sense but the problem is that VMware wants his piece of the pie now (in addition to the piece customers have to buy from MS). This makes essentially the MS hypervisor solution REALLY FREE in the comparison of the two technologies.
So assuming to take into account the Guests licensing, in specific situations where you have to re-license them, we are looking at:
Need: Microsoft -- VMware
Basic Consolidation (includes Guests licensing through Windows 2008 Datacenter - 6 sockets x $3,000 each): $18,000 -- 18,000
Centralized Management (includes high availability): $18,500 -- $26,990
Basic Advanced Features (includes high availability) (Backup and Patching): $22,260 -- $26,990
You can also normalize this chart removing the $18,000 price for the Windows Datacenter licenses and the table would end up in listing only the "virtual infrastructure" costs:
Need: Microsoft -- VMware
Basic Consolidation (assumes but does not include Guests licensing through Windows 2008 Datacenter - 6 sockets x $3,000 each): FREE -- FREE
Centralized Management (includes high availability): $500 -- $8,990
Basic Advanced Features (includes high availability) (Backup and Patching): $4,260 -- $8,990
Again, based on my (limited) licensing know-how, these two new tables hold true for:
-
Customers that cannot re-use OEM licenses
-
Customers deploying a new virtual infrastructure, only have older Windows versions (i.e. Windows 200) and want to use newer versions in the guests (i.e. Windows 2003 and Windows 2008)
-
Customers deploying a new infrastructure from scratch
So basically the two statements above made by Mike do not always hold true:
> You would still end up purchasing Windows 2008 licenses as well for the Hyper-V hosts and not for the VMware hosts
> If you decide to go ahead and upgrade to Windows 2008 then the note at the end of the blog post still holds true on the impact to both columns
Assuming my understanding is correct (but I might be missing something) under these assumptions the MS solution stack comes in at a much cheaper price (especially if you account for mandatory VMware subscriptions and support fees). So the first question that needs to be answered is: does this analysis make any sense?
If it does, then the second real question that needs to be answered is... how many customers fall in the 3 scenarios described above that favor the MS licensing model?
How many faces does a sphere have?
Massimo.
|
-
Virtualization is a disruptive technology and we all know that. With this post I want to share with you some scenarios about how server (and storage) virtualization can drastically change the landscape for "small IT shops" (aka SMB's) in the context of High-Availability and Disaster/Recovery. Up until today server "high availability" was not for everyone as it required a complexity and a cost that many IT shops could not sustain. I have already been talking about the change of paradigm that a virtual infrastructure brings in when it comes to make a service highly available. You can read this post for more info.
However that covers a small portion of the bigger picture. Particularly that post assumes that you have a number of physical servers connecting to a central storage repository so that you can restart your VM (i.e. your service) should one physical server fail. Fair enough but this obviously doesn't cover the other important subsystem which is the storage subsystem. In a scenario like this the central storage repository is a so called SPOF (or Single Point Of Failure). If you are a big Bank / Telco organization (or if you have enough money to spend anyway) you can get something decent using Midrange / Enterprise Storage arrays that support all sort of replication features so that you can literally create a fully redundant infrastructure with no SPOF. It must be noticed that while it is true that even Entry level Storage arrays might have no internal SPOF it is common referring to these boxes as a single entity (hence a potential single source of issues). This is even more evident when you think about being able to stretch your virtual infrastructure across two buildings in the same Metropolitan Area Network (such as 3 physical servers in Building A and 3 physical servers in Building B comprising a single cluster): this obviously leaves you with the only choice of putting your single Storage array in either Building A or Building B.
I don't want to get into all sort of discussions regarding how you would define a scenario like this. Someone refers to this as "DR", someone else refers to this as "Campus HA", someone else refers to this as "Continuous Availability". I am actually not interested in formal definitions. I am more interested in the fact that the vast majority of the customers I talk to (be them big Banks, big Telco's or the small SMB IT departments) would like to leverage virtualization technologies to be able to achieve this scenario in a much less complex way (typically a requirement of the big boys that have a very complex distributed IT infrastructure) or in a much less expensive way (typically a requirement of the not-so-big boys that have a tight budget). It is amazing to hear that one of the most appealing reasons for which all customers are virtualizing ... is not for consolidating the servers (sure this is important) but to provide better high availability and DR mechanisms. Virtualization is really a paradigm shift.
I have already said that for the bigs there are (expensive) technologies and products that allow you to achieve that. This is an example of a project I have been working on a few years ago. But where does this leave the "small" shops? I am talking about customers that have from 10 to 15 Windows / Linux servers and that do not have a budget that allows them to intercept Midrange Storage technologies with replication capabilities. These arrays are not enormously expensive (in fact I am assuming that who has more than 15 or 20 servers perhaps has a decent IT budget that allows them to buy these technologies). However IT departments that have up to 10 or 15 virtual instances which, by the way, could be deployed on as few as two 2-socket systems (for redundancy) based on these other assumptions I discussed in a previous post.... might not be keen on buying a Midrange Storage array just for the purpose of being able to replicate and protect the data. Don't get me wrong, the "geeks" working for IT would love to have that kit in their hands, it's the "buyer" within the organization that would question the value for the money.
That's why when I was first introduced to technologies such as those from Lefthand Networks and then Datacore I was intrigued. What these companies do in essence is storage virtualization. They do it differently and the product packaging itself is not identical but essentially, bottom line, they hide the layout of the storage arrays and the way it is presented to the hosts and their OS'es. As I said they have a different approach in how they achieve this.
Lefthand Networks sells SAN "building blocks" that are effectively x86 servers with their own software on-board. This software simply turns those network-connected x86 servers equipped with a certain amount of Direct Access Storage into a highly available and distributed SAN that the computational hosts and their OS'es can access via an iSCSI protocol.
Datacore uses a slightly different philosophy as it is sold as a software package that you would install on a Windows host (typically more Windows hosts for redundancy and high availability reasons). Storage administrators would then give visibility of heterogeneous and distributed storage resources to these hosts, typically via FC but not limited to that, so that the Datacore software could present, using various protocols such as iSCSI and FC, to the hosts and their OS'es, a virtual storage repository.
These two pictures should clarify this brief explanation of the two technologies. 
Obviously this is not intended to be an exhaustive explanation of these vendors' technologies, features and strategies. Make sure you contact them directly should you be interested in knowing more about this.
In a typical scenario as the one I have outlined in the pictures above, separated physical "storage islands" would be aggregated into a single virtual SAN by the Lefthand/Datacore software that in turns provide a robust storage repository to (yet) other physical hosts running your production applications. That is how these products are usually positioned. How all this ties into the HA and DR for the masses I was referring to? Well it turned out that it is possible (and this is how both companies are now marketing their products as well) to install the software logic within virtual machines. Imagine a new innovative deployment scenario where you would create two special purpose virtual machines on two separate virtualized hosts and each virtual machine is associated with a vast amount of locally attached storage. At that point the Lefthand/Datacore software would turn those two VM's (with a bulk of local storage space associated to them) into your virtual SAN that is going to serve, through the iSCSI protocol, the other virtual machines (supporting your own production applications) running on the same two virtualized hosts. Confused? I think a picture is worth 1000 words.
On top of all the functionalities that these software provide the most interesting, in the context of this post, is the ability to mirror the local content of the Storage VM's in order to create an active/active fully redundant iSCSI SAN. As you can depict from the picture above the logical layout is quite different from the physical layout. Let me try to explain: logically the two Storage VM's create the virtual iSCSI SAN. The virtualization layer then maps the LUN's made available by the clustered Storage VM's and use these LUN's to create VMFS volumes to host the production virtual machines to support the business. The logical perspective differs from the physical perspective in the sense that, while logically the virtualization layers connect to an "external" iSCSI SAN and use it as an external service... the actual code that instantiates the virtual iSCSI SAN runs as redundant stand-alone virtual machines on top of the same virtualization layers.
One of the key advantages of such a setup is being able to tolerate the failure of one of the two systems and continue to be able to operate. The following pictures illustrates what happens should either one of the server or an entire building go off-line for some reason.
Since the Storage VM's replicate the local storage associated to them, either one of these entities can fail (be it the Storage VM itself, the virtualization layer, the physical server or the entire building) without affecting the availability of the VMFS volumes created on top of the mirror. This is transparent to the surviving virtualization layer as it could be compared to a failure of a Storage Controller within a redundant FC Storage Server. It is worth noticing that, in case of failure of the physical hosts or failure of the building, VM's can be either manually or automatically restarted on the surviving node depending on the virtualization layer being used and the feature set associated to it.
I am not getting into the details but consider that both these software support Synchronous and Asynchronous replication of the data so that you can even tune your solution based on the distance of the two buildings. In the simplest scenario both buildings are in the same Metropolitan Area Network so you would treat the two servers as if they were installed in the same rack of a single building. On the other hand if your buildings are far apart (or otherwise not LAN-like connected) you can tune it to use Asynchronous replication and build something that is closer to a Disaster/Recovery plan (well I am oversimplifying here but you get the point).
It is also worth noticing that since the Datacore software is an add-on that you (the customer or the integrator) would install on top of Windows, you can use any virtualization layer that allows you to create Microsoft Windows virtual machines making it very flexible in terms of deployment options. Lefthand on the other hand provides the Storage VM as a virtual appliance (thus making it more robust and easy to deploy in my opinion than a Windows add-on like Datacore) that is however, as of today at least, only available for VMware VI3 virtualized platforms.
This is clearly not something that you might want to look at in the context of a medim / big virtual infrastructure deployment where Midrange / Enterprise Storage arrays with their native virtualization and replication techniques offer a great deal in terms of performance, scalability and reliability. I don't want to downplay Lefthand and Datacore but I think there is a positioning that needs to be taken into account when comparing these products with Midrange / Enterprise class Storage arrays features. But in the context of the small IT shops, using the technologies described in this post you can achieve similar features at a small fraction of the costs and it might make sense doing so.
Let's try to do the math. A couple of System x 3650 configured with 2 x Intel Quad-Core processors, 16GB of RAM, "a bunch" of local disks and network adapters might cost around 10.000$ each (list price). This makes a 20.000$ total (list price) for the hardware.
I am not an expert on Lefthand and Datacore pricing (nor I want to become one) but as far as I have seen it would be fair to assume that, in a context and scope like this, the software to enable each Storage VM would cost around 5.000$ (list price). This makes a total of 10.000$ for the virtual SAN with remote replication capabilities.
Then it comes the virtualization layer. Here there are a number of options (both from a vendor perspective as well as from a feature set perspective within the same vendor). Clearly if you want to use MS technologies to enable the virtual infrastructure solution (i.e. Hyper-V with Systems Center Virtual Machine Manager) the cost would be pretty low. On the other hand if you want to use VI3 Enterprise to enable it the costs would be higher (so would be the feature set). Obviously one should also take into account lower costs VMware VI3 alternatives (such as VMware VI3 Foundation) as well as alternative virtualization vendors such as Citrix and VirtualIron. All in all I think it would be fair to assume that one could spend, on average, 5.000$ to enable both physical systems with a virtual infrastructure software (again it might be as low as 0$ or as high as 10.000+$ depending on what you want to achieve).
All numbers we have mentioned (and assumed) are list prices and I will let you do the math on the average discounts you can achieve on each of those items (for example I know Lefthand has some bundle offerings that lower considerably the price). In general it would be fair to assume that, for a "low 2 digit thousands of US dollars" (what a great way to not tell you what I think a potential discount could be) you can get the following:
- A physical Server and Storage infrastructure capable of supporting up to 10 or 15 virtual images
- Integrated Server and Storage high availability and redundancy
- Compatible with all server virtualization enterprise features (live partition mobility / high availability for virtual machines / etc)
- Without the costs associated to a SAN environment
- With acceptable performance and good-enough scalability (in the context of small IT shops)
All this at a fraction of the costs compared to achieving similar characteristics using high end Enterprise class components and products. As I said this is not meant to be for everyone; Midrange / Enterprise Storage Server arrays should continue to be intended as the preferred choice for High Availability and Disaster/Recovery scenarios in many circumstances. This is however a great way for customers with limited budgets to achieve similar levels of features at a fraction of the price. This is not about cannibalizing the Midrange and Enterprise products market, but it is rather making similar level of features available to the masses (masses that would not be able to get the same features otherwise).
In closing this thread I'd like also to point out that virtualization is not, as many still think, (just) the capability to carve a number of software partitions out of a single physical system (aka server consolidation). Virtualization is really a paradigm shift within the data center that allows to re-architect the entire stack (hardware, software, management) in a completely different (and better) way. It allows customers and vendors to look at the typical problems from a completely different angle, thinking out-of-the-box if you will. It allows to solve problems in a way that a few years ago one could not even think it would have been possible.
Massimo.
|
-
Lately, there have been many discussions on the Internet and on various forums regarding the implementation of HA clustering technologies (namely and primarily Microsoft Cluster Server) within virtual machine environments (namely and primarily VMware infrastructures). Many customers are still treating virtual machines as if they were standard Windows servers (or Linux for what that matters) so this does make sense.
However there is a trend in this industry that is shifting typical infrastructure services from the multi-purpose operating systems into the virtual infrastructure. The top of the iceberg of this trend is called Virtual Appliances. While many view Virtual Appliances as a starting point of something big and new I really see them as the natural result (big and new) of this trend that is... turning the hypervisor into a so called Data Center OS. I have discussed this trend in a presentation that I did at VMworld 2007 in San Francisco and that you can access here.
If you stop for a minute and think about what it is happening in this x86 virtualization industry, you'll notice that many infrastructure services that were typically loaded within the standard Windows OS are now being provided at the virtual infrastructure layer. An easy example would be network interface fault tolerance: nowadays in virtual environments you typically configure a virtual switch at the hypervisor level, comprised of a bond of two or more Ethernet adapters and you associate virtual machines to the switch with a single virtual network connection. What you have done in this case is that you have basically delegated the virtual infrastructure of dealing with Ethernet connectivity problems. This is a very basic example and there are many others like this such as storage configuration/redundancy/connectivity.
These two pictures should graphically outline this trend:

(for higher quality pictures please refer to the presentation linked above)
Back on track, one of these infrastructure services that is about to migrate from within the multi-purpose OS where the application runs all the way down into the virtual infrastructure is the High Availability service. In the VMware vocabulary this is called VMware HA and this is a piece of code/intelligence that is part of the VI3 offering and whose purpose is to protect virtual machines from host failures. Basically what happens in this case is that, should a host fail, all virtual machines running on top of that failed host get automatically restarted on surviving nodes being part of the same VMware HA Cluster. However many readers would point out that there are at least a couple of very important architectural differences in how VMware HA compares to Microsoft Cluster Server implemented within virtual machines:
-
In the case of VMware HA there is a single instance of the virtual machine (with the application) to be protected. The VM is being started on a given node of the cluster given the status of the others (availability and resource utilization). Many people still think that the software stack loaded in the virtual machine is a Single Point Of Failure (imagine a Service Pack upgrade that goes wrong for example and you will have an unplanned downtime of the VM and in turn of the application). On the other hand a "virtual" MSCS solution requires two independent Windows nodes (virtual nodes in this case) so that should any problem occur within the software stack of a node it won't affect the availability of the application that can be restarted on the other virtual node.
-
In the case of VMware HA, you are really only monitoring the status of the physical server. Should a physical server go down the virtual machine is restarted on another node of the cluster. This scenario doesn't cover the software stack status within the VM nor, obviously, the application status within the VM (it must be noticed that VI3.5 introduced experimental support for monitoring the status of the OS within the VM via VMware Tools heartbeat check-points). On the other hand in a Microsoft Cluster Server solution you would typically be able to be protected by physical host failures (obviously) and you also would be able to monitor the application status so that a given service can be restarted onto another MSCS node should it fail to start on the "primary" node even if the node has not failed.
This picture should outline the differences of these two approaches:

I guess you can easily depict the philosophical differences between the two approaches. The first one is more traditional and tends to treat virtual machines as we have been treating physical servers in the last 10 years, applying the same practices and technologies. In the second picture, the philosophy is more innovative and tends to treat a VM as a simple object which leverages the new virtual infrastructure capabilities.
We are clearly at an inflection point now where many customers that used to do standard cluster deployments on physical servers (which was the only option to provide high availability) are now arguing how to do that. They now have the choice to either continue to do so in virtual servers as opposed to physical servers (thus applying the same rules, practices and with little disruption as far their IT organization policies are concerned) or turning to a brand new strategy to provide the same (or similar) high availability scenarios (at the cost of heavily changing the established rules and standards). The reason I am saying we are at an inflection point is because I really believe that the second scenario is the future of x86 application deployments, but obviously as we stand today there are things that you cannot technically do or achieve with it. Plus, there is a cultural problem from moving from an established scenario to the other.
The following table tries to summarize advantages / disadvantages of both approaches: Characteristics HA Cluster within the VM HA Cluster at the virtual infrastructure level
Easy Deployment Not True / Can't be achieved True / Can be achieved
SW stack redundancy True / Can be achieved Not True / Can't be achieved
Application Monitoring True / Can be achieved Not True / Can't be achieved
"Guest OS independent" high availability Not True / Can't be achieved True / Can be achieved
Allows to apply traditional practices and IT standards True / Can be achieved Not True / Can't be achieved
Allows to decouple appl functionalities from HA functionalities Not True / Can't be achieved True / Can be achieved
Easy to implement / inherit DR properties Not True / Can't be achieved True / Can be achieved
These are a few of the characteristics many users are currently debating. Again you can depict from the above that delegating this infrastructure service (i.e. HA) to the virtual infrastructure is a better way to implement a data center...at least in my opinion. Assuming proper and effective backup/restore procedures can be implemented for your virtual environments, assuming that you don't need strict application monitoring (or that HA clusters at the virtual infrastructure will improve over time) and assuming an IT organization can adapt easily to new deployment methods and standards... it's obviously where you want to go in the long term.
It is interesting to notice that there are a number of limitations in deploying an MSCS solution in a VMware VI3 environment: one is the fact that the VMDK files corresponding to the C:\ drive of the virtual machine nodes need to reside on a local, non-shared VMFS volume of a given ESX host - which is typically a small partition on the local hard drives that also contains the hypervisor code. On top of this there are a number of other limitations but it suffices to say how bad and not very flexible a Microsoft Cluster Server solution implemented on top of VMware VI3 can turn out to be, with no VMotion of the virtual nodes themselves given the non shared disks.
Another problem associated with the usage of HA software packages within virtual machines is that VMware tends to randomly pull in and pull out support for this at every minor and/or major infrastructure release update. Sometimes I am wondering whether these limitations imposed by VMware are due to technical challenges or to strategic politics from VMware to undermine the minds of those customers that want to keep their traditional practices. In fact this underlines the nature of the VMware strategies which is clearly not only that of introducing a hypervisor between your physical box and your legacy software stacks, practices and standards... their strategy is to literally scramble the entire data center in terms of software stacks, practices and standards. And it's not necessarily a bad thing if you think that these stack, practices and standards are not optimal (as I do).
At this point I also must remind readers that MSCS is an example that might be confusing for the simple fact that this is the very same technology that MS will be using to cope with this new trend. The idea is that, instead of using MSCS within a couple of virtual machines as we described above, they will be using MSCS to act as the High Availability mechanism for Hyper-V similarly to what VMware HA does for ESX. These pictures should clarify the idea:

Last but not least I also should mention that VI3 and MSCS are respectively examples of an implementation of an HA solution at the virtual infrastructure level (VI3) and an example of an HA software package at the virtual machines level (MSCS) that I have been using throughout this document to describe the concept. There are other technologies that can be mapped to the same concept and the list hereafter is an attempt to mention some of these options:
Virtual Infrastructure solutions w/ HA capabilities High Availability Software Packages for setup in Virtual Machines VMware Virtual Infrastructure Microsoft Cluster Server (MSCS)
Microsoft Virtualization (Hyper-V w/ MSCS) Veritas Cluster Server
VirtualIron Extended Enterprise .................. Citrix XenServer (HA module in roadmap)
..................
This oversimplifies a very complex matter; for example one could notice that VCS (Veritas Cluster Server) could be used either within a virtual machine environment (as reported in the table above) or as an alternative to VMware HA at the virtual infrastructure layer - similar to how MSCS can be used either within virtual machines or in conjunction with the Hyper-V parent partition. Interestingly enough, in such a context (i.e. used at the virtual infrastructure layer), VCS is potentially able to monitor application status provided the proper Veritas agents are loaded within the virtual machine guests...although this challenges the benefits of a deployment like this being potentially Guest OS agnostic.
Obviously all this discussion strictly pertains to typical HA scenarios where you have an application that deals with and manipulates data, and for which you need a shared storage solution. In all situations where the application is stateless and high-availability can be achieved load balancing multiple instances of it (a good example is a farm of web servers), then both high-availability and scalability is inherited by the layout of the application deployment and doesn't require any "infrastructure HA assist" (be it at the virtual infrastructure level or within the virtual machine).
In the end my suggestion is that users try to evaluate the pros and cons of the "legacy" option vs. the pros and cons of the "new trend", which leverages virtual infrastructure capabilities so that they can take educated decisions. Emotionally I do like the second option much more because it's.... better. But I perfectly understand many IT organizations have their own problems jumping on the wagon right away. By the way, I am totally for virtualization, but realistically I wouldn't rule out the potential situations of keeping some particularly critical x86 workloads on a physical MSCS cluster if that is required. Some organizations also like the idea of implementing N+1 clusters where you can protect N independent physical servers using a single virtualized host on top of which run N virtual images which are the MSCS nodes counterparts of the physical systems to be protected. While this sounds like an interesting scenario - and it is for some situations - it involves the same supportability and limitation concerns we have discussed above.
As a matter of fact, closing this long post, I have realized that I am ok with everything .... but with using HA software packages within virtual machines running on top of virtual infrastructures.... It's just too complicated, too risky, too cumbersome... too "no way."
Massimo.
|
-
In the last few months I have been struggling to understand what is so different, in terms of mass adoption, between virtualizing server workloads and virtualizing desktop workloads (also known as "VDI" or "Virtual Desktop Infrastructure"). I have been exposed to this phenomenon of x86 virtualization since around 2000 where the idea was as simple as taking a high end server and miniaturizing it into many small virtual servers. Similarly I have been exposed for the last 3 years to the other big use-case for x86 virtualization which is "Desktop Virtualization" and I can tell you that the time it took for the first traditional use-case to take off (through seeding the market with the idea - piloting and proofs of concept - mass adoption) was way shorter than the time it is taking for VDI to take off (going through the same phases above). This doesn't mean that VDI is not taking off but there are no doubts that after 3 years from introduction I have seen so many more production implementation of VMware ESX than I have seen of VDI.
Why is that? Isn't VDI just virtualizing XP rather than Windows Server? Well not quite I would say. Let's dig into some of the details (not in strict order of importance).
- Desktop Virtualization alternatives. While I am focusing this discussion on the VDI concept there are some analysts that, for good reasons, are implying that desktop virtualization is not just VDI (i.e. virtualizing Windows XP and putting it on a server in the back). There are other alternative architectures to "virtualize a desktop" such as Windows Terminal Services, Application Virtualization, OS streaming and many others. To complicate things further these technologies are sometimes complementary to each other and sometimes alternative to each other. So customers are challenged since the beginning of the potential desktop virtualization project with a great deal of input and information that they find hard to understand and digest. In the server space this has never been a great deal since "virtualizing a server" has always had a single meaning that was that of "hardware virtualization" (i.e. getting as many virtual hardware partitions as possible out of a single physical server). So in the server virtualization realm the confusion was far less than the one that is being created nowadays given all the potential architectures at the very different layers of the desktop software stack (and VDI is just one of these different architectures).
- VDI Products complexity. On top of the above complexity there is another one. In fact 8 years ago it was much easier to understand the products you needed to adopt a server virtualization model. If you used to buy 20 physical servers and install 20 Windows instances, now with server virtualization you would buy 2 physical servers, 2 VMware ESX 1.x licenses and install 20 Windows instances. As easy as it is. You couldn't do much differently and it worked great (so why bother?). VMware has since introduced new versions of the software and enriched their value proposition "linearly" with Virtual Center 1.x and eventually with VI3. On the other hand to adopt a desktop virtualization model you have to buy a virtualization platform, a connection broker, and you need to decide which access device you want to use etc etc. For every single layer of the architecture you have multiple implementations which translate into multiple different products that are supposed to do similar things (if you want to know more about the architecture of VDI have a look at this presentation). As a result in the last few years this desktop virtualization market has been very "foaming" with ISV's entering into this space and ISV's buying out other ISV's etc etc. Clearly it is much more difficult right now to understand what do to and which ISV to buy from a VDI solution than it was 8 years ago for a customer interested in entering the server virtualization space.
- Overall cost of the solution. In the desktop space there is a predominant metric that is "cost per seat" that you can hardly find in the server space. Sure customers understand that a server virtualization solution could cost slightly more than a traditional layout of a string of small physical servers but apparently they are more ready to discuss the benefits (in terms of TCO) of a virtualized solution and factor them in into the overall costs. This is especially true when these customers are considering high-availability solutions and disaster recovery that are either very expensive in the standard physical space or not achievable at all. On the other hand the "cost of the desktop" is a very strong metric that most customers are using when discussing the overall costs of a desktop virtualization solution. A couple of days ago I met with a customer that, as part of a very large bid, was buying (branded and good quality) desktops for 233€ (monitor and Windows license included). Needless to say that in a VDI solution which comprises the back end-servers, the virtualization software, the proper Microsoft licenses, the connection broker software, the thin clients and the miscellaneous utilities you might want to use to complement the scenario, the cost per user might be VERY WELL above that 233€. While for a server virtualization scenario the overall acquisition price of the solution can get close to what a customer would pay for a standard physical deployment (or at least within a reasonable range that is off-set by the tremendous advantages), to create a business case for VDI you have to include a detailed TCO analysis to get on pair with a standard desktop deployment. And we all know how difficult it is to "sell" on TCO (especially to desktops buyers).
- Microsoft licensing. Of particular importance is the issue of MS licensing. Historically, customers have always bought Windows PC's and historically these Windows PC's have come with a so called (very cheap) OEM Windows license (that is, when you buy a PC you get a Windows license tied to it). This OEM license CANNOT be used in a VDI scenario so you need to buy brand new licenses. And this is where the "fun" starts. This is a very bad story for customers both from a complexity perspective as well as from a cost perspective. At the time of this writing Windows licensing for virtual desktops is still pretty confusing: "should I buy a retail version of the OS?", "Should I buy the VECD (Vista Enterprise Centrlized Desktop) license under Software Assurance?", "What if I am not a customer with MS Software Assurance?" etc etc. All in all whatever you decide fits best your scenario as a customer, it's going to be more expensive than the cheap OEM Windows license you used to buy tied to your desktops purchase. We all hope MS will make this transition easier for our customers but so far ... not so good.
- End-user Experience. There is a big difference between virtualizing a server and virtualizing a desktop from an end-user perspective. You, as a CIO / Sys Admin, can virtualize a server or even the whole server farm and no one at your company would even notice it. It's just your own decision to do that or not to. In a desktop virtualization scenario, as soon as you start deploying the first thin client you are opening it up to the whole company. Immediately you have exposed your decision to dozens / hundreds / thousands of other individuals that, for good reasons or political reasons, will start to challenge you. Good reasons might be technical limitations that you have to compromise with as of today, limitations for which a thin client can sometimes hardly cope, in terms of local device attachment support / multimedia video performance / flexibility / off-line capabilities etc etc, with a standard desktop deployment. I can assure you that no single "average end-user" would ever realize that their mail system in the back is now running on a vm whereas yesterday it was physical; however even the more "IT-candid end-user" would understand that he / she is using Outlook from a "little box where I cannot even attach my iPOD anymore" as opposed to the PC he / she was used to! And there is when political problems start. On this I have always said that a very happy Sys Admin has a frustrated end-user base and, viceversa, a very frustrated Sys Admin has a happy end-user base. It's a matter of compromising as usual: VDI technology advancements will allow the CIO / Sys Admin to provide the standard business requirements whereas end-users will need to understand that they can't just see their business access device as if it was their home PC.
I think these are some of the major road-blocks for VDI to become really true and start the massive deployment we have seen in the traditional server virtualization use-case. All in all I think that the root of the problems when trying to re-architect the desktop deployments is that, whatever you do, it's basically a "hack". If you think about that for a minute, the WHOLE industry only has one default that is "the end-user will be using a Windows desktop". Whatever you do with any technology that the industry is creating (be it an application, a physical USB device or whatever) to make it work in a different scenario... it is a "hack". We have implemented hacks with Terminal Servers and we are doing the same with VDI and any other technology such as Application Virtualization. As long as there is an industry that creates "stuff" for the PC and there is just a handful of people that try to make the "stuff" work differently in a different scenario ... it will always be an up-hill. I look forward to the day when the industy as a whole will embrace these non-PC deployments in a more structured way than the current "I'll do this assuming the PC and then someone will be able to hack it to make it work for alternative scenarios". I look forward to the day when the average "CIO Joe" that needs to create an IT infrastructure will not only think "I have 1000 users, I have to buy 1000 Windows PC's" but rather ... "I have 1000 users, I need to buy a VDI solution for them".
At that point all these things such as products and architecture complexity, end-user experience, licensing issues etc etc will fall apart... because it has become the "obvious / default" way to give end-users access to IT.
Massimo.
|
|
|