And the winner is… AppSpeed
I have just got back from VMworld 2009 Europe in Cannes. It was an interesting week and not just because we were in Cote D'Azur (Azur, not Azure like in Windows Azure). There have been a few interesting announcements, demo and breakout sessions going on at the Palais de Festival during the week so it would be difficult to make a ranking but if I have to give my "virtual Oscar" to something I have seen.... that would be AppSpeed.
AppSpeed is a new technology that will take some sort of product shape during 2009 under the vSphere umbrella. Whether it's going to be part of the VDC-OS most expensive SKUs or it's going to be a separated product, that I don't know. The roots of this product are in an acquisition VMware did in the summer of 2008 when they acquired a company called B-Hive that developed a product called Conductor. Conductor - AppSpeed from now on - is an "SLA product" that basically takes apart the architecture of an application and creates a logical view of the sub-workloads taking place; a typical example is a multi-tier application that has web, application logic and database components. Not only this, the interesting part is that AppSpeed will monitor the performance of the workload in the way end-users perceive it that is: latency and time of execution. This means that once AppSpeed has built the logical mapping of the applications, the system administrator will have available at the fingertips information such as, for example, how long the web front end takes to respond to the request (i.e. web server response time), how long it takes for the transaction to get to the DB server (i.e. network latency), how long the DB server takes to respond back to the front end (i.e. DB server response time). If you want more information about AppSpeed you can see here; there is also a very nice on-line demo here.
I see this as a huge step forward in virtual infrastructure deployments for two particular reasons that I am going to articulate hereafter.
The first reason is because this is what customers implementing virtualization have asked me since I started deploying these technologies. "How much is the ESX overhead?" is probably the most frequently asked question that I have heard in the last 10 years or so of virtualization implementations and evangelism. The good news is that the answer was easy: "it depends". The bad news is that it was rarely satisfactory for the customer. The fundamental problem we have had so far is that VMware systems administrators and the application folks use different metrics to check the health of the implementation. Systems administrators would usually monitor resource usage on the host (i.e. CPU, Memory etc) such as "your VM is only consuming 10% of its allotted resources so it's doing well". However the end-users use a different metric such as "I don't care it's only using 10% of its allotted resources, the fact of the matter is that the job takes 2 minutes to complete so it's slow!". AppSpeed is going to bridge these two disconnected worlds giving the systems administrators higher level monitoring techniques that are very close to the language the end-users speak.
An interesting scenario that was pitched during the breakout session in Cannes was that AppSpeed could even be used in the pre-virtualization stage. The idea is that before virtualizing a given multi-tiered application (or part of it) you would use the AppSpeed sensors to build the logical map while the application is still running on one or more physical servers. That would give you the benchmark when you move the application into the virtual world. So for example if your transactional application deployed on your physical infrastructure has a 2 seconds response time or your batch workload has a 5 minutes elapsed time of execution, you can then benchmark your new virtual deployments against these values to see whether virtualization has brought in some overhead (and how much). And with the "decomponentization" that AppSpeed does at the application level you should be able to drill down to the level where you can determine where the issue is. It's not yet clear to me whether the correlation between AppSpeed metrics and standard resource usage metrics is going to be done out-of-the-box by the VMware tools or it's the systems administrator that will have to match the two metrics.
The second reason for which I think this is an enormous step forward in virtualization deployments is because I have always laughed at those people referring, in the early days, to VMware ESX as the mainframe software for x86 servers. There is a fundamental difference between a VMware ESX server and a mainframe and that is that mainframe operations are usually driven by "goal modes" in the sense that the administrator would set the goal - or the desired performance for a given workload - and it would let the system figure out itself the configuration of resources to deliver on the goal. While ESX has many of the knobs and parameters you could find on high end UNIX boxes and mainframes, its operations are still driven by "let's try to add more resources to that workload and see what happens". The pattern on ESX usually is:
- The end-user complains about the application to be slow (what does slow mean by the way?)
- the ESX administrator tries to add more resources (i.e. either increasing the CPU and Memory shares or increasing the number of vCPUs and Memory allocated to the VM)
- the ESX administrator keeps his/her fingers crossed and goes back to the end-user to see if anything has changed
- the end-user will either be happy or will continue to complain because the application is still slow (and the discussion would go on and on).
While AppSpeed won't add magically the goal mode capabilities to the VMware infrastructure it's clearly a step into that direction. Most likely in the first incarnation of the product the technology will allow to monitor "passively" the response time of a given application which would require a system administrator to work on the vSphere knobs to change the behaviour reported by AppSpeed. Continuing to speculate it would be natural for VMware to get to that "goal mode" state where a system administrator (or the end-user directly through the vApp SLAs) would set the "response time" for the application and would let the infrastructure figure out how to achieve that level of performance (and perhaps charge back accordingly).
I am certainly not saying that vSphere (or any future VMware products incarnation) would easily get to the point of matching the mainframe operations any time soon but AppSpeed is certainly a move into that direction. It is also worth noticing the different nature of the applications deployed on the mainframe and those deployed on x86 infrastructure. While applications deployed on the mainframe can usually be tuned increasing or decreasing priority access to physical resources while keeping the same number of application instances, on VMware infrastructure you can either use the same technique or - most likely - you might be forced to clone those workloads to scale-out (think of a web or application layer comprised of more VMs). This certainly adds complexity to the automation and the "goal mode" scenario since it's not just a matter of tuning priority shares for an existing VM but it is rather a process that would need to provision and de-provision workload instances on the infrastructure. It can be done but it's not as trivial as tuning a CPU power knob. The mainframe still rules in this space and it's always used as a benchmark for these sort of functionalities. And beating it is not trivial.
The limited documentation and demos available for the technology would lead to think that AppSpeed is able to respond to events automatically triggering resource reconfigurations (either shares reconfigurations or the ability to spawn new VMs) although I am not sure if that capability demonstrated was an ad-hoc scenario implemented for the demo or it's an out-of-the-box capability natively integrated with the VMware infrastructure underneath. Since, as I said, this is not a trivial thing to achieve, I would speculate that, initially, the product will only have monitoring capabilities based on which a system administrator could take corrective actions. We'll see as we know more though.
There are a couple of downsides however to this technology. The first one is that it's obviously a VMware oriented product so one should expect a real end-to-end meaningful measuring only if the end-to-end application architecture runs on VMware. To be honest VMware has countered this statement saying that you can also probe applications that run on physical boxes; this is the case for example of complex multi-platform and multi-tier applications where the front-end might run on a VMware infrastructure while the back-end might run on a UNIX box for example. This leads to the second concern which is this technology doesn't require any agent to be installed into the VM or the physical host running the application - which is a good thing - but it requires the AppSpeed server to sniff the network (virtual or physical) in promiscuous mode. This might be a security concern for some organizations.
All in all I would say AppSpeed is what any VMware system administrator was waiting for hence it gets my "virtual Oscar" (I know they don't give Oscars at the Palais de Festival.... but nonetheless it sounds nice).
Massimo.
P.S. I have just been informed that due to previous trademark registrations the name AppSpeed might change at the product general availability. Still up in the air, but watch out for the potential new name.