From Scale Up vs Scale Out… to Scale Down

Those of you that have been following me on twitter and on my blog know that I have been very focused on studying and monitoring the latest trends regarding which hardware platforms virtualization users are using for their infrastructures. This includes multiple points of view such as simple sizing rules of thumb, potential reference architectures and scale up vs. scale out strategies. I'd like to spend the next few minutes talking about what's going on lately in this respect, specifically in light of the latest (and future) hardware improvements we have seen or that we will see in the next few months. I am doing this because I have a very weird feeling about what's going on. Bear with me.

When I started working with VMware software back in 2001, the only value proposition that we could imagine out of the thing was the so-called server consolidation: in essence the process of consolidating many virtual instances - aka partitions or guests - onto a fewer number of physical servers. To make a long story short, down the road we have realized that the value proposition was way more than just server consolidation as a mean to reduce the costs of operation. It suddenly became pretty evident that there were many more advantages to that which may include things like easier high-availability for applications, easier Disaster Recovery scenarios, faster time-to-market for business applications, and many more. S_erver consolidation_ was, at that point, just one of the many value items we know today.

Right now my feeling is that the advantage of stuffing more and more OS instances on as few physical systems as possible is not even considered an advantage any more these days. To put it another way, it is still considered an advantage, but only to a certain extent. In fact, if consolidating more instances on fewer hardware pieces was still one of the strategic objectives of a virtualization process, what you would have seen was a progression in terms of the ratio # of OS instances / physical system. Something like this:

  • 4-Socket single-core x86-based server with n GB of memory could support 10 VMs
  • 4-Socket dual-core x86-based server with n*2 GB of memory could support 20 VMs
  • 4-Socket quad-core x86-based server with n*4 GB of memory could support 40 VMs

The numbers above are just examples, and are only used to outline the mathematic progression I was mentioning. The high level idea behind it is that, the more powerful the systems become, the more OS instances you could consolidate onto them. Once you have strategically chosen a given hardware platform (whose main characteristic is expressed in # of CPUs it is capable to support) you will see higher consolidation ratios as the CPUs become more powerful (typically via doubling the number of cores from one generation to the other). Put into a more mathematical language, the constant here should be the number of CPUs (in red). The speed of the CPU is a function of the Moore's law, so to speak. As a result, the number of VMs that can be supported is a function of the CPU speed. Memory is also a function of the CPU speed and it needs to be configured accordingly to keep a balanced system with the proper CPU-to-Memory ratio.

That's what would happen (naturally) if server consolidation was a priority. However I have noticed that it doesn't seem to be what's actually happening in the industry. I can think of many such situations, but the most emblematic to me refers to a customer I have been working with very closely since 2001. We started deploying 16-Socket single-core servers, then they moved to 8-Socket dual-core servers, then to 4-Socket quad-core servers and are now in the process of migrating to 2-Socket Nehalem-based servers. In a way, what it is happening is that customers are inverting the mathematical constants and variables compared to what would be natural (see above). This is the approach and mindset most customers are using these days to size their "brick":

  • To support 20 VMs I would need a 8-Socket single-core system with n GB of memory
  • To support 20 VMs I would need a 4-Socket dual-core system with n GB of memory
  • To support 20 VMs I would need a 2-Socket quad-core system with n GB of memory

Wow. This is neither Scale Up nor Scale Out. This is indeed Scale Down!

Again, while the numbers are not tremendously unrealistic, they are only used to demonstrate, at a very high level, the mathematical progression which maps the mindset. As you can see there is a trend in the industry right now that doesn't consider the number of VMs you can get on a system as a function of how fast and powerful the system is. It's quite the opposite. The speed of a system is determined as a function of the requirement to run a fixed number of VMs. Since the size of the memory is typically a function of the number of VMs, its configuration doesn't tend to vary drastically because the number of VMs tends to remain the same. By the way, 20 / 25 VMs seems to be the average number most customers are defaulting to on each physical host, based on what I have seen.

There are a few reasons for which this is happening. One of the reasons is that most customers are not confident to put too many eggs into a single basket. They may be guessing that 20 / 25 partitions per host is a good trade-off between disadvantage of the potential downtime of multiple partitions and the advantage of having fewer physical servers (compared to a non-virtualized environment). For example, having 5 partitions would diminish too much the value of the latter, and having 100 partitions would increase too much the potential risk of the former. The consensus today does seem to be 20 / 25 partitions.

Another reason why this is happening is that there is a common perception that the smaller the virtualization brick is, the cheaper it is (due to the commoditization process we are seeing in the low-end x86 market). I don't have a definitive position on this - as I think that it always depends. But there are a number of people in this industry that would claim that, while this may be a good approach for a small business that only has a few dozens partitions to deal with, it wouldn't work for an enterprise customer with thousands of partitions. The method would result in an improperly designed virtualized infrastructure due to the high number of physical low-end servers required.

The third - and last - reason I am mentioning here is a bit more tricky and opportunistic in my opinion. The x86 virtualization industry is largely driven by software vendors rather than hardware vendors. Software vendors in this space tend to prefer the usage of low-end commodity servers because, this way, they can provide the value at the software layer. There is no magic: the better the hardware is (in terms of scalability / resiliency / efficiency / etc.), the less infrastructure software features you need to make it an enterprise platform. On the other hand, if you use many low-end commodity x86 servers you can tie them together into a single gigantic (virtual) enterprise platform through the value of the software running on them. The latter is what software vendors really love to hear these days and that's what they are after.

If you are still following me and agree with the analysis to some extent, you'll realize that there are a number of implications caused by this trend.

One of the implications is that servers are now memory-bound. If you ask 10 virtualization architects in the x86 space they will all tell you that the limiting factor today in servers is the memory subsystem. Put it another way, you are reaching the physical memory usage limit far before you manage to saturate the processors in a virtualized server. Have you ever wondered why that is the case? As users move backwards from 8-Socket servers to 4-Socket servers to 2-Socket servers the number of memory slots available per server gets reduced. That's how x86-based servers have been designed over the years: the more sockets the server has, the more memory slots that are available. What is happening now is that customers tend to use much smaller servers because they can support the same number of partitions per physical host, but the memory requirements haven't changed. That's because the amount of memory needed is a function of the number of partitions running, and if that number of partitions is kept constant you will always need the same amount of memory.

That's the problem: you now have a lot fewer slots available to support the same amount of memory. While memory vendors have been able to squeeze more and more Gigabytes worth of circuitry in the same DIMMs, the fact is that this is not enough to create a balanced system given the speed of CPUs has improved at a faster pace than memory vendors have been able to shrink their parts to put more memory space into a single DIMM. The outcome? You either configure very dense - and expensive! - memory modules into those fewer slots in the low-end servers, or you configure reasonably cheap DIMMs into those slots. The first approach would send the price of that virtualization brick to the roof; the second approach would cause the system to be bottlenecked very soon by the memory subsystem, with the CPUs being used at a fraction of their potential. This is in fact what's happening, as it is not uncommon these days to see virtualized systems being used - from a CPU perspective - at about 30-40%, and memory being already under heavy pressure approaching the physical limit.

There is another aspect to consider which is even more "interesting." The high density memory cost seems, frankly, to be the excuse for being stuck in such a situation. After all, it may even be convenient, in some cases, to configure more expensive memory parts to double the number of partitions and put to good use those wasted CPU cycles. However, the real problem seems to be that most customers are mentally partitions-bound: "No matter the technology and its associated costs, I don't want to get beyond the 20 / 25 partitions per physical host." If that is really the case - it's just my feeling so far - in the near future we won't need cheaper high density memory DIMMs or more memory slots in low-end servers. Most likely what will happen in the near future is that these customers will either start using 1-Socket servers - assuming these have the same memory support characteristics of the 2Socket servers - or more simply they will start populating a single CPU package in 2-Socket-capable servers. At this pace we will be running single socket Atom servers in about 24 to 36 months: Intel and AMD are warned!

This also will have further (and funny) implications. For example, the structure of all the industry benchmarks out there may become irrelevant in the future (assuming you consider it relevant today). All these benchmarks are designed to load the CPUs at 100% (configuring all other subsystems to cope with that) and coming out with a scalability number. In the server virtualization context, this number is typically expressed in the number of VMs a given n-Socket server can support. In the scenario I am picturing, this is completely useless. First of all, because of what we have said, memory is becoming the bottleneck in most of the situations, so these benchmarks should - at least - assume the 100% memory load as the limiting factor of a given server configuration. What's the point of benchmarking a server running at 100% of CPU utilization for which you had to configure 1TB of memory and 3.000+ disk spindles to achieve that CPU load, when customers are using 128GB of memory and a few dozens spindles at best?

To make things worse, the number of VMs is not even a function of the speed of the server any more - as we argued - but rather it's becoming a constant in the equation. In the currently available benchmarks, in fact, the constant is the number of Sockets and its 100% load. To build a benchmark that could map exactly what's happening in the industry and could be of use for the community, one would need to design a performance test that would give the number and type of CPUs and memory DIMMs to achieve a certain number of constant partitions (20 or 25). The lowest the resources (and their price), the best is the result.

While there is nothing wrong with all this, at the same time we need to acknowledge it is the complete negation of the initial Server Consolidation value item we started with back in 2001. The problem is that users may be leaving lots of money on the table because of inefficiencies due to underutilized resources and/or the management of many small Intel based servers (think about the costs associated with power consumption or I/O cablings). This is far from being an attempt to convince you that Scale Up is a better approach. I am ok with a Scale Out approach, too, as I can see the value of it. However, I see this Scale Down approach as a trend that won't allow users to exploit the full potential of what you could achieve using the technologies properly. Perhaps I am having the wrong perception of what's going on; or perhaps I am having the right perception and I am wrong in questioning it. Either way, I'd be curious to hear what you think, if you have a spare minute.

Massimo.