Typical VXLAN Use Case

One of the problems VXLAN is supposed to solve is the possibility to decouple (and abstract) the compute capacity from the underling network configuration. A lot of people whose background is solely in the compute space now know that there is a solution but don’t really get why there is a problem in the first place.

In this post I’ll attempt to describe the problem first and (in brief) the solution later.

Problem statement

The typical example of this scenario is that a VM needs to be deployed in a specific segment of the network. By that I mean a layer 2 broadcast domain. Free compute capacity should ideally drive the placement of this VM. Instead what happens is that what drives the placement is “where that specific network is available” across the number of clusters deployed. In fact, typically, each cluster has its own set of networks available. So if a specific network “is available” in a cluster that is utilized at 80% that’s where you need to deploy your workload, despite there may be another cluster sitting somewhere else doing pretty much nothing.

Why can’t you make that network available to the idle cluster one may argue? That’s the problem I’d like to double click on now.

When people talk about this they tend to mention “the VLAN is not available in that idle cluster”. I believe talking about VLANs confuses the people that don’t have a good networking background (like myself).

What happens here is that your access layer (TOR switches for example) is configured for one or more VLANs with a specific network. For example VLAN 200 is configured to use a specific network such as 192.168.10.0/24. This VLAN is routed at layer 3 to the other VLANs (or to other networks if you will) available in the infrastructure by means of a router. In a vSphere environment a PortGroup on a vSwitch represents this VLAN and the VLAN 200 (along with potentially others) needs to be made available to a pNIC through a trunk on the Access Layer switch.
In a rack far away there may be another TOR switch serving another vSphere cluster. Let’s assume VLAN 300 is available (along others) on this Access Layer switch and, through a trunk on the pNICs, to the cluster. This VLAN is configured with a 10.11.11.0/24 network segment. As you can imagine, placing a VM in either one of the clusters will determine its network personality. In other words it’s not the same thing.

So can’t you just configure VLAN 200 on this TOR? That is the confusing part. This isn’t so much a VLAN problem but rather a routing problem. You could indeed create a VLAN 200 but which IP network are you going to configure it with? If you assign a 192.168.10.0/24 class that doesn’t mean you have created a single layer 2 domain that spans those two VLANs per se (they are considered two distinct separate broadcast domains). You can possibly configure both of them with the very same IP schema but the end result is that:

- VMs in one network won’t broadcast to the VMs in the other network.
- A VM in one network can’t reach a VM in the other network (given the address of the other VM is considered a local address so the default gateway won’t attempt to route it)
- Every other router/L3 switch will be confused because they won’t know whether to send the packets for 192.168.10.0/24 to the left or right VLAN.

The picture below depicts the limitation mentioned.

If you assign a 10.11.11.0/24 schema to the VLAN 200 in the second cluster you can certainly route between this and the VLAN 200 on the first cluster (whose class is 192.168.10.0/24) but what would the point be if the objective is to create a flat layer 2 across these two switches and ultimately, across these clusters?

So as you can see it’s not so much about “VLANs not being available”. It’s more about routing and segmentations of VLANs based on the configured IP classes the core of the problem.

Can we create a flat layer 2 network across these elements? Yes we can do this by, for example, creating a GRE tunnel (or EtherIP, L2TPv3 or OTV for that matter) that needs to be configured on the ingress and egress aggregation switches. These protocols, in a nutshell, can extend a layer 2 domain across a layer 3 tunnel.

Doing so you are essentially stretching VLAN 200 to the other side of the datacenter. This is different than having two “standalone” VLAN 200’s in different locations of the data center.

This sounds all good but this isn’t usually seen well by network admins because it involves a lot of operational troubles. Consider that in order to create this tunnel all network gears involved in this tunnel (ingress and egress aggregation switches) need to be configured (perhaps manually, perhaps one by one) for this to happen.

The net result is that this doesn’t get done (usually) and the only option is to deploy the VM on the cluster that has visibility of the VLANs that represents the IP network segment the VM needs to end up in.

The Solution

VXLAN provides the solution for the aforementioned problem. By creating an abstraction layer on top of the networking physical infrastructure, the VXLAN technology can bind the two separate layer 2 domains and make them look like one. It essentially presents to the application (or the VM if you will) a contiguous flat layer 2 by connecting (over layer 3) two distinct domains.

This is not different than what the GRE protocol we described above would do. The difference here is that we do this in the software running on the servers leveraging the standard layer 3 routing in the network.

In other words VXLAN encapsulate the layer 2 traffic and send it over traditional layer 3 connectivity. GRE does a similar thing (conceptually at least) but requires the network to be reconfigured to do this encapsulation. VXLAN does this in an abstraction layer running on the server.

A lot has been already said on the technicality VXLAN uses to achieve this (multicasting) and I appreciate there is space for improvements in how it works. This post is not intended to go deep into the solution, as it was more of a double click on the problem and why we need a “solution”.

Please note what we discussed here is one of the two main use cases for VXLAN: creating a flat layer 2 network across a physical layer 3 network.

There is another use case we haven’t mentioned in this brief article: being able to carve out a number of virtual wires from a single VLAN.

Deja Vu

As I was writing this post my mind sort of went back 10 years and I thought this is exactly the same thing VMware did with server virtualization: a static inflexible server infrastructure that couldn’t be adapted easily to run workloads dynamically. The deployment of a new physical server would have taken weeks.

We resorted to a layer of software that could provide the flexibility on top of a static set of resources that was difficult to provision and reconfigure.

The first wave of change came with ESX where you could take an arbitrarily big server and slice it on the fly to create virtual instances out of that static server. In a way this reminds me what VMware did with the Lab Manager logical networks (and now with VXLAN) in the early days where you could take a VLAN a slice it with a right click of the mouse within the context of an application running on the server.

The second wave came with vMotion and DRS where not only you could apply that abstraction at the single server only but we started to tie together loosely coupled physical resources and make them appear as one to the application. In a way this reminds me what we are doing with VXLAN where we take a static routed network backbone and we create these abstracted and flexible virtual wires to make it appear the way we want.

I understand and appreciate this may not be the most efficient way, from a performance perspective, to consume a network. And I hear lots of networking expert saying that. I don’t argue with that. But wasn’t this the same argument for server virtualization in the early days?

Interesting times ahead. Time will tell.

Massimo.

23 comments to Typical VXLAN Use Case

  • Nicely done.

    So, what you are saying is that VXLAN will enable VMware to be the VMware of Networking.

    Eric

  • Grear article Massimo. I can’t wait to test more VXLAN with vCloud on a larger scale so I can see exactly how it’s implemented step by step.

  • Mario

    Hi Massimo!

    I don’t understand the use case.

    Is the “Core” in your pics only a router? Or is it a layer-3-switch? We use Catalyst 6500 as core switches iirc and I don’t see why you can’t have have VLAN 200 span all aggregation and access switches and deliver it to both vSphere clusters. You say something about “the other side of the datacenter” but I know we do just this ,i.e. spanning VLANs, even across sites a couple of kilometers apart. Where’s the benefit of VXLAN?

    Don’t get me wrong, I think VXLAN is a very interesting development. I just don’t see why you would want to use it in this case when your “Core” is a layer-3-switch and not just a router.

    cu

    Mario

    • Massimo

      Hi Mario. No problem, I am not getting you wrong. It does sound like a good discussion.

      As I said in my post I don’t have a solid networking background (you might wonder why I write these stuff then?!) but I’d say that what you are referring to is somewhat unusual in the sense that many customers I have been working with seem to prefer smaller layer 2 domains rather than infinitely big layer 2 domains (which may provide a great level of flexibility but they do provide also a very challenging network design due to all the STP issues / limitations).

      Note that what I had in the picture / text was a very peculiar example of one VLAN being made available to a couple of clusters. What I am really referring to in general is to enable ALL clusters in the ENTIRE datacenter to be able to access ALL layer 2 domains. If I go with your model I would need to create (correct me if I wrong) a trunk of ALL VLANs on ALL the ports of ALL the switches in the datacenter. Without getting into that (admittedly unrealistic) scenario I think the common practices in most data centers is to limit the size of the layer 2 domains (and confine potential problems in an as small as possible domain) and connect them via a layer 3 routing protocol. So it isn’t so much (IMO at least) about the distance of the racks but more about the density of the data center.

      The flow in my post didn’t really take into account this scenario. I somewhat assumed this “small layer 2 domains” as a given and then I ranted that the only solution to this is to tunnel them using the mechanisms I mentioned. You are somewhat saying that that given shouldn’t really be a given.

      BTW what I am also learning is that these discussions (as I have already hinted in the blog post) aren’t very dissimilar to many of the discussions we have entertained in the server space. And the funny thing is that there isn’t a right/wrong answer. I remember all the arguments around scale out Vs scale up servers scenarios. Should I use a 64CPU server (ie a big layer 2 domain) or should I use 32 2CPU small tightly coupled servers (ie 32 small layer 2 domains routed at layer 3). Ironically the arguments were very similar (ie separated fault domains etc).

      I am pretty sure that in the server space the war was won by the latter scenario and I have assumed this was the same for the network domain. I know that talking to many customers they acknowledged this (local VLAN availability) as a problem.

      What’s your thought / view?

      Thanks for chiming in.

      Massimo.

      • TJ

        I believe the real problem is the limit on the number of VLANs supported in a traditional deployment – 4096, IIRC. In a more-than-trivially-sized CSP deployment that is insufficient and, thus, we need a solution … like VxLAN.

      • Mario

        Hi Massimo,

        My networking background isn’t that solid, either. Enough to be a (as I hope) good vSphere admin but I have never worked as a network administrator.

        Maybe we’re talking about diferent sizes of virtual environments: We run some 1500 VMs. If you run tens of thousands of VMs on hundreds of vSphere hosts the game is probably quite different. We’re just not that big.

        I’m not sure if you can compare the scale out vs scale up discussion to this one. The question is not: Do I use a few large or lots of small broadcast domains? The questions is: I’d like to have my clients (or even applications) in different broadcast domains in order to separate them from each other, how do I manage this? And I think VXLAN tries to be the solution to this.

        cu

        Mario

  • Will this be going to worry the network guys as much as when they discovered that we can deploy a multihomed server inside our ESX that spans all the trunk they gave us without even involving one of their firewalls? :-)
    I foresee rogue networks coming up everywhere…

    • Massimo

      It may turn out to be different things for different organizations. Regardless of whether VMware will be successful or not IT seems to be moving to a software world anywyay (and network is no exception). Smart networking people will embrace this trend. Perhaps networking people that are close to retirement will continue to fight the trend :)
      Joking aside it will be interesting to see what happens. In an ideal world server, network and storage people should work against a common objective… but we know the real world doesn’t exist.

  • For those of you interested… Ivan wrote a follow up triggered (among other things) by the interesting commentary in this blog post: http://blog.ioshints.info/2012/05/layer-2-network-is-single-failure.html

    Enjoy.

  • Massimiliano

    Massimo thanks for the interesting article.

    Just a quick general comment on datacenter network design that it will impact (maybe positively) technologies such as VXLANs.

    The network world didn’t experience big revolutions last few years but we have to be aware that there might be one soon (well it is happening).
    L2 multipathing (TRILL, SPB and other more vendor oriented like Cisco FastPath and Juniper QFabric) are re-shaping datacenters design, getting rid of the glorious Spanning Tree Protocol and transforming them from the traditional core->distribution->access design towards a fully meshed L2 datacenter, loop-free and with increased bisectional bandwidth (previously limited by STP).

    A quick side note on the role of sysadmins vs network admin:
    IMHO it will not be possible for VM guys to work un-ware (read un-skilled) about the underlying network, on the other side I ‘feel’ network engineers getting quickly up-skilled on virtualization technologies…for me serious networking knowledge is a necessary (though not sufficient) condition to be a good next-gen system administrator. On top of this we are seeing the same network convergence wave is touching the storage side of the equation (FCoE etc), again a solid network knowledge will be necessary for storage admins.

    I thinks nobody knows exactly what’s going to happen when you put in the mix SDN…for sure exciting times are to come, meantime let’s get a good networking book!

    Max

    • Massimo

      Agreed.

      I think the value of VXLAN is that you can achieve (potentially) that state of the art networking without having to invest into these new technologies.
      Perhaps this parallel is a bit of a stretch but I see VXLAN similarly to what DRS/vMotion is to tie together a cluster of smaller standard boxes to provide a “giacantic” compute engine. Now, I don’t want to compare QFabric/FastPath/etc to a gigantic Unix physical server (too many differences) but this may give you a different perspective (one of the many).

      I don’t think VXLAN will stay forever. Particularly as applications mature and layer 2 boundaries become more and more meaningless we may see lots of small layer2 routed islands be a good fit for the next gen applications. Another reason for which people may not be looking at huge investments to create this massive flat L2 datacenter at the infrastructure layer, perhaps.

      I also agree that boundaries between compute, storage and network are slowly going away and I also agree that a network expert is better positioned to that transition than a compute expert is (learning the compute part is an order of magnitude easier than learning the network part). That’s why I laugh when I hear network people concerned about VMware trying to give network power to compute people. I always suggest to see this the other way around. What if VMware is empowering network people to get visibility into the compute space?

      Massimo.

  • [...] Yesterday I got an email about configuring VXLAN. I was in the middle of re-doing my lab so I figured this would be a nice exercise. First I downloaded vShield Manager and migrated from regular virtual switches to a Distributed Switch environment. I am not going to go in to any depth around how to do this, this is fairly straight forward. Just right click the Distributed Switch and select “Add and Manage Hosts” and follow the steps. If you wondering what the use-case for VXLAN would be I recommend reading Massimo’s post. [...]

  • Really detailed and meticulous Massimo. Great. Mario, as Massimo mentioned, VXLAN helps you scale and decouple your Layer 2 application addressing schema from network addressing. This provides tremendous flexibility and agility to the server/app teams while letting the network teams get rid of locked-in proprietary solutions like QFarbic or FabricPath where troubleshooting them are an admins worst nightmares. There are standard multi-chassis solutions from multiple switch vendors that let you scale your Layer 2 domain (Two Arista 7508 switches in an MLAG design with 768 10G linerate non-blocking ports in aggregation). However, there is a limit while an L3 fabric design let’s you scale your designs beyond these ceilings.

    The second key benefit of VXLAN is the use of the inner fields in increasing entropy in the outer encap headers of VXLAN to better leverage the standard multi-way ECMP L3 designs of most data centers. This allows better bandwidth & link utilization of you network.

  • Hey Massimo,

    Thanks for the post – keep’em coming!

    For some reason, I was getting wrapped around the axle about a square peg being hammered into a round hole: VXLAN as a L2 Data Center Interconnect. I’m a big fan of Ivan’s (blog.ioshints.com) and I’ve been trying hard to understand VXLAN and its use-cases. I’ve read the IETF draft (ok, the problem statement) but finally, thanks to this post, you’ve added the piece I’ve been missing. VXLAN is very useful inside the same datacenter but, if I recall Ivan’s objections, it doesn’t support traditional DCI protections from packet flooding or suboptimal routing when you do an inter-DC vMotion.

    I’ll be speaking about OTV and VXLAN in our upcoming VMUG in Dallas. I’ll be sure to point them your way, Massimo.

    Thanks and all the best!

    Mike

    http://VirtuallyMikeBrown.com
    https://twitter.com/VirtuallyMikeB
    http://LinkedIn.com/in/michaelbbrown
    .

    • Ah! My bad (do people still say that!?) I always massacre Ivan’s URL. It’s actually blog.ioshints.info.

      Thanks,

      Mike

    • Massimo

      Thanks for the kind comments Mike. Yes VXLAN is currently positioned as an intra-DC “thing”. Other than the challenges Ivan correctly outlines consider that today the scope of the vWires is limited to a single vCenter. That is just to say that there are also practical limitations other than those performance/optimization challenges.

      Not that many of these limitations/concerns will go away as we march towards a better integration of vCNS and Nicira.

      Thanks again.

  • [...] use cases (when / when not to use it) VXLAN Primer-Part 1 VXLAN Primer-Part 2: Let’s Get Physical Typical VXLAN Use Case Digging Deeper into [...]

Leave a Reply