NVGRE, VXLAN and what Microsoft is Doing RightPosted: October 3, 2011
There has been more movement in the industry towards L2 in L3 tunneling from the edge as the approach for tackling issues with virtual networking.
Hot on the heals of the VMWare/Cisco-led VXLAN announcement, an Internet draft on NVGRE (authored by Microsoft, Intel, Dell, Broadcom, and Arista, but my guess is that Microsoft is the primary driver) showed up with relatively little fanfare. You can check it out here.
In this blog post, I’ll briefly introduce NVGRE . However, I’d like to spend more time providing broader context on where these technologies fit into the virtual networking solution space. Specifically, I’ll argue that the tunneling protocol is a minor aspect of a complete solution, and we need open standards around the control and configuration interface as much as we need to standardize the wire format.
We’ll get to that in a few paragraphs. But first, What is NVGRE?
NVGRE is very similar to VXLAN (my comments on VXLAN here). Basically, it uses GRE as a method to tunnel L2 packets across an IP fabric, and uses 24 bits of the GRE key as a logical network discriminator (which they call a tenant network ID or TNI). By logical network discriminator, I mean it indicates which logical network a particular packet is part of. Also like VXLAN, logical broadcast is achieved through physical multicast.
A day in the life of a packet is simple. I’ll sketch out the case of a packet being sent from a VM. The vswitch, on receiving a packet from a vNic, does two lookups (a) it uses the destination MAC address to determine which tunnel to send the packet to (b) it uses the ingress vNic to determine the tenant network ID. If the MAC in (a) is known, the vswitch will cram the packet into the associated point-to-point GRE tunnel, setting the GRE key to the tenant network ID. If it isn’t know, it will tunnel the packet to the multicast address associated with that tenant network ID. Easy peasy.
While architecturally NVGRE is very similar to VXLAN, there are some differences that have practical implications.
On positive side, NVGRE’s use of GRE eases the compatibility requirement for existing hardware and software stacks. Many of the switching chips I’m familiar with already support GRE, so porting it to these environments is likely much easier than a non-supported tunneling format.
As another example, on a quick read of the RFC, it appears that Open vSwitch can already support NVGRE — it supports GRE, allows for looking and setting the GRE key (including masking, so can be limited to 24 bits), and it supports learning and explicit population of the logical L2 table. In Open vSwitch’s case, all of this can be driven through OpenFlow and the Open vSwitch configuration protocol.
On the other hand, GRE does not take advantage of a standard transport protocol (TCP/UDP), so logical flow information cannot be reflected in the outer header ports like you can for VXLAN. This means that ECMP hashing in the fabric cannot provide flow-level granularity which is desirable to take advantage of all bandwidth. As the protocol catches on, this is simple to address in hardware and will likely happen. [Update: Since writing this, I’ve been told that some hardware can use the GRE key in the ECMP hash, which improves the granularity of load balancing in the fabric. However, per-flow load-balancing (over the logical 5 tuple) is still not possible.]
OK, so there you have it. A very brief glimpse of NVGRE. Sure there is more to say, but I have a hard time getting excited about tunneling protocols. Why? That is what I’m going to talk about next.
Tunneling vs. Network Virtualization
So, while it is moderately interesting to explore NVGRE and VXLAN, it’s important to remember that the tunneling formats are really a very minor (and easily changed) component of a full network virtualization solution. That is, how a system tunnels, whether over UDP, or GRE, or CAPWAP or something else, doesn’t define what functions the system provides. Rather it specifies what the tunnel packets look like on the wire (an almost trivial consideration).
It’s also important to remember that these two proposals (VXLAN and NVGRE) in specific, while an excellent step in the right direction, are almost certainly going to see a lot of change going forward. As they stand now, they are fairly limited. For example, they only support L2 within the logical network. Also, they both abdicate a lot of responsibility to multicast. This limits scalability on many modern fabrics, requires the deployment of multicast which many large operators eschew, and has real shortcomings when it comes to speed of provisioning a new network, manageability, and security.
Clearly these issues are going to be addressed as the protocols mature. For example, it’s likely that there will be support for L2 and L3 in the logical network, as well as ACLs, QoS primitives, etc. Also, it is likely that there will be some primitives for support secure group joins, and perhaps even support for creating optimized multicast trees that don’t rely on the fabric (some existing solutions already do this).
So if we step back and look at the current situation. We have “standardized” the tunneling protocol and some basic mechanism for L2 in logical space, and not much else. However, we know that these protocols will need evolve going forward. And it’s very likely that this evolution will have non-trivial dependency on the control path.
Therefore, I would argue that the real issue is not “lets standardize the tunneling protocol” but rather, “lets standardize the control interface to configure the tunnels and associated state so system developers can use it address these and future challenges”. That is, if vswitches and pswitches of the future have souped up versions of NVGRE and VXLAN, something is still going to have to provide the orchestration of these primitives. And that is where the real function comes from. Neither of these proposals have addressed this, for example, the interface to provide the mapping from a vNIC to a tenant network ID is unspecified.
So what might this look like? I’ll use Open vSwitch to provide a specific example. Open vSwitch is a great platform for NVGRE (and soon VXLAN). It supports tens of thousands of GRE tunnels without a problem. And it allows setting and doing lookups on the GRE key. But what makes it immediately useful is not that it supports these tunneling primitives, but that a third party can pick it up, and use OpenFlow to configure those keys, tunnels, and any other network state (ACLs, L3, etc.) to build a full solution.
And that is the crux of the issue. If you get a solution that supports VXLAN or NVGRE, even though they are standardized, you are only getting a piece of the solution. The second piece we need, and should all ask for, is an open standard for configuring this interface. We (Open vSwitch) use OpenFlow. But any standard will do.
What about Microsoft?
From what I can tell (and admittedly, I don’t know very much), this seems to be what Microsoft is getting right. Rather than just specify a tunneling protocol, it appears that they’re also opening up the vswitch interface to support implementations from multiple vendors. While this itself doesn’t provide an open interface to virtual networking configuration, it does allow someone to do that work. For example, NEC has announced they will be releasing an OpenFlow compatible vSwitch for Windows Server 8. My guess is that this is a port of Open vSwitch. (Hey NEC, if you are using Open vSwitch, care to share the changes with the rest of the community?)
In any case, NEC will probably charge you for it, but we will work hard to make a full Open vSwitch port for Windows Server 8 available. Of course it will be open source, so we encourage users to get it for free and innovate in the source as well as use the open management protocols.
We’ll also be demoing Open vSwitch support for NVGRE within OpenStack/Quantum, so stay tuned for that.
Anyway, kudos to Microsoft for supporting L2 in L3 and adding a contender to the pool. It will be interesting to see what finally pops out of the IETF standardization process. I presume it will be some agglomeration achieved through consensus.
And double kudos for opening the vswitch interface. I think we’re going to see a lot of cool innovation in networking around Windows Server 8. And that’s good for all of us.