Thanks, Steve Herrod.
If you missed it, VMware, Cisco, Broadcom and others announced support for VXLAN a new tunneling format. For the curious, here is the RFC draft.
So quickly, what is VXLAN? It’s an L2 in L3 tunneling mechanism that supports L2 learning and tenancy information.
And what does it do for the world? A lot really. There currently is fair bit of effort in the virtualization space to address issues of L2 adjacency across subnets, VM mobility across L3 boundaries, overlapping IP spaces, service interposition, and a host of other sticky networking shortcomings. Like VXLAN, these solutions are often built on L2 in L3 tunneling, however the approaches are often form-fitted and rarely are they compatible with another implementation.
Announcing support for a common approach is a very welcome move from the industry. It will, of course, facilitate interoperability between implementations, and it will pave the way for broad support in hardware (very happy to see Broadcom and Intel in the announcements).
Ivan (as usual, ahead of the game) has already commented on some of the broader implications. His post is worth a read.
I don’t have too much to add outside of a few immediate comments.
First, we need an open implementation of this available. The Open vSwitch project has already started to dig in and will try and have something out soon — perhaps for the next OpenStack summit. More about this as the effort moves along.
Second, this is a *great* opportunity for the NIC vendors to support acceleration of VXLAN in hardware. It would be particularly nice if LRO support worked in conjunction with the tunneling so that interrupt overhead from the VMs is minimized, and the hardware handles segmentation and coalescing
Finally, it’s great to see this sort of validation for the L3 fabric, which is an excellent way to build a datacenter network. IGP + ECMP + a well understood topology (e.g. CLOS or Flattened butterfly) is the foundation of many large fabrics today, and I predict many more going forward.
[This post is written by Teemu Koponen, and Martin Casado. Teemu has architected and been involved in the implementation of multiple widely used OpenFlow controllers.]
Over the last year, there has been some discussion within the OpenFlow and broader SDN community about designing and standardizing a controller API in addition to the switch interface.
To be honest, standardizing a controller API sounds like a poor idea. The controller is software, not a protocol. To our knowledge there are 9 controllers in the wild today, and more than half of them are open source. If the community wants an open software platform, it should divert resources from arguing over standards to building open source software.
(yeah ok, this comic is only tangentially relevant … )
In addition to being a waste of time, premature standardization could be detrimental to the SDN effort. Minimally, it will unnecessarily constrain a fledgling software ecosystem. Software innovation should come through development and experience with deployment, not through consensus built on prophecies.
Further, it’s not clear how to design and standardize an interface to a (non-existent) large software system out of whole cloth. Even Unix which started in the early 70s did not get standardized until the mid/late 80s.
For SDN in particular, there is really very little experience building controllers systems in the wild today. We (the authors) have built four controllers, three which have seen production use, and two of which have multiple products built on them from different organizations. And still, we would certainly not argue for standardizing a particular interface. Hell, we’re not even clear if there is a one-size-fits-all interface for controllers targeted at drastically different deployment environments. In fact, our experience suggests the contrary.
In our opinion, if an interface is going to be standardized, it should follow the software model of standardization. Either (a) take a system that has demonstrated broad success and generality over many years and call it a standard, or (b) during the design phase of a particular software project, have the *developers* work to create a standard API with the appropriate community feedback (which is probably product and system developers).
But most importantly, and it really bears repeating, we all win if we let an unbridled software ecosystem flourish within the controller space.
OK, so while we think any discussion about standardizing a controller interface is extremely premature, the question of “what makes a good controller API” is absolutely crucial to SDN and very much worth discussing. However, there has been relatively little discourse on controller APIs. And that’s exactly what we’ll be focusing on in rest of this post. We’ll start by providing a brief look at common controller designs, and then take a closer look at Onix, the controller platform we work on.
Common Controller Types
If you survey the existing controller landscape there appears to be three broad classes of API.
Single Purpose Controllers: These controllers are built for a specific function. So while they may use some common code (like an OpenFlow library) to communicate with the switch, they are otherwise not built with different uses in mind. Or put another way, they lack a control-logic agnostic interface for extension, so there really is little more to talk about.
OpenFlow Controllers: Yeah, that is probably a confusing name, but bear with us. A number of controllers provide a high-level interface to OpenFlow (specifically) and some infrastructure for hosting one or more control “applications”. These are general purpose platforms meant for extension. However, the interface is built around the OpenFlow protocol itself, meaning the interface is a thin wrapper around each message, without offering higher-level abstractions. Therefore, as the protocol changes (e.g., a new message is added), the API necessarily reflects this change. Most controllers (including one we’ve written) fall in this camp.
In our experience, the primary problem with this tight coupling is that it is challenging to evolve any aspect of the control application, switch protocol, or controller state distribution (assuming the controllers form a cluster) without refactoring all the layers of the software stack in tandem.
General SDN Controllers: We’re sort of making these names up as we go along, but a general SDN controller is one in which the control application is fully decoupled from both the underlying protocol(s) communicating with the switches, and the protocol(s) exchanging state between controller instances (assuming the controllers can run in a cluster).
The decoupling is achieved by turning the network control problem into a state management problem, and only exposing the state to be managed to the application, and not the mechanisms by which it arrived.
Applications operate over control traffic, flow table state, configuration state, port state, counters, etc. However, how this data is pulled into the controller, and how any changes are pushed back down to the switch is not the application’s business. It can just happily modify network state as if it were local and the platform will take care of the state dissemination.
The platform we’ve been working on over the last couple of years (Onix) is of this latter category. It supports controller clustering (distribution), multiple controller/switch protocols (including OpenFlow) and provides a number of API design concessions to allow it to scale to very large deployments (tens or hundreds of thousands of ports under control). Since Onix is the controller we’re most familiar with, we’ll focus on it.
So, what does the Onix API look like? It’s extremely simple. In brief, it presents the network to applications as an eventually consistent graph that is shared among the nodes in the controller cluster. In addition, it provides applications with distributed coordination primitives for coordinating nodes’ access to that graph.
Conceptually the interface is as straightforward as it sounds, however understanding its use in practice, especially in a distributed setting, bears more discussion.
So, digging a little further …
The Network as an Eventually Consistent Graph
In Onix, the physical network is represented as a graph for the control application. Elements in the graph may represent physical objects such as switches, or their subparts, such as physical ports, or lookup tables. And it may contain logical entities such as logical ports, tunnels, BFD configuration, etc.
Control applications operate on the graph to find elements of interest and read and write to them. They can register for notifications about state changes in the elements, and they can even extend the existing elements to hold network state specific to a particular control logic.
We happen to call this graph “the NIB” (for Network Information Base).
The NIB abstracts away both the protocols for state synchronization between the controller and the switches and the protocols used for state synchronization between controllers (discussed further below).
Regarding switch state, the control applications operate on the state directly, independent of how it is synchronized with the switch. For example, control logic may traverse the NIB looking for a particular switch, modify some forwarding table entries, and then place a trigger to receive notification on any future changes (such as a port status change event).
Doing this within a single controller node is pretty straightforward. However, for resilience and scale, most production deployments will want to run with multiple controllers. Hence, the platform needs to support clustering.
Within Onix, the NIB is the central mechanism for distributed state sharing. That is, the NIB is actually a shared datastructure among the nodes in the cluster. If a node updates the NIB, eventually that update will propagate to the other nodes interested in that portion of the graph (assuming no network failures).
Under this model, the control logic remains unaware of the specifics of the distribution mechanisms for controller-to-controller state. However, it is the responsibility of the application logic to coordinate which controller instance is responsible for which portion of the graph at any given time. To aid with this, Onix includes a number of standard distributed coordination primitives such as distributed locking, leader election, etc. Using them, control applications can safely partition the work amongst the cluster.
Because the control logic is decoupled from the distribution mechanisms, the Onix may be configured to use different distributed datastores to share different parts of the graph. For more dynamic state, it may use high-performance, memory-only, key/value store, whereas for more static information (such as admin configured parts of the graph) it may prefer storage that is strongly consistent and provides durability.
Of course, a design like this pushes a lot of complexity to the application. For example, the application is responsible for all distributed coordination (though tools for doing so are provided), and if the application chooses to use an eventually consistent storage back-end, it must tolerate eventual consistency in the NIB (including conflicts, data disappearing, etc.)
An alternative approach would have been to constrain the distribution model to something simple (like strongly consistent updates to all data) which would have greatly simplified the API. However, this would come at a huge cost to scalability.
Which begs the question …
Is This the Right Interface?
Perhaps for some environments. Almost certainly not for others. There have been multiple control applications built on Onix, and it is used in large production deployments in the data centers, as well as in the access and core networks. However, it is probably too heavyweight for smaller networks (the home or small enterprise), and it is certainly too complex to use as a basic research tool.
The NIB is also a very low-level interface. Clearly, there is a lot of scaffolding one can build on top to aid in application development. For example, routing libraries, packet classification engines, network discovery components, configuration management frameworks, and what not.
And that is the beauty of software. If you don’t like an API, you fix it, build on it, or you build your own platform. Sure, as a platform matures, binary compatibility and stability of APIs becomes crucial, but that is more a matter of versioning than a priori standardization and design.
So our vote is to keep standards away from the controller design, to support and promote multiple efforts, and to let the ecosystem play out naturally.
Scott’s keynote at Ericsson research on SDN. I really encourage anyone who is interested in OpenFlow and/or SDN to view it. It is, in my opinion, the cleanest justification for SDN, and appropriately articulates where OpenFlow fits in the broader context (… as a minor mechanism that is capable, but generally unimportant).