The following post explains key drivers for overlay virtual networking and popular overlay technology called Virtual Extensible LAN (VXLAN). It walks through how to set up a working VXLAN design based on Arista vEOS on Ravello’s Networking Smart Lab. Additional command snippets from a multicast VXLAN using Cisco CSR1000V setup are also present.
The evolution of the data centre started with cloud computing when Amazon introduced the first commercial product to provide a “cloud anytime and anywhere” service offering. The result has driven a major paradigm shift in networking, which forces us to rethink data centre architectures and how they service applications. Designs goals must be in line with new scalability and multi-tenancy requirements that existing technologies cannot meet.
Overlay-based architectures provide some protection against switch table explosion. They offer a level of indirection that enables switch tables sizes not to increase as the number of hosts supported increases. This combats some of the scalability concerns. Putting design complexities aside, an overlay is a tunnel between two endpoints, enabling frames to be exchanged between those endpoints. As the diagram below states, it’s a network that is built on top of another network. The overlay is a logical Layer 2 network and the underlay could be an IP or Layer 2 Ethernet fabric.
The ability to decouple virtual from physical drives new scalability requirements that push data centres to the edge. For example, the quantity of virtual machines deployed on single physical servers has increased. Each virtual machine has one or more virtual network card (vNIC) with unique MAC address. Traditional switched environments consist of single server connected end hosts, viewed as a single device. The corresponding upstream switch processes a low number of MAC address. Now, with the advent of server virtualization, ToR switch accommodates multiple hosts on the same port. As a result, MAC table size must increase by an order of magnitude to accommodate the increase scalability requirement driven by server virtualization. Some vendors have low MAC address tables support, which may result in MAC address table overflow causing unnecessary flooding and operational problems.
Over the course of the last 10 years, the application has changed. Customers require complex application tiers in their public or private IaaS clouds deployments. Application stacks usually contain a number of tiers that require firewalling and/or load balancing services at edges and within the application/database tier. Additionally, in order to support this infrastructure you need Layer 3 or Layer 2 segments between tiers. Layer 2 supports non-routable or keepalive packets.
The infrastructure must support multi-tenancy for numerous application stacks in the same cloud environment. Each application stack must maintain independence from one another. Application developers continually request similar application connectivity models such as same IP address and security models for applications, both for on-premise and cloud services. Applications that are “cloud centric” and born for the cloud are easy to support from a network perspective. On the other hand, optimized applications for “cloud-ready” status may require some network tweaking and overlay models to support coexistence. Ideally, application teams want to move the exact same model to cloud environments while keeping every application as an independent tenant. You may think that 4000 VLAN is enough but once you start deploying each application as a segment and each application has a numerous segments, the 4000 VLAN limit is soon reached.
Customers want the ability to run any VM on any server. This type of unlimited and distributed workload placement results in large virtual segments. Live VM mobility also requires Layer 2 connectivity between the virtual machines. All this coupled with quick on-demand provision is changing the provisioning paradigm and technologies choices we employ to design data centres.
We need a technology that decouples physical transport from the virtual network. Transports should run IP only and all complexities are performed at the edges. Keep complexity to the edge of the network and let the core do what it should do – forward packets as fast as possible, without making too many decisions. The idea of having a simple core and smart edges carrying out intelligence (encapsulation) allows you to build a scalable architecture. Overlay virtual networking supports this concept and all virtual machine traffic generated between hosts is encapsulated and becomes an IP application. This is similar to how Skype (voice) uses IP to work on the Internet. The result of this logical connection is that the edge of the data centre has moved and potentially no longer exist in the actual physical network infrastructure. In a hypervisor environment, one could now say the logical overlay is decoupled from the physical infrastructure.
Virtual Extensible LAN (VXLAN) is an LAN extension over a Layer 3 network. It relays Layer 2 traffic over different IP subnets. It addresses the current scalability concerns with VLANs and can be used to stretch Layer 2 segments over Layer 3 core supporting VM mobility and stretched clusters. It may also be used as a Data Centre Interconnect (DCI) technology, allowing customers to have Multi-Chassis Link Aggregation Groups (MLAG) configurations between two geographically dispersed sites.
VXLAN identifies individual Layer 2 domains by a 24-bit virtual network identifier (VNI). The following configuration snippet from Cisco CSR1000V displays VNI 4096 mapped to multicast group 22.214.171.124. In the CSR1000V setup, PIM spare-mode (Protocol Independent Multicast) is enabled in the core.
The VNI identifies the particular VXLAN segment and is typically determined based on the IEEE 802.1Q VLAN tag of the frame received. The 24-bit segment-ID enables up to 16 million VLANs in your network. It employs MAC over IP/UDP overlay scheme that allows unmodified Layer 2 Ethernet frames to be encapsulated in IP/UDP datagrams relayed transparently over the IP network. The main components in VXLAN architecture consist of virtual tunnel end-points (VTEPs) and virtual tunnel interfaces (VTI). The VTEP perform the encapsulation and de-encapsulation of the Layer 2 traffic. Each VTEP is identified by an IP address (assigned by the VTI), deployed as the tunnel endpoint.
Previous VXLAN implementation consisted of virtual Layer 2 segments with flooding via IP multicast in the transport network. It relied on traditional Layer 2 flooding/learning behaviour and did not have an explicit control plane. Broadcast/multicast and unknown unicast frames were flooding similar to how they flooded in a physical Ethernet network – the difference being they were flooded with IP multicast. The configuration snippet is from the CSR1000v setup and displays multicast group 126.96.36.199 signalled with flag BCx (VXLAN group).
Multicast VXLAN scales much better than VLANs, but you are still limited by the number of multicast entries. Also, multicast in the core is undesirable especially from a network manageability point of view. It’s another feature that has to be managed, maintained and troubleshooted. A more scalable solution is to use unicast VXLAN. There are methods to introduce control planes to VXLAN, such as LISP and BGP-EVPN and these will be discussed in later posts.
I set up my Arista vEOS lab environment on Ravello’s Network Smart Lab. It was easy to build the infrastructure using Ravello’s application canvas. One can either import the VMs with the Ravello import utility or use Ravello Repo to copy existing ‘blueprints’. I was able to source the Arista VMs from the Ravello Repo.
Once you have the VM setup, move to the “Application” tab to create a new application. The application tab is where you build your deployment before you publish your virtual network deployment to either Google or Amazon Cloud. Ravello gives one the opportunity to configure some basic settings from the right-hand sections. I jumped straight to the network section and added the network interface settings.
The settings you enter in the canvas sets up the basic network connectivity. For example, if you configure an NIC with address 10.1.1.1./24, Ravello cloud will automatically create the vSwitch. If you require inbound SSH access, add this external service from the services section. Ravello will automatically give you a public IP address and DNS name, allowing the use of your local SSH client.
Once the basic topology is set-up, publish to the cloud and start configuring advanced features via the CLI.
The Arista vEOS design consists of a 4 node set-up. The vEOS is the EOS in a VM. It’s free to download and can be found here. The two Core’s, R1 and R2 are BGP peers and formulate BGP peerings with each other, simulating the IP Core. There are no VLANs spanning the core and it’s a simple IP backbone, forwarding IP packets. Host1 and Host2 are IP endpoints and connect to access ports on R1 (VLAN 10) and R2 (VLAN 11) respectively. They are in the same subnet, separated by layer 3. The hosts do not perform any routing.
Below is a VXLAN configuration snippet for R1. The majority of VXLAN configuration is done under the Vxlan interface settings.
The next snippet displays the VXLAN interface for R2. The loopbacks are used for VXLAN VTEP endpoints and you can see that R2 VTEP loopback address is 172.16.0.2. The VNI is set to 10010 used identify the VXLAN segments.
The MAC address table for R2 below shows the MAC address for H2 only. The core supports Layer 3 only and the only routes learnt are the VTEP endpoints.
Configuration for Arista vEOS are found at this GitHub Account.
VXLAN is a widely used data-center interconnect (DCI) technology, and can be implemented using Arista vEOS or Cisco CSR1000v to seamlessly connect data-centers. Ravello’s Networking Smart Labs provides an easy way to model and test a VXLAN before it is rolled out into production infrastructure.