Move your VMware and KVM applications to the cloud without making any changes

  • October 11, 2015

How to build a large scale BGP MPLS Service Provider Network and model on AWS – Part 1

Matt Conran
Matt Conran is a Network Architect based out of Ireland and a prolific blogger at Network Insight. In his spare time he writes on topics ranging from SDN, OpenFlow, NFV, OpenStack, Cloud, Automation and Programming.

Service Providers around the globe deploy MPLS networks using label-switched paths to improve quality of service (QoS) that meet specific SLAs that enterprises demand for traffic. This is a two-part post – Part 1 (this post) introduces MPLS constructs and Part 2 describes how to setup a fully-functional MPLS network using Ravello’s Network Smart Lab. Interested in playing with this 14 node Service Provider MPLS deployment – just add this blueprint from Ravello Repo.

Multiprotocol Label Switching (MPLS)

One could argue that Multiprotocol Label Switching (MPLS) is a virtualization technique. It is used to build virtual topologies on top of physical topologies by creating connections between the network edges, explicitly called tunnels. MPLS is designed to reduce the forwarding information managed by the internal nodes of the network by tunneling through them. MPLS architectures build on the concept of complex edges and simple cores, allowing networks to scale and serve millions of customer routes.

MPLS changed the way we think about control planes. It pushed most of the control plane to the edge of the network. The MPLS Provider (P) nodes still run a control plane but the complex decision making is now at the edge. In terms of architecture and scale, MPLS leads to very scalable networks and reduces the challenges of distributed control plane. It allows service providers to connect millions of customers together and place them into separate VRF containers allowing overlapping IP addressing and independent topologies.

The diagram below represents high level components of a MPLS network. Provider Edge (PE) nodes are used to terminate customer connections, Provider (P) nodes label switch packets and Route Reflectors (RR) are used for IPv4 and IPv6 route propagation.

Figure 1 Service Provider MPLS Network

How does MPLS work?

Virtual routing and forwarding (VRF) instances are like VLANs, except they are Layer 3 and not Layer 2. In a single VRF-capable router you have multiple logical routers with one single management entity. The virtual VRF is like a full blown router. It has its own routing protocols, topology, and independent set of interfaces and subnets. If you want to connect multiple routers with VRFs together, you can use either Layer 2 trunk, Generic Routing Encapsulation (GRE) or MPLS.

First, you need an interface per VRF on each router, which means you need a VLAN or GRE tunnel for each VRF, also a VRF routing protocol on every single hop. This is known as Multi-VRF. For example, with a Multi-VRF approach, if you have 4 routers in sequence you have to run two copies of the routing protocol on every single path. Every router in the path must have the VRF configured, resulting in numerous routing protocol adjacencies and convergence events occurring if a single link fails. Multi-VRF with its hop-by-hop configuration does not scale and should be used through a maximum of one hop. A more scalable solution is a full blown MPLS network, as described below.

An end-to-end MPLS implementation is more common than multi-VRFs. They build a Label Switched Path (LSP) between every ingress and egress router. To enable this functionality, you have to enable LDP on individual interfaces. Routers send LDP Hello messages and establish an LDP session over TCP. It is enabled only on Core (P to P, PE to P, P to RR) interface and not on user interfaces facing CE routers. Every router running LDP will assign a label to every prefix in the CEF table. These labels are then advertised to all LDP neighbors. Usually, you only need to advertise labels for the internal PE BGP next hops, enabling Label Switched Paths within the core.

Provider (P) and Provider Edge (PE) Nodes

MPLS shifted the way we think about control plane and pushed the intelligence of the network to the edges. It changed how the data plane and control planes interrelate. The decisions are now made at the edge of the network with devices known as Provider Edge (PE) routers. PE’s control the best-path decisions, perform end-to-end path calculation and any encapsulation/de-capsulations. The PE’s run BGP that uses a special address family called VPNv4. The Provider (P) routers sit in the core of the network and switch packets based on labels. They do not not contain end-to-end path information and require path information to the remote PE next hop address only. The PE nodes run the routing protocols with the CE’s. No customer running information should be passed into the core of the network. Any customer route convergence events are stopped at the PE nodes, protecting the core. The P’s run an internal IGP for remote PE reachability and LDP to assign labels to IP prefixes in the Cisco Express Forwarding (CEF) table.

The idea is to use a single protocol for all VRFs. BGP is the only routing protocol scalable enough and allows extra attributes added to prefixes making them unique. We redistribute routes from customers IGP or connected networks from the VRFs to Multiprotocol BGP (MP-BGP). You do the same at the other connected end and the BGP updates are carried over the core. As discussed, transport between the routers in the middle can be either layer 2 transport, GRE tunnel or MPLS with Label Switched Paths (LSP).

Route Distinguishers (RD) and Route Targets (RT)

MPLS/VPN networks have the concept of Route Distinguishers (RD) and Route Targets (RT).
The RD distinguishes one set of routes from another. It’s a number prepended to each route within a VRF. The number is used to identify which VRF the route belongs to. A number of options exist for the format of a RD but essentially it is a flat number prepended to a route.

The following snippet displays a configuration example of a RT and RD attached to VRF instance named VRFSite_A.

ip vrf VRFSite_A rd 65000:100 route-target export 65000:100 route-target import 65000:100

A route for in VRF Site_A is effectively advertised as 65000:100:

RT’s are used to share routes among sites. They are assigned to routes and control the import and export of routes to VRFs. At a basic level, for end-to-end communication, an export RT at one site must match an imported RT at the other site. Allowing sites to import some RT’s and export others, enables the creation of complex MPLS/VPN topologies, such as hub and spoke, full and partial mesh designs.

The following shows a wireshark capture of a MP-BGP update. The RT value is displayed as a extended community with value 65000:10. Also, the RD is specified as the value 65000:10.

When you redistribute from VRF into VPNv4 BGP table, it prepends a 64-bit user configurable route distinguisher (RD) to every IPv4 address. Every IPv4 prefix gets 64-bits in front of it to make it globally unique. RD is not the VPN identifier. It is just a number that makes the IPv4 address globally unique. Usually, we use the 2-byte AS number + 4 byte decimal value.

Route Targets (RT) (extended BGP community) are not part of the prefix but attached to the prefix. As discussed, the RT attribute controls the import process to the BGP table. It tells BGP to which VRF should it insert the specific prefix.

The diagram below displays the generic actions involved with the traffic flow of a MPLS/VPN network. Labels are assigned to prefixes and sent as BGP update messages to corresponding peers.

Multi-protocol Extensions for BGP

MP-BGP allows you to carry a variety of information. More recently, BGP has been used to carry MAC addresses with a feature known an EVPN. It also acts as a control and DDoS prevention tool, downloading PBR and ACL to the TCAM on routers with BGP FlowSpec. It is a very extensible protocol.

Label Distribution Protocol (LDP) is supported solely for IPv4 cores, which means to transport IPv6 packets we need to either use LDPv6, which does not exist or somehow tunnel packets across an IPv4-only core. We need a mechanism to extend IPv4 BGP to carry IPv6 prefixes. A feature known an 6PE solves this and allows IPv6 packets to be labelled and sent over a BGP IPv4 TCP connection. The BGP session is built over TCP over IPv4 but with the “send-label” command, allowing BGP to assign a BGP label (not LDP label) to IPv6 packets – enabling IPv6 transport over an IPv4 BGP session. In this case, both BGP and LDP are used to assign labels. BGP assigns a label to the IPv6 prefix and LDP assigns a label for remote PE reachability. This allows the enablement of IPv6 services over an IPv4 core. BGP is simply a TCP application carrying multiple types of traffic.


This post introduces the key concepts involved in creating a Service Provider MPLS network. Interested in building your MPLS network – read Part 2 of this article for step by step instructions on how to create a 14 node MPLS network using Ravello’s Network Smart Lab. You can also play with this fully functional MPLS network by adding this blueprint to your Ravello account.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.