Wednesday Jun 18, 2014

Diving into OpenStack Network Architecture - Part 3 - Routing

In the previous posts we have seen the basic components of OpenStack networking and then described three simple use cases that explain how network connectivity is achieved. In this short post we will continue to explore networking setup through looking at a more sophisticated (but still pretty basic) use case of routing between two isolated networks. Routing uses the same basic components to achieve inter subnet connectivity and uses another namespace to create an isolated container to allow forwarding from one subnet to another.

Just to remind what we said in the first post, this is just an example using out of the box OVS plugin. This is only one of the options to use networking in OpenStack and there are many plugins that use different means.

Use case #4: Routing traffic between two isolated networks

In a real world deployment we would like to create different networks for different purposes. We would also like to be able to connect those networks as needed. Since those two networks have different IP ranges we need a router to connect them. To explore this setup we will first create an additional network called net2 we will use 20.20.20.0/24 as its subnet. After creating the network we will launch an instance of Oracle Linux and connect it to net2. This is how this looks in the network topology tab from the OpenStack GUI:

If we further explore what happened we can see that another namespace has appeared on the network node, this namespace will be serving the newly created network. Now we have two namespaces, one for each network:

# ip netns list

qdhcp-63b7fcf2-e921-4011-8da9-5fc2444b42dd

qdhcp-5f833617-6179-4797-b7c0-7d420d84040c

To associate the network with the ID we can use net-list or simply look into the UI network information:

# nova net-list

+--------------------------------------+-------+------+

| ID                                   | Label | CIDR |

+--------------------------------------+-------+------+

| 5f833617-6179-4797-b7c0-7d420d84040c | net1  | None |

| 63b7fcf2-e921-4011-8da9-5fc2444b42dd | net2  | None |

+--------------------------------------+-------+------+

Our newly created network, net2 has its own namespace separate from net1. When we look into the namespace we see that it has two interfaces, a local and an interface with an IP which will also serve DHCP requests:

# ip netns exec qdhcp-63b7fcf2-e921-4011-8da9-5fc2444b42dd ip addr

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

    inet 127.0.0.1/8 scope host lo

    inet6 ::1/128 scope host

       valid_lft forever preferred_lft forever

19: tap16630347-45: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN

    link/ether fa:16:3e:bd:94:42 brd ff:ff:ff:ff:ff:ff

    inet 20.20.20.3/24 brd 20.20.20.255 scope global tap16630347-45

    inet6 fe80::f816:3eff:febd:9442/64 scope link

       valid_lft forever preferred_lft forever

Those two networks, net1 and net2, are not connected at this time, to connect them we need to add a router and connect both networks to the router. OpenStack Neutron provides users with the capability to create a router to connect two or more networks This router will be simply an additional namespace.

Creating a router with Neutron can be done from the GUI or from command line:

# neutron router-create my-router

Created a new router:

+-----------------------+--------------------------------------+

| Field                 | Value                                |

+-----------------------+--------------------------------------+

| admin_state_up        | True                                 |

| external_gateway_info |                                      |

| id                    | fce64ebe-47f0-4846-b3af-9cf764f1ff11 |

| name                  | my-router                            |

| status                | ACTIVE                               |

| tenant_id             | 9796e5145ee546508939cd49ad59d51f     |

+-----------------------+--------------------------------------+

We now connect the router to the two networks:

Checking which subnets are available:

# neutron subnet-list

+--------------------------------------+------+---------------+------------------------------------------------+

| id                                   | name | cidr          | allocation_pools                               |

+--------------------------------------+------+---------------+------------------------------------------------+

| 2d7a0a58-0674-439a-ad23-d6471aaae9bc |      | 10.10.10.0/24 | {"start": "10.10.10.2", "end": "10.10.10.254"} |

| 4a176b4e-a9b2-4bd8-a2e3-2dbe1aeaf890 |      | 20.20.20.0/24 | {"start": "20.20.20.2", "end": "20.20.20.254"} |

+--------------------------------------+------+---------------+------------------------------------------------+

Adding the 10.10.10.0/24 subnet to the router:

# neutron router-interface-add fce64ebe-47f0-4846-b3af-9cf764f1ff11 subnet=2d7a0a58-0674-439a-ad23-d6471aaae9bc

Added interface 0b7b0b40-f952-41dd-ad74-2c15a063243a to router fce64ebe-47f0-4846-b3af-9cf764f1ff11.

Adding the 20.20.20.0/24 subnet to the router:

# neutron router-interface-add fce64ebe-47f0-4846-b3af-9cf764f1ff11 subnet=4a176b4e-a9b2-4bd8-a2e3-2dbe1aeaf890

Added interface dc290da0-0aa4-4d96-9085-1f894cf5b160 to router fce64ebe-47f0-4846-b3af-9cf764f1ff11.

At this stage we can look at the network topology view and see that the two networks are connected to the router:

We can also see that the interfaces connected to the router are the interfaces we have defined as gateways for the subnets.

We can also see that another namespace was created for the router:

# ip netns list

qrouter-fce64ebe-47f0-4846-b3af-9cf764f1ff11

qdhcp-63b7fcf2-e921-4011-8da9-5fc2444b42dd

qdhcp-5f833617-6179-4797-b7c0-7d420d84040c

When looking into the namespace we see the following:

# ip netns exec qrouter-fce64ebe-47f0-4846-b3af-9cf764f1ff11 ip addr

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

    inet 127.0.0.1/8 scope host lo

    inet6 ::1/128 scope host

       valid_lft forever preferred_lft forever

20: qr-0b7b0b40-f9: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN

    link/ether fa:16:3e:82:47:a6 brd ff:ff:ff:ff:ff:ff

    inet 10.10.10.1/24 brd 10.10.10.255 scope global qr-0b7b0b40-f9

    inet6 fe80::f816:3eff:fe82:47a6/64 scope link

       valid_lft forever preferred_lft forever

21: qr-dc290da0-0a: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN

    link/ether fa:16:3e:c7:7c:9c brd ff:ff:ff:ff:ff:ff

    inet 20.20.20.1/24 brd 20.20.20.255 scope global qr-dc290da0-0a

    inet6 fe80::f816:3eff:fec7:7c9c/64 scope link

       valid_lft forever preferred_lft forever

We see the two interfaces, “qr-dc290da0-0a“ and “qr-0b7b0b40-f9. Those interfaces are using the IP addresses which were defined as gateways when we created the networks and subnets. Those interfaces are connected to OVS:

# ovs-vsctl show

8a069c7c-ea05-4375-93e2-b9fc9e4b3ca1

    Bridge "br-eth2"

        Port "br-eth2"

            Interface "br-eth2"

                type: internal

        Port "eth2"

            Interface "eth2"

        Port "phy-br-eth2"

            Interface "phy-br-eth2"

    Bridge br-ex

        Port br-ex

            Interface br-ex

                type: internal

    Bridge br-int

        Port "int-br-eth2"

            Interface "int-br-eth2"

        Port "qr-dc290da0-0a"

            tag: 2

            Interface "qr-dc290da0-0a"

                type: internal

        Port "tap26c9b807-7c"

            tag: 1

            Interface "tap26c9b807-7c"

                type: internal

        Port br-int

            Interface br-int

                type: internal

        Port "tap16630347-45"

            tag: 2

            Interface "tap16630347-45"

                type: internal

        Port "qr-0b7b0b40-f9"

            tag: 1

            Interface "qr-0b7b0b40-f9"

                type: internal

    ovs_version: "1.11.0"

As we see those interfaces are connected to “br-int” and tagged with the VLAN corresponding to their respective networks. At this point we should be able to successfully ping the router namespace using the gateway address (20.20.20.1 in this case):

We can also see that the VM with IP 20.20.20.2 can ping the VM with IP 10.10.10.2 and this is how we see the routing actually getting done:

The two subnets are connected to the name space through an interface in the namespace. Inside the namespace Neutron enabled forwarding by setting the net.ipv4.ip_forward parameter to 1, we can see that here:

# ip netns exec qrouter-fce64ebe-47f0-4846-b3af-9cf764f1ff11 sysctl net.ipv4.ip_forward

net.ipv4.ip_forward = 1

We  can see that this net.ipv4.ip_forward is specific to the namespace and is not impacted by changing this parameter outside the namespace.

Summary

When a router is created Neutron creates a namespace called qrouter-<router id>. The subnets are connected to the router through interfaces on the OVS br-int bridge. The interfaces are designated with the correct VLAN so they can connect to their respective networks. In the example above the interface qr-0b7b0b40-f9 is assigned IP 10.10.10.1 and is tagged with VLAN 1, this allows it to be connected to “net1”. The routing action itself is enabled by the net.ipv4.ip_forward parameter set to 1 inside the namespace.

This post shows how a router is created using just a network namespace. In the next post we will see how floating IPs work using iptables. This becomes a bit more sophisticated but still uses the same basic components.

@RonenKofman

 

 

Wednesday Jun 04, 2014

Diving into OpenStack Network Architecture - Part 2 - Basic Use Cases

 

In the previous post we reviewed several network components including Open vSwitch, Network Namespaces, Linux Bridges and veth pairs. In this post we will take three simple use cases and see how those basic components come together to create a complete SDN solution in OpenStack. With those three use cases we will review almost the entire network setup and see how all the pieces work together. The use cases we will use are:

1.       Create network – what happens when we create network and how can we create multiple isolated networks

2.       Launch a VM – once we have networks we can launch VMs and connect them to networks.

3.       DHCP request from a VM – OpenStack can automatically assign IP addresses to VMs. This is done through local DHCP service controlled by OpenStack Neutron. We will see how this service runs and how does a DHCP request and response look like.

In this post we will show connectivity, we will see how packets get from point A to point B. We first focus on how a configured deployment looks like and only later we will discuss how and when the configuration is created. Personally I found it very valuable to see the actual interfaces and how they connect to each other through examples and hands on experiments. After the end game is clear and we know how the connectivity works, in a later post, we will take a step back and explain how Neutron configures the components to be able to provide such connectivity. 

We are going to get pretty technical shortly and I recommend trying these examples on your own deployment or using the Oracle OpenStack Tech Preview. Understanding these three use cases thoroughly and how to look at them will be very helpful when trying to debug a deployment in case something does not work.

Use case #1: Create Network

Create network is a simple operation it can be performed from the GUI or command line. When we create a network in OpenStack the network is only available to the tenant who created it or it could be defined as “shared” and then it can be used by all tenants. A network can have multiple subnets but for this demonstration purpose and for simplicity we will assume that each network has exactly one subnet. Creating a network from the command line will look like this:

# neutron net-create net1

Created a new network:

+---------------------------+--------------------------------------+

| Field                     | Value                                |

+---------------------------+--------------------------------------+

| admin_state_up            | True                                 |

| id                        | 5f833617-6179-4797-b7c0-7d420d84040c |

| name                      | net1                                 |

| provider:network_type     | vlan                                 |

| provider:physical_network | default                              |

| provider:segmentation_id  | 1000                                 |

| shared                    | False                                |

| status                    | ACTIVE                               |

| subnets                   |                                      |

| tenant_id                 | 9796e5145ee546508939cd49ad59d51f     |

+---------------------------+--------------------------------------+

Creating a subnet for this network will look like this:

# neutron subnet-create net1 10.10.10.0/24

Created a new subnet:

+------------------+------------------------------------------------+

| Field            | Value                                          |

+------------------+------------------------------------------------+

| allocation_pools | {"start": "10.10.10.2", "end": "10.10.10.254"} |

| cidr             | 10.10.10.0/24                                  |

| dns_nameservers  |                                                |

| enable_dhcp      | True                                           |

| gateway_ip       | 10.10.10.1                                     |

| host_routes      |                                                |

| id               | 2d7a0a58-0674-439a-ad23-d6471aaae9bc           |

| ip_version       | 4                                              |

| name             |                                                |

| network_id       | 5f833617-6179-4797-b7c0-7d420d84040c           |

| tenant_id        | 9796e5145ee546508939cd49ad59d51f               |

+------------------+------------------------------------------------+

We now have a network and a subnet, on the network topology view this looks like this:

Now let’s dive in and see what happened under the hood. Looking at the control node we will discover that a new namespace was created:

# ip netns list

qdhcp-5f833617-6179-4797-b7c0-7d420d84040c

 

The name of the namespace is qdhcp-<network id> (see above), let’s look into the namespace and see what’s in it:

# ip netns exec qdhcp-5f833617-6179-4797-b7c0-7d420d84040c ip addr

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

    inet 127.0.0.1/8 scope host lo

    inet6 ::1/128 scope host

       valid_lft forever preferred_lft forever

12: tap26c9b807-7c: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN

    link/ether fa:16:3e:1d:5c:81 brd ff:ff:ff:ff:ff:ff

    inet 10.10.10.3/24 brd 10.10.10.255 scope global tap26c9b807-7c

    inet6 fe80::f816:3eff:fe1d:5c81/64 scope link

       valid_lft forever preferred_lft forever

 

We see two interfaces in the namespace, one is the loopback and the other one is an interface called “tap26c9b807-7c”. This interface has the IP address of 10.10.10.3 and it will also serve dhcp requests in a way we will see later. Let’s trace the connectivity of the “tap26c9b807-7c” interface from the namespace.  First stop is OVS, we see that the interface connects to bridge  br-int” on OVS:

# ovs-vsctl show

8a069c7c-ea05-4375-93e2-b9fc9e4b3ca1

    Bridge "br-eth2"

        Port "br-eth2"

            Interface "br-eth2"

                type: internal

        Port "eth2"

            Interface "eth2"

        Port "phy-br-eth2"

            Interface "phy-br-eth2"

    Bridge br-ex

        Port br-ex

            Interface br-ex

                type: internal

    Bridge br-int

        Port "int-br-eth2"

            Interface "int-br-eth2"

        Port "tap26c9b807-7c"

            tag: 1

            Interface "tap26c9b807-7c"

                type: internal

        Port br-int

            Interface br-int

                type: internal

    ovs_version: "1.11.0"

 

In the picture above we have a veth pair which has two ends called “int-br-eth2” and "phy-br-eth2", this veth pair is used to connect two bridge in OVS "br-eth2" and "br-int". In the previous post we explained how to check the veth connectivity using the ethtool command. It shows that the two are indeed a pair:

# ethtool -S int-br-eth2

NIC statistics:

     peer_ifindex: 10

.

.

 

#ip link

.

.

10: phy-br-eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000

.

.

Note that “phy-br-eth2” is connected to a bridge called "br-eth2" and one of this bridge's interfaces is the physical link eth2. This means that the network which we have just created has created a namespace which is connected to the physical interface eth2. eth2 is the “VM network” the physical interface where all the virtual machines connect to where all the VMs are connected.

About network isolation:

OpenStack supports creation of multiple isolated networks and can use several mechanisms to isolate the networks from one another. The isolation mechanism can be VLANs, VxLANs or GRE tunnels, this is configured as part of the initial setup in our deployment we use VLANs. When using VLAN tagging as an isolation mechanism a VLAN tag is allocated by Neutron from a pre-defined VLAN tags pool and assigned to the newly created network. By provisioning VLAN tags to the networks Neutron allows creation of multiple isolated networks on the same physical link.  The big difference between this and other platforms is that the user does not have to deal with allocating and managing VLANs to networks. The VLAN allocation and provisioning is handled by Neutron which keeps track of the VLAN tags, and responsible for allocating and reclaiming VLAN tags. In the example above net1 has the VLAN tag 1000, this means that whenever a VM is created and connected to this network the packets from that VM will have to be tagged with VLAN tag 1000 to go on this particular network. This is true for namespace as well, if we would like to connect a namespace to a particular network we have to make sure that the packets to and from the namespace are correctly tagged when they reach the VM network.

In the example above we see that the namespace interface “tap26c9b807-7c” has vlan tag 1 assigned to it, if we examine OVS we see that it has flows which modify VLAN tag 1 to VLAN tag 1000 when a packet goes to the VM network on eth2 and vice versa. We can see this using the dump-flows command on OVS for packets going to the VM network we see the modification done on br-eth2:

#  ovs-ofctl dump-flows br-eth2

NXST_FLOW reply (xid=0x4):

 cookie=0x0, duration=18669.401s, table=0, n_packets=857, n_bytes=163350, idle_age=25, priority=4,in_port=2,dl_vlan=1 actions=mod_vlan_vid:1000,NORMAL

 cookie=0x0, duration=165108.226s, table=0, n_packets=14, n_bytes=1000, idle_age=5343, hard_age=65534, priority=2,in_port=2 actions=drop

 cookie=0x0, duration=165109.813s, table=0, n_packets=1671, n_bytes=213304, idle_age=25, hard_age=65534, priority=1 actions=NORMAL

 

For packets coming from the interface to the namespace we see the following modification:

#  ovs-ofctl dump-flows br-int

NXST_FLOW reply (xid=0x4):

 cookie=0x0, duration=18690.876s, table=0, n_packets=1610, n_bytes=210752, idle_age=1, priority=3,in_port=1,dl_vlan=1000 actions=mod_vlan_vid:1,NORMAL

 cookie=0x0, duration=165130.01s, table=0, n_packets=75, n_bytes=3686, idle_age=4212, hard_age=65534, priority=2,in_port=1 actions=drop

 cookie=0x0, duration=165131.96s, table=0, n_packets=863, n_bytes=160727, idle_age=1, hard_age=65534, priority=1 actions=NORMAL

 

To summarize we can see that when a user creates a network Neutron creates a namespace and this namespace is connected through OVS to the “VM network”. OVS also takes care of tagging the packets from the namespace to the VM network with the correct VLAN tag and knows to modify the VLAN for packets coming from VM network to the namespace. Now let’s see what happens when a VM is launched and how it is connected to the “VM network”.

Use case #2: Launch a VM

Launching a VM can be done from Horizon or from the command line this is how we do it from Horizon:

Attach the network:

And Launch

Once the virtual machine is up and running we can see the associated IP using the nova list command :

# nova list

+--------------------------------------+--------------+--------+------------+-------------+-----------------+

| ID                                   | Name         | Status | Task State | Power State | Networks        |

+--------------------------------------+--------------+--------+------------+-------------+-----------------+

| 3707ac87-4f5d-4349-b7ed-3a673f55e5e1 | Oracle Linux | ACTIVE | None       | Running     | net1=10.10.10.2 |

+--------------------------------------+--------------+--------+------------+-------------+-----------------+

The nova list command shows us that the VM is running and that the IP 10.10.10.2 is assigned to this VM. Let’s trace the connectivity from the VM to VM network on eth2 starting with the VM definition file. The configuration files of the VM including the virtual disk(s), in case of ephemeral storage, are stored on the compute node at/var/lib/nova/instances/<instance-id>/. Looking into the VM definition file ,libvirt.xml,  we see that the VM is connected to an interface called “tap53903a95-82” which is connected to a Linux bridge called “qbr53903a95-82”:

<interface type="bridge">

      <mac address="fa:16:3e:fe:c7:87"/>

      <source bridge="qbr53903a95-82"/>

      <target dev="tap53903a95-82"/>

    </interface>

 

Looking at the bridge using the brctl show command we see this:

# brctl show

bridge name     bridge id               STP enabled     interfaces

qbr53903a95-82          8000.7e7f3282b836       no              qvb53903a95-82

                                                        tap53903a95-82

 

 The bridge has two interfaces, one connected to the VM (“tap53903a95-82 “) and another one ( “qvb53903a95-82”) connected to “br-int” bridge on OVS:

# ovs-vsctl show

83c42f80-77e9-46c8-8560-7697d76de51c

    Bridge "br-eth2"

        Port "br-eth2"

            Interface "br-eth2"

                type: internal

        Port "eth2"

            Interface "eth2"

        Port "phy-br-eth2"

            Interface "phy-br-eth2"

    Bridge br-int

        Port br-int

            Interface br-int

                type: internal

        Port "int-br-eth2"

            Interface "int-br-eth2"

        Port "qvo53903a95-82"

            tag: 3

            Interface "qvo53903a95-82"

    ovs_version: "1.11.0"

 

As we showed earlier “br-int” is connected to “br-eth2” on OVS using the veth pair int-br-eth2,phy-br-eth2 and br-eth2 is connected to the physical interface eth2. The whole flow end to end looks like this:

VM è tap53903a95-82 (virtual interface)è qbr53903a95-82 (Linux bridge) è qvb53903a95-82 (interface connected from Linux bridge to OVS bridge br-int) è int-br-eth2 (veth one end) è phy-br-eth2 (veth the other end) è eth2 physical interface.

The purpose of the Linux Bridge connecting to the VM is to allow security group enforcement with iptables. Security groups are enforced at the edge point which are the interface of the VM, since iptables nnot be applied to OVS bridges we use Linux bridge to apply them. In the future we hope to see this Linux Bridge going away rules. 

VLAN tags: As we discussed in the first use case net1 is using VLAN tag 1000, looking at OVS above we see that qvo41f1ebcf-7c is tagged with VLAN tag 3. The modification from VLAN tag 3 to 1000 as we go to the physical network is done by OVS  as part of the packet flow of br-eth2 in the same way we showed before.

To summarize, when a VM is launched it is connected to the VM network through a chain of elements as described here. During the packet from VM to the network and back the VLAN tag is modified.

Use case #3: Serving a DHCP request coming from the virtual machine

In the previous use cases we have shown that both the namespace called dhcp-<some id> and the VM end up connecting to the physical interface eth2  on their respective nodes, both will tag their packets with VLAN tag 1000.We saw that the namespace has an interface with IP of 10.10.10.3. Since the VM and the namespace are connected to each other and have interfaces on the same subnet they can ping each other, in this picture we see a ping from the VM which was assigned 10.10.10.2 to the namespace:

The fact that they are connected and can ping each other can become very handy when something doesn’t work right and we need to isolate the problem. In such case knowing that we should be able to ping from the VM to the namespace and back can be used to trace the disconnect using tcpdump or other monitoring tools.

To serve DHCP requests coming from VMs on the network Neutron uses a Linux tool called “dnsmasq”,this is a lightweight DNS and DHCP service you can read more about it here. If we look at the dnsmasq on the control node with the ps command we see this:

dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tap26c9b807-7c --except-interface=lo --pid-file=/var/lib/neutron/dhcp/5f833617-6179-4797-b7c0-7d420d84040c/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/5f833617-6179-4797-b7c0-7d420d84040c/host --dhcp-optsfile=/var/lib/neutron/dhcp/5f833617-6179-4797-b7c0-7d420d84040c/opts --leasefile-ro --dhcp-range=tag0,10.10.10.0,static,120s --dhcp-lease-max=256 --conf-file= --domain=openstacklocal

The service connects to the tap interface in the namespace (“--interface=tap26c9b807-7c”), If we look at the hosts file we see this:

# cat  /var/lib/neutron/dhcp/5f833617-6179-4797-b7c0-7d420d84040c/host

fa:16:3e:fe:c7:87,host-10-10-10-2.openstacklocal,10.10.10.2

 

If you look at the console output above you can see the MAC address fa:16:3e:fe:c7:87 which is the VM MAC. This MAC address is mapped to IP 10.10.10.2 and so when a DHCP request comes with this MAC dnsmasq will return the 10.10.10.2.If we look into the namespace at the time we initiate a DHCP request from the VM (this can be done by simply restarting the network service in the VM) we see the following:

# ip netns exec qdhcp-5f833617-6179-4797-b7c0-7d420d84040c tcpdump -n

19:27:12.191280 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:fe:c7:87, length 310

19:27:12.191666 IP 10.10.10.3.bootps > 10.10.10.2.bootpc: BOOTP/DHCP, Reply, length 325

 

To summarize, the DHCP service is handled by dnsmasq which is configured by Neutron to listen to the interface in the DHCP namespace. Neutron also configures dnsmasq with the combination of MAC and IP so when a DHCP request comes along it will receive the assigned IP.

Summary

In this post we relied on the components described in the previous post and saw how network connectivity is achieved using three simple use cases. These use cases gave a good view of the entire network stack and helped understand how an end to end connection is being made between a VM on a compute node and the DHCP namespace on the control node. One conclusion we can draw from what we saw here is that if we launch a VM and it is able to perform a DHCP request and receive a correct IP then there is reason to believe that the network is working as expected. We saw that a packet has to travel through a long list of components before reaching its destination and if it has done so successfully this means that many components are functioning properly.

In the next post we will look at some more sophisticated services Neutron supports and see how they work. We will see that while there are some more components involved for the most part the concepts are the same.

@RonenKofman

Wednesday May 28, 2014

Diving into OpenStack Network Architecture - Part 1

Before we begin

OpenStack networking has very powerful capabilities but at the same time it is quite complicated. In this blog series we will review an existing OpenStack setup using the Oracle OpenStack Tech Preview and explain the different network components through use cases and examples. The goal is to show how the different pieces come together and provide a bigger picture view of the network architecture in OpenStack. This can be very helpful to users making their first steps in OpenStack or anyone wishes to understand how networking works in this environment.  We will go through the basics first and build the examples as we go.

According to the recent Icehouse user survey and the one before it, Neutron with Open vSwitch plug-in is the most widely used network setup both in production and in POCs (in terms of number of customers) and so in this blog series we will analyze this specific OpenStack networking setup. As we know there are many options to setup OpenStack networking and while Neturon + Open vSwitch is the most popular setup there is no claim that it is either best or the most efficient option. Neutron + Open vSwitch is an example, one which provides a good starting point for anyone interested in understanding OpenStack networking. Even if you are using different kind of network setup such as different Neutron plug-in or even not using Neutron at all this will still be a good starting point to understand the network architecture in OpenStack.

The setup we are using for the examples is the one used in the Oracle OpenStack Tech Preview. Installing it is simple and it would be helpful to have it as reference. In this setup we use eth2 on all servers for VM network, all VM traffic will be flowing through this interface.The Oracle OpenStack Tech Preview is using VLANs for L2 isolation to provide tenant and network isolation. The following diagram shows how we have configured our deployment:


This first post is a bit long and will focus on some basic concepts in OpenStack networking. The components we will be discussing are Open vSwitch, network namespaces, Linux bridge and veth pairs. Note that this is not meant to be a comprehensive review of these components, it is meant to describe the component as much as needed to understand OpenStack network architecture. All the components described here can be further explored using other resources.

Open vSwitch (OVS)

In the Oracle OpenStack Tech Preview OVS is used to connect virtual machines to the physical port (in our case eth2) as shown in the deployment diagram. OVS contains bridges and ports, the OVS bridges are different from the Linux bridge (controlled by the brctl command) which are also used in this setup. To get started let’s view the OVS structure, use the following command:

# ovs-vsctl show

7ec51567-ab42-49e8-906d-b854309c9edf

    Bridge br-int

        Port br-int

            Interface br-int

type: internal

        Port "int-br-eth2"

            Interface "int-br-eth2"

    Bridge "br-eth2"

        Port "br-eth2"

            Interface "br-eth2"

type: internal

        Port "eth2"

            Interface "eth2"

        Port "phy-br-eth2"

            Interface "phy-br-eth2"

ovs_version: "1.11.0"

We see a standard post deployment OVS on a compute node with two bridges and several ports hanging off of each of them. The example above is a compute node without any VMs, we can see that the physical port eth2 is connected to a bridge called “br-eth2”. We also see two ports "int-br-eth2" and "phy-br-eth2" which are actually a veth pair and form virtual wire between the two bridges, veth pairs are discussed later in this post.

When a virtual machine is created a port is created on one the br-int bridge and this port is eventually connected to the virtual machine (we will discuss the exact connectivity later in the series). Here is how OVS looks after a VM was launched:

# ovs-vsctl show

efd98c87-dc62-422d-8f73-a68c2a14e73d

    Bridge br-int

        Port "int-br-eth2"

            Interface "int-br-eth2"

        Port br-int

            Interface br-int

type: internal

        Port "qvocb64ea96-9f"

tag: 1

            Interface "qvocb64ea96-9f"

    Bridge "br-eth2"

        Port "phy-br-eth2"

            Interface "phy-br-eth2"

        Port "br-eth2"

            Interface "br-eth2"

type: internal

        Port "eth2"

            Interface "eth2"

ovs_version: "1.11.0"

Bridge "br-int" now has a new port "qvocb64ea96-9f" which connects to the VM and tagged with VLAN 1. Every VM which will be launched will add a port on the “br-int” bridge for every network interface the VM has.

Another useful command on OVS is dump-flows for example:

# ovs-ofctl dump-flows br-int

NXST_FLOW reply (xid=0x4):

cookie=0x0, duration=735.544s, table=0, n_packets=70, n_bytes=9976, idle_age=17, priority=3,in_port=1,dl_vlan=1000 actions=mod_vlan_vid:1,NORMAL

cookie=0x0, duration=76679.786s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,in_port=1 actions=drop

cookie=0x0, duration=76681.36s, table=0, n_packets=68, n_bytes=7950, idle_age=17, hard_age=65534, priority=1 actions=NORMAL

As we see the port which is connected to the VM has the VLAN tag 1. However the port on the VM network (eth2) will be using tag 1000. OVS is modifying the vlan as the packet flow from the VM to the physical interface. In OpenStack the Open vSwitch agent takes care of programming the flows in Open vSwitch so the users do not have to deal with this at all. If you wish to learn more about how to program the Open vSwitch you can read more about it at http://openvswitch.org looking at the documentation describing the ovs-ofctl command.

Network Namespaces (netns)

Network namespaces is a very cool Linux feature can be used for many purposes and is heavily used in OpenStack networking. Network namespaces are isolated containers which can hold a network configuration and is not seen from outside of the namespace. A network namespace can be used to encapsulate specific network functionality or provide a network service in isolation as well as simply help to organize a complicated network setup. Using the Oracle OpenStack Tech Preview we are using the latest Unbreakable Enterprise Kernel R3 (UEK3), this kernel provides a complete support for netns.

Let's see how namespaces work through couple of examples to control network namespaces we use the ip netns command:
Defining a new namespace:

# ip netns add my-ns

# ip netns list

my-ns

As mentioned the namespace is an isolated container, we can perform all the normal actions in the namespace context using the exec command for example running the ifconfig command:

# ip netns exec my-ns ifconfig -a

lo        Link encap:Local Loopback

          LOOPBACK  MTU:16436 Metric:1

          RX packets:0 errors:0 dropped:0 overruns:0 frame:0

          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

We can run every command in the namespace context, this is especially useful for debug using tcpdump command, we can ping or ssh or define iptables all within the namespace.

Connecting the namespace to the outside world:
There are various ways to connect into a namespaces and between namespaces we will focus on how this is done in OpenStack. OpenStack uses a combination of Open vSwitch and network namespaces. OVS defines the interfaces and then we can add those interfaces to namespace.

So first let's add a bridge to OVS:

# ovs-vsctl add-br my-bridge

Now let's add a port on the OVS and make it internal:

# ovs-vsctl add-port my-bridge my-port

# ovs-vsctl set Interface my-port type=internal

And let's connect it into the namespace:

# ip link set my-port netns my-ns

Looking inside the namespace:

# ip netns exec my-ns ifconfig -a

lo        Link encap:Local Loopback

          LOOPBACK  MTU:65536 Metric:1

          RX packets:0 errors:0 dropped:0 overruns:0 frame:0

          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

my-port   Link encap:Ethernet HWaddr 22:04:45:E2:85:21

          BROADCAST  MTU:1500 Metric:1

          RX packets:0 errors:0 dropped:0 overruns:0 frame:0

          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

Now we can add more ports to the OVS bridge and connect it to other namespaces or other device like physical interfaces.

Neutron is using network namespaces to implement network services such as DCHP, routing, gateway, firewall, load balance and more. In the next post we will go into this in further details.

Linux Bridge and veth pairs

Linux bridge is used to connect the port from OVS to the VM. Every port goes from the OVS bridge to a Linux bridge and from there to the VM. The reason for using regular Linux bridges is for security groups’ enforcement. Security groups are implemented using iptables and iptables can only be applied to Linux bridges and not to OVS bridges.

Veth pairs are used extensively throughout the network setup in OpenStack and are also a good tool to debug a network problem. Veth pairs are simply a virtual wire and so veths always come in pairs. Typically one side of the veth pair will connect to a bridge and the other side to another bridge or simply left as a usable interface.

In this example we will create some veth pairs, connect them to bridges and test connectivity. This example is using regular Linux server and not an OpenStack node:
Creating a veth pair, note that we define names for both ends:

# ip link add veth0 type veth peer name veth1

# ifconfig -a

.

.

veth0     Link encap:Ethernet HWaddr 5E:2C:E6:03:D0:17

          BROADCAST MULTICAST  MTU:1500 Metric:1

          RX packets:0 errors:0 dropped:0 overruns:0 frame:0

          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

veth1     Link encap:Ethernet HWaddr E6:B6:E2:6D:42:B8

          BROADCAST MULTICAST  MTU:1500 Metric:1

          RX packets:0 errors:0 dropped:0 overruns:0 frame:0

          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

.

.

To make the example more meaningful we will create the following setup:

veth0 => veth1 => br-eth3 => eth3 ======> eth2 on another Linux server

br-eth3 – a regular Linux bridge which will be connected to veth1 and eth3

eth3 – a physical interface with no IP on it, connected to a private network

eth2 – a physical interface on the remote Linux box connected to the private network and configured with the IP of 50.50.50.1

Once we create the setup we will ping 50.50.50.1 (the remote IP) through veth0 to test that the connection is up:

# brctl addbr br-eth3

# brctl addif br-eth3 eth3

# brctl addif br-eth3 veth1

# brctl show

bridge name     bridge id               STP enabled     interfaces

br-eth3         8000.00505682e7f6       no              eth3

                                                        veth1

# ifconfig veth0 50.50.50.50

# ping -I veth0 50.50.50.51

PING 50.50.50.51 (50.50.50.51) from 50.50.50.50 veth0: 56(84) bytes of data.

64 bytes from 50.50.50.51: icmp_seq=1 ttl=64 time=0.454 ms

64 bytes from 50.50.50.51: icmp_seq=2 ttl=64 time=0.298 ms

When the naming is not as obvious as the previous example and we don't know who are the paired veth interfaces we can use the ethtool command to figure this out. The ethtool command returns an index we can look up using ip link command, for example:

# ethtool -S veth1

NIC statistics:

peer_ifindex: 12

# ip link

.

.

12: veth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000

Summary

That’s all for now, we quickly reviewed OVS, network namespaces, Linux bridges and veth pairs. These components are heavily used in the OpenStack network architecture we are exploring and understanding them well will be very useful when reviewing the different use cases. In the next post we will look at how the OpenStack network is laid out connecting the virtual machines to each other and to the external world.

@RonenKofman

About

Ronen is Director of Product Development for Oracle OpenStack. You are welcome to follow Ronen on Twitter at @RonenKofman

Search

Archives
« April 2015
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
  
       
Today