Using RDS with ipv6 Networks

February 7, 2018 | 7 minute read
Text Size 100%:

RDS is the open source Reliable Datagram Sockets protocol developed by Oracle and contributed to the Linux kernel. Kernel developer Ka-Cheong Poon writes the following about his work to make the RDS code ipv6 capable. 

 

Currently, RDS can only work between peers using IPv4 address. As IPv6 deployment is increasing around the world, the need to have RDS working between peers using IPv6 address is becoming more and more important. We have updated Oracle Linux RDS to support IPv6 to meet the needs. This article explains how an existing application written in C which utilizes RDS can be changed to support IPv6. RFC 3493 can be consulted for the basic IPv6 API. And RFC 4291 explains the IPv6 addressing architecture. We have also updated the rds-tools package to support IPv6.  Some usage examples are given below.

RDS IPv6 support is designed to ensure that minimal changes are needed to modify an application to support IPv6.  As before, the following call creates an RDS socket

    int sd;
    sd = socket(AF_RDS, SOCK_SEQPACKET, 0);

The socket created can be used to communicate with peer using either IPv4 or IPv6 address. This is a bit different from, say a TCP socket, which requires specifying the address family when a socket is created.  So when will this RDS socket be changed to be an IPv6 RDS socket? Before explaining that, let's define the following union to make address handling illustration easier

union sockaddr_ip {
    struct sockaddr_in      addr4;
    struct sockaddr_in6     addr6;
};

This union can be used to store either an IPv4 or an IPv6 address. Suppose an application needs to use an user supplied local address to communicate with an user supplied peer address, it can use the following code to parse the supplied buffers

    char *user_suuplied_laddr, *user_supplied_paddr;
    struct addrinfo *ainfo;
    union sockaddr_ip local_addr, peer_addr;
    socklen_t local_addrlen, peer_addrlen;

    if (getaddrinfo(user_supplied_laddr, NULL, NULL, &ainfo) != 0) {
        /* Error handling code */
    } else {
        /* Just use the first one returned. */
        local_addrlen = ainfo->ai_addrlen;
        memcpy(&local_addr, ainfo->ai_addr, local_addrlen);
        freeaddrinfo(ainfo);
        ...
    }

    if (getaddrinfo(user_supplied_paddr, NULL, NULL, &ainfo) != 0) {
        /* Error handling code */
    } else {
        /* Just use the first one returned. */
        peer_addrlen = ainfo->ai_addrlen;
        memcpy(&dst_addr, ainfo->ai_addr, peer_addrlen);
        freeaddrinfo(ainfo);
        ...
    }

    /* The following checks for address family mismatched.  Note that the
     * address family field of all socket address structures are at the same
     * position.  Hence we can use sin_family to do the check.
     */
    if (local_addr.addr4.sin_family != peer_addr.addr4.sin_family) {
        /* Error handling code */
    }

The getaddrinfo(3) function understands both IPv4 and IPv6 addresses and fills in the address with correct information.

Before communicating with the peer, an RDS socket is required to be bound first.  This can be done by the following

    if (bind(sd, (struct sockaddr *)local_addr, local_addrlen) != 0) {
        /* Error handling code */
    }

If the user has supplied an IPv4 address, the RDS socket becomes an IPv4 RDS socket.  But if the user has supplied an IPv6 address, the RDS socket becomes an IPv6 socket. So an RDS socket's family is fixed at binding time.  Note also that an RDS socket can only be bound once. A second bind() will fail. This means that once the address family of an RDS socket is set, it cannot be changed. And an RDS socket can only communicate with a peer in the same family. This is the reason for the family mismatched check above.

The application can now talk to the peer as following

    char *msg;
    size_t msg_len;

    if (sendto(sd, msg, msg_len, 0, (struct sockaddr *)peer_addr, peer_addrlen) < 0) {
        /* Error handling code */
    }

If the application wants to use the same socket to talk to another peer, it can do so too. Just be sure that the new peer address must be in the same family as the bound address. As before, getsockname() can be used to obtain the socket bound address like

    union sockaddr_ip ipaddr;
    socklen_t addrlen;

    if (getsockname(sd, (struct sockaddr *)ipaddr, &addrlen) < 0) {
        /* Error handling code */
    }

If the socket is not yet bound, the returned address family is set to AF_UNSPEC.  This can be used to determine if the address is bound or not.  Note the use of union sockaddr_ip here as the socket can be bound to either an IPv4 or IPv6 address.

In summary, changing an RDS application to use IPv6 address only requires changing the socket address structure used to store an IP address and the code used to fill in such structures.  Once the address structure is changed, all the other socket calls are pretty much the same as before.

rds-tools package

The rds-tools package includes three tools, rds-info, rds-ping and rds-stress.  All of them are updated to support IPv6.  Followings are examples of their usage.

rds-ping

Suppose in host A,

[host-a]> ip addr show ib0
9: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast state UP qlen 4096
    link/infiniband 80:00:02:08:fe:80:00:00:00:00:00:00:00:02:c9:03:00:0a:79:fd brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    inet6 2010:211::12/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::202:c903:a:79fd/64 scope link
       valid_lft forever preferred_lft forever

And in host B,

[host-b]> ip addr show ib0
6: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast state UP qlen 4096
    link/infiniband 80:00:02:08:fe:80:00:00:00:00:00:00:00:02:c9:03:00:0a:75:a5 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    inet6 2010:211::22/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::202:c903:a:75a5/64 scope link
       valid_lft forever preferred_lft forever

We can use rds-ping to test RDS IPv6 reachability between them by doing

[host-a]# rds-ping fe80::202:c903:a:75a5%ib0
   1: 160 usec
   2: 155 usec
   ...

fe80::202:c903:a:75a5 is host B's IPv6 link local address.  Since link local address needs to be associated with an interface, we need to put %ib0 (or %9 in this case) after the address to specify that we want to use ib0 to reach that link local address.  And from host B, we can also do

[host-b]# rds-ping 2010:211::12
   1: 162 usec
   2: 153 usec
   ...

In this case, we test host A's global address.  There is no need to specify the interface used for IPv6 global address.

rds-info

There is no change in the default rds-info output.  This means that no IPv6 connection information is displayed by default to ensure backward compatibility.  To display IPv6 connection information, we can use the "-a" option. This option shows both IPv4 and IPv6 information. We continue to use the rds-ping example.  After doing the above in host A, we can use rds-info to check RDS connection status

[host-a]# /tmp/rds-info -Ina

RDS IB Connections:
    LocalAddr              RemoteAddr             Tos  SL  LocalDev             RemoteDev
    2010:211::12           2010:211::22           0    0   fe80::2:c903:a:79fd  fe80::2:c903:a:75a5
    fe80::202:c903:a:79fd  fe80::202:c903:a:75a5  0    0   fe80::2:c903:a:79fd  fe80::2:c903:a:75a5

RDS Connections:
    LocalAddr              RemoteAddr             Tos NextTX NextRX  Flgs
    2010:211::12           2010:211::22           0   5      5       --C-
    fe80::202:c903:a:79fd  fe80::202:c903:a:75a5  0   11     11      --C-

The above shows that there are two RDS connections in host A.  One is the connection between host A and B's global addresses and the other is between host A and B's link local addresses. And because they are using the InfiniBand interfaces, there are also two IB connections. Without the "-a" option, nothing will be shown as there is only IPv6 information.

rds-stress

The address options "-s" and "-r" can now take IPv6 address. For example,

[host-a]> rds-stress -r 2010:211::12
waiting for incoming connection on 2010:211::12:4000
accepted connection from 2010:211::22::41735
negotiated options, tasks will start in 2 seconds
Starting up....
 ...

[host-b]> rds-stress -r 2010:211::22 -s 2010:211::12
connecting to 2010:211::12:4000
negotiated options, tasks will start in 2 seconds
Starting up....
 ...

The updated rds-stress can communicate with both old and updated rds-stress.

Greg Marsden


Previous Post

XFS - What's new and what's next!

Darrick Wong | 4 min read

Next Post


Let's Encrypt! nginx on Oracle Cloud Infrastructure Using Oracle Linux

Sergio Leunissen | 9 min read