News, tips, partners, and perspectives for the Oracle Linux operating system and upstream Linux kernel work

Tips and Tricks for IPsec on Intel 10 Gbe NICs

Shannon Nelson is a Linux kernel driver expert and kernel developer who has been looking at accelerating IPsec performance. In this blog blog post, he shows how to reduce the overhead of running with IPsec enabled. 

IPsec has been gaining in popularity, but is quite a hit against network throughput, making multi-Gigabit network connections slow to Megabit speeds.  With support for IPsec hardware offload recently added to the Linux kernel's network stack, Oracle has added IPsec offload support to the kernel driver for Intel's 10 GbE family of NICs, bringing throughput back into the multi-Gigabit range.

IPsec Offload In Linux

IPsec (Internet Protocol Security), for encrypting network traffic, has been gaining in popularity as the cloud supported networks have grown.  However, it becomes quite a hit against network data throughput. Enabling full message encryption can easily take a 10 GbE link down to the 200 Mbps range, and suck down a lot of server CPU cycles in the process.

While other operating systems have supported for some time the offloading of IPsec encryption to hardware, the Linux kernel has only recently added it.  The initial patches to expand the XFRM framework were accepted into the 4.11 kernel in Spring of 2017 [1], and was first used by the Mellanox mlx5e network driver.  Some background for this work can be found in the IPsec presentations at the recent Netdevcon conferences [2] [3] .  Similar work has also been done in DPDK implementations, but these bypass the Linux kernel and are not useful for normal applications [4].

Intel's current family of 10 GbE network devices originally came out in 2007 with the 82598, but hardware support for IPsec offload didn't appear until the 82599 (aka x540) was released in 2009.  Support for this hardware offload was added into the Microsoft Windows mini-driver at that time, but it was left unimplemented in the Linux driver.

The NICs are capable of offloading the AES-128-GMAC and AES-128-GCM, and can offload 1024 Security Associatsions (SAs) for each of Tx and Rx directions.  Only 128 incoming IP addresses can be specified, but several Rx SAs can share an IP address.  To make the Rx decode faster, special Content Addressable Memory is used for the Rx SA tables.

Oracle Activity

Oracle provides platforms that use Intel's 10 GbE device, so it is in our best interest to be sure that our customers have access to the security and performance they need to be successful.  Given the recent Linux kernel support, we embarked on adding support for the IPsec hardware offload in ixgbe, the driver for Intel's 10 GbE NICs.  Intel had done some early work to add this feature to their driver as the kernel support was being developed in 2016, with encouragement from Oracle developers, but their effort got sidetracked by other priorities.  We were able to build from this work as a head-start to a working implementation.

Theory of Operations

When the ixgbe driver is loaded and sets up its network data structures, it sets the NETIF_F_HW_ESP netdev feature flag to signal support for the IPsec offload, and initializes the xfrmdev_ops callbacks.  It also clears the hardware tables and sets up the software shadow tables, but leaves the offload engine disabled until the first Security Association is added in order to save on the chip's power requirements.  The software shadow tables track the hardware table contents for faster searches and for table reloads on hardware resets.

As the user adds and removes the SAs and their encryption keys, the driver's xdo_dev_state_add and xdo_dev_state_delete functions are called to update the hardware tables.  When the last SA is removed, the offload engine is disabled, again to save on power requirements.  SAs can be managed on the Linux command line via the 'ip' command, or through use of 3rd party applications such as StrongSwan, LibreSwan, and others.

A "simple" pair of 'ip' commands to encrypt TCP traffic to and from a server through network port eth4 might look something like this:

ip xfrm policy add dir out src dst \
    proto tcp tmpl proto esp src dst \
    spi 0x07 mode transport reqid 0x07
ip xfrm policy add dir in src dst \
    proto tcp tmpl proto esp dst src \
    spi 0x07 mode transport reqid 0x07
ip xfrm state add proto esp src dst \
    spi 0x07 mode transport reqid 0x07 replay-window 32 \
    aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \
    sel src dst proto tcp \
    offload dev eth4 dir out
ip xfrm state add proto esp dst src \
    spi 0x07 mode transport reqid 0x07 replay-window 32 \
    aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \
    sel src dst proto tcp \
    offload dev eth4 dir in


A similar set of commands would be required on the remote host, but with this src and dst parameters swapped.  Both sides need to have the hardware offload enabled in order to get the throughput benefit.

When a network packet is to be encrypted before sending, the packet's skb (socket data buffer) is given to the driver and has a pointer to the SA information.  The network stack has already inserted the encryption headers into the data packet, but without filling in the final encryption information.  The driver sets up the hardware to do the encryption using the specific SA - in the ixgbe case, the driver sets up a special Tx Context Descriptor that contains the encryption information - and the driver hands the packets to the NIC hardware.  The encryption engine uses the indicated encryption key to encode the packet data, fills in the rest of the header data, and sends the packet on its way.

On receipt of a packet, the decryption engine looks at the packet header to see if it matches any of the Rx SAs that have been loaded.  If so, the key is used to decode the packet and the driver is informed there was a decryption.  The driver fills out a new packet skb with with decryption information and hands it up the kernel stack.  The XFRM receive code then strips off the extra headers before routing the packet to the destination user program.

Current Status

At this writing, the driver's offload feature has been submitted to Intel's driver code tree and is expected to be pushed to the upstream net-next tree soon, targeting release in the v4.16 kernel.  The feature should be supported in the up-coming UEK5 distribution from Oracle.

In a simple TCP stream test on a pair of Oracle x5-2 systems over the 10 GbE NICs, the data throughput goes from around 330 Mbps using the default software IPsec to around 7 Gbps with the hardware offload enabled on both ends.  Currently the checksum and TSO offloads in conjunction with IPsec offload are not yet implemented.  The throughput should get near to line rate once these are completed.

Many thanks go to the Intel folks, especially Jesse Brandeburg, for their support in better understanding the hardware operations, and to Steffen Klassert and the XFRM folks for their help with using the XFRM framework.


1. xfrm: Add an IPsec hardware offloading API  https://patchwork.ozlabs.org/patch/752710/

2. Netdevconf 1.2 IPsec workshop https://netdevconf.org/1.2/session.html?steffen-klassert

3. Netdevconf 2.2 IPsec workshop https://netdevconf.org/2.2/session.html?klassert-ipsec-workshop

4. Efficient serving of VPN endpoints on COTS server hardware https://www.net.in.tum.de/fileadmin/bibtex/publications/papers/CloudNet2016.pdf

Join the discussion

Comments ( 1 )
  • Shannon Nelson Tuesday, April 3, 2018
    Since this writing, the patches to support TSO and checksum offloads in conjunction with IPsec offload have been pulled into Dave Miller's net-next tree and should appear in Linux v4.17. These new patches get us up to nearly line rate in a simple iperf test.
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.