Oracle Linux kernel developer Alan Maguire continues our blog series on BPF, wherein he presented an in depth look at the kernel's "Berkeley Packet Filter" -- a useful and extensible kernel function for much more than packet filtering.
In the previous BPF blog entry, I warned against enabling generic segmentation offload (GSO) when using tc-bpf. The purpose of this blog entry is to describe how it in fact can be used, even for cases where BPF programs add encapsulation. A caveat however; this is only true for the 5.2 kernel and later.
So here we will describe GSO briefly, then why it matters for BPF and finally demonstrate how new flags added to the bpf_skb_adjust_room helper facilitate using it for cases where encapsulation is added.
Generic segmentation offload took the hardware concept of allowing Linux to pass down a large packet - termed a megapacket - which the hardware would then dice up into individual MTU size frames for transmission. This is termed TSO (TCP segmentation offload), and GSO generalizes this beyond TCP to UDP, tunnels, etc and performs the segmentation in software. Performance benefits are still significant, even in software and we find ourselves reaching line rate on 10Gb/s and faster NICs, even with an MTU of 1500 bytes. Because a lot of the per-packet costs in the networking stack are paid by one megapacket rather than dozens of smaller MTU packets traversing the stack, the benefits really accrue.
Now consider the BPF case. If we are doing processing or encapsulation in BPF, we are adding per-packet overhead. This overhead could come in the form of map lookups, adding encapsulation etc. The beautiful thing about GSO is that it happens after tc-bpf processing, so any costs we accrue in BPF are only paid for the mega-packet, rather than each individual MTU-sized packet. As such switching GSO on is highly desirable.
There is a problem however. For GSO to work on encapsulated packets, the packets must mark their inner encapsulated headers. This is done for native tunnels via skb_set_inner_[mac|transport]_header() functions, but if we added encapsulation in BPF there was no way to mark the inner headers accordingly.
In BPF we carry out the marking of inner headers via flags passed to bpf_skb_adjust_room(). For usage examples, I would recommend looking at tools/testing/selftests/bpf/progs/test_tc_tunnel.c.
GRE and UDP tunnels are supported via the BPF_F_ADJ_ROOM_ENCAP_L4_GRE and BPF_F_ADJ_ROOM_ENCAP_L4_UDP flags. For L3, we need to specify if the inner header is IPv4 or IPv6 via the BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 and BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 flags. Finally if we have L2 encapsulation such as an inner ether header or MPLS label(s), we need to pass in the inner L2 header length via BPF_F_ADJ_ROOM_ENCAP_L2(inner_maclen). So we simply OR together the flags that specify the encapsulation we are adding.
Generic Segmentation Offload and BPF work beautifully together, but BPF encapsulation presented a difficulty since GSO did not know that the encapsulation had been added. With 5.2 and later kernels, this problem is now solved!
Be sure to visit the previous installments of this series on BPF, here, and stay tuned for our next blog posts! 1. BPF program types 2. BPF helper functions for those programs 3. BPF userspace communication 4. BPF program build environment 5. BPF bytecodes and verifier 6. BPF Packet Transformation