eBPF (extended Berkeley Packet Filter) has revolutionized Linux observability and security by enabling user-defined programs to run directly within the kernel. This allows for high-performance, secure, and flexible network traffic processing. XDP (eXpress Data Path), in particular, has emerged as a powerful tool in the eBPF ecosystem, offering the ability to filter and process network packets with unmatched efficiency by dropping malicious traffic at the earliest stage—before it even reaches the kernel.

While TC (Traffic Control) and cgroup-based BPF solutions provide flexible and granular control over network traffic, XDP stands out for its performance. It operates at the driver level, ensuring that packets are handled with minimal overhead, which makes it ideal for high-throughput environments where low latency is critical.

In this guide, we focus on XDP and how to leverage its capabilities to build a high-performance firewall. We will cover how to:

  • Filter and drop unwanted traffic as soon as it enters the system
  • Detect SYN floods
  • Implement a lightweight firewall solution with minimal resource consumption

By diving into XDP, we will explore its advantages in terms of speed, security, and scalability. For a more comprehensive, full-stack approach, you can refer to our previous post, where we examined how to extend this firewall solution with TC and cgroup-based BPF.

Test Setup Details

  • IP address of the system: p.q.r.s
  • Blocked IP: w.x.y.z
  • Selected interface for testing: enp0s5
  • Tested on OL-UEK7 (5.15.0-310.184.5.2)
  • Required bpf and bcc packages
  • cgroup v2
  • pyroute2
  • Python 3.6.8. Tests may require tuning based on the Python version used.
  • The IPs have been intentionally masked; please update them as needed in the test setup.

XDP (eXpress Data Path)

XDP (eXpress Data Path) is a high-performance, low-latency packet processing framework that runs eBPF programs directly within the Linux kernel’s network driver layer. It handles packets as they enter the system, before they reach the networking stack, enabling ultra-low-latency operations like filtering, manipulation, or dropping packets at the earliest point. XDP offers an efficient alternative to traditional packet processing by bypassing higher networking layers, making it ideal for high-throughput, low-latency applications such as DDoS protection and rate limiting.

By hooking into the network device driver layer, XDP allows custom eBPF programs to be triggered when packets arrive at the network interface. These programs can inspect, modify, drop, or redirect packets based on specific logic, offering enhanced control over network traffic at an early stage in the packet processing path.

XDP Packet Control:

  • XDP_ABORTED: Drops the packet due to a program error and triggers a tracepoint exception for debugging.
  • XDP_DROP: Immediately discards the packet at the driver level to minimize processing overhead.
  • XDP_PASS: Allows the packet to continue into the standard Linux networking stack for normal processing.
  • XDP_TX: Transmits the packet back out through the same network interface it arrived on.
  • XDP_REDIRECT: Forwards the packet to a different network interface, a different CPU, or user-space.

XDP Modes Explained:

  1. Network Device Layer (Generic XDP):
    • Runs as a software fallback within the standard Linux network stack. It doesn’t require specific driver or hardware support and works with any network interface.
    • Slower compared to other XDP modes.
  2. Driver/Hardware Layer (Native XDP):
    • Integrated into the network driver, providing massive performance gains by making drop or redirect decisions early, saving significant CPU cycles.
    • Requires a network driver that has implemented the XDP hook.
  3. Offloaded to NIC (Offloaded XDP):
    • The packet is handled by the NIC hardware itself, offering the absolute highest performance and zero CPU overhead for filtered packets.
    • Requires specialized hardware (e.g., SmartNICs like Netronome or Mellanox).
  • Note: Native mode requires driver support, and Offloaded XDP requires hardware support. Please select the appropriate interface when using XDP in these modes.

XDP Probe:

  • In XDP SKB mode, the function do_xdp_generic() manages the probes and is invoked by __netif_receive_skb_core().
  • In XDP DRV, or native mode, the function bpf_dispatcher_xdp_func manages the probes. The invocation of bpf_dispatcher_xdp_func is driver-dependent; for example, in the mlx5 driver, the function mlx5e_xdp_handle calls the XDP handler, while in the igb driver, igb_run_xdp triggers the XDP handler.

Testing XDP Modes:

`xdp_check.py`
-------------

# LICENSE: GPLv2
from bcc import BPF
import argparse
import sys

program = """
int hello_xdp(struct xdp_md *ctx) {
    bpf_trace_printk("Packet intercepted!\\n");
    return XDP_PASS;
}
"""

parser = argparse.ArgumentParser(description="Attach XDP program to an interface")
parser.add_argument("interface", help="The network interface to attach to (e.g., eth0, vnet5)")
parser.add_argument("--mode", choices=["skb", "drv", "hw"], default="skb",
                    help="XDP mode: skb (Generic), drv (Native), or hw (Offload). Default is skb.")
args = parser.parse_args()
device = args.interface

# Map string modes to BCC constants
mode_map = {
    "skb": BPF.XDP_FLAGS_SKB_MODE,
    "drv": BPF.XDP_FLAGS_DRV_MODE,
    "hw":  BPF.XDP_FLAGS_HW_MODE
}
mode_flag = mode_map[args.mode]

def cleanup():
    """Forcefully removes any XDP program from the device."""
    print(f"\nCleaning up {device}...")
    try:
        BPF.remove_xdp(device, flags=mode_flag)
    except Exception:
        pass

cleanup()
try:
    b = BPF(text=program)
    fn = b.load_func("hello_xdp", BPF.XDP)

    print(f"Attaching XDP in {args.mode.upper()} mode to {device}...")
    b.attach_xdp(device, fn, flags=mode_flag)

    print("Success! Monitoring packets. Press Ctrl+C to stop.")
    b.trace_print()

except KeyboardInterrupt:
    print("\nDetected Ctrl+C.")
except Exception as e:
    print(f"An error occurred: {e}")
finally:
    cleanup()
    print("Exit successful.")

Results of the XDP mode test:

# python3 xdp_check.py vnet5 --mode skb

Cleaning up vnet5...
Attaching XDP in skb mode to vnet5...
Success! Monitoring packets. Press Ctrl+C to stop.
bpf_trace_printk: Packet intercepted!'
bpf_trace_printk: Packet intercepted!'

# python3 xdp_check.py  vnet5 --mode drv

Cleaning up vnet5...
Attaching XDP in drv mode to vnet5...
Success! Monitoring packets. Press Ctrl+C to stop.
bpf_trace_printk: Packet intercepted!'
bpf_trace_printk: Packet intercepted!'

# python3 xdp_check.py vnet5 --mode hw

Cleaning up vnet5...
Attaching XDP in hw mode to vnet5...
bpf: Attaching prog to vnet5: Invalid argument
An error occurred: Failed to attach BPF to device b'vnet5': Invalid argument

Cleaning up vnet5...
Exit successful.
  • XDP can capture packets at the driver level if supported. mlx5 supports this.
  • XDP can capture packets at the network layer in socket mode.
  • We observe a failure in HW mode because XDP is still not offloaded to MLX in Oracle Linux.

Recommended Practices for XDP Usage:

  • High-Performance, Low-Latency Environments: XDP is ideal for environments where performance is critical, such as high-speed network appliances, DDoS protection systems, or any scenario where low-latency packet filtering is required.
  • DDoS Protection: XDP can be used to drop unwanted traffic, such as flood attacks, at the earliest stage of packet processing, significantly reducing the impact on the system and allowing it to continue processing legitimate traffic with minimal resource consumption.
  • Rate-Limiting or Packet Filtering: If you need to implement packet rate-limiting, SYN flood prevention, or other forms of traffic control (e.g., dropping packets from a specific IP address in a 5-second window), XDP provides an efficient and scalable way to achieve this.
  • Packet Redirection: XDP is useful for scenarios where packets need to be redirected to another interface or processing path, such as in load balancing or network monitoring setups.
  • Stateful Firewalling and Deep Packet Inspection (DPI): XDP is not well-suited for these use cases.
  • XDP is strictly limited to incoming traffic. It cannot inspect or filter packets being sent out by the local host. For a full firewall that needs to control both incoming and outgoing data, XDP alone is not sufficient.

Performance Impact of XDP

XDP is known for its low-latency, high-throughput performance. By processing packets at the earliest point in the kernel, within the network device driver, XDP can drop or redirect packets with minimal impact on system resources. This drastically reduces latency compared to traditional packet filtering methods, which often operate at higher layers of the networking stack. This helps build ultra-low-latency, high-throughput firewalls while reducing CPU utilization.

Firewall Based on XDP:

xdp_FireWall.py
---------------

# LICENSE: GPLv2
from bcc import BPF
import time
import ctypes
import socket
import struct

ebpf_code = """
#include <uapi/linux/bpf.h>
#include <uapi/linux/if_ether.h>
#include <uapi/linux/ip.h>
#include <uapi/linux/tcp.h>
#include <linux/in.h>

struct drop_evt_t {
    u32 src_ip;
};

BPF_HASH(black_list, u32, u8);
BPF_PERF_OUTPUT(drop_events);
BPF_ARRAY(stats, u64, 2); // Index 0: Total, Index 1: SYN Total

int xdp_firewall(struct xdp_md *ctx) {
    void *data_end = (void *)(long)ctx->data_end;
    void *data = (void *)(long)ctx->data;

    // Increment Total Packet Counter
    u32 total_key = 0;
    u64 *total_cnt = stats.lookup(&total_key);
    if (total_cnt) lock_xadd(total_cnt, 1);

    struct ethhdr *eth = data;
    if ((void*)eth + sizeof(*eth) > data_end) return XDP_PASS;
    if (eth->h_proto != htons(ETH_P_IP)) return XDP_PASS;

    struct iphdr *iph = data + sizeof(*eth);
    if ((void*)iph + sizeof(*iph) > data_end) return XDP_PASS;

    u32 src_ip = iph->saddr;


    if (black_list.lookup(&src_ip)) {
        struct drop_evt_t evt = { .src_ip = src_ip };
        drop_events.perf_submit(ctx, &evt, sizeof(evt));
        return XDP_DROP;
    }

    if (iph->protocol == IPPROTO_TCP) {
        struct tcphdr *tcp = (void*)iph + sizeof(*iph);
        if ((void*)tcp + sizeof(*tcp) > data_end) return XDP_PASS;

        if (tcp->syn) {
            u32 syn_key = 1;
            u64 *syn_cnt = stats.lookup(&syn_key);
            if (syn_cnt) lock_xadd(syn_cnt, 1);
        }
    }

    return XDP_PASS;
}
"""

b = BPF(text=ebpf_code)
fn = b.load_func("xdp_firewall", BPF.XDP)

def print_drop_event(cpu, data, size):
    event = b["drop_events"].event(data)
    ip_str = socket.inet_ntoa(struct.pack('I', event.src_ip))
    print(f"[!] BLACKLIST DROP: {ip_str}")

b["drop_events"].open_perf_buffer(print_drop_event)

# Add Blocked IP
blocked_ip = "w.x.y.z"
packed_ip = struct.unpack("I", socket.inet_aton(blocked_ip))[0]
b["black_list"][ctypes.c_uint32(packed_ip)] = ctypes.c_uint8(1)

# Attach to interfaces
interfaces = ["enp0s5", "virbr0"]
for iface in interfaces:
    try:
        b.attach_xdp(iface, fn, BPF.XDP_FLAGS_SKB_MODE)
        print(f"[+] Attached to {iface}")
    except Exception as e:
        print(f"[!] Failed to attach to {iface}: {e}")

print("[*] Starting monitor (5s reporting interval)...")
last_report = time.time()

try:
    while True:
        # Poll for drop events
        b.perf_buffer_poll(timeout=100)

        current_time = time.time()
        if current_time - last_report >= 5:
            stats_map = b["stats"]
            total = stats_map[ctypes.c_int(0)].value
            syns = stats_map[ctypes.c_int(1)].value

            print(f"\n--- [STATISTICS - Last 5s] ---")
            print(f"Total Traffic: {total} pkts")
            print(f"SYN Traffic:   {syns} pkts")
            print(f"------------------------------\n")

            # Reset counters for the next 5s window
            stats_map[ctypes.c_int(0)] = ctypes.c_uint64(0)
            stats_map[ctypes.c_int(1)] = ctypes.c_uint64(0)
            last_report = current_time

except KeyboardInterrupt:
    print("\n[*] Shutting down...")
finally:
    for iface in interfaces:
        b.remove_xdp(iface, BPF.XDP_FLAGS_SKB_MODE)

Result of Firewall Based on XDP:

In system p.q.r.s:
-------------------
# python3 xdp_FireWall.py
[+] Attached to enp0s5
[+] Attached to virbr0
[*] Starting monitor (5s reporting interval)...

--- [STATISTICS - Last 5s] ---
Total Traffic: 18 pkts
SYN Traffic:   2 pkts
------------------------------


--- [STATISTICS - Last 5s] ---
Total Traffic: 8 pkts
SYN Traffic:   1 pkts
------------------------------

<<We can count the total packets and SYN packets.>>

In system w.x.y.z:
-----------------
# ping -S w.x.y.z p.q.r.s 
PING pqrs (p.q.r.s) 56(84) bytes of data.

<<We can observe the ping hang>>

In system p.q.r.s:
------------------ 

[!] BLACKLIST DROP: w.x.y.z
[!] BLACKLIST DROP: w.x.y.z

<<Notice that packets from w.x.y.z are dropped>>
<<Press Ctrl+C to stop the firewall and clean up>>

^C
[*] Shutting down...

In system w.x.y.z:
------------------
# ping -S w.x.y.z p.q.r.s 
PING pqrs (p.q.r.s) 56(84) bytes of data.

64 bytes from pqrs (p.q.r.s): icmp_seq=9 ttl=64 time=0.451 ms
64 bytes from pqrs (p.q.r.s): icmp_seq=10 ttl=64 time=0.412 ms
^C
--- pqrs ping statistics ---
7 packets transmitted, 4 received, 42.8571% packet loss, time 6139ms
rtt min/avg/max/mdev = 0.449/0.463/0.479/0.028 ms

<<Notice that the ping starts working once we stop the firewall>>
  • We can print the packet count and block packets from the desired IP using an XDP-based solution with eBPF.

Let’s Try XDP_ABORTED on the Firewall:

  • Replace XDP_DROP with XDP_ABORTED in the test case.
Conduct the same tests as previously performed, while concurrently monitoring the results below in a separate window:

# echo 1 > /sys/kernel/debug/tracing/events/xdp/xdp_exception/enable
# cat /sys/kernel/debug/tracing/trace_pipe
          <idle>-0       [001] ..s.1 576000.303905: xdp_exception: prog_id=1924 action=ABORTED ifindex=2
          <idle>-0       [001] .Ns.1 576005.456956: xdp_exception: prog_id=1924 action=ABORTED ifindex=2
          <idle>-0       [001] .Ns.1 576006.480948: xdp_exception: prog_id=1924 action=ABORTED ifindex=2
  • The XDP program returns XDP_ABORTED, causing the packet to trigger an exception. The occurrence of this exception can be observed in the traces.

Let’s Experiment with XDP_REDIRECT:

xdp_FireWall_Redirect.py
---------------

# LICENSE: GPLv2
from bcc import BPF
import time
import ctypes
import socket
import struct
import os
from pyroute2 import IPRoute

ebpf_code = """
#include <uapi/linux/bpf.h>
#include <uapi/linux/if_ether.h>
#include <uapi/linux/ip.h>
#include <uapi/linux/pkt_cls.h>

BPF_HASH(black_list, u32, u8);
BPF_HASH(xdp_stats, u32, u64);
BPF_HASH(lo_ingress_stats, u32, u64);

int xdp_redirect_func(struct xdp_md *ctx) {
    u32 ifindex = ctx->ingress_ifindex;
    void *data = (void *)(long)ctx->data;
    void *data_end = (void *)(long)ctx->data_end;

    u64 *cnt = xdp_stats.lookup(&ifindex);
    if (cnt) { lock_xadd(cnt, 1); }
    else { u64 val = 1; xdp_stats.update(&ifindex, &val); }

    struct ethhdr *eth = data;
    if ((void*)eth + sizeof(*eth) > data_end) return XDP_PASS;
    if (eth->h_proto != htons(ETH_P_IP)) return XDP_PASS;

    struct iphdr *iph = data + sizeof(*eth);
    if ((void*)iph + sizeof(*iph) > data_end) return XDP_PASS;

    if (black_list.lookup(&iph->saddr)) {
        // Log to kernel trace pipe (viewable via trace_print in Python)
        bpf_trace_printk("REDIRECT: IP %u to lo\\n", iph->saddr);
        return bpf_redirect(REDIRECT_IFINDEX, 0);
    }
    return XDP_PASS;
}

int tc_ingress_lo(struct __sk_buff *skb) {
    u32 key = 0;
    u64 *cnt = lo_ingress_stats.lookup(&key);
    if (cnt) { lock_xadd(cnt, 1); }
    else { u64 val = 1; lo_ingress_stats.update(&key, &val); }
    return TC_ACT_OK;
}
"""

in_iface = "enp0s5"
out_iface = "lo"
target_idx = socket.if_nametoindex(out_iface)

ebpf_code = ebpf_code.replace("REDIRECT_IFINDEX", str(target_idx))
b = BPF(text=ebpf_code)

# Attach Hooks
xdp_fn = b.load_func("xdp_redirect_func", BPF.XDP)
b.attach_xdp(in_iface, xdp_fn, BPF.XDP_FLAGS_SKB_MODE)

ip = IPRoute()
os.system(f"tc qdisc add dev {out_iface} clsact 2>/dev/null")
tci_fn = b.load_func("tc_ingress_lo", BPF.SCHED_CLS)
ip.tc("add-filter", "bpf", target_idx, ":1", fd=tci_fn.fd, name=tci_fn.name, parent="ffff:fff2", classid=1, direct_action=True)

# Blacklist setup
blocked_ip = "w.x.y.z"
packed_ip = struct.unpack("I", socket.inet_aton(blocked_ip))
b["black_list"][ctypes.c_uint32(packed_ip[0])] = ctypes.c_uint8(1)

print(f"[*] Monitoring {in_iface}. Reporting summary every 5s.")

try:
    last_report = time.time()
    while True:
        # Pull kernel trace messages (bpf_trace_printk)
        # This will block briefly and print logs as they occur
        try:
            (task, pid, cpu, flags, ts, msg) = b.trace_fields(nonblocking=True)
            if msg:
                print(f"[>] {msg.decode('utf-8', 'replace')}")
        except ValueError:
            pass

        current_time = time.time()
        if current_time - last_report >= 5:
            rx_enp = b["xdp_stats"].get(ctypes.c_uint32(socket.if_nametoindex(in_iface)), ctypes.c_uint64(0)).value
            lo_in = b["lo_ingress_stats"].get(ctypes.c_uint32(0), ctypes.c_uint64(0)).value

            print(f"\n--- [TRAFFIC REPORT - 5s WINDOW] ---")
            print(f" {in_iface} RX (XDP Ingress) : {rx_enp} pkts")
            print(f" {out_iface} RX (TC Ingress)   : {lo_in} pkts (Looped Back)")
            print(f"------------------------------------\n")

            b["xdp_stats"].clear()
            b["lo_ingress_stats"].clear()
            last_report = current_time

        time.sleep(0.01) # Prevent high CPU usage in the loop

except KeyboardInterrupt:
    print("\n[*] Cleaning up...")
finally:
    b.remove_xdp(in_iface, BPF.XDP_FLAGS_SKB_MODE)
    os.system(f"tc qdisc del dev {out_iface} clsact 2>/dev/null")
    ip.close()
  • Packets received on interface enp0s5 with IP address w.x.y.z will be selected and redirected to the loopback interface.
  • Ensure that the interfaces match your test configuration.
  • The TC case verifies the packets on the loopback interface.

Result of XDP Redirect:

In system p.q.r.s:
-------------------
# python3 xdp_FireWall_Redirect.py
[*] Monitoring enp0s5. Reporting summary every 5s.

--- [TRAFFIC REPORT - 5s WINDOW] ---
 enp0s5 RX (XDP Ingress) : 8 pkts
 lo RX (TC Ingress)   : 0 pkts (Looped Back)
------------------------------------
<<We observed that we received some packets on interface enp0s5>>
<<There are no packets on the loopback interface>>

In system w.x.y.z:
-----------------
# ping -S w.x.y.z p.q.r.s 
PING pqrs (p.q.r.s) 56(84) bytes of data.

<<We can observe the ping hang>>

In system p.q.r.s:
-------------------

[>] REDIRECT: IP w.x.y.z to lo
[>] REDIRECT: IP w.x.y.z to lo
[>] REDIRECT: IP w.x.y.z to lo
[>] REDIRECT: IP w.x.y.z to lo

--- [TRAFFIC REPORT - 5s WINDOW] ---
 enp0s5 RX (XDP Ingress) : 14 pkts
 lo RX (TC Ingress)   : 4 pkts (Looped Back)
------------------------------------
<<The BPF trace for XDP-redirected packets can be observed.>>
<<At the same time, packets on the loopback interface can be monitored.>>
<<XDP redirected 4 packets in this instance, and the loopback interface received 4 packets>>
  • We can observe how XDP redirects the packet, and TC confirms this by checking the redirected interface.

Let’s Experiment with XDP_TX:

In this test case, XDP_TX is used on the receiver to immediately reflect ping packets by swapping their Ethernet and IP headers at the driver level. On the sender, a TC program intercepts these incoming “bounced” packets at the ingress hook to verify successful transmission and capture.

Receiver: xdp_FireWall_Bounce.py
---------------

# LICENSE: GPLv2
from bcc import BPF

# IP W.X.Y.Z in hex (Network Byte Order): 0xZYXW
prog = """
#include <linux/bpf.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <linux/in.h>

int xdp_bounce(struct xdp_md *ctx) {
    void *data_end = (void *)(long)ctx->data_end;
    void *data = (void *)(long)ctx->data;

    struct ethhdr *eth = data;
    if ((void *)(eth + 1) > data_end) return XDP_PASS;
    if (eth->h_proto != bpf_htons(ETH_P_IP)) return XDP_PASS;

    struct iphdr *iph = (void *)(eth + 1);
    if ((void *)(iph + 1) > data_end) return XDP_PASS;

    if (iph->saddr != 0xZYXW || iph->protocol != IPPROTO_ICMP) {
        return XDP_PASS;
    }

    // Print source IP as a hex integer (compatible with older kernels)
    bpf_trace_printk("XDP: Bouncing ping from 0x%x\\n", iph->saddr);

    // MAC swap
    unsigned char tmp_mac[6];
    __builtin_memcpy(tmp_mac, eth->h_dest, 6);
    __builtin_memcpy(eth->h_dest, eth->h_source, 6);
    __builtin_memcpy(eth->h_source, tmp_mac, 6);

    // Swap IPs
    __u32 tmp_ip = iph->saddr;
    iph->saddr = iph->daddr;
    iph->daddr = tmp_ip;

    return XDP_TX;
}
"""

device = "enp0s5"
b = BPF(text=prog)
fn = b.load_func("xdp_bounce", BPF.XDP)

try:
    b.attach_xdp(device, fn, flags=0) # Try Native first
    print(f"Bouncing on {device} (Native mode)")
except:
    b.attach_xdp(device, fn, flags=2) # Fallback to Generic
    print(f"Bouncing on {device} (Generic/SKB mode)")

try:
    b.trace_print()
except KeyboardInterrupt:
    b.remove_xdp(device, flags=0)
    b.remove_xdp(device, flags=2)


Sender: tc_FireWall_Capture.py
---------------
from bcc import BPF
from pyroute2 import IPRoute

# IP P.Q.R.S in hex (Network Byte Order): 0xSRQP
prog = """
#include <linux/bpf.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <linux/in.h>
#include <linux/pkt_cls.h>

int tc_capture(struct __sk_buff *skb) {
    void *data = (void *)(long)skb->data;
    void *data_end = (void *)(long)skb->data_end;

    struct ethhdr *eth = data;
    if (data + sizeof(*eth) > data_end) return TC_ACT_OK;
    if (eth->h_proto != bpf_htons(ETH_P_IP)) return TC_ACT_OK;

    struct iphdr *iph = data + sizeof(*eth);
    if (data + sizeof(*eth) + sizeof(*iph) > data_end) return TC_ACT_OK;

    if (iph->saddr == 0xSRQP && iph->protocol == IPPROTO_ICMP) {
        bpf_trace_printk("TC: Captured bounced ping from 0x%x\\n", iph->saddr);
    }

    return TC_ACT_OK;
}
"""

device = "enp0s5"
ipr = IPRoute()
idx = ipr.link_lookup(ifname=device)[0]

b = BPF(text=prog)
fn = b.load_func("tc_capture", BPF.SCHED_CLS)

try:
    ipr.tc("add", "clsact", idx)
except:
    pass

ipr.tc("add-filter", "bpf", idx, ":1", fd=fn.fd, name=fn.name,
       parent="ffff:fff2", classid=1, direct_action=True)

print(f"Monitoring on {device}...")
try:
    b.trace_print()
except KeyboardInterrupt:
    pass
finally:
    ipr.tc("del", "clsact", idx)
  • Swapping the MAC addresses ensures that the network interface on the destination machine recognizes and accepts the packet as its own at the hardware level. Swapping the IP addresses transforms the packet from an “outgoing request” into an “incoming response,” allowing the operating system and the eBPF programs to process it as a valid return packet.

Result of XDP TX:

In system p.q.r.s:
-----------------
#  python3 ./xdp_FireWall_Bounce.py
<<As there are no incoming ICMP packets from w.x.y.z, there is no activity>>

In system w.x.y.z:
-----------------
#  python3 ./tc_FireWall_Capture.py
<<As there are no bounced-back packets from p.q.r.s, there is no activity>>

#  ping -S w.x.y.z p.q.r.s
PING pqrs (p.q.r.s) 56(84) bytes of data.
64 bytes from pqrs (p.q.r.s): icmp_seq=1 ttl=64 time=0.827 ms
64 bytes from pqrs (p.q.r.s): icmp_seq=2 ttl=64 time=0.779 ms
<<We started the ICMP traffic from w.x.y.z to p.q.r.s>>

In system p.q.r.s:
-----------------
bpf_trace_printk: XDP: Bouncing ping from 0xWQYZ'
bpf_trace_printk: XDP: Bouncing ping from 0xWXYZ'

<<We observed that ICMP packets from w.x.y.z are bounced by p.q.r.s>>

In system w.x.y.z:
-----------------
bpf_trace_printk: TC: Captured bounced ping from 0xPQRS'
bpf_trace_printk: TC: Captured bounced ping from 0xPQRS'

<<We captured bounced packets from p.q.r.s>>
  • We observed that the receiver bounced ICMP packets from the sender using XDP_TX, and the sender successfully captured the packets that were bounced back from the receiver by TC.

A Comparative Analysis of XDP, TC, and Cgroup SKB:

  1. XDP (eXpress Data Path)
    • How it works: Attaches to the network driver. It intercepts the raw packet as soon as it arrives at the Network Interface Card (NIC).
    • Positives: Unrivaled performance. It can drop millions of packets per second with minimal CPU usage. Early intervention stops attacks before they reach the main TCP/IP stack.
    • Negatives: No egress. It works only on incoming, or ingress, packets. You must manually parse packet headers such as Ethernet, IP, and TCP from raw bytes.
  2. TC (Traffic Control)
    • How it works: Runs after the kernel has parsed the packet and created a sk_buff structure.
    • Positives: Bi-directional filtering. It can filter both incoming and outgoing traffic. Rich metadata provides access to kernel-parsed information such as interface indices and socket data.
    • Negatives: Higher overhead. The kernel has already spent CPU cycles wrapping the packet before TC notices it.
  3. Cgroup (BPF_PROG_TYPE_CGROUP_SKB)
    • How it works: Attaches to a specific Linux control group, or cgroup, filtering traffic only for processes in that group.
    • Positives: Granular control. It can block traffic for a specific service or container without affecting the rest of the host.
    • Negatives: Late stage. Packets have already traveled through much of the kernel stack before reaching this hook. Only cgroup v2 supports this.

Conclusion

Developing a firewall with eBPF (Extended Berkeley Packet Filter) allows you to process network traffic directly within the Linux kernel with unmatched efficiency. By using Python as a control plane via libraries like BCC, you can write high-performance, C-based kernel logic while managing statistics and blocking rules from a user-friendly script.

When building a firewall to detect and block traffic such as SYN floods at high speed, not all eBPF hooks are created equal.

Hook TypeJustification
XDP (eXpress Data Path)It sits at the lowest level of the network stack, allowing you to drop packets before the kernel even allocates memory for them.
TC (Traffic Control)It’s a balanced choice. It handles both ingress and egress traffic and provides access to advanced kernel metadata (sk_buff), making it easier to manage complex flows.
cgroup SKBIt’s considered the container specialist. Best used when you want to enforce rules on a per-application, per-service, or per-container basis rather than for the whole system.

References