eBPF (extended Berkeley Packet Filter) has become a cornerstone of modern Linux observability and security. It allows you to run user-defined programs inside the Linux kernel in a safe, efficient way. With eBPF, we can write custom firewall logic that inspects and filters network traffic at various levels of the networking stack.

TC (Traffic Control), cgroup BPF (BPF_PROG_TYPE_CGROUP_SKB), and XDP (eXpress Data Path) hooks represent the industry standard for eBPF-based packet filtering because they offer a comprehensive coverage model across the networking stack. XDP provides the highest performance by dropping malicious traffic at the driver level before the kernel processes it. TC offers full-stack flexibility, enabling both ingress and egress filtering with access to complex kernel metadata. cgroup-based filters provide identity-aware security, allowing policies to be enforced directly on specific applications or containers. By leveraging this trio, developers can achieve a balance of wire-speed efficiency, bidirectional control, and process-level granularity.

In this guide, we’ll explore how to create a small Linux firewall that can:

  • Count packets
  • Detect SYN packets and count them
  • Drop packets from a specific IP
  • Print statistics in a 5-second window

We implement the firewall with eBPF using TC and cgroup BPF. We discuss these methods in terms of usability, performance, and flexibility to help you choose the right tool for the job.

Given the scope of XDP, covering it here would make this article too long. A follow-up post will cover XDP for the same use case.

Several other technologies can also achieve this objective. However, covering all of them would also make the article excessively long. These topics will be covered in future posts.

Test Setup Details

  • IP address of the system: p.q.r.s
  • Blocked IP: w.x.y.z
  • Selected interface for testing: enp0s5
  • Tested on OL-UEK7 (5.15.0-310.184.5.2)
  • Required bpf and bcc packages
  • cgroup v2
  • pyroute2
  • Python 3.6.8. Tests may require tuning based on the Python version in use.
  • The IPs have been intentionally masked; please update them as needed in the test setup.

Traffic Control (TC) eBPF Programs

eBPF programs (classified as BPF_PROG_TYPE_SCHED_CLS) in Linux are a powerful tool for managing network traffic. They allow you to filter, shape, prioritize, and drop packets at the kernel level, making them ideal for building custom firewalls using eBPF. TC integrates with the Linux kernel’s networking stack and manages how packets are queued, filtered, and processed on network interfaces. When a packet hits the TC hook, the kernel executes the attached eBPF program and uses its return value as a verdict for the packet’s fate.

TC Probe:

  • In TC, the function sch_handle_ingress() manages ingress probes and is invoked by __netif_receive_skb_core(), while sch_handle_egress() handles egress probes and is triggered by __dev_queue_xmit().

Packet Control in TC:

  • The program returns TC_ACT_OK. This tells the kernel to stop further TC processing for the packet and allow it to continue through the networking stack.
  • The program returns TC_ACT_SHOT. The kernel immediately terminates the packet’s path and frees its memory, ensuring it never reaches its destination.
  • The program returns TC_ACT_REDIRECT. This is often used with helper functions like bpf_redirect() to bypass the standard stack and send the packet directly to another interface.

Advantages of TC for eBPF-based Firewalls:

  • Fine-Grained Control: TC allows you to filter traffic at the packet level based on IP, port, and protocol, enabling precise control over incoming and outgoing traffic.
  • Kernel-Level Efficiency: TC and eBPF operate in the kernel, ensuring low-latency and high-performance packet processing with minimal overhead.
  • Traffic Shaping & Rate Limiting: TC can shape traffic, apply bandwidth limits, and prioritize specific types of traffic, which is useful for DoS protection and network management.

Recommended Practices for TC Usage:

  • Simple, Stateless Firewalls: If you need efficient packet filtering based on IP, port, or protocol, TC is a great choice.
  • Traffic Shaping: Use TC for rate-limiting and prioritizing traffic (e.g., for DDoS protection or QoS).
  • High-Performance Requirements: TC is ideal for low-latency environments due to its kernel-space efficiency.

Performance Impact:

When used properly, TC with eBPF has minimal performance overhead. However, complex eBPF programs or high traffic volumes can increase CPU usage.

Firewall Based on TC:

tc_FireWall.py 
--------------

# LICENSE: GPLv2
from bcc import BPF
import time
import ctypes
import socket
import struct
import pyroute2  # Ensure this is installed: pip install pyroute2

ebpf_code = """
#include <uapi/linux/bpf.h>
#include <uapi/linux/if_ether.h>
#include <uapi/linux/ip.h>
#include <uapi/linux/tcp.h>
#include <uapi/linux/pkt_cls.h>
#include <linux/in.h>

struct drop_evt_t { u32 src_ip; };
BPF_HASH(black_list, u32, u8);
BPF_PERF_OUTPUT(drop_events);
BPF_ARRAY(stats, u64, 2);

int tc_firewall(struct __sk_buff *skb) {
    void *data_end = (void *)(long)skb->data_end;
    void *data = (void *)(long)skb->data;

    u32 total_key = 0;
    u64 *total_cnt = stats.lookup(&total_key);
    if (total_cnt) lock_xadd(total_cnt, 1);

    struct ethhdr *eth = data;
    if (data + sizeof(*eth) > data_end) return TC_ACT_OK;
    if (eth->h_proto != htons(ETH_P_IP)) return TC_ACT_OK;

    struct iphdr *iph = data + sizeof(*eth);
    if ((void*)iph + sizeof(*iph) > data_end) return TC_ACT_OK;

    u32 src_ip = iph->saddr;

    if (black_list.lookup(&src_ip)) {
        struct drop_evt_t evt = { .src_ip = src_ip };
        drop_events.perf_submit(skb, &evt, sizeof(evt));
        return TC_ACT_SHOT;
    }

    if (iph->protocol == IPPROTO_TCP) {
        struct tcphdr *tcp = (void*)iph + sizeof(*iph);
        if ((void*)tcp + sizeof(*tcp) > data_end) return TC_ACT_OK;
        if (tcp->syn) {
            u32 syn_key = 1;
            u64 *syn_cnt = stats.lookup(&syn_key);
            if (syn_cnt) lock_xadd(syn_cnt, 1);
        }
    }
    return TC_ACT_OK;
}
"""

b = BPF(text=ebpf_code)
fn = b.load_func("tc_firewall", BPF.SCHED_CLS)

# Add blocked IP
blocked_ip = "w.x.y.z"
packed_ip = struct.unpack("I", socket.inet_aton(blocked_ip))[0]
b["black_list"][ctypes.c_uint32(packed_ip)] = ctypes.c_uint8(1)

interface = "enp0s5"  # CHANGE THIS to your interface
ipr = pyroute2.IPRoute()
idx = ipr.link_lookup(ifname=interface)[0]

try:
    # Add clsact qdisc (ingress/egress hook point)
    ipr.tc("add", "clsact", idx)
except Exception:
    print("[!] clsact already exists or failed to add.")

try:
    ipr.tc("add-filter", "bpf", idx, ":1", fd=fn.fd, name=fn.name,
           parent="ffff:fff2", classid=1, direct_action=True)
    print(f"[+] Attached TC filter to {interface}")
except Exception as e:
    print(f"[!] Filter attachment failed: {e}")

def print_drop_event(cpu, data, size):
    event = b["drop_events"].event(data)
    print(f"[!] TC DROP: {socket.inet_ntoa(struct.pack('I', event.src_ip))}")

b["drop_events"].open_perf_buffer(print_drop_event)
last_report = time.time()

try:
    while True:
        b.perf_buffer_poll(timeout=100)
        if time.time() - last_report >= 5:
            stats = b["stats"]
            print(f"\n--- [TC STATS] ---")
            print(f"Total: {stats[0].value} | SYN: {stats[1].value}")
            stats[0] = stats[1] = ctypes.c_uint64(0)
            last_report = time.time()
except KeyboardInterrupt:
    print("\n[*] Cleaning up...")
finally:
    ipr.tc("del", "clsact", idx)

Result of Firewall Based on TC:

In system p.q.r.s:
-------------------
# python3 tc_FireWall.py
[+] Attached TC filter to enp0s5

--- [TC STATS] ---
Total: 7 | SYN: 0

--- [TC STATS] ---
Total: 9 | SYN: 1

<<We can observe the total packet count and SYN packets>>

In system w.x.y.z:
-----------------
# ping -S w.x.y.z p.q.r.s 
PING pqrs (p.q.r.s) 56(84) bytes of data.

<<The ping hangs>>

In system p.q.r.s:
------------------ 
--- [TC STATS] ---
Total: 5 | SYN: 0
[!] TC DROP: w.x.y.z
[!] TC DROP: w.x.y.z

<<Notice that packets from w.x.y.z are dropped>>
<<Press Ctrl+C to stop the firewall and clean up>>

^C
[*] Cleaning up...

In system w.x.y.z:
------------------
# ping -S w.x.y.z p.q.r.s 
PING pqrs (p.q.r.s) 56(84) bytes of data.

64 bytes from pqrs (p.q.r.s): icmp_seq=9 ttl=64 time=0.451 ms
64 bytes from pqrs (p.q.r.s): icmp_seq=10 ttl=64 time=0.412 ms
^C
--- pqrs ping statistics ---
11 packets transmitted, 3 received, 72.7273% packet loss, time 10219ms
rtt min/avg/max/mdev = 0.412/0.427/0.451/0.029 ms

<<Notice that the ping starts working once we stop the firewall>>
  • We can print packet counts and block packets from the desired IP using a TC-based solution with eBPF.

cgroup SKB

The BPF_PROG_TYPE_CGROUP_SKB program type allows us to attach logic directly to a control group (cgroup), giving us fine-grained control over the traffic generated by specific sets of applications. In the context of cgroup BPF, the kernel provides two primary attachment points (hooks) for filtering traffic: BPF_CGROUP_INET_INGRESS and BPF_CGROUP_INET_EGRESS. Both operate on the sk_buff (socket buffer), which is the primary data structure for network packets in the Linux kernel.

When a program is attached to these hooks, the kernel executes your eBPF code every time a packet crosses the cgroup boundary:

  • Return 1 (allow): The packet continues its journey through the stack.
  • Return 0 (drop): The kernel discards the packet immediately.

Cgroup SKB Probe:

  • For Cgroup SKB, the CGROUP_INET_INGRESS and CGROUP_INET_EGRESS probes are handled by __cgroup_bpf_run_filter_skb. For instance, in the TCP v4 receive path, it is invoked by tcp_v4_rcv().

Advantages of cgroup eBPF-based Firewalls:

  • Process Awareness: Unlike XDP or TC, which process packets at the interface level, cgroup BPF knows exactly which application owns the packet. This makes it ideal for containerized environments.
  • Simplified Policy: You do not need to track complex iptables rules or port mappings. You simply apply the firewall rule to the container’s cgroup. By the time a packet reaches the egress hook, the kernel has already performed routing and checksumming. This gives the eBPF program a clean packet representation.

Disadvantages:

  • cgroup v2 only: cgroup BPF programs (like BPF_PROG_TYPE_CGROUP_SKB) cannot be applied to cgroup v1. These hooks were designed specifically for the unified hierarchy of cgroup v2.
  • Higher Latency: Because cgroup hooks sit higher up the network stack (near the application layer), the packet has already traveled through much of the kernel’s networking code. This consumes more CPU cycles than early-stage filtering.
  • No Global View: It only observes traffic for the specific processes in that cgroup. If you need a global firewall for the entire host, you would need to attach it to the root cgroup or use a different method.

Performance Impact:

The performance impact of cgroup BPF is moderate because it sits higher up in the network stack.

Recommended Practices for cgroup SKB Usage:

  • Container Protection: One of the best use cases is guarding a container.
  • Egress Filtering: Use BPF_CGROUP_INET_EGRESS to prevent a specific application from reaching out to a malicious IP or unauthorized internal service.
  • Resource Accounting: This is excellent for tracking how many bytes or packets a specific user or service has consumed (the “count packets” requirement in our guide).

When to Avoid It:

  • Global Host Firewall: Do not use cgroup BPF to stop a massive DDoS attack. By the time the packet reaches the cgroup hook, the kernel has already spent significant effort processing it. If your goal is to protect the entire server from external scans, a hook at the device level is more efficient.

The reason the cgroup SKB firewall fails to block pings is that ICMP echo requests are processed by the kernel before they reach the cgroup_skb ingress hook.

Firewall Based on cgroup SKB:

cgroup_FireWall.py 
----------------------

# LICENSE: GPLv2
from bcc import BPF, BPFAttachType
import os
import time
import ctypes
import socket
import struct

ebpf_code = """
#include <uapi/linux/bpf.h>
#include <uapi/linux/if_ether.h>
#include <uapi/linux/ip.h>
#include <uapi/linux/tcp.h>
#include <linux/in.h>

BPF_HASH(black_list, u32, u8);
// Index 0: Total, Index 1: SYNs, Index 2: Dropped
BPF_ARRAY(stats, u64, 3);

int cgroup_firewall(struct __sk_buff *skb) {
    u32 total_key = 0;
    u32 drop_key = 2;
    u64 *total_cnt = stats.lookup(&total_key);
    if (total_cnt) lock_xadd(total_cnt, 1);

    if (skb->protocol != bpf_htons(ETH_P_IP)) return 1;

    u32 src_ip;
    if (bpf_skb_load_bytes(skb, 12, &src_ip, 4) < 0) return 1;

    // Blacklist Check
    if (black_list.lookup(&src_ip)) {
        u64 *drop_cnt = stats.lookup(&drop_key);
        if (drop_cnt) lock_xadd(drop_cnt, 1);
        return 0; // DROP packet
    }

    u8 proto;
    if (bpf_skb_load_bytes(skb, 9, &proto, 1) < 0) return 1;
    if (proto == IPPROTO_TCP) {
        u8 ihl_byte;
        if (bpf_skb_load_bytes(skb, 0, &ihl_byte, 1) < 0) return 1;
        u32 ip_hdr_len = (ihl_byte & 0x0F) * 4;

        u8 tcp_flags;
        if (bpf_skb_load_bytes(skb, ip_hdr_len + 13, &tcp_flags, 1) == 0) {
            if (tcp_flags & 0x02) {
                u32 syn_key = 1;
                u64 *syn_cnt = stats.lookup(&syn_key);
                if (syn_cnt) lock_xadd(syn_cnt, 1);
            }
        }
    }

    return 1; // PASS packet
}
"""

b = BPF(text=ebpf_code)
fn = b.load_func("cgroup_firewall", BPF.CGROUP_SKB)

blocked_ip = "w.x.y.z"
packed_ip = struct.unpack("I", socket.inet_aton(blocked_ip))[0]
b["black_list"][ctypes.c_uint32(packed_ip)] = ctypes.c_uint8(1)

cgroup_path = "/sys/fs/cgroup"
try:
    fd = os.open(cgroup_path, os.O_RDONLY)
    b.attach_func(fn, fd, BPFAttachType.CGROUP_INET_INGRESS)
    print(f"[+] Attached. Blocking {blocked_ip}...")
except Exception as e:
    print(f"[!] Error: {e}")
    exit(1)

print("Monitoring... Press Ctrl+C to stop.")
try:
    while True:
        time.sleep(5)
        s = b["stats"]
        print(f"--- [STATS] Total: {s[0].value} | SYNs: {s[1].value} | Dropped: {s[2].value} ---")
        # Reset stats
        s[0] = s[1] = s[2] = ctypes.c_uint64(0)
except KeyboardInterrupt:
    print("\n[*] Exiting...")
finally:
    b.detach_func(fn, fd, BPFAttachType.CGROUP_INET_INGRESS)
    os.close(fd)

Result of Firewall Based on cgroup SKB:

In system p.q.r.s:
------------------

# python cgroup_FireWall.py
[+] Attached. Blocking w.x.y.z...
Monitoring... Press Ctrl+C to stop.
--- [STATS] Total: 29 | SYNs: 3 | Dropped: 0 ---

In system w.x.y.z:
------------------
# ping -S w.x.y.z p.q.r.s 
PING pqrs (p.q.r.s) 56(84) bytes of data.
64 bytes from pqrs (p.q.r.s): icmp_seq=9 ttl=64 time=0.451 ms
...
64 bytes from pqrs (p.q.r.s): icmp_seq=10 ttl=64 time=0.412 ms
^C
--- pqrs ping statistics ---
14 packets transmitted, 14 received, 0% packet loss, time 13303ms
rtt min/avg/max/mdev = 0.406/0.460/0.484/0.033 ms
<<Ping packets are not blocked>>
  • To test further, let’s use a simple client/server program.
Client: w.x.y.z says hi every second
    sock = socket(AF_INET, SOCK_STREAM, 0);
    serv_addr.sin_family = AF_INET;
    serv_addr.sin_port = htons(PORT);
    inet_pton(AF_INET, "p.q.r.s", &serv_addr.sin_addr);
    connect(sock, (struct sockaddr *)&serv_addr, sizeof(serv_addr)) 

Server: p.q.r.s replies with hello
    address.sin_family = AF_INET;
    address.sin_addr.s_addr = inet_addr("p.q.r.s");
    address.sin_port = htons(PORT);
    bind(server_fd, (struct sockaddr *)&address, sizeof(address));
    listen(server_fd, 3);
    new_socket = accept(server_fd, (struct sockaddr *)&address, (socklen_t*)&addrlen);

In system p.q.r.s:
------------------

# ./server
Server listening on p.q.r.s:8080...
Client says: hi
Reply sent: hello
Client says: hi
Reply sent: hello

<<Server p.q.r.s is responding to incoming packets>>

In system w.x.y.z:
------------------
# ./client
Sent: hi
Server replied: hello
Sent: hi
Server replied: hello

<<Client w.x.y.z is responding to incoming packets>>

In system p.q.r.s (New session):
--------------------------------
# python cgroup_FireWall.py
[+] Attached. Blocking w.x.y.z...
Monitoring... Press Ctrl+C to stop.
--- [STATS] Total: 13 | SYNs: 1 | Dropped: 6 ---
--- [STATS] Total: 17 | SYNs: 0 | Dropped: 1 ---

<<We can observe packet drops here>>

In system p.q.r.s:
------------------

# ./server
Server listening on p.q.r.s:8080...
Client says: hi
Reply sent: hello
Client says: hi
Reply sent: hello

<<No new response appears on server p.q.r.s>>

In system w.x.y.z:
------------------
# ./client
Sent: hi
Server replied: hello
Sent: hi
Server replied: hello
<<No new response appears on client w.x.y.z>>

In system p.q.r.s (New session):
--------------------------------
[root@pssatapa-ol8-test cg]# python cgroup_FireWall.py 
[+] Attached. Blocking w.x.y.z...
Monitoring... Press Ctrl+C to stop.
--- [STATS] Total: 13 | SYNs: 1 | Dropped: 6 ---
--- [STATS] Total: 17 | SYNs: 0 | Dropped: 1 ---
^C
[*] Exiting...

<<We stopped the firewall>>

In system p.q.r.s:
------------------

# ./server
Server listening on p.q.r.s:8080...
Client says: hi
Reply sent: hello
Client says: hi
Reply sent: hello

Client says: hi
Reply sent: hello
<<Server p.q.r.s starts responding again>>

In system w.x.y.z:
------------------
# ./client
Sent: hi
Server replied: hello
Sent: hi
Server replied: hello

Sent: hi
Server replied: hello
<<Client w.x.y.z starts responding again>>
  • We can print packet counts and block packets from the desired IP using a cgroup SKB-based solution with eBPF.

Why the Pings Are Not Dropped:

  • In the Linux network stack, the cgroup_skb ingress hook (attached to BPF_CGROUP_INET_INGRESS) is executed very late in the packet’s journey, specifically at the socket layer.
  • When an ICMP echo request (ping) arrives, the kernel’s network stack identifies it as a protocol-level message. For standard pings, the kernel handles the response internally. This happens before the packet is ever delivered to a specific process’s socket.
  • Because cgroup_skb filters traffic associated with processes inside a cgroup, and the kernel’s internal ICMP responder does not belong to a user-space process in that cgroup, the hook is often bypassed for these automated kernel-level replies.

Conclusion

Developing a firewall with eBPF (Extended Berkeley Packet Filter) lets you process network traffic directly within the Linux kernel with high efficiency. By using Python as a control plane (via libraries like BCC), you can write high-performance, C-based kernel logic while managing statistics and blocking rules from a user-friendly script.

When building a firewall to detect and block traffic such as SYN floods at high speed, not all eBPF hooks are created equal.

In this guide, we explored two eBPF-based firewall solutions: TC and cgroup BPF. TC offers low-latency, global network filtering and is ideal for protecting the entire host or implementing traffic shaping and rate limiting. It is highly efficient and suitable for simple, stateless firewalls. On the other hand, cgroup BPF allows fine-grained control over traffic generated by specific applications or containers, making it ideal for process-level security in containerized environments.

While TC performs well for global protection, cgroup BPF provides detailed control over individual processes. For comprehensive security, especially in containerized or microservices environments, combining these methods with other kernel-level protections, such as XDP, can offer a robust solution to modern network security challenges. XDP will be covered in a future post.

References