HugeTLB users have long had two competing goals:

  • Predictability: reserve huge pages up front and fail early if capacity is not available.
  • Flexibility: avoid pinning too much memory in advance and allocate only when pages are actually touched.

For a long time, gigantic HugeTLB pages (such as 1 GiB pages on x86) lagged behind in this flexibility story. With Linux v6.19, overcommit support for gigantic HugeTLB pages is now available, closing a long-standing gap.

This post explains what changed, why it matters, and how to verify behavior on a real system.

HugeTLB Reservation: Two Different Layers

There are two separate layers of HugeTLB reservations. It is useful to distinguish between them clearly.

Global HugeTLB pool reservation

This is the system-wide pool (nr_hugepages) dedicated to HugeTLB allocations. Pages in this pool are not available to the regular page allocator for normal anonymous or page-cache use.

Example (default hugetlb page size):

# echo 10 > /proc/sys/vm/nr_hugepages
# grep Hugepagesize /proc/meminfo

Per-mapping reservation (mmap()-time accounting)

When a process creates a HugeTLB mapping, the kernel tracks reservation state with struct resv_map:

  • private mappings: reservation state associated with the VMA (struct vm_area_struct)
  • shared mappings: reservation state associated with the mapping (struct address_space)

By default, HugeTLB tries to ensure enough pages are available to satisfy the mapping’s reservation. If not, mmap() fails early with -ENOMEM.

That early-fail behavior is intentional: for many workloads, it is often preferable to fail at setup time rather than fault later and receive SIGBUS during execution.

MAP_NORESERVE: Different Tradeoff, Different Failure Point

If userspace passes MAP_NORESERVE, together with MAP_HUGETLB, reservation is relaxed and allocation pressure shifts toward fault time.

With MAP_NORESERVE, there is less upfront commitment and a higher chance of fault-time failure (SIGBUS). Without MAP_NORESERVE, the reservation is attempted at mmap() time, if reservations fails, mmap() will fail early (ENOMEM).

This is not universally “better” or “worse” . It depends on workload tolerance for fault-time failure and memory-efficiency goals.

HugeTLB Overcommit and Surplus Pages

HugeTLB overcommit is controlled per hugepage size (per hstate) by the tunable nr_overcommit_hugepages.

This setting does not increase the permanent HugeTLB pool size. Instead, it allows the kernel to temporarily allocate additional hugepages beyond the configured pool target (nr_hugepages) when demand requires it.

Huge pages allocated beyond the persistent pool are tracked as surplus pages. These pages represent temporary expansions of the HugeTLB pool that are permitted by the overcommit limit.

For 2 MiB HugeTLB pages:

# echo 10 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_overcommit_hugepages

This allows the system to allocate up to 10 additional hugepages beyond the configured nr_hugepages pool if needed.

Unlike persistent pool pages, surplus pages are not intended to remain in the pool permanently. When a surplus page is freed, the kernel may return it to the normal buddy allocator instead of keeping it in the HugeTLB pool. As a result, the HugeTLB pool can temporarily grow when demand increases and shrink back once those pages are released.

Why Gigantic HugeTLB Pages Are Special

HugeTLB reservation and overcommit mechanisms have existed for many years and apply to most HugeTLB page sizes. However, gigantic HugeTLB pages (for example, 1 GiB pages on x86) historically behaved differently.

Gigantic pages are defined as huge pages whose order exceeds what the buddy allocator can normally allocate at runtime. Because the buddy allocator cannot directly satisfy such high-order allocations, gigantic pages traditionally had to be reserved at boot time through kernel parameters. This requirement existed long before HugeTLB runtime pool management and overcommit mechanisms were introduced.

Later, support was added (in 2014) to allocate gigantic pages dynamically at runtime via using page migration to assemble a contiguous region of memory large enough for the gigantic page. This removed the strict requirement that gigantic pages must always be reserved during boot. However, the corresponding restriction on HugeTLB overcommit for gigantic pages remained in place.

Historically, gigantic overcommit support was unavailable, and attempts to set:

/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages

returned EINVAL.

With Linux v6.19, that limitation is removed.

Reference patch series: Usama Arif’s series

Reproducing Gigantic Overcommit Behavior

First, make sure the kernel is built with CONFIG_ARCH_HAS_GIGANTIC_PAGE selected. This is by default true on Oracle UEK6/UEK7/UEK8 kernels.

Set overcommit for 1 GiB pages:

# echo 0 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
# echo 8 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages

Inspect the gigantic hstate directly:

grep . /sys/kernel/mm/hugepages/hugepages-1048576kB/{nr_hugepages,free_hugepages,resv_hugepages,surplus_hugepages,nr_overcommit_hugepages}

Why this matters: /proc/meminfo‘s Hugepagesize shows the default hstate and does not by itself confirm gigantic-page state transitions.

Example Program (1 GiB HugeTLB Mapping)

The following maps 4 GiB using 1 GiB HugeTLB pages and touches one byte per huge page to force fault/allocation:

/* LICENSE: GPLv2 */
#define _GNU_SOURCE
#include <errno.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>

#ifndef MAP_HUGE_1GB
#define MAP_HUGE_1GB (30 << MAP_HUGE_SHIFT)
#endif

int main(void)
{
	const size_t huge_page_size = 1UL << 30;        /* 1 GiB */
	const size_t alloc_size = 4UL * huge_page_size; /* 4 GiB */
	volatile unsigned char *p;

	void *addr = mmap(NULL, alloc_size, PROT_READ | PROT_WRITE,
			  MAP_PRIVATE | MAP_ANONYMOUS |
			  MAP_HUGETLB | MAP_HUGE_1GB,
			  -1, 0);
	if (addr == MAP_FAILED) {
		fprintf(stderr, "mmap failed: %s (errno=%d)\n",
			strerror(errno), errno);
		return 1;
	}

	printf("mmap OK: addr=%p size=%zu (%.1f GiB)\n",
	       addr, alloc_size,
	       (double)alloc_size / (1024.0 * 1024.0 * 1024.0));

	p = (volatile unsigned char *)addr;
	for (size_t off = 0; off < alloc_size; off += huge_page_size)
		p[off] = (unsigned char)(off / huge_page_size);

	puts("Touched all hugepages successfully.");

	if (munmap(addr, alloc_size) != 0) {
		fprintf(stderr, "munmap failed: %s (errno=%d)\n",
			strerror(errno), errno);
		return 2;
	}

	puts("munmap OK.");
	return 0;
}

Build and run:

gcc -O2 -Wall -o mmap_gigantic_hugetlb mmap_gigantic_hugetlb.c
./mmap_gigantic_hugetlb

When to Use Gigantic Overcommit

Gigantic overcommit is often useful when:

  • operators accept some fault-time risk in exchange for reduced upfront reservation
  • memory demand is bursty and static pool sizing is hard

It is often less suitable when:

  • hard allocation guarantees are required
  • fault-time SIGBUS is unacceptable

Final Takeaway

Linux v6.19 extends HugeTLB overcommit support to gigantic pages on systems where gigantic runtime allocation is supported. That is a meaningful capability improvement for operators and applications that want more elastic HugeTLB behavior without fully pre-reserving a large 1 GiB pool.

But the key operational truth remains: overcommit increases opportunity, not certainty. For gigantic pages, observability and realistic failure handling are still essential.

References