struct page, the Linux physical page frame data structure

October 14, 2020 | 5 minute read
Text Size 100%:


Linux manages physical memory by dividing it into PAGE_SIZE pieces. Usually this is the same as the CPU's page size, between 4KiB and 64KiB. Each page has a small data structure (about 64 bytes) called struct page, which contains various pieces of information about the page.

If you allocate a page using the low level page allocator directly (e.g. alloc_page(), alloc_pages() and similar functions) then some of the fields in struct page are available for you to use. If you allocate it through another memory allocator, that allocator may be using the struct page for its own purposes, and so you may not use it. Consult the documentation for your memory allocator to see if it allows you to use any of it.

The primary users of struct page are the page cache and anonymous memory and they impose various restrictions on how you can use various fields within the structure. This document explains how you can use them safely.


Compound pages

If you use a function like alloc_pages() to allocate multiple pages, you get a struct page for each page you allocated. If you do not specify the GFP_COMP flag, you can use each of the struct page as if you had allocated each page individually.

If you specify GFP_COMP, the first page will be marked as PageHead(). The other pages are marked as PageTail() and calling compound_head() will return the head page. This limits how you can use the struct page belonging to each of the tail pages. It allows the page to be treated as a single entity in ways explained further below. Hugetlbfs pages and Transparent Huge Pages (THP) are implemented using compound pages.


Page flags

The first word (32 or 64 bits) is used for page flags. These encode the state of the page, for example whether it is locked or dirty. The upper bits of this word may also be used to encode the zone, numa node, and sparsemem section. For compound pages, some flags are used on every subpage, but most are used only on the head page.

Most of the page flags are defined by the page cache (or for anonymous pages), and are free for you to use for your own purpose. Flags which are used on all pages (and so are not available for you to re-use):

  • PageSlab (this page has been allocated to the slab allocator)

  • PageReserved (this page is special in some way)

  • PageHead (this page is the head of a compound page)

  • PageHWPoison (this page of memory is defective)

  • PageMlocked (this page is mapped into an mlock()ed region)

If you use the flags for your own purposes, you should make sure they are clear before the page is freed. Allocating a new bit to use for a flag is unlikely to be popular. Particularly on 32-bit systems, the flags word is quite crowded.


Reference count

Each page has a reference count. A freshly allocated page has a reference count of 1. When the reference count reaches zero, usually by calling put_page(), the page will be freed back to the page allocator. You can use the reference count for your own needs, as long as you use it as a counter. Don't manipulate the page's reference count directly; instead use the accessors provided like get_page() or functions like page_ref_freeze() for more complex modifications. Compound pages only use the reference count on the head page.

The page cache speculatively increments the reference count on pages before checking to see whether they are still in the page cache, so you cannot rely on the reference count being stable. Tail pages have a reference count of zero so the page cache will not temporarily increment the reference count.


Map count

Pages which can be mapped to userspace record how many times they are mapped using the mapcount field. Like the reference count, it should be accessed only through accessor functions like page_mapcount(). Pages which will never be mapped to userspace (e.g. slab pages) may reuse the mapcount field for their own purpose.

If you do use mapcount, you should call page_mapcount_reset() before freeing the page. Don't call it if you are using mapcount for its intended purpose of counting how many times the page is mapped to userspace, or you will defeat a valuable debugging check!

Compound pages which are large enough and mapped to an appropriately aligned address in userspace can be mapped using larger TLB entries. The exact size of TLB entry used depends on the CPU, but are accounted by Linux as if the CPU has PMD-sized TLB entries. The number of PMD mappings is stored in the first tail page of the compound page, accessed through compound_mapcount_ptr(). The number of page-sized TLB mappings is stored in the mapcount field of the individual pages.


Other fields

The five words in the first anonymous union of the struct page are mostly available to you. Please define your own struct as part of that union instead of reusing the already-defined fields.

The first word of the union is used to implement compound pages. If the bottom bit is set, then PageTail() is true and the rest of the word is used to refer to the head page. You can use this word for your own purposes as long as you keep the bottom bit clear. Some users reserve the entire word while others use the word to point to a data structure returned from kmalloc() (which is guaranteed to have the bottom bit clear).

There are no restrictions on using the second, fourth and fifth word of the union. If the page you have allocated can be mapped to userspace, you should avoid using the third word in the struct which is shared with mapping. This avoids the page being mistaken for being in the page cache or being an anonymous page. If you do use the third word, you should set it to NULL before freeing the page.

You should take care not to increase the size of struct page. They take up a significant percentage of the RAM in the machine and are always allocated. If your feature is applicable to every page in the machine, you may be able to use the CONFIG_PAGE_EXTENSION mechanism to store additional per-page information, but usually it's best to allocate an additional data structure with kmalloc() and store a pointer to it in struct page.

Matthew Wilcox

Previous Post

SEP and Oracle Present “Backup & Recovery Best Practices with SEP Sesam and Oracle Linux Virtualization Manager”

Guest Author | 1 min read

Next Post

Vinchin Backup & Recovery is now tested and supported with Oracle Linux Virtualization Manager

Guest Author | 1 min read