Introducing Memdesc

What is a Memdesc ?

In linux, physical memory is divided into chunks of memory called pages. To keep track of all those pages, we have a data structure – struct page. There is a struct page associated with every physical page frame in a system. A struct page can describe file-backed memory. Or anonymous memory. Or even page tables and slab pages and so much more.

A struct page is only 64 bytes, and making it bigger is unreasonable: in x86 struct page takes up approximately 1.6% of the kernel, so the smaller the better. As a result, a struct page is now a plethora of variables crammed together in multiple unions.

We are looking to split the users of struct page into dedicated memory descriptors. We’re calling this the memdesc project.

Guidelines for Memdescs

When introducing a new memory descriptor, it is vital to establish a good foundation for users to build on. The goal is to standardize a descriptor’s usage as much as possible, so all APIs should be generally applicable. For now, the descriptor should act as a simplified wrapper of struct page.

It’s also extremely important to include appropriate documentation – in both the code and the commit logs. Generally, commit logs should contain details on any fundamental changes, while code comments should point out current users and considerations about the code.

Recently, I introduced a memory descriptor for page tables – ptdesc. I will use this as a reference to highlight important things to look out for when creating, and implementing, memdescs.

Introducing a Memdesc

First, we must decide on the memory we want a descriptor for. You can find a detailed list on the memdesc wiki page: https://kernelnewbies.org/MatthewWilcox/Memdescs.

Once you have chosen the memory you want a descriptor for, the next course of action is to determine the members we need in our descriptor. As our descriptor is effectively a wrapper around struct page for now, we can base our new descriptor off of the members found in struct page. To ensure proper aliasing with struct page, we must have placeholder variables – these will eventually be removed. We want to prefix these unused variables with a double underscore for easy identification in the future. Additionally, to properly organize the structure, we need to consider all users of the descriptor. For generally used variables, we simply name them understandably. For niche users, we can introduce unions that alias appropriately.

Take a look at these snippets from the definition of a ptdesc (found in “include/linux/mm_types.h”):

    union {
            struct rcu_head pt_rcu_head;
            struct list_head pt_list;
            struct {
                    unsigned long _pt_pad_1;
                    pgtable_t pmd_huge_pte;
            };
    };
    unsigned long __page_mapping;

We consolidated this word in struct page based on the various page table users. This makes things much more readable and focused. The purpose of each variable here is documented in comments preceding the descriptor for anyone observing the code:

    * @pt_rcu_head:      For freeing page table pages.
    * @pt_list:          List of used page tables. Used for s390 and x86.
    * @_pt_pad_1:        Padding that aliases with page's compound head.
    * @pmd_huge_pte:     Protected by ptdesc->ptl, used for THPs.
    * @__page_mapping:   Aliases with page->mapping. Unused for page tables.

Finally, it is extremely important to ensure our descriptor properly aliases with struct page. If our descriptor does not properly alias, we risk untraceable data corruption. For that reason, we have a sanity check to ensure we have proper alignment between ptdesc and struct page:

    #define TABLE_MATCH(pg, pt)                                                \
            static_assert(offsetof(struct page, pg) == offsetof(struct ptdesc, pt))

Establishing Memdesc helper functions

We will want to create helper functions that abstract complexity away from our users. Helper functions for a memdesc should set up an infrastructure that meets all user’s needs, and allow for simple, targeted reimplementation. These helper functions should call/return our new memory descriptor to discourage users from organizing their own personalized struct pages.

Let’s look at some ptdesc helper functions:

    #define ptdesc_page(pt)                 (_Generic((pt),                \
            const struct ptdesc *:          (const struct page *)(pt),     \
            struct ptdesc *:                (struct page *)(pt)))


    #define ptdesc_folio(pt)                (_Generic((pt),                \
            const struct ptdesc *:          (const struct folio *)(pt),    \
            struct ptdesc *:                (struct folio *)(pt)))


    #define page_ptdesc(p)                  (_Generic((p),                 \
            const struct page *:            (const struct ptdesc *)(p),    \
            struct page *:                        (struct ptdesc *)(p)))

The first, most obvious, set of helper functions required are for casting between struct page and our new descriptor. Ptdescs have 3 casting helper functions. Due to the nature of functions such as pagetable_pte_ctor(), we need a way to get a folio from a ptdesc. However, we do not want a way to get a ptdesc from a folio. It is never necessary, and we would like to keep things that way.

Ptdescs also have some functions that change intrinsic behavior:

    static inline struct ptdesc *pagetable_alloc(gfp_t gfp, unsigned int order)
    {
            struct page *page = alloc_pages(gfp | __GFP_COMP, order);

            return page_ptdesc(page);
    }


    static inline void pagetable_free(struct ptdesc *pt)
    {
            struct page *page = ptdesc_page(pt);

            __free_pages(page, compound_order(page));
    }

Before allocating page tables this way, architectures would call alloc_pages() and free_pages() themselves however they saw fit. Introducing these allocation/freeing functions standardizes the creation/freeing of page tables. In the future, if we want to change the internal workings of page tables, we can simply modify the helper function rather than trying to address every single unique architecture. Furthermore, this prevents misuse by keeping the strict allocation/free routine out of a caller’s hands.

The effect of this standardization is summarized in the commit log introducing these functions:

    pagetable_alloc() is defined to allocate new ptdesc pages as compound
    pages.  This is to standardize ptdescs by allowing for one allocation and
    one free function, in contrast to 2 allocation and 2 free functions.

Implementing a Memdesc

This is the easy part. The goal of the initial patchset should be to split out all direct accesses of struct page, while ensuring our new standardization rules do not break anything. We should be left with a kernel that compiles after removing any fields unique to the descriptor from struct page.

With respect to the pagetable_alloc() change mentioned earlier, this meant anytime code allocating a page table was converted, the corresponding freeing code also had to be addressed. On top of that, variables such as pmd_huge_pte are no longer found in struct page at all, but still exist as important variables in ptdescs.

Organizing a patchset

Follow the linux kernel patch submission guidelines to prepare the patches. The patch submission guidelines can be found here: https://docs.kernel.org/process/submitting-patches.html.

Generally, the flow is easy to follow when the first patches are preparatory, followed by the creation of the descriptor. After that, introduce any helper functions followed by the removal of unique fields from struct page. Once ready, send the patchset to the mm mailing list linux-mm@kvack.org and wait for comments.

What’s next?

At this point, it’s time to start converting other users to use the new descriptor. Any intended users of the new descriptor eventually need to be converted away from using struct page. Depending on the complexity, some of these changes can be introduced as part of the introduction patchset as well. By slowly converting the rest of the code, eventually we can have your memory descriptor stand on its own!

Introducing Memdesc

What is a Memdesc ?

Guidelines for Memdescs

Introducing a Memdesc

Establishing Memdesc helper functions

Implementing a Memdesc

Organizing a patchset

What’s next?

Vishal Moola

Extend your OS upgrade timeline: Oracle Linux 7's Extended Support with expanded coverage

Speeding up Large Memory VM Boot with QEMU ThreadContext

Introducing Memdesc

What is a Memdesc ?

Guidelines for Memdescs

Introducing a Memdesc

Establishing Memdesc helper functions

Implementing a Memdesc

Organizing a patchset

What’s next?

Authors

Vishal Moola

Extend your OS upgrade timeline: Oracle Linux 7's Extended Support with expanded coverage

Speeding up Large Memory VM Boot with QEMU ThreadContext