Overview
Since the introduction of the maple tree[1] to the linux kernel, many enhancements have been upstreamed to improve the performance of the tree. One such feature in v6.12 is the addition of store type tracking[2]. This feature adds an enum[3] to struct ma_state[4] which tracks the type of store that will occur throughout the write code paths.
Motivation
Writing to the maple tree with preallocated memory requires performing two walks of the tree. The first walk is used to determine how many nodes to allocate in order to perform this write. The second walk is used to complete the write by inserting the given entry into a leaf node of the tree. In between these two walks, information such as the topography of the tree is lost and thus cannot be used in the second walk. The new store type feature works by saving information from the first walk and retaining it in the maple state so it can be used to optimize the second step of completing the write. The information that is saved is the type of store that’s needed to complete the write. These types are expressed in the store type enum. Each store type corresponds to an already existing function in the maple tree code and is used to execute a write depending on the different conditions of the current tree. The two step preallocation + write pattern can now be consolidated into one walk of the tree: when performing the write, mas->store_type will be populated and the code can immediately dispatch to the correct helper function and execute the write directly rather than performing an additional walk to determine which helper function to use.
Implementation
The maple tree code has multiple helper functions that perform a write based on different conditions of the tree. These conditions are listed in the below store_type enum. Each type of store also corresponds to a number of maple node allocations that are needed to complete the store.
enum store_type { wr_invalid, wr_new_root, wr_store_root, wr_exact_fit, wr_spanning_store, wr_split_store, wr_rebalance, wr_append, wr_node_store, wr_slot_store, };
wr_invalid
The wr_invalid case is used as an initial value on the creation of a maple state. It is further used to check for errors on the write path, as all writes must go through a code path which set a non wr_invalid value for the store type field.
wr_new_root
The wr_new_root case represents a write which overwrites the complete contents of the tree. This means the range of the write spans from 0 to ULONG_MAX. To complete this write, we create a new node which contains that entry, and set the tree’s root to be this node.
wr_store_root
The wr_store_root case occurs when a write happens on an empty tree or a pointer encoded tree.
wr_exact_fit
The wr_exact_fit case represents a write in which the span of this store matches the range of an already existing entry in the tree. To complete the write, simply overwrite the entry contained in this entry to the new entry requested. In the diagram below, the range of the write exactly matches the existing entry stored from a to b.
a b c d +--------+---+-------+ | | | | +--------+---+-------+ ^ ^ index. last
wr_spanning_store
The wr_spanning_store case occurs when a write has a range that would start in one node and end in another node. This situation is detected through the function mas_is_span_wr(). The diagram below shows how the range of the write starts at the leaf node l and extends to the sibling node r.
+--------+---+-------+ | | | | +--------+---+-------+ | | | | +----+-------+ +---------+--+ | l | | | r | | +----+-------+ +---------+--+ ^ ^ |__________________________| index last
wr_rebalance
The wr_rebalance case occurs when an entry is being erased from the tree. For a maple node to be valid, a minimum number of entries must be kept in a node. This number is defined in the code within the mt_min_slots[]
array[5]. As a result of this erase operation, the node may now contain less than the mt_min_slots
amount. A node in this state is defined as being insufficient. To fix this insufficiency, the node can be absorbed into a sibling node that has enough space for insufficient node’s entries as well as its own existing entries. The original node and its sibling have now been merged into one node and their parent node has now lost one entry because of this merge of two nodes to one. This can cause further insufficiencies in the parent and the rebalance operation will cascade up the tree. If the sibling node does not have enough space, the two nodes are combined and then split into two nodes which meet the minimum sufficiency requirements.
wr_split_store
The wr_split_store case occurs when trying to add an additional entry to a node that is already full. As a result, the node is split into two nodes. This causes an additional entry in the parent node. If the parent node is full, additional upward splits occur until the tree is valid.
wr_append
The wr_append case occurs when an entry is being added to the last slot of a node. This write can be completed without allocating any additional nodes. Append writes are uncommon because they cannot occur in RCU (Read-Copy Update) mode.
wr_node_store
A wr_node_store occurs when a write results in enough change to require a new node. This can happen through the need for shrinking or growing of the original node.
wr_slot_store
The wr_slot_store case involves writes which can be completed without having to allocate a new node. An example of this scenario could be expanding or shrinking the boundaries of an existing entry.
Results
Results were measured using the Phoronix t-test-1[6] memory test and stress-ng[7] mmap.
Phoronix t-test-1 (Seconds < Lower Is Better) v6.10-rc6 Threads: 1 33.15 Threads: 2 10.81 v6.10-rc6 + store type enum addition Threads: 1 32.69 Threads: 2 10.45 Stress-ng mmap 6.10_base store_type_v4 Duration User 2744.65 2769.40 Duration System 10862.69 10817.59 Duration Elapsed 1477.58 1478.35
Although performance results are not drastic, this change adds readability to the write code paths and removes an unneeded extra traversal of the maple tree.
References
- https://blogs.oracle.com/linux/post/the-maple-tree-a-modern-data-structure-for-a-complex-problem
- https://lore.kernel.org/lkml/20240814161944.55347-1-sidhartha.kumar@oracle.com/T/
- https://github.com/torvalds/linux/blob/master/include/linux/maple_tree.h#L151
- https://github.com/torvalds/linux/blob/master/include/linux/maple_tree.h#L438
- https://github.com/torvalds/linux/blob/master/lib/maple_tree.c#L126
- https://openbenchmarking.org/test/pts/t-test1
- https://github.com/ColinIanKing/stress-ng/blob/master/stress-mmap.c