Introduction
The blog “Anticipating Your Memory Needs” discussed the adaptivemm daemon which monitors the rate of consumption and reclamation of memory pages of different orders on a system. Using this information, adaptivemm builds a mathematical model of memory usage trend on the system and uses this model to anticipate potential upcoming memory shortage. This allows it to take actions ahead of time to force reclamation/compaction of memory pages before memory shortage happens.
An initial implementation of adaptivemm showed significant improvement in the number of allocation and compaction stalls. As this implementation has been put through significant use with real world workloads, further refinements have been made to the implementation to improve its behaviors. This update will discuss some of these learnings and how adaptivemm evolved to address these.
Impact of Hugepages
As adaptivemm was deployed on database systems, an important side-effect of how databases are deployed became a significant issue for adaptivemm. Database systems tend to use a large number of pre-allocated hugetlbfs pages. These pages are not subject to reclamation and on many of the systems as much as 90% of the memory can be allocated to hugetlbfs pages. When adaptivemm computes watermarks for such a system, it can compute a watermark that is too high for the amount of reclaimable memory and this can result in an OOM (Out Of Memory) killer being invoked immediately as soon as adaptivemm changes the watermark_scale_factor.
To address this issue, adaptivemm needs to compute a watermark scale factor based only on the reclaimable pages which was accounted for with a simple change to code:
/*
* Hugepages should not be taken into account for watermark
* calculations since they are not reclaimable
*/
total_managed -= total_hugepages;
To keep track of the current number of hugetlbfs pages on the system, adaptivemm reads the number of currently allocated hugeapges of all sizes from /sys/kernel/mm/hugepages every time it wakes up and samples current memory state. This allows adaptivemm to take any additional actions to account for changes in the number of hugepages on the system.
Taking Multiple Zones into Account
The initial implementation of adaptivemm only considered “Normal” zone memory pages. This resulted in sub-optimal behavior on systems where a significant amount of memory resides in other zones, especially the DMA32 zone. As a result adaptivemm was updated to include zones other than the “Normal” zone. This improved adaptivemm performance significantly on systems with memory of 16G or less.
Limiting Scaling up of Watermarks
Another observation made from adaptivemm behavior with real-world workloads was it could raise watermarks to values high enough to cause the current number of free pages to fall below the low watermark value. This resulted in the OOM killer killing user processes immediately. This is highly undesirable, so adaptivemm code was updated to ensure it does not change the watermark scale factor to a value that would compute a new low watermark value that is above the current number of free pages and thus avoid invoking the OOM killer inadvertently.
New Functionality in Version 1.5.0
Since adaptivemm computes the current number of hugetlbfs pages on the system every time it wakes up to evaluate system memory state, it can detect any changes in the number of hugetlbfs pages on the system. This new functionality now supports using changes to hugetlbfs pages as a trigger to make any additional changes to the system.
The adaptivemm codebase has now been further modularized to allow for one time initialization of any of the system tunables at adaptivemm startup and then continue to tune any of the system tunables triggered by changes to the system. Fluctuations in the number of hugetalbfs pages is currently implemented as a trigger. This modular design facilitates easily adding further triggers as required.
These two functions have been used to implement tuning a new tunable in Oracle’s Unbreakable Enterprise Kernel (UEK), that limits the number of negative dentries in cache based upon the current amount of non-hugepage memory on the system.
Future Functionality
adaptivemm can be tuned further to make smarter decisions about which actions to take in response to upcoming events. One such possibility is to use the recently introduced /proc/sys/vm/compaction_proactiveness file to initiate more gradual compaction if the system is expected to run out of higher order pages in the future.
There are other vm tunables that can be tuned automatically by adaptivemm based upon current workload. Some of the tunables that are being looked at are:
- watermark_boost_factor
- swappiness
- min_free_kbytes
- dirty_background_ratio
- ext_frag_threshold
- vfs_cache_pressure
Resources
The Linux Foundation hosted a series of Live Mentorship webinars in which adaptivemm was included. This webinar performs a deeper exploration of adaptivemm and is accompanied by a demo and system behavior data to demonstrate how adaptivemm improves system performance.
The latest release of adaptivemm is available from the git repo : https://github.com/oracle/adaptivemm.