Detecting Kernel Memory Leaks using adaptivemm

Introduction

Memory is a very valuable resource on a system and workload designers put in significant effort to maximize the use of all memory on the system. Memory leak on a system that is designed to put all of the memory to use can be severely disruptive to performance, stability and availability on the system. Detecting and preventing memory leaks is critically important.

Memory leaks can happen in userspace or kernel space. There are many tools available to detect potential memory leaks in a userspace application during the development phase. Kernel memory leaks can be a little harder to detect. Memory leaks in the kernel can happen from subsystems allocating memory and releasing references to the allocated memory without freeing it. Linux kernel provides tools like kmemleak that scans for memory objects that have no references but have not been freed. This scanning does have a performance impact and may not be suitable to be enabled on a production machine. Bugs in kernel code can result in much more difficult to detect memory leaks. Cases where a module/subsystem allocates pages, maintains reference to the page but never uses the page again while continuing to allocate pages due to a bug in the code can be much harder to detect. There is also another effort to detect memory leaks under BCC (BPF Compiler Collection) tools – memleak.

Leveraging adaptivemm’s periodic scan

adaptivemm is a daemon that monitors a Linux system proactively. It scans the system to build a model of system behavior based upon recent trends and updates this model of system behavior regarding memory use during periodic scans. These scans can also be used to check if the system memory use is growing suspiciously and raise an alarm early on so system admins can check if the system is experiencing a memory leak. By looking at memory usage reported by /proc/meminfo, /proc/zoneinfo and /proc/vmstat, adaptivemm can check if the reported memory in use adds up to the amount of memory on the system. If the amount of memory that can not be accounted for in memory usage reported by the kernel goes up consistently, this could be an indication the system might be leaking memory. When that happens, adaptivemm can report a potential memory leak, record the data it used to come to this conclusion and how it arrived at this conclusion. This information can then help a system admin investigate where the possible sources of memory leak might be.

adaptivemm is available as an rpm package in UEK repositories for Oracle Linux. Source code is available from github. adaptivemm operation can be configured through the configuration file – /etc/sysconfig/adaptivemmd or /etc/default/adaptivemmd (depending upon Linux distribution). The adaptivemm daemon can be launched through systemd or init services. It can also be started from the command line by root user with:

$ /usr/sbin/adaptivemmd

Accounting for memory use

The memory management subsystem in the Linux kernel maintains a large number of statistics to account for various uses of memory. These statistics are reported in /proc/meminfo and /proc/vmstat. Not all fields in these files represent unique pages and some of the memory pages may be counted in more than one field. This requires that when accounting for memory, fields with overlapping accounting of memory be used carefully to avoid double counting any memory. With that in mind, fields of interest from /proc/meminfo are:

AnonPages: Non-file backed pages mapped into userspace
Buffers: Cache for disk blocks
Cached: Cache for files on disk (pagecache) as well as tmpfs and shmem
CmaTotal: Memory reserved for the Contiguous Memory Allocator (CMA)
KReclaimable: Kernel allocations that the kernel will attempt to reclaim under memory pressure. Includes other direct allocations with a shrinker
KernelStack: Memory consumed by the kernel stacks of all tasks
MemTotal: Total usable RAM on the system
PageTables: Memory consumed by userspace page table
SwapCached: Memory that once was swapped out, is swapped back in but still also is in the swapfile (if memory is needed it doesn’t need to be swapped out AGAIN because it is already in the swapfile. This saves I/O)
SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure
SecPageTables: Memory consumed by secondary page tables
Unevictable: Memory allocated for userspace which cannot be reclaimed, such as mlocked pages, ramfs backing pages, secret memfd pages etc.
MemFree: Memory currently not in use

There is one more category of memory use that is not completely accounted for in /proc/meminfo and /proc/vmstat and that would be hugetlbfs pages. A system might support multiple hugepage sizes and to account for these pages correctly, we need to look at /sys/kernel/mm/hugepages which lists the amount of memory used by pre-allocated hugepages of each supported size. Adding all of the memory consumed by pages of each supported size gives us the total memory consumed by hugepages.

To arrive at the memory that can be accounted for, we can use the following formula:

accounted_mem = AnonPages + Buffers + Cached + CmaTotal + KReclaimable + KernelStack + PageTables + SwapCached + SUnreclaim + SecPageTables + Unevictable + MemFree + hugepages

That gives us the amount of memory not accounted for as:

unaccounted_mem = MemTotal – accounted_mem

Memory not accounted for is consumed by the kernel, firmware and drivers/modules text and memory directly allocated through the buddy allocator. This memory is difficult to track down. Unaccounted memory will grow and shrink as kernel subsystems call the buddy allocator to allocate memory for their use. There is a baseline memory that will always be in use by the kernel, firmware and drivers/modules. adaptivemm can track the floor for the unaccounted memory value and the floor most likely represents the baseline memory in use. Ideally adaptivemm will start immediately after all drivers/modules have been loaded by the kernel so it can establish a good value for baseline memory use, but adaptivemm can be started or restarted any time. This requires adaptivemm to track where the floor is for this baseline for memory in use.

As an example, this is what /proc/meminfo reports on a system that has been running for a few days:

  MemTotal:       790680500 kB
  MemFree:        785261676 kB
  Buffers:           751908 kB
  Cached:           1710116 kB
  SwapCached:             0 kB
  Unevictable:        16576 kB
  AnonPages:          83392 kB
  KReclaimable:      331748 kB
  SUnreclaim:        281952 kB
  KernelStack:        15168 kB
  PageTables:          2148 kB
  SecPageTables:          0 kB
  CmaTotal:               0 kB

Number of hugepages are reported as:

$ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
0

$ cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
0

As a result:

hugepages = 0 * 1048576 + 0 * 2048 = 0

On this system, adding all the memory we can account for:

accounted_mem= 83392 + 751908 + 1710116 + 0 + 331748 + 15168 + 2148 + 0 + 281952 + 0 + 16576 + 785261676 + 0 = 788454684

Now:

unaccounted_mem = 790680500 – 788454684 = 2225816

We have 2225816 kB of memory that in unaccounted for which is consumed by the kernel, firmware and drivers/modules at this point. If adaptivemm is started at this time, it will set this value as the baseline memory use. As the system continues to run, any new lower values for unaccounted memory will reset the baseline memory use, for example from the adaptivemm log on the same system from later:

adaptivemmd[211429]: Base memory consumption updated to 1541744 K

Messages from adaptivemm are logged using the syslog() facility and can be found in the location where syslog is configured to log to.

Putting it all together

adaptivemm is composed of multiple source code level modules that are each invoked during its periodic scan. All of these function modules are compiled into a single adaptivemmd binary. A new module to check for potential memory leak has been added to adaptivemm under a new function. This functionality has an associated configuration option “ENABLE_MEMLEAK_CHECK”. Memory leak check in adaptivemm can be disabled by setting this option to 0. Memory leak check is enabled by default. The memory leak check module is first initialized by computing a baseline memory use from current memory use on the system.

The memory leak check function is invoked by the main function in adaptivemm during its periodic scan of the system. During each scan, unaccounted memory is computed using the formula introduced previously. If unaccounted memory is lower than the baseline memory use, the baseline is updated to this new value so it tracks the floor for this value.

During each scan, if unaccounted memory grows by more than 10% compared to the last scan for 10 consecutive scans, it may signal a slow memory leak. This instance is logged along with the numbers that led to this conclusion, current contents of /proc/meminfo and relevant fields of /proc/meminfo that changed by more than 10%. If unaccounted memory more than doubles over 3 consecutive scans, it may indicate a sudden memory leak. This instance is logged along with the same data as before.

System admins can monitor adaptivemm logs for messages regarding memory leaks and investigate based upon the fields in /proc/meminfo that changed when adaptivemm suspects a memory leak. At higher logging levels, the memory leak check module logs messages to show how it is updating the baseline memory usage and logs changes to unaccounted memory by more than 10% over consecutive scans.

Detecting Kernel Memory Leaks using adaptivemm

Introduction

Leveraging adaptivemm’s periodic scan

Accounting for memory use

Putting it all together

References

Khalid Aziz

Speeding up Large Memory VM Boot with QEMU ThreadContext

Check out UEK-next, now updated to the 6.9 kernel

Detecting Kernel Memory Leaks using adaptivemm

Introduction

Leveraging adaptivemm’s periodic scan

Accounting for memory use

Putting it all together

References

Authors

Khalid Aziz

Speeding up Large Memory VM Boot with QEMU ThreadContext

Check out UEK-next, now updated to the 6.9 kernel