What is MemAvailable?
MemAvailable is a statistic reported in /proc/meminfo file; it’s a rough estimate of how much memory can be easily allocated on that system, without creating memory pressure – i.e. without swapping or direct reclaim. This includes some page cache memory (since inactive page cache pages can be reused easily without I/O if they’re not dirty – so there’s no cost to reclaiming those pages) but not all page cache – some page cache pages are bound to be “hot” – i.e. they are in use and might get read in again if reclaimed.
This also includes some reclaimable slab caches – just like with the page cache, it roughly estimates that around half of it can be reclaimed easily. MemAvailable calculation excludes reserved pages in each zone, computed as lowmem_reserve per zone, plus high watermarks per zone – this ensures that a memory allocation does not deplete a zone too much.
Typically, there are 3 zones (sometimes more) – DMA, DMA32, and Normal. DMA and DMA32 zones are used by legacy devices/drivers which have limitations on how much memory they can address. If the device can only access addresses below 4 GB, it would be limited to allocations from the DMA32 zone (and lower – i.e. DMA). Normal zone is the rest of RAM, which is not reserved for legacy drivers, and available for all users. lowmem_reserve_ratio is a sysctl tuneable that controls how much memory per zone is reserved for those zones and protected from being exhausted by allocations from “higher” zone. For instance, zones DMA and DMA32 might have non-zero lowmem_reserve_ratio numbers so that they’re not overused by zone Normal allocations. (Whereas zone Normal does not need to be “protected” in this manner from other zones.)
Every zone has 3 watermarks – min, low, high. These values affect when kswapd is woken up for background memory reclaim (when free memory dips below the low watermark of that zone), or when processes might block on direct reclaim (when free memory dips further down, below the min watermark of a zone), or when the Out Of Memory (OOM) killer is invoked (when reclaim does not succeed). When free memory per zone is above its high watermark, it signifies that there’s enough free memory in that zone, and kswapd can go back to sleep (if it was running).
If the system has a high number of reserved pages (either lowmem_reserve per zone or high watermark values) and not too much page cache or reclaimable slab cache, it is possible for MemAvailable to be the same as, or even less than MemFree.
How is MemAvailable computed?
Let’s dissect the code that computes MemAvailable. The function where this is done is si_mem_available(). The data here was sampled from an ARM system running a UEK6 (5.4.17-2136.331.7.el8uek.aarch64) kernel, with a base PAGE_SIZE of 64 KB. But the code is the same in newer kernels – this calculation hasn’t changed in UEK7, UEK8 or latest upstream (as of v6.19).
This is an ARM system with a PAGE_SIZE of 64 KB:
# getconf PAGE_SIZE
65536
1. The baseline for MemAvailable is of course, MemFree, which is NR_FREE_PAGES:
available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages;
The number of free pages can be read from /proc/vmstat. On the test system:
nr_free_pages 28052
2. Next: how is totalreserve_pages computed?
totalreserve_pages represents the amount of memory that is protected, or reserved, in that zone, and thus is not available for general allocations. As mentioned earlier, it includes two parts: high watermark per zone, as well as any lowmem_reserve pages. The function that computes this is calculate_totalreserve_pages() in mm/page_alloc.c.
...
/* Find valid and maximum lowmem_reserve in the zone */
for (j = i; j < MAX_NR_ZONES; j++) {
if (zone->lowmem_reserve[j] > max)
max = zone->lowmem_reserve[j];
}
/* we treat the high watermark as reserved pages. */
max += high_wmark_pages(zone);
...
lowmem_reserve is computed for each zone in setup_per_zone_lowmem_reserve():
...
lower_zone->lowmem_reserve[j] =
managed_pages / sysctl_lowmem_reserve_ratio[idx];
...
This value is also reported in /proc/zoneinfo as protection:
$ grep reserve sos_commands/kernel/sysctl_-a
vm.lowmem_reserve_ratio = 256 256 32 0
$ grep -E 'Node|protection' proc/zoneinfo
Node 0, zone DMA
protection: (0, 0, 758, 758)
Node 0, zone DMA32
protection: (0, 0, 758, 758)
Node 0, zone Normal
protection: (0, 0, 0, 0)
Node 0, zone Movable
protection: (0, 0, 0, 0)
High watermarks per zone are also available in /proc/zoneinfo. These watermarks cannot be directly set by the user – however, sysctls like vm.min_free_kbytes, vm.watermark_scale_factor, and vm.watermark_boost_factor directly influence the watermark values.
$ grep -E 'Node|high ' proc/zoneinfo
Node 0, zone DMA
high 9382
Node 0, zone DMA32
high 0
Node 0, zone Normal
high 12399
Node 0, zone Movable
high 0
$ grep -i min_free sos_commands/kernel/sysctl_-a
vm.min_free_kbytes = 230400
Adding these up, totalreserve_pages comes up to:
max(0, 0, 758, 758) + 9382 (DMA) +
max(0, 0, 758, 758) + 0 (DMA32) + <-- This does not count; see below
0 + 12399 (Normal) +
0 + 0 (Movable)
= 22539
DMA32 can be ignored as its managed_pages is 0:
...
if (max > managed_pages)
max = managed_pages;
...
So now, available = 28052 – 22539 = 5513.
3. Let’s include some page cache:
...
/*
* Not all the page cache can be freed, otherwise the system will
* start swapping. Assume at least half of the page cache, or the
* low watermark worth of cache, needs to stay.
*/
pagecache = pages[LRU_ACTIVE_FILE] + pages[LRU_INACTIVE_FILE];
pagecache -= min(pagecache / 2, wmark_low);
available += pagecache;
...
From /proc/vmstat:
nr_zone_inactive_file 29129
nr_zone_active_file 10262
Here, wmark_low is computed as:
for_each_zone(zone)
wmark_low += low_wmark_pages(zone);
which is:
$ grep -E 'Node|low ' proc/zoneinfo
Node 0, zone DMA
low 9184
Node 0, zone DMA32
low 0
Node 0, zone Normal
low 11698
Node 0, zone Movable
low 0
i.e. wmark_low = 9184 + 11698 = 20882.
So now:
pagecache = 29129 + 10262 = 39391
pagecache = 39391 - min(39391/2, 20882) = 19696
available += pagecache
available = 5513 + 19696 = 25209
4. Next: include some reclaimable slab cache
...
/*
* Part of the reclaimable slab and other kernel memory consists of
* items that are in use, and cannot be freed. Cap this estimate at the
* low watermark.
*/
reclaimable = global_node_page_state_pages(NR_SLAB_RECLAIMABLE_B) +
global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE);
available += reclaimable - min(reclaimable / 2, wmark_low);
...
From /proc/vmstat again:
nr_slab_reclaimable 5903
nr_kernel_misc_reclaimable 0
That is:
reclaimable = 5903
available += (5903 - min(5903/2, 20882) = 2952
available = 25209 + 2952 = 28161 pages
With a PAGE_SIZE of 64 KB, MemAvailable is therefore 1802304 KB, and MemFree is 1795328 KB. This more or less matches what’s in /proc/meminfo:
MemTotal: 15943040 kB
MemFree: 1794944 kB
MemAvailable: 1801984 kB
Buffers: 15872 kB
Cached: 2782464 kB
...
That is, even with a page cache of ~2.5 GB, MemAvailable is more or less the same as MemFree, which is normal for this system. On another aarch64 system (with a real workload), the numbers were more stark:
# getconf PAGE_SIZE
65536
$ cat /proc/meminfo
MemTotal: 15943040 kB
MemFree: 1929792 kB <--
MemAvailable: 326336 kB <--
Buffers: 305920 kB
Cached: 2348480 kB
SwapCached: 0 kB
Active: 3283840 kB
Inactive: 286208 kB
Active(anon): 1059264 kB
Inactive(anon): 9728 kB
Active(file): 2224576 kB
Inactive(file): 276480 kB
...
For comparison, here are the numbers on an idle x86_64 test VM (with a 4 KB PAGE_SIZE):
$ uname -r
5.4.17-2136.337.2.el8uek.x86_64
$ getconf PAGE_SIZE
4096
$ grep -E 'Node|protection' /proc/zoneinfo
Node 0, zone DMA
protection: (0, 2631, 15602, 15602, 15602)
Node 0, zone DMA32
protection: (0, 0, 12970, 12970, 12970)
Node 0, zone Normal
protection: (0, 0, 0, 0, 0)
Node 0, zone Movable
protection: (0, 0, 0, 0, 0)
Node 0, zone Device
protection: (0, 0, 0, 0, 0)
$ grep -E 'Node|high ' /proc/zoneinfo
Node 0, zone DMA
high 21
Node 0, zone DMA32
high 4269
Node 0, zone Normal
high 21049
Node 0, zone Movable
high 0
Node 0, zone Device
high 0
# sysctl vm.lowmem_reserve_ratio
vm.lowmem_reserve_ratio = 256 256 32 0 0
# sysctl vm.min_free_kbytes
vm.min_free_kbytes = 67584
# less /proc/meminfo
MemTotal: 16078384 kB
MemFree: 14968324 kB
MemAvailable: 15218692 kB
Buffers: 2248 kB
Cached: 491164 kB
totalreserve_pages = 15602 + 12970 + 21 + 4269 + 21049 = 53911 pages = 211 MB.
This is a much smaller buffer to keep aside as reserved, compared to the aarch64 system, and both have the same amount of RAM (16 GB) – the only difference is the default PAGE_SIZE.
Summary
I’d like to emphasize on just two takeaways from this post:
- MemAvailable is not an accurate statistic; it’s just a rough back-of-the-envelope calculation for how much memory is available on your system.
- On systems with bigger default PAGE_SIZE values, MemAvailable can be quite low, due to reserved memory.
If the latter leads to memory pressure or performance problems, consider lowering the zone watermarks or disabling watermark boosting altogether. Watermark boosting is done in the reclaim/compaction flow to let kswapd run a little longer, so that it can free up more pages and create more higher-order chunks (thus reducing fragmentation). This boost will decay naturally, over time, as memory pressure eases. But boosting the watermarks also increases the amount of memory the kernel thinks is reserved – thereby lowering MemAvailable. Based on your workload and system configuration, you might benefit from disabling watermark boosting if you’re seeing very low MemAvailable values. This needs to be done carefully, after expert analysis, as it could do more harm than good if there is real memory pressure on the system.