X

News, tips, partners, and perspectives for the Oracle Solaris operating system

Changes to ZFS ARC Memory Allocation in 11.3

New in Solaris 11.3 is a kernel memory allocation subsystem called the
kernel object manager, or KOM. The first consumer of this subsystem is the
ZFS ARC.

Prior to Solaris 11.3, the ZFS ARC allocated its memory from the kernel heap
space using kmem caches. This has several drawbacks: first, internal
fragmentation can result in memory used by the ARC not being reclaimed by the
system. This problem is particularly acute if large pages are being used, since
the buffer size is considerably smaller than the large page size -- even one
buffer still allocated will prevent the system from freeing the large page.
Another drawback of ZFS ARC using the kernel heap is that all of the kernel
heap is non-relocatable in memory, and thus must reside in the kernel cage.
This can lead to issues allocating large pages or performing DR memory remove
operations once the ARC has grown large, even if it shrinks successfully. As a
workaround for the cage growth issue, many sysadmins have limited the size of
the ZFS ARC cache in /etc/system. Finally, scalability of ARC shrinking prior
to Solaris 11.3 is limited by heap page unmapping speed on large SPARC systems.

In Solaris 11.3, the ZFS ARC allocates its memory through KOM. The metadata
which is frequently accessed by ZFS (such as directory files) remains in
the kernel cage, but the vast majority of the cache which is not frequently
accessed by ZFS now resides outside of the kernel cage, where it can be
relocated by DR and page coalescing. KOM uses a slab size of 2M on x86 or 4M on
SPARC, so internal fragmenation is much less of an issue than it was with 256M
heap pages on SPARC. Scalability is vastly improved, as KOM takes advantage of
64-bit systems by using the seg_kpm framework for its address translations.

With this change, many systems which required limiting the ARC size will no
longer require a hard limit, since the system is able to manage its memory much
better. Metadata heavy workloads, and systems hosting kernel zones, will still
need to limit the ARC size through /etc/system tuning in Solaris 11.3, however.

Join the discussion

Comments ( 2 )
  • sean Tuesday, September 1, 2015

    One issue I've had with ZFS ARC is vmstat won't report ARC as 'free' memory. On the contrast with UFS vmstat would report cachelist in 'free' column.

    Did/will this usage of KOM 'fix' this issue?


  • Eric Lowe Tuesday, November 10, 2015

    ZFS ARC is not 'free' since it does not use the page cache. There are no plans to change this, it was an architectural decision in ZFS.


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha