Solaris 11.3: Optimized Shared Memory (OSM)

One of the new features in Solaris 11.3 is Optimized Shared Memory
(OSM), which is a new mode for System V shared memory designed to marry
the best parts of Intimate Shared Memory (ISM) and
Dynamic Intimate Shared Memory (DISM). OSM has been a
private feature in Solaris since Solaris 11 and Solaris 10 update 11,
but the interfaces were not documented outside of Oracle. The only
consumer has been the Oracle Database, which uses OSM instead of DISM
from Oracle 12c onwards.[1]

In Solaris 11.3, we are now documenting and supporting the use of OSM for a
wider class of consumers. The OSM kernel support has also been substantially

OSM segments are created using shmget_osm(2), a new interface
similar to shmget(2), but which adds a "granule_size" argument:

  int shmget_osm(key_t key, size_t size, int shmflg, size_t granule_size);

granule_size is a power-of-2 greater than or equal to
sysconf(_SC_OSM_PAGESIZE_MIN). size must be a
multiple of granule_size.

The granule_size is the unit of operation on the OSM
segment; the segment will be mapped aligned to a granule_size boundary,
and memcntl(2) and madvise(2) operations on the SHM
segment must also be aligned to granule_size boundaries. It also bounds
the largest pagesizes that the segment can use; so on machines which
support very large pages (256m, 1g, 2g or
larger), larger granule sizes will allow the system to use larger
mappings, increasing your TLB reach.

When shmat(2)ing SHM segments created with shmget_osm(2),
shmflg may not contain SHM_SHARE_MMU,
and shmaddr must either be 0 or an address aligned to a multiple
of the segment's granule_size. (the resulting address returned by
a successful shmat() will be aligned to
granule_size). An OSM segment can simultaneously be mapped
SHM_RDONLY and read-write by different processes.[2]

At creation time, an OSM SHM segment is entirely unlocked; any attempt
to access any part of it will fail with FLTBOUNDS (SIGSEGV) with si_code
of SEGV_ACCERR. Calling:

   ret = memcntl(addr, len, MC_LOCK_GRANULE, 0, 0, 0);

attempts to "lock" the requested range, which must target a OSM SHM
segment, and both addr and len must be aligned to the OSM's granule
size. If successful, any parts of the range which were not already
"lock"ed will now be accessible and filled with zeros. The locked
memory will be charged against the project.max-locked-memory and
zone.max-locked-memory resource controls.

If a portion of the OSM segment is no longer needed, calling:

   ret = memcntl(addr, len, MC_UNLOCK_GRANULE, 0, 0, 0);

(again, we must target an OSM segment, and addr and len must be multiples
of the segment's granule_size) will "unlock" the covered granules.
Once the granules are unlocked, any access to them will SIGSEGV like before,
and any data which was in the granules will be thrown away.

Unlike MC_LOCK/mlock(3C), granule locks are not associated
with the process which established them; the memory is locked down until
either the granules are unlocked or the shared memory segment is
IPC_RMIDed and all mappings to it are gone.

In combination, OSM gives you a shared memory segment which, like ISM,
does not require a process running as root[3],
but whose memory usage can be adjusted dynamically, like DISM.

The new 11.3 OSM kernel support uses significantly less kernel data than
the pre-11.3 implementation. This means that both locking and unlocking
granules, as well as the final destruction of the SHM segment, are much
more efficient in 11.3.

For the Oracle Database, the speed of granule locking is mostly hidden
from the administrator, because the database can report "ready" (and
return from the "STARTUP" operation) long before the SGA is fully allocated
and available. On the other hand, database shutdown waits until the
SGA destruction is complete. In our testing, database shutdown time
was where speedups were the most noticeable.

[1] Oracle 12c uses OSM when the SGA
size is allowed to vary; I'm not a DBA, but I believe OSM is used when
SGA_MAX_SIZE and SGA_TARGET parameters are set.

[2] Mapping an OSM segment read-only and
read-write is only efficient on 11.3 and later. This can be detected
programatically by testing sysconf(_SC_OSM_VERSION) >= 2; if so, such mappings are efficient.

[3] DISM is locked using
mlock(3C), which requires a process (Oracle's is called
oradism) with the proc_lock_memory privilege(5)
(see privileges(5); root runs with all privileges) to hold onto the locked memory. If that process dies
for any reason, the memory becomes unlocked, causing I/O performance
degradation. ISM and OSM are governed by the max-locked-memory
resource-control(5), which can be set on a per-project or
per-zone basis, and do not require additional privileges to lock the

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.