Now It Can Be Told

Guest Author

For thirty years, people have speculated on the identity of Deep Throat.
Last week the mystery was finally revealed: the man called Deep Throat
was not John Dean, not G Gordon Liddy, not Al Haig -- no, it was none
other than Mark Felt. That's right, Mark Felt.

I'm sorry -- who? Who the f\*\*\* is Mark Felt? Felt, F-E-L-T? Let me
think for a minute... Felt... Felt... nope, never heard of him. Ever.
You've got to be kidding me. I've waited thirty years for this?

Well, if you think that revelation was a letdown, just keep scrolling:
because while the chattering classes in Washington were playing the
Deep Throat parlor game, an even less compelling question was nagging
at far fewer people: where did the
slab allocator
get its name?

About 20 years ago Kellogg's ran a commercial for Special K cereal
with the tag line, "Can you pinch an inch?" The implication was
that you were overweight if you could pinch more than an inch of fat
on your waist -- and that hoovering a bowl of corn flakes would help.

Back in high school I was watching TV with some friends, and this
commercial came on. The voice intoned: "Can you pinch an inch?"
Without missing a beat, Tommy, who weighed about 250 pounds, reached
for his midsection and offered his response: "Hell, I can grab a slab!"

Fast forward 10 years. I was debugging a heap corruption bug in the
Solaris 2.2 kernel. The kernel had actually crashed in the allocator
itself, as often happens when something corrupts the heap. Trying to
disentangle the state, I had the crash dump in one window and the source
code for the Solaris 2.2 kernel memory allocator in another.

The code had all the hallmarks of what we in the industry call "swill":
mysterious hard-coded constants, global locks, fixed-size arrays,
unions, macros that implicitly reference global state, and variable
names that are at once long and uninformative. A small taste:

if ((listp->fl_slack >= 2) || (tmpsiz == max)) {
listp->fl_slack -= 2;
HEADFREE(listp, bufp);
if (tmpsiz < max)
bufp->fb_union.fb_state = DELAYED;
else {
LOOKUP(addr, bufp->fb_union.fb_poolp);
if (--(bufp->fb_union.fb_poolp->bp_inuse) == 0 &&
bufp->fb_union.fb_poolp != Km_NewSmall &&
bufp->fb_union.fb_poolp != Km_NewBig) {
if (Km_IdleI < NIDLEP) {
if ((bufp->fb_union.fb_poolp->bp_status &
BP_IDLE) == 0) {
Km_Idlep[Km_IdleI++] = ...

and on and on it went.

All of this might have been tolerable if the allocator performed well,
or was space-efficient, or had some other redeeming virtue. But no.
It was big, fat, slow, and had to die. I went home that night determined
to burn it to the ground. (The allocator, not my home.)

The basic idea for the new allocator was to grab a page of memory,
carve it up into equal-size chunks, and use a reference count to
keep track of how many chunks were allocated. When the reference
count went to zero, the page could be freed.

That worked fine for small buffers (say 1/8 of a page or smaller),
but wouldn't quite work for larger buffers: those would require more
than one page. So I couldn't just grab a page; I'd have to grab a ...
something. "Grab" must have been a mental trigger: all of a sudden I
remembered Tommy, and how much I loved that line, and that sealed it:
we would grab a slab, and this would be called the slab allocator.

Now you know. And Mark Felt is looking pretty dang interesting.

Technorati Tag: href="http://www.technorati.com/tag/OpenSolaris" rel="tag">OpenSolaris

Technorati Tag: href="http://www.technorati.com/tag/Solaris" rel="tag">Solaris

Join the discussion

Comments ( 3 )
  • RJ Davis Saturday, September 10, 2005
    When will ZFS arrive in Solaris 10? Will I be able to use it for my root disk? Any updated documentation yet? Thanks.
  • kimo Monday, October 3, 2005
    Have you seen this posting re:
    distributed lock-free slab-allocator pseudo-code...
    A per-thread memory pool /w simple lock-free LIFO can be used for a
    slab-allocator. Blocks of memory are allowed to be allocated in one thread
    and freed in another thread. A LIFO single-linked list allows a thread to
    efficiently pass the block back to the thread that allocated it. A simple
    reference count is used to keep track of how many allocations a thread has
    Psuedo code below from Chris Thomasson
  • zuzu Wednesday, December 14, 2005
    Matt Dillon is implementing the slab allocator architecture for his FreeBSD fork, entitled DragonFly BSD. I would like more attention (such as by the Google search engine) drawn to this fact.



    Following this point is Matt Dillon's explanation of the Slab Allocator, taken right from the cvs entry:


    The slab allocator breaks allocations up into approximately 80 zones based on their size. Each zone has a chunk size (alignment). For example, all allocations in the 1-8 byte range will allocate in chunks of 8 bytes. Each size zone is backed by one or more blocks of memory. The size of these blocks is fixed at ZoneSize, which is calculated at boot time to be between 32K and 128K. The use of a fixed block size allows us to locate the zone header given a memory pointer with a simple masking operation.

    The slab allocator operates on a per-cpu basis. The cpu that allocates a zone block owns it. free() checks the cpu that owns the zone holding the memory pointer being freed and forwards the request to the appropriate cpu through an asynchronous IPI. This request is not currently optimized but it can theoretically be heavily optimized ('queued') to the point where the overhead becomes inconsequential. As of this commit the malloc_type information is not MP safe, but the core slab allocation and deallocation algorithms, non-inclusive the having to allocate the backing block, ARE MP safe. The core code requires no mutexes or locks, only a critical section.

    Each zone contains N allocations of a fixed chunk size. For example, a 128K zone can hold approximately 16000 or so 8 byte allocations. The zone is initially zero'd and new allocations are simply allocated linearly out of the zone. When a chunk is freed it is entered into a linked list and the next allocation request will reuse it. The slab allocator heavily optimizes M_ZERO operations at both the page level and the chunk level.

    The slab allocator maintains various undocumented malloc quirks such as ensuring that small power-of-2 allocations are aligned to their size, and malloc(0) requests are also allowed and return a non-NULL result. kern_tty.c depends heavily on the power-of-2 alignment feature and ahc depends on the malloc(0) feature. Eventually we may remove the malloc(0) feature.
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.