Now It Can Be Told

For thirty years, people have speculated on the identity of Deep Throat. Last week the mystery was finally revealed: the man called Deep Throat was not John Dean, not G Gordon Liddy, not Al Haig -- no, it was none other than Mark Felt. That's right, Mark Felt.

I'm sorry -- who? Who the f\*\*\* is Mark Felt? Felt, F-E-L-T? Let me think for a minute... Felt... Felt... nope, never heard of him. Ever. You've got to be kidding me. I've waited thirty years for this?

Well, if you think that revelation was a letdown, just keep scrolling: because while the chattering classes in Washington were playing the Deep Throat parlor game, an even less compelling question was nagging at far fewer people: where did the slab allocator get its name?

About 20 years ago Kellogg's ran a commercial for Special K cereal with the tag line, "Can you pinch an inch?" The implication was that you were overweight if you could pinch more than an inch of fat on your waist -- and that hoovering a bowl of corn flakes would help.

Back in high school I was watching TV with some friends, and this commercial came on. The voice intoned: "Can you pinch an inch?" Without missing a beat, Tommy, who weighed about 250 pounds, reached for his midsection and offered his response: "Hell, I can grab a slab!"

Fast forward 10 years. I was debugging a heap corruption bug in the Solaris 2.2 kernel. The kernel had actually crashed in the allocator itself, as often happens when something corrupts the heap. Trying to disentangle the state, I had the crash dump in one window and the source code for the Solaris 2.2 kernel memory allocator in another.

The code had all the hallmarks of what we in the industry call "swill": mysterious hard-coded constants, global locks, fixed-size arrays, unions, macros that implicitly reference global state, and variable names that are at once long and uninformative. A small taste:

mutex_enter(&kma_lock);
if ((listp->fl_slack >= 2) || (tmpsiz == max)) {
        listp->fl_slack -= 2;
        HEADFREE(listp, bufp);
        if (tmpsiz < max)
                bufp->fb_union.fb_state = DELAYED;
        else {
                LOOKUP(addr, bufp->fb_union.fb_poolp);
                if (--(bufp->fb_union.fb_poolp->bp_inuse) == 0 &&
                   bufp->fb_union.fb_poolp != Km_NewSmall &&
                   bufp->fb_union.fb_poolp != Km_NewBig) {
                        if (Km_IdleI < NIDLEP) {
                                if ((bufp->fb_union.fb_poolp->bp_status &
                                    BP_IDLE) == 0) {
                                        Km_Idlep[Km_IdleI++] = ...

and on and on it went.

All of this might have been tolerable if the allocator performed well, or was space-efficient, or had some other redeeming virtue. But no. It was big, fat, slow, and had to die. I went home that night determined to burn it to the ground. (The allocator, not my home.)

The basic idea for the new allocator was to grab a page of memory, carve it up into equal-size chunks, and use a reference count to keep track of how many chunks were allocated. When the reference count went to zero, the page could be freed.

That worked fine for small buffers (say 1/8 of a page or smaller), but wouldn't quite work for larger buffers: those would require more than one page. So I couldn't just grab a page; I'd have to grab a ... something. "Grab" must have been a mental trigger: all of a sudden I remembered Tommy, and how much I loved that line, and that sealed it: we would grab a slab, and this would be called the slab allocator.

Now you know. And Mark Felt is looking pretty dang interesting.


Technorati Tag:
Technorati Tag:
Comments:

When will ZFS arrive in Solaris 10? Will I be able to use it for my root disk? Any updated documentation yet? Thanks.

Posted by RJ Davis on September 10, 2005 at 01:27 AM PDT #

Have you seen this posting re: distributed lock-free slab-allocator pseudo-code... A per-thread memory pool /w simple lock-free LIFO can be used for a slab-allocator. Blocks of memory are allowed to be allocated in one thread and freed in another thread. A LIFO single-linked list allows a thread to efficiently pass the block back to the thread that allocated it. A simple reference count is used to keep track of how many allocations a thread has made: Psuedo code below from Chris Thomasson http://groups.google.com/group/comp.programming.threads/browse_frm/thread/8245f4b48591fc69

Posted by kimo on October 03, 2005 at 03:47 PM PDT #

Matt Dillon is implementing the slab allocator architecture for his FreeBSD fork, entitled DragonFly BSD. I would like more attention (such as by the Google search engine) drawn to this fact.


http://www.dragonflybsd.org/
http://www.shiningsilence.com/dbsdlog/archives/000004.html

Following this point is Matt Dillon's explanation of the Slab Allocator, taken right from the cvs entry:

SLAB ALLOCATOR FEATURES

The slab allocator breaks allocations up into approximately 80 zones based on their size. Each zone has a chunk size (alignment). For example, all allocations in the 1-8 byte range will allocate in chunks of 8 bytes. Each size zone is backed by one or more blocks of memory. The size of these blocks is fixed at ZoneSize, which is calculated at boot time to be between 32K and 128K. The use of a fixed block size allows us to locate the zone header given a memory pointer with a simple masking operation.

The slab allocator operates on a per-cpu basis. The cpu that allocates a zone block owns it. free() checks the cpu that owns the zone holding the memory pointer being freed and forwards the request to the appropriate cpu through an asynchronous IPI. This request is not currently optimized but it can theoretically be heavily optimized ('queued') to the point where the overhead becomes inconsequential. As of this commit the malloc_type information is not MP safe, but the core slab allocation and deallocation algorithms, non-inclusive the having to allocate the backing block, ARE MP safe. The core code requires no mutexes or locks, only a critical section.

Each zone contains N allocations of a fixed chunk size. For example, a 128K zone can hold approximately 16000 or so 8 byte allocations. The zone is initially zero'd and new allocations are simply allocated linearly out of the zone. When a chunk is freed it is entered into a linked list and the next allocation request will reuse it. The slab allocator heavily optimizes M_ZERO operations at both the page level and the chunk level.

The slab allocator maintains various undocumented malloc quirks such as ensuring that small power-of-2 allocations are aligned to their size, and malloc(0) requests are also allowed and return a non-NULL result. kern_tty.c depends heavily on the power-of-2 alignment feature and ahc depends on the malloc(0) feature. Eventually we may remove the malloc(0) feature.

Posted by zuzu on December 13, 2005 at 09:56 PM PST #

Post a Comment:
Comments are closed for this entry.
About

bonwick

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today