 macros and powers of two

Unlike many userland applications, there are large portions of the
kernel that need to do bit-level manipulations. These generally fall
into a few distinct areas:

• Drivers communicating with hardware
• Hash functions, bitmasks, and other high-performance algorithms
• Memory allocators, page manipulators, etc.

In this note, I'm going to peek into the last group, which
typically needs to do a lot of power-of-2 based alignment work, and look
at a set of macros used to abstract away some of the bit-fiddling.

The main low-level allocators in Solaris are
kmem
and
vmem;
kmem handles your standard page-backed memory allocation, while
vmem handles the more abstract concept of allocating from spaces
of integers. kmem actually layers on top of vmem, but
the details are beyond the scope of this blog entry.

While implementing vmem,
Jeff Bonwick introduced a
set of handy macros for dealing with things relative to powers of two.
The macros live in uts/common/sys/sysmacros.h,
and are defined thusly:

/\*
\* Macros for various sorts of alignment and rounding when the alignment
\* is known to be a power of 2.
\*/
#define

P2ALIGN(x, align)

((x) & -(align))
#define

P2PHASE(x, align)

((x) & ((align) - 1))
#define

P2NPHASE(x, align)

(-(x) & ((align) - 1))
#define

P2ROUNDUP(x, align)

(-(-(x) & -(align)))
#define

P2END(x, align)

(-(~(x) & -(align)))
#define

P2PHASEUP(x, align, phase)

((phase) - (((phase) - (x)) & -(align)))
#define

P2CROSS(x, y, align)

(((x) \^ (y)) > (align) - 1)
/\*
\* Determine whether two numbers have the same high-order bit.
\*/
#define

P2SAMEHIGHBIT(x, y)

(((x) \^ (y)) < ((x) & (y)))

The purpose of this blog entry is to talk about what they do and how
they work; if you don't want the challenge spoiled, you should stop

Alright, now that they've all left, we can dive in. First, remember
the basic equation of two's complement arithmetic:
-x = ~x + 1

That is, to negate a number, complement all of its bits and add one.
If I assume that the number has a lowest set
bit, and capital
Xs represent complements of the corresponding xs, here
are some relations for any non-zero binary number:
x = ...xxxxx10000...
x-1 = ...xxxxx01111...
~x = ...XXXXX01111...
-x = ~x+1 = ...XXXXX10000...

Now, for powers of two, there is only one set bit, so all of the
xs are 0. This means that (x-1) is a mask for the offset
modulo x, and -x is a mask which clears the offset. In addition,
all numbers are unsigned, align is a power-of-2, and
offset is less than align. These relations in mind,
we can look at what each of the above macros do:

macroformulaaction
P2ALIGNx & -align
clears all bits below align, rounding x down to the
next lower multiple of align
P2PHASEx & (align - 1)
clears all bits above or equal to align, getting(x % align), the phase of x with regards toalign
P2NPHASE(-x) & (align - 1)
This is the first tricky formula. We negate x, and get
its phase. The result is the same as (align - P2PHASE(x,
align)) % align
, but is much faster to compute.
P2ROUNDUP-((-x) & -align)
This can be re-written as -P2ALIGN(-x, align), which shows how
the operation works; "negate, round down, negate" is the same as
"round up".
So we round x up to the next align boundary.
P2END-((~x) & -align)This can be re-written as-((-x-1) & -align) = P2ROUNDUP(x+1, align); round up(x+1) to the next align boundary. This gets
the end address of the current align block.
P2PHASEUP
(phase -

((phase - x) &

-align))

This can be re-written as (phase + P2ROUNDUP(x - phase, align)).
So what we've done is rounded up x to have a specificphase with regards to align. (If you've got
some linear algebra under your belt, this is similar to
transforming f(y) = P2ROUNDUP(y, align) into a
coordinate system with x = y + phase)
P2CROSS(x \^ y) > (align-1)(x \^ y) has bits set wherever x and y
differ. This just tests to see if they differ in any position
above the alignment, which will only be the case if [x, y]
crosses an align boundary.
P2SAMEHIGHBIT(x \^ y) < (x & y)
Similar to P2CROSS; here, we're checking that the first bit that
differs is less than the highest set bit the two have in common.

From the above, here are some trivial relations between the various
functions:

P2ALIGN(x, align)   = x - P2PHASE(x, align)
P2ROUNDUP(x, align) = x + P2NPHASE(x, align)
P2END(x, align) = P2ALIGN(x, align) + align

Finally, the one drawback of these macros is the fact that they
are macros. This means that (as usual) the argument parsing
can leave much to be desired. The biggest problem is that there is no
type enforcement; if x, y, or align are not
unsigned integers of the same width, bad results can occur. Hence the
(recently added) macros, P2\*_TYPED, defined
underneath
the originals, which allow an explicit type to be specified.

Despite this drawback, these macros tend to be much clearer than the same
things done by hand in the code. They probably aren't used enough, in fact.

Footnotes

 Of course, zero doesn't have a
lowest set bit, but you can treat it as being 2bits, and get