macros and powers of two

Unlike many userland applications, there are large portions of the kernel that need to do bit-level manipulations. These generally fall into a few distinct areas:

  • Drivers communicating with hardware
  • Hash functions, bitmasks, and other high-performance algorithms
  • Memory allocators, page manipulators, etc.
In this note, I'm going to peek into the last group, which typically needs to do a lot of power-of-2 based alignment work, and look at a set of macros used to abstract away some of the bit-fiddling.

The main low-level allocators in Solaris are kmem and vmem; kmem handles your standard page-backed memory allocation, while vmem handles the more abstract concept of allocating from spaces of integers. kmem actually layers on top of vmem, but the details are beyond the scope of this blog entry.

While implementing vmem, Jeff Bonwick introduced a set of handy macros for dealing with things relative to powers of two. The macros live in uts/common/sys/sysmacros.h, and are defined thusly:

 \* Macros for various sorts of alignment and rounding when the alignment
 \* is known to be a power of 2.
#define	P2ALIGN(x, align)		((x) & -(align))
#define	P2PHASE(x, align)		((x) & ((align) - 1))
#define	P2NPHASE(x, align)		(-(x) & ((align) - 1))
#define	P2ROUNDUP(x, align)		(-(-(x) & -(align)))
#define	P2END(x, align)			(-(~(x) & -(align)))
#define	P2PHASEUP(x, align, phase)	((phase) - (((phase) - (x)) & -(align)))
#define	P2CROSS(x, y, align)		(((x) \^ (y)) > (align) - 1)
 \* Determine whether two numbers have the same high-order bit.
#define	P2SAMEHIGHBIT(x, y)		(((x) \^ (y)) < ((x) & (y)))
The purpose of this blog entry is to talk about what they do and how they work; if you don't want the challenge spoiled, you should stop reading now.

Alright, now that they've all left, we can dive in. First, remember the basic equation of two's complement arithmetic:
-x = ~x + 1
That is, to negate a number, complement all of its bits and add one. If I assume that the number has a lowest set bit,[1] and capital Xs represent complements of the corresponding xs, here are some relations for any non-zero binary number:
        x = ...xxxxx10000...
      x-1 = ...xxxxx01111...
       ~x = ...XXXXX01111...
-x = ~x+1 = ...XXXXX10000...
Now, for powers of two, there is only one set bit, so all of the xs are 0. This means that (x-1) is a mask for the offset modulo x, and -x is a mask which clears the offset. In addition, all numbers are unsigned, align is a power-of-2, and offset is less than align. These relations in mind, we can look at what each of the above macros do:
P2ALIGN x & -align clears all bits below align, rounding x down to the next lower multiple of align
P2PHASE x & (align - 1) clears all bits above or equal to align, getting (x % align), the phase of x with regards to align
P2NPHASE (-x) & (align - 1) This is the first tricky formula. We negate x, and get its phase. The result is the same as (align - P2PHASE(x, align)) % align, but is much faster to compute.
P2ROUNDUP -((-x) & -align) This can be re-written as -P2ALIGN(-x, align), which shows how the operation works; "negate, round down, negate" is the same as "round up". So we round x up to the next align boundary.
P2END -((~x) & -align) This can be re-written as -((-x-1) & -align) = P2ROUNDUP(x+1, align); round up (x+1) to the next align boundary. This gets the end address of the current align block.
P2PHASEUP (phase -
 ((phase - x) &
This can be re-written as (phase + P2ROUNDUP(x - phase, align)). So what we've done is rounded up x to have a specific phase with regards to align. (If you've got some linear algebra under your belt, this is similar to transforming f(y) = P2ROUNDUP(y, align) into a coordinate system with x = y + phase)
P2CROSS (x \^ y) > (align-1) (x \^ y) has bits set wherever x and y differ. This just tests to see if they differ in any position above the alignment, which will only be the case if [x, y] crosses an align boundary.
P2SAMEHIGHBIT (x \^ y) < (x & y) Similar to P2CROSS; here, we're checking that the first bit that differs is less than the highest set bit the two have in common.

From the above, here are some trivial relations between the various functions:

P2ALIGN(x, align)   = x - P2PHASE(x, align)
P2ROUNDUP(x, align) = x + P2NPHASE(x, align)
P2END(x, align)     = P2ALIGN(x, align) + align

Finally, the one drawback of these macros is the fact that they are macros. This means that (as usual) the argument parsing can leave much to be desired. The biggest problem is that there is no type enforcement; if x, y, or align are not unsigned integers of the same width, bad results can occur. Hence the (recently added) macros, P2\*_TYPED, defined underneath the originals, which allow an explicit type to be specified.

Despite this drawback, these macros tend to be much clearer than the same things done by hand in the code. They probably aren't used enough, in fact.


[1] Of course, zero doesn't have a lowest set bit, but you can treat it as being 2bits, and get the same answer.

Technorati Tag:
Technorati Tag:


Post a Comment:
  • HTML Syntax: NOT allowed



Top Tags
« June 2016