fixnums in the VM
By john.rose on Mar 12, 2008
Dynamic languages typically perform arithmetic using dynamically typed references to boxed numbers. Normally (at least in a JVM) a dynamically typed reference is a pointer to an object in the heap, whose header in turn contains the required dynamic type information. But language-specific implementations often use pseudo-pointers to represent a commonly used subset of numbers, with the actual bits of the pseudo-pointer carrying a value field “payload”. Also, because their relaxed typing encourages data structure reuse, dynamic languages typically feature a few small but ever-present types like Lisp’s cons cell which are overwhelmingly common in the heap. Again, language specific implementations have historically provided “headerless” representations for these, in which the fields are stored in the heap but are not accompanied by a header.
The JVM can support fixnums and other headerless objects for the sake of these languages, and even for Java. The idea is to make ordinary object pointers (sometimes called oops) coexist with more specialized formats for headerless objects, which we will call iops and xops. The techniques are mature and well-known, and the overheads (of extra tag checking before pointer usage) can be controlled by optimizations already used widely in JVMs. In particular, the Hotspot JVM could cleanly represent fixnums and other immediate data types by modifying extending its oop type.
In Hotspot, a typical mature JVM, an object header is two words, a mark and a klass. (That ‘k ’is not a typo, but a dodge to avoid a reserved keyword in C++.) The mark is used for bookkeeping of operations like synchronization and identity hash code. The klass is a full machine pointer, to a metadata object which describes the object’s layout, methods, and other type information. A dynamic typing operation must first load the klass and then query its contents. (The actual query logic may require one more dependent load, from the klass object.) This object layout is flexible and general, but it requires two machine words of overhead in every object, plus a possible word or so of alignment overhead, plus (of course) the payload of the object, i.e., the actual fields containing application information.
In many language-specific systems, integers of 20-30 bits or less are compressed into pseudo-pointers. Lisp and Ruby call them fixnums, while Smalltalk uses the name
SmallInteger. Some languages, like Emacs Lisp, simply call them “integer” and do not allow overflow. Most dynamic languages check for arithmetic overflow beyond the 20-30 bits, and either signal an error or provide a more or less seamless transition from fixnum to bignum, i.e., from a machine word subrange to a multiple precision array-based representation.
In the case of a Java
Integer on a 32-bit Hotspot JVM, the 32-bit payload (a
Integer.value field) is accompanied by a 96 additional bits, a mark, a klass, and a word of alignment padding, for a total of 128 bits. Moreover, if there are (say) six references to this integer in the world (threads plus heap), those references also occupy 192 bits, for a total of 320 bits. On a 64-bit machine, everything is twice as big, at least at present: 256 bits in the object (which now includes 96 bits of padding), and 384 bits elsewhere. By contrast, six copies of an unboxed primitive integer occupy 192 bits. The point of introducing fixnums is that the overhead of dynamically typed integers can be much closer to that minimum of 192 bits than it is now. (The number six is chosen arbitrarily to model a common use case, a number that has been boxed and then copied a few times by procedure calling or other computation.) Here is a summary of the sizes:
|Representation||Heap bits||Reference bits (N=6)||Total bits||Comparison|
|fixnum, 32-bit JVM||0||192||192||1.00|
|object, 32-bit JVM||128||192||320||1.67|
|fixnum, 64-bit JVM||0||384||384||2.00|
|object, 64-bit JVM||256||384||640||3.33|
When Java introduced autoboxing of primitive ints into
Integer objects, application codes began to perform implicit creation of integer objects, and the allocation rate went up. Of course it depends on the application, but codes which mix generic collections (like
List<) with primitive values routinely box and unbox frequently. The implicit boxing is done by the factory method
Integer.valueOf(int). This method has been carefully specified to allow memoization or uniquification techniques, so that more references can share fewer copies of the same value.
In the case of a widely shared heap object, such as
Integer.valueOf(13), the heap overhead is negligible, but there is still an execution overhead from loads of
klass internals, and
Integer.value fields, with associated costs of cache traffic and occupancy. With fixnums, dynamically typed arithmetic can proceed with no memory references at all.
Note that in a 32-bit system compromises must be made in representing the 32-bit value of an integer. Historically, fixnums represent integers in widths like 30, 28, or 24 bits, leaving a few additional bits available to distinguish the fixnum from ordinary object pointers and other kinds of pointer-like values. In a 32-bit JVM, some calls to
Integer.valueOf will not be able to return a fixnum; they will have to product ordinary objects. But in practice, no applications use all 32 bits of every integer; most integers are small values like array indices, and can be compressed into less than 32 bits. This is true (to a lesser extent) of longs, floats, and doubles. Smaller values like characters and booleans can be stored in full precision.
On a 64-bit JVM, there is so much “slack” in the pointer layout that the full 32 bit range of values can be represented in fixnum format. Even larger values like longs and doubles can fit into the pointer layout, with a little compression. But regardless of pointer size, some
Integer object will always have to be ordinary objects, such as in strange edge cases where reference equality is significant, or when (wrongheadedly) a code synchronizes on a
This reasoning can be applied, and the fixnum optimization constructed, for any JVM type with the following properties:
- The object contains a payload of a machine word or less.
- The payload is immutable (a final and/or private field, unmodified outside the constructor).
- If the payload is a machine word, it is usually compressible (e.g, via sign truncation).
- The type is heavily used, enough to justify the complexity of a headerless representation.
- The object is never used for synchronization, or can be created in a locked state.
The last requirement (no synchronization) is necessary, since synchronization is a kind of object state that can potentially live in an object’s header, regardless of the immutability of its payload. Since the type
Object itself supports synchronization, it is hard to avoid in the JVM. However, there are two observations here which can let us make progress:
- Objects which the user merely shares (and does not allocate) can behave as if someone else had locked them and will never unlock them.
- Types not already in the Java language do not need to support synchronization.
The first case covers
Integer.valueOf and the other factory methods used by autoboxing. The second case could cover closures, method handles, tuples, complex numbers, etc. It will probably be a good idea to standardize on a marker of some sort which disables the locking feature for a given class (and its subclasses). A marker interface
Unsynchronized is probably best, since synchronization is, for better or worse, a public operation.
Let’s call any pseudo-pointer format which carries its payload directly an immediate object pseudo-pointer, or iop. Below we will discuss more specific implementation tactics for iops.
Headerless pairs, tuples
A similar analysis can be made for small objects, objects for which the extra space occupied by an ordinary header is a significant overhead. fFor example, in some Lisp applications, the majority of the heap is occupied by small two-element tuples called cons cells, which Lisp uses to build lists and trees. Removing the JVM header from a cons cell can double the effectiveness of memory—and, more importantly, of cache.
(A further optimization called cdr coding could double memory density again, almost, by eliding next pointers if they immutably point to a following cons cell created at the same time. But this would require interior pointers on the JVM, a topic which must wait for another day.)
Although a headerless pair would be referred to via a pointer, it would not qualify as an ordinary object pointer, but rather as a specialized headerless pointer. Let’s call this an extraordinary object pointer, or xop. As with an iop, an xop must somehow be tagged to distinguish it from an ordinary object pointer. (Following Smalltalk tradition, Hotspot calls the latter an oop.) Along with the tagging, some means of extracting a klass value is needed. The klass could be bit-encoded into the reference as with an iop, especially on 64-bit systems. Since an xop refers to memory, the klass could also be stored in memory in such a way (e.g., a card header) that it is shared by several nearby objects of the same type. Or it could just be stored in an abbreviated object header (one-word, markless).
An extraordinary object pointer could be useful for any JVM type with the following properties:
- The object contains a payload of a few machine words.
- The payload need not be directly synchronized. (E.g., it is immutable, or does not need multiword transactions, or is single-threaded, or is locked by an external mutex.)
- The type is heavily used, enough to justify the complexity of a headerless representation.
- The object is never used for synchronization, or can be created in a locked state.
Types which might possibly benefit from this treatment include any immutable tuple type, method references or closures, Lisp cons cells, multipart numeric types like complex, or strings.
(Pronunciation Warning: In homage to Dr. Seuss, I intend to pronounce these neologisms monosyllabically as Yopps and Zopps.)
In JVMs where this makes sense, such as Hotspot, one can divide the machine word of an oop into an implementation-defined set of three fields, tag, klass, and value. The sizes of these are tunable, but on 32-bit systems the tag is generally 2-3 bits, the klass 0-5 bits, and the value 24-31 bits. On 64-bit machines, the klass may in fact be a restricted pointer value (31 bits or so), with klass and tag sharing one 32-bit subword and the value in the other. For jumbo payloads, the value could be expanded to almost 64 bits, at the expense of klass and tag.
Tag: On most machines, the tag is in the high bits. On machines that require address alignment, the tag could be the low bits, some of which are non-zero (the traditional “low-tag” representation), though this could impose an extra check on unaligned operations. In all cases, tag bits are a fixed combination that is distinct from all valid heap pointer values and in addition should (if at all possible) cause a trap of some sort if it is used. This will enable certain favorable compilation tricks, exactly like the ones that JVMs use today for null pointer decoding.
Klass: On 32-bit systems, the klass must be an index into a table of well-known classes. On 64-bit systems it can be a more general pointer. But the generality would still limited to classes with single immutable value fields, and factory methods like
Integer.valueOf! So the generality of a full klass pointer is only useful if it helps with performance.
The tag and klass fields can often be unified. For example, if the three low bits are used for an alignment-based tag, then a unified field (say, the low eight bits) can be used, where any klass value can be used as long as at least one of its three low bits is non-zero. Or, the tag field can be used to control the width of the klass field, allowing a tunable range of value widths.
Value: For integers and other values of 32 bits or less, the value field is obviously a clipped copy of what would ordinarily be stored in the
value field of the class, if the reference were a regular object. With the low-tag representation, the value can be extracted in a single instruction (signed right shift). For floats, the value can be zero-filled to the right.
A straw-man design
Though I do not have time for it at present, if I were to try out this design space today in Hotspot, I might start hacking with the following specific parameters:
- 32-bit JVM (more challenging than 64-bit, but a more immediately usable result)
- 28-bit long immediate or 20-bit short immediate value field, left justified
- 2-bit right-justified tag (supported directly by SPARC memory bus, requires explicit testing on Intel)
- 2-bit tag values of 1 or 3 are reserved for iops
- only iops have klass fields
- if 2-bit tag is 1, the klass field is 2 bits and value is 28 bits
- 28-bit long immediate value represents an int (later, perhaps long, float, and/or double)
- if 2-bit tag is 3, the klass field is 10 bits and value is 20 bits
- initial short immediate classes are void, byte, boolean, short, and char.
- xops to be attempted only after iops are working
- 2-bit tag of 2 reserved for xops
- a klass pointer at the front of each xops, at value+4, and the first field at value+8
- in a xop, the word at value+0 is part of the previous object; it overlays the mark field of any oop
Semantic features of fixnums and other iops
- The call
java.lang.Integer.valueOf((int)x)returns a fixnum if x is in the supported subrange.
Integermethods just work as normal on this alternate representation. Internally, the JVM codes them as a tiny if/then/else, controlled by a tag check.
- Ditto for all Object and Number methods (toString, etc.).
wait, etc.: All iops must behave (w.r.t. subsequent synchronization attempts) as if they were locked immediately upon creation, and are never unlocked.
- Explicit construction expressions
new Integer(x))must continue to create ordinary objects.
- Compiler escape analysis may sometimes be able to weaken explicitly allocated integers to iops.
- The identity hash code of any iop is obtained by extracting as many of its value and klass bits as will fit in the 31-bit result.
A number of other classes might merit the iop treatment. Here is a list ordered by roughly decreasing plausbility. Note that there are two alternate representations for strings; they could co-exist with some care.
- very short strings (value packs 0-3 7-bit bytes or one 20-bit extended char)
- well-known interned strings (value is index into global table of packed UTF8 sequences)
- some zero-length arrays or strings (note: this should combine the length check with a tag check)
Semantic features of cons cells, tuples, and other xops
Because of the interactions with object identity, extraordinary object pointers fall into two categories: Immutable and mutable.
In the immutable case, user-level semantics are simple, while the JVM maintains the fiction that object identity and structure equality are the same. We will use the complex number (a 2-tuple of doubles) as an example:
- The complex type is defined from the start as unsynchronized, so that the JVM is free to omit headers uniformly.
- The expression
new Complex(x,y)always produces a headerless object, as does the underlying
Objectmethods just work as normal on this headerless representation. Internally, the JVM codes them as a tiny if/then/else, controlled by a tag check.
monitorenter, etc.: Complex numbers, like all other xops and iops, must behave as if they are “locked at birth”.
acmpinstructions (Java’s reference equality comparisons) are illegal when applied to two complex numbers (or any two xops). Enforcing this requires extra bit testing, mitigated by some type analysis in the compiler. As an exception, the code of the complex class can use reference equality comparison without restriction on the self type. (That is, the operator acts like a protected method.)
- The identity hash code of complex number is computed by calling its
Objectmust be overridden by
- The JVM is free to clone and unclone equivalent complex number instances, in the GC and/or on the fly, without notifying the application.
Note that, under these rules and those of Java, objects produced by
Integer.valueOf can be represented as immutable xops when they do not fit in iops.
The semantics of mutable xops are somewhat less regular, and so the concept may not be as valuable. We will use the (mutable) cons cell as an example. Here are the points where they differ from their immutable cousins:
- The identity hash code of a mutable cons cell is -1. This lets identity-sensitive hash tables detect unsuitable keys quickly. (Or should
System.identityHashCodethrow an error?)
- As with regular oops, the system is not free to clone or unclone mutable cons cells.
- Reference equality works the same as with oops.
- Some types implemented with non-oops may be invisibly split by the JVM into ordinary and immediate or extraordinary formats. Additional type inference and tag testing is required wherever such splits can occur, or where super types like
Objectmay disguise the non-oops.
Integer.value(etc.) must first test the tag, and then issue either a load or a shift to obtain the value field.
- Bytecodes which access the object header must also test the tag. (These include
- Invocations of
Number) methods must also test the tag. Access to unrelated classes (e.g.,
String) will be unaffected.
- Compilers can use the usual combination of static analysis and profile-driven optimism to make the overheads go away.
- If bus alignment is being used to check low-tags, and a super class has a field of size N (e.g., a
Object.klassheader field of size 4), be sure that any possible non-oops have low bits that are misaligned modulo N, or else check the bits explicitly. This could creep up, say with an attempted memory access to the
Byte.valuefield via a iop that happened to look like an address.
We have examined low-level techniques for headerless objects, and investigated how to integrate them into the JVM and even (in some cases) the Java language. The techniques are on a sliding scale from the straightforward (low tag bit for immediate
Integer only) to the subtle (compile-time and profile-driven type estimation). Implementing even some of this in an existing JVM is likely to be an adventure on the order of, say, compressing oops. A language implementor might prefer, at first, to try these techniques in the closed environment of a from-scratch, specialized VM. But it is probably more profitable in the end to fit this these techniques into an optimized, mature JIT and GC, such as Hotspot... Especially now that Hotspot is part of the Open JDK, and even has a Multi-Language VM subproject specifically for these explorations!
Footnote  ...certain favorable compilation tricks... What compilation tricks are those, you ask? Well, most pointer indirection instructions in a JVM will never provoke null exceptions. Yet the JVM is obligated to always defend against possible nulls. Some instructions can statically be proven never to see null, while many others cannot be so proven. The latter are compiled optimistically, so that if a null ever does come through, the instruction will cause some sort of hardware-level trap. In that case the null check comes for free with the CPU address decode logic. The recovery from a failure of such a check is very complex and painful, but complete; the result is that the newly failing instruction is emulated somehow in the interpreter or by other cleanup code. It is OK for that recovery to be very expensive, especially if the JVM recompiles the offending instruction to be guarded by an explicit null check. The JVM keeps a historical profile about exceptional events, which the compiler consults; it optimistically uses an implicit null check when the profile looks clear, and the system backs off to a slightly more expensive explicit check when that fails. The same thing can be done with tag bits, if they look ugly enough to make the CPU throw a trap.
Bug Reference: 6674617, dynamic languages need more efficient autoboxing