value types in the vm

value types in the vm

Or, enduring values for a changing world.

Introduction

A value type is a data type which, generally speaking, is designed for being passed by value in and out of methods, and stored by value in data structures. The only value types which the Java language directly supports are the eight primitive types. Java indirectly and approximately supports value types, if they are implemented in terms of classes. For example, both Integer and String may be viewed as value types, especially if their usage is restricted to avoid operations appropriate to Object. In this note, we propose a definition of value types in terms of a design pattern for Java classes, accompanied by a set of usage restrictions. We also sketch the relation of such value types to tuple types (which are a JVM-level notion), and point out JVM optimizations that can apply to value types.

This note is a thought experiment to extend the JVM’s performance model in support of value types. The demonstration has two phases. Initially the extension can simply use design patterns, within the current bytecode architecture, and in today’s Java language. But if the performance model is to be realized in practice, it will probably require new JVM bytecode features, changes to the Java language, or both. We will look at a few possibilities for these new features.

[Posted 3/21/2012, updated as marked 3/24/2012 and 3/05/2014.]

An Axiom of Value

In the context of the JVM, a value type is a data type equipped with construction, assignment, and equality operations, and a set of typed components, such that, whenever two variables of the value type produce equal corresponding values for their components, the values of the two variables cannot be distinguished by any JVM operation.

Here are some corollaries: A value type is immutable, since otherwise a copy could be constructed and the original could be modified in one of its components, allowing the copies to be distinguished. Changing the component of a value type requires construction of a new value. The equals and hashCode operations are strictly component-wise. If a value type is represented by a JVM reference, that reference cannot be successfully synchronized on, and cannot be usefully compared for reference equality.

A value type can be viewed in terms of what it doesn’t do. We can say that a value type omits all value-unsafe operations, which could violate the constraints on value types. These operations, which are ordinarily allowed for Java object types, are pointer equality comparison (the acmp instruction), synchronization (the monitor instructions), all the wait and notify methods of class Object, and non-trivial finalize methods. The clone method is also value-unsafe, although for value types it could be treated as the identity function. Finally, and most importantly, any side effect on an object (however visible) also counts as an value-unsafe operation.

A value type may have methods, but such methods must not change the components of the value. It is reasonable and useful to define methods like toString, equals, and hashCode on value types, and also methods which are specifically valuable to users of the value type.

Representations of Value

Value types have two natural representations in the JVM, unboxed and boxed. An unboxed value consists of the components, as simple variables. For example, the complex number x=(1+2i), in rectangular coordinate form, may be represented in unboxed form by the following pair of variables:

/*Complex x = Complex.valueOf(1.0, 2.0):*/
double x_re = 1.0, x_im = 2.0;

These variables might be locals, parameters, or fields. Their association as components of a single value is not defined to the JVM. Here is a sample computation which computes the norm of the difference between two complex numbers:

double distance(/*Complex x:*/ double x_re, double x_im,
        /*Complex y:*/ double y_re, double y_im) {
    /*Complex z = x.minus(y):*/
    double z_re = x_re - y_re, z_im = x_im - y_im;
    /*return z.abs():*/
    return Math.sqrt(z_re*z_re + z_im*z_im);
}

A boxed representation groups component values under a single object reference. The reference is to a ‘wrapper class’ that carries the component values in its fields.

(A primitive type can naturally be equated with a trivial value type with just one component of that type. In that view, the wrapper class Integer can serve as a boxed representation of value type int.)

The unboxed representation of complex numbers is practical for many uses, but it fails to cover several major use cases: return values, array elements, and generic APIs. The two components of a complex number cannot be directly returned from a Java function, since Java does not support multiple return values. The same story applies to array elements: Java has no ‘array of structs’ feature. (Double-length arrays are a possible workaround for complex numbers, but not for value types with heterogeneous components.) By generic APIs I mean both those which use generic types, like Arrays.asList and those which have special case support for primitive types, like String.valueOf and PrintStream.println. Those APIs do not support unboxed values, and offer some problems to boxed values. Any ‘real’ JVM type should have a story for returns, arrays, and API interoperability.

The basic problem here is that value types fall between primitive types and object types. Value types are clearly more complex than primitive types, and object types are slightly too complicated. Objects are a little bit dangerous to use as value carriers, since object references can be compared for pointer equality, and can be synchronized on. Also, as many Java programmers have observed, there is often a performance cost to using wrapper objects, even on modern JVMs.

Even so, wrapper classes are a good starting point for talking about value types. If there were a set of structural rules and restrictions which would prevent value-unsafe operations on value types, wrapper classes would provide a good notation for defining value types. This note attempts to define such rules and restrictions.

Let’s Start Coding

Now it is time to look at some real code. Here is a definition, written in Java, of a complex number value type.

@ValueSafe
public final class Complex implements java.io.Serializable {
    // immutable component structure:
    public final double re, im;
    private Complex(double re, double im) {
        this.re = re; this.im = im;
    }
    // interoperability methods:
    public String toString() { return "Complex("+re+","+im+")"; }
    public List<Double> asList() { return Arrays.asList(re, im); }
    public boolean equals(Complex c) {
        return re == c.re &amp;&amp; im == c.im;
    }
    public boolean equals(@ValueSafe Object x) {
        return x instanceof Complex &amp;&amp; equals((Complex) x);
    }
    public int hashCode() {
        return 31*Double.valueOf(re).hashCode()
                + Double.valueOf(im).hashCode();
    }
    // factory methods:
    public static Complex valueOf(double re, double im) {
        return new Complex(re, im);
    }
    public Complex changeRe(double re2) { return valueOf(re2, im); }
    public Complex changeIm(double im2) { return valueOf(re, im2); }
    public static Complex cast(@ValueSafe Object x) {
        return x == null ? ZERO : (Complex) x;
    }
    // utility methods and constants:
    public Complex plus(Complex c)  { return new Complex(re+c.re, im+c.im); }
    public Complex minus(Complex c) { return new Complex(re-c.re, im-c.im); }
    public double abs() { return Math.sqrt(re*re + im*im); }
    public static final Complex PI = valueOf(Math.PI, 0.0);
    public static final Complex ZERO = valueOf(0.0, 0.0);
}

This is not a minimal definition, because it includes some utility methods and other optional parts. The essential elements are as follows:

  1. The class is marked as a value type with an annotation.
  2. The class is final, because it does not make sense to create subclasses of value types.
  3. The fields of the class are all final. (I.e., the type is immutable.)
  4. From the supertype Object, all public non-final methods are overridden.
  5. The constructor is private.

Beyond these bare essentials, we can observe the following features in this example, which are likely to be typical of all value types:

  1. One or more factory methods are responsible for value creation, including a component-wise valueOf method.
  2. There are utility methods for complex arithmetic and instance creation, such as plus and changeIm.
  3. There are static utility constants, such as PI.
  4. The type is serializable, using the default mechanisms.
  5. There are methods for converting to and from dynamically typed references, such as asList and cast.

The Rules

In order to use value types properly, the programmer must avoid value-unsafe operations. A helpful Java compiler should issue errors (or at least warnings) for code which provably applies value-unsafe operations, and should issue warnings for code which might be correct but does not provably avoid value-unsafe operations. No such compilers exist today, but to simplify our account here, we will pretend that they do exist.

A value-safe type is any class, interface, or type parameter marked with the @ValueSafe annotation, or any subtype of a value-safe type. If a value-safe class is marked final, it is in fact a value type. All other value-safe classes must be abstract. The non-static fields of a value class must be final, and all its constructors must be private.

Under the above rules, a standard interface could be helpful to define value types like Complex. Here is an example:

@ValueSafe
public interface ValueType extends java.io.Serializable {
    // All methods listed here must get redefined.
    // Definitions must be value-safe, which means
    // they may depend on component values only.
    List<? extends Object> asList();
    int hashCode();
    boolean equals(@ValueSafe Object c);
    String toString();
}

//@ValueSafe inherited from supertype:
public final class Complex implements ValueType { ...

The main advantage of such a conventional interface is that (unlike an annotation) it is reified in the runtime type system. It could appear as an element type or parameter bound, for facilities which are designed to work on value types only. More broadly, it might assist the JVM to perform dynamic enforcement of the rules for value types.

Besides types, the annotation @ValueSafe can mark fields, parameters, local variables, and methods. (This is redundant when the type is also value-safe, but may be useful when the type is Object or another supertype of a value type.) Working forward from these annotations, an expression E is defined as value-safe if it satisfies one or more of the following:

  1. The type of E is a value-safe type.
  2. E names a field, parameter, or local variable whose declaration is marked @ValueSafe.
  3. E is a call to a method whose declaration is marked @ValueSafe.
  4. E is an assignment to a value-safe variable, field reference, or array reference.
  5. E is a cast to a value-safe type from a value-safe expression. ([Added 3/24/2012:] The issue of null pollution is discussed below.)
  6. E is a conditional expression E0 ? E1 : E2, and both E1 and E2 are value-safe.

Assignments to value-safe expressions and initializations of value-safe names must take their values from value-safe expressions.

[Added 3/24/2012:] A cast from an arbitrary value to a value-safe type is almost value-safe, except for the possibility of a null operand, which will cause a null pointer exception whenever the value is converted to an unboxed representation.

A value-safe expression may not be the subject of a value-unsafe operation. In particular, it cannot be synchronized on, nor can it be compared with the “==” operator, not even with a null or with another value-safe type.

In a program where all of these rules are followed, no value-type value will be subject to a value-unsafe operation. Thus, the prime axiom of value types will be satisfied, that no two value type will be distinguishable as long as their component values are equal.

More Code

To illustrate these rules, here are some usage examples for Complex:

Complex pi = Complex.valueOf(Math.PI, 0);
Complex zero = pi.changeRe(0);  //zero = pi; zero.re = 0;
ValueType vtype = pi;
@SuppressWarnings("value-unsafe")
  Object obj = pi;
@ValueSafe Object obj2 = pi;
obj2 = new Object();  // ok
List<Complex> clist = new ArrayList<Complex>();
clist.add(pi);  // (ok assuming List.add param is @ValueSafe)
List<ValueType> vlist = new ArrayList<ValueType>();
vlist.add(pi);  // (ok)
List<Object> olist = new ArrayList<Object>();
olist.add(pi);  // warning: "value-unsafe"
boolean z = pi.equals(zero);
boolean z1 = (pi == zero);  // error: reference comparison on value type
boolean z2 = (pi == null);  // error: reference comparison on value type
boolean z3 = (pi == obj2);  // error: reference comparison on value type
synchronized (pi) { }  // error: synch of value, unpredictable result
synchronized (obj2) { }  // unpredictable result
Complex qq = pi;
qq = null;  // possible NPE; warning: "null-unsafe"
qq = (Complex) obj;  // warning: "null-unsafe"
qq = Complex.cast(obj);  // OK
@SuppressWarnings("null-unsafe")
  Complex empty = null;  // possible NPE
qq = empty;  // possible NPE (null pollution)

The Payoffs

It follows from this that either the JVM or the java compiler can replace boxed value-type values with unboxed ones, without affecting normal computations. Fields and variables of value types can be split into their unboxed components. Non-static methods on value types can be transformed into static methods which take the components as value parameters.

Some common questions arise around this point in any discussion of value types. Why burden the programmer with all these extra rules? Why not detect programs automagically and perform unboxing transparently? The answer is that it is easy to break the rules accidently unless they are agreed to by the programmer and enforced. Automatic unboxing optimizations are tantalizing but (so far) unreachable ideal. In the current state of the art, it is possible exhibit benchmarks in which automatic unboxing provides the desired effects, but it is not possible to provide a JVM with a performance model that assures the programmer when unboxing will occur. This is why I’m writing this note, to enlist help from, and provide assurances to, the programmer. Basically, I’m shooting for a good set of user-supplied “pragmas” to frame the desired optimization.

Again, the important thing is that the unboxing must be done reliably, or else programmers will have no reason to work with the extra complexity of the value-safety rules. There must be a reasonably stable performance model, wherein using a value type has approximately the same performance characteristics as writing the unboxed components as separate Java variables.

There are some rough corners to the present scheme. Since Java fields and array elements are initialized to null, value-type computations which incorporate uninitialized variables can produce null pointer exceptions. One workaround for this is to require such variables to be null-tested, and the result replaced with a suitable all-zero value of the value type. That is what the “cast” method does above.

[Added:] The introduction of nulls into value-safe variables and expression can be called null pollution, by analogy with the concept of heap pollution caused by unchecked generic type conversions. A polluting null value may propagate through value-safe variables and expressions, until someone unboxes it into a value type, at which point it will cause a null pointer exception. For this reason alone, a cast from a non-value-safe expression is not value-safe; all non-null casts are useful and safe. It might be reasonable, at some point, for a cast to a value-safe type to deal specially with nulls, either forcing a NPE, or substituting a box of zero values (if the cast is to a concrete type). I think experimentation is needed to decide the right answer here.

Generically typed APIs like List<T> will continue to manipulate boxed values always, at least until we figure out how to do reification of generic type instances. Use of such APIs will elicit warnings until their type parameters (and/or relevant members) are annotated or typed as value-safe. Retrofitting List<T> is likely to expose flaws in the present scheme, which we will need to engineer around. Here are a couple of first approaches:

public interface java.util.List<@ValueSafe T> extends Collection<T> { ...
public interface java.util.List<T extends Object|ValueType> extends Collection<T> { ...

(The second approach would require disjunctive types, in which value-safety is “contagious” from the constituent types.)

With more transformations, the return value types of methods can also be unboxed. This may require significant bytecode-level transformations, and would work best in the presence of a bytecode representation for multiple value groups, which I have proposed elsewhere under the title “Tuples in the VM”.

But for starters, the JVM can apply this transformation under the covers, to internally compiled methods. This would give a way to express multiple return values and structured return values, which is a significant pain-point for Java programmers, especially those who work with low-level structure types favored by modern vector and graphics processors. The lack of multiple return values has a strong distorting effect on many Java APIs.

Even if the JVM fails to unbox a value, there is still potential benefit to the value type. Clustered computing systems something have copy operations (serialization or something similar) which apply implicitly to command operands. When copying JVM objects, it is extremely helpful to know when an object’s identity is important or not. If an object reference is a copied operand, the system may have to create a proxy handle which points back to the original object, so that side effects are visible. Proxies must be managed carefully, and this can be expensive. On the other hand, value types are exactly those types which a JVM can “copy and forget” with no downside.

Array types are crucial to bulk data interfaces. (As data sizes and rates increase, bulk data becomes more important than scalar data, so arrays are definitely accompanying us into the future of computing.) Value types are very helpful for adding structure to bulk data, so a successful value type mechanism will make it easier for us to express richer forms of bulk data.

Unboxing arrays (i.e., arrays containing unboxed values) will provide better cache and memory density, and more direct data movement within clustered or heterogeneous computing systems. They require the deepest transformations, relative to today’s JVM. There is an impedance mismatch between value-type arrays and Java’s covariant array typing, so compromises will need to be struck with existing Java semantics. It is probably worth the effort, since arrays of unboxed value types are inherently more memory-efficient than standard Java arrays, which rely on dependent pointer chains.

It may be sufficient to extend the “value-safe” concept to array declarations, and allow low-level transformations to change value-safe array declarations from the standard boxed form into an unboxed tuple-based form. Such value-safe arrays would not be convertible to Object[] arrays. Certain connection points, such as Arrays.copyOf and System.arraycopy might need additional input/output combinations, to allow smooth conversion between arrays with boxed and unboxed elements.

Alternatively, the correct solution may have to wait until we have enough reification of generic types, and enough operator overloading, to enable an overhaul of Java arrays.

Implicit Method Definitions

The example of class Complex above may be unattractively complex. I believe most or all of the elements of the example class are required by the logic of value types. If this is true, a programmer who writes a value type will have to write lots of error-prone boilerplate code. On the other hand, I think nearly all of the code (except for the domain-specific parts like plus and minus) can be implicitly generated.

Java has a rule for implicitly defining a class’s constructor, if no it defines no constructors explicitly. Likewise, there are rules for providing default access modifiers for interface members. Because of the highly regular structure of value types, it might be reasonable to perform similar implicit transformations on value types. Here’s an example of a “highly implicit” definition of a complex number type:

public class Complex implements ValueType {  // implicitly final
    public double re, im;  // implicitly public final
    //implicit methods are defined elementwise from te fields:
    //  toString, asList, equals(2), hashCode, valueOf, cast
    //optionally, explicit methods (plus, abs, etc.) would go here
}

In other words, with the right defaults, a simple value type definition can be a one-liner. The observant reader will have noticed the similarities (and suitable differences) between the explicit methods above and the corresponding methods for List<T>.

Another way to abbreviate such a class would be to make an annotation the primary trigger of the functionality, and to add the interface(s) implicitly:

public @ValueType class Complex { ... // implicitly final, implements ValueType

(But to me it seems better to communicate the “magic” via an interface, even if it is rooted in an annotation.)

Implicitly Defined Value Types

So far we have been working with nominal value types, which is to say that the sequence of typed components is associated with a name and additional methods that convey the intention of the programmer. A simple ordered pair of floating point numbers can be variously interpreted as (to name a few possibilities) a rectangular or polar complex number or Cartesian point. The name and the methods convey the intended meaning.

But what if we need a truly simple ordered pair of floating point numbers, without any further conceptual baggage? Perhaps we are writing a method (like “divideAndRemainder”) which naturally returns a pair of numbers instead of a single number. Wrapping the pair of numbers in a nominal type (like “QuotientAndRemainder”) makes as little sense as wrapping a single return value in a nominal type (like “Quotient”). What we need here are structural value types commonly known as tuples.

For the present discussion, let us assign a conventional, JVM-friendly name to tuples, roughly as follows:

public class java.lang.tuple.$DD extends java.lang.tuple.Tuple { 
    double $1, $2;
}

Here the component names are fixed and all the required methods are defined implicitly. The supertype is an abstract class which has suitable shared declarations. The name itself mentions a JVM-style method parameter descriptor, which may be “cracked” to determine the number and types of the component fields.

The odd thing about such a tuple type (and structural types in general) is it must be instantiated lazily, in response to linkage requests from one or more classes that need it. The JVM and/or its class loaders must be prepared to spin a tuple type on demand, given a simple name reference, $xyz, where the xyz is cracked into a series of component types. (Specifics of naming and name mangling need some tasteful engineering.)

Tuples also seem to demand, even more than nominal types, some support from the language. (This is probably because notations for non-nominal types work best as combinations of punctuation and type names, rather than named constructors like Function3 or Tuple2.) At a minimum, languages with tuples usually (I think) have some sort of simple bracket notation for creating tuples, and a corresponding pattern-matching syntax (or “destructuring bind”) for taking tuples apart, at least when they are parameter lists. Designing such a syntax is no simple thing, because it ought to play well with nominal value types, and also with pre-existing Java features, such as method parameter lists, implicit conversions, generic types, and reflection. That is a task for another day.

Other Use Cases

Besides complex numbers and simple tuples there are many use cases for value types. Many tuple-like types have natural value-type representations. These include rational numbers, point locations and pixel colors, and various kinds of dates and addresses.

Other types have a variable-length ‘tail’ of internal values. The most common example of this is String, which is (mathematically) a sequence of UTF-16 character values. Similarly, bit vectors, multiple-precision numbers, and polynomials are composed of sequences of values. Such types include, in their representation, a reference to a variable-sized data structure (often an array) which (somehow) represents the sequence of values. The value type may also include ‘header’ information.

Variable-sized values often have a length distribution which favors short lengths. In that case, the design of the value type can make the first few values in the sequence be direct ‘header’ fields of the value type. In the common case where the header is enough to represent the whole value, the tail can be a shared null value, or even just a null reference. Note that the tail need not be an immutable object, as long as the header type encapsulates it well enough. This is the case with String, where the tail is a mutable (but never mutated) character array.

Field types and their order must be a globally visible part of the API. The structure of the value type must be transparent enough to have a globally consistent unboxed representation, so that all callers and callees agree about the type and order of components that appear as parameters, return types, and array elements. This is a trade-off between efficiency and encapsulation, which is forced on us when we remove an indirection enjoyed by boxed representations. A JVM-only transformation would not care about such visibility, but a bytecode transformation would need to take care that (say) the components of complex numbers would not get swapped after a redefinition of Complex and a partial recompile. Perhaps constant pool references to value types need to declare the field order as assumed by each API user.

This brings up the delicate status of private fields in a value type. It must always be possible to load, store, and copy value types as coordinated groups, and the JVM performs those movements by moving individual scalar values between locals and stack. If a component field is not public, what is to prevent hostile code from plucking it out of the tuple using a rogue aload or astore instruction? Nothing but the verifier, so we may need to give it more smarts, so that it treats value types as inseparable groups of stack slots or locals (something like long or double).

My initial thought was to make the fields always public, which would make the security problem moot. But public is not always the right answer; consider the case of String, where the underlying mutable character array must be encapsulated to prevent security holes. I believe we can win back both sides of the tradeoff, by training the verifier never to split up the components in an unboxed value. Just as the verifier encapsulates the two halves of a 64-bit primitive, it can encapsulate the the header and body of an unboxed String, so that no code other than that of class String itself can take apart the values.

Similar to String, we could build an efficient multi-precision decimal type along these lines:

public final class DecimalValue extends ValueType {
    private final long header;
    private final BigInteger digits;
    public DecimalValue valueOf(int value, int scale) {
        assert(scale >= 0);
        return new DecimalValue(((long)value << 32) + scale, null);
    }
    public DecimalValue valueOf(long value, int scale) {
        if (value == (int) value)
            return valueOf((int)value, scale);
        return new DecimalValue(-scale, new BigInteger(value));
    }
}

Values of this type would be passed between methods as two machine words. Small values (those with a significand which fits into 32 bits) would be represented without any heap data at all, unless the DecimalValue itself were boxed.

(Note the tension between encapsulation and unboxing in this case. It would be better if the header and digits fields were private, but depending on where the unboxing information must “leak”, it is probably safer to make a public revelation of the internal structure.)

Note that, although an array of Complex can be faked with a double-length array of double, there is no easy way to fake an array of unboxed DecimalValues. (Either an array of boxed values or a transposed pair of homogeneous arrays would be reasonable fallbacks, in a current JVM.) Getting the full benefit of unboxing and arrays will require some new JVM magic.

Although the JVM emphasizes portability, system dependent code will benefit from using machine-level types larger than 64 bits. For example, the back end of a linear algebra package might benefit from value types like Float4 which map to stock vector types. This is probably only worthwhile if the unboxing arrays can be packed with such values.

More Daydreams

A more finely-divided design for dynamic enforcement of value safety could feature separate marker interfaces for each invariant. An empty marker interface Unsynchronizable could cause suitable exceptions for monitor instructions on objects in marked classes. More radically, a Interchangeable marker interface could cause JVM primitives that are sensitive to object identity to raise exceptions; the strangest result would be that the acmp instruction would have to be specified as raising an exception.

@ValueSafe
public interface ValueType extends java.io.Serializable,
        Unsynchronizable, Interchangeable { ...
public class Complex implements ValueType {
    // inherits Serializable, Unsynchronizable, Interchangeable, @ValueSafe
    ...

It seems possible that Integer and the other wrapper types could be retro-fitted as value-safe types. This is a major change, since wrapper objects would be unsynchronizable and their references interchangeable. It is likely that code which violates value-safety for wrapper types exists but is uncommon. It is less plausible to retro-fit String, since the prominent operation String.intern is often used with value-unsafe code.

We should also reconsider the distinction between boxed and unboxed values in code. The design presented above obscures that distinction. As another thought experiment, we could imagine making a first class distinction in the type system between boxed and unboxed representations. Since only primitive types are named with a lower-case initial letter, we could define that the capitalized version of a value type name always refers to the boxed representation, while the initial lower-case variant always refers to boxed. For example:

complex pi = complex.valueOf(Math.PI, 0);
Complex boxPi = pi;  // convert to boxed
myList.add(boxPi);
complex z = myList.get(0);  // unbox

Such a convention could perhaps absorb the current difference between int and Integer, double and Double. It might also allow the programmer to express a helpful distinction among array types.

As said above, array types are crucial to bulk data interfaces, but are limited in the JVM. Extending arrays beyond the present limitations is worth thinking about; for example, the Maxine JVM implementation has a hybrid object/array type. Something like this which can also accommodate value type components seems worthwhile. On the other hand, does it make sense for value types to contain short arrays? And why should random-access arrays be the end of our design process, when bulk data is often sequentially accessed, and it might make sense to have heterogeneous streams of data as the natural “jumbo” data structure. These considerations must wait for another day and another note.

More Work

It seems to me that a good sequence for introducing such value types would be as follows:

  1. Add the value-safety restrictions to an experimental version of javac.
  2. Code some sample applications with value types, including Complex and DecimalValue.
  3. Create an experimental JVM which internally unboxes value types but does not require new bytecodes to do so. Ensure the feasibility of the performance model for the sample applications.
  4. Add tuple-like bytecodes (with or without generic type reification) to a major revision of the JVM, and teach the Java compiler to switch in the new bytecodes without code changes.

A staggered roll-out like this would decouple language changes from bytecode changes, which is always a convenient thing.

A similar investigation should be applied (concurrently) to array types. In this case, it seems to me that the starting point is in the JVM:

  1. Add an experimental unboxing array data structure to a production JVM, perhaps along the lines of Maxine hybrids. No bytecode or language support is required at first; everything can be done with encapsulated unsafe operations and/or method handles.
  2. Create an experimental JVM which internally unboxes value types but does not require new bytecodes to do so. Ensure the feasibility of the performance model for the sample applications.
  3. Add tuple-like bytecodes (with or without generic type reification) to a major revision of the JVM, and teach the Java compiler to switch in the new bytecodes without code changes.

That’s enough musing me for now. Back to work!

[Added:] Actually, here’s a little more context for this note, with some acknowledgements: The need for immutable objects has been known to Java designers from the beginning, as can be seen from the design of the String class, and the early introduction of blank finals (in which I participated, back in the day). The numeric community (remember Java Grande?) has kept Java’s designers aware from Day One of the need for tuple-like or struct-like types, starting with complex numbers, and for the need for more flexible control of array layout. It has been a perennial puzzle, to us Java folk, how to reconcile the highly reference-like nature of Java values with the use cases for value types.

This note presents the results of my own puzzling on this topic for (yes) 15 years, along with some proposals for finishing the puzzle. This work on this has been informed by the thoughts of the original authors of Java, as well as very old language designs (think APL in the ’70s and function programming work in the ’80s), as well as current language designs (think C# and Scala). This note is also partial fulfillment of a promise to many colleagues over the years to propose a suitable solution. As a compiler (JIT) geek, I have seen many hopefully “automagic” scalarization or unboxing optimizations strain and fail to achieve their promise. I have concluded that user advice is necessary. (This is also, I think, the tacit conclusion of decades of research into numeric loop auto-parallelization, but I’m not an expert in that.)

This topic, viewed historically, is so huge that I have not even tried to adequately link all the relevant web pages. It is easy to Google them up by the hundreds. However, I would like to acknowledge my discussions with the JavaFX graphics library designers, including Per Bothner, who detailed to me their frustrations in programming in Java to modern GPUs—a series of brick wall encounters which are a telling repeat of the frustrations of the numerics community. I also appreciate the eloquent plea for JVM value types by Ben Hutchison, almost four years ago.

So, who wants to hack up an explicitly unboxing javac and/or JVM? I offer the mlvm repository as a home for the work.

[Added 2/05/2014:] Martin Buchholz recently dug up an set of design notes by James Gosling from the aforesaid 15 years ago. It is no accident that they have a resemblance to the current note.

Watch the OpenJDK project, including the JEP page for more information on value objects and other initiatives.

[Added 2/07/2014:] Sam Umbach pointed out some inconsistencies in the use of the private access modifier; thanks Sam! Some of them came from the fact that I couldn’t decide (while writing the post) whether value type fields should always be public. On the “pro” side, the value would be structurally transparent, a mainly esthetic advantage. On the “con” side, the value would be insecure. Result: Values should be allowed to have private fields, and the amended post reflects this everywhere.

I also fixed some broken links, due to my misuse of Markdown. (And Markdown and Smartypants are lovely things.)

Comments:

Do you know where I can study the JVM memory model?

Posted by Leonardo on March 21, 2012 at 09:17 PM PDT #

Sounds interesting. Isn't at least part of it what JSR 308 also works on?
The Complex and Decimal types would be extremely beneficial to other JSRs like 354, which are unhappy with what current numeric types, especially BigDecimal have to offer.

Posted by Werner Keil on March 22, 2012 at 06:18 AM PDT #

This is great stuff. Keep doing it!

I assume enums would be value-safe types by default? Don't enums provide a good example of what can be created implicitly based on definition?

As a Clojure developer, I can see many places where value types would be useful (almost everywhere in fact :). I can imagine the implementation of Clojure would significantly benefit from the existence of value types. I see some interesting comparisons between the unboxed values and the work occurring in the Clojure 1.4 tagged reader literals as well.

Posted by Alex Miller on March 22, 2012 at 06:59 AM PDT #

Hello,

Very interesting post. I do have some questions:
- Do you have plans to support some kind of value types for Java 8, or is it (for the moment) an experiment?
- What's the main differences between what you propose and value types in C#? It seems to me that Value Types here would be much more powerful than their C# equivalent. Am I right? (To be honest, I don't like C# very much)

Posted by Hervé on March 22, 2012 at 06:04 PM PDT #

Did you have a look at the stuff Scala is doing at the moment?

Considering that most of the features already exist at the language level in Scala imho it would make sense to compare and verify if the proposed changes to the JVM work out the way you expect/hope.

Something different: Did you consider retrofitting primitives as Value Types from the Java language POV and make it usable that way in Generics?

Posted by Simon on March 23, 2012 at 04:06 AM PDT #

I see the benefit of value types.
But please be careful. Don't turn Java into another Scala with strange implicit magic, immutability inquisitors, functional preachers and the like...
Keep it explicit - keep it simple. I'm all for magic inside the JVM but don't twist the language and the compiler beyond recognition.

Posted by azivarka on March 23, 2012 at 03:25 PM PDT #

Have you seen http://docs.scala-lang.org/sips/pending/value-classes.html ? It's the Scala take on a similar idea; should allow unboxed value types and implicit wrappers without extra allocation.

Posted by Alex Cruise on March 24, 2012 at 02:59 AM PDT #

@Leonardo: Google is your friend.

@Werner: My proposal does not require 308, but is much better with 308. Inference of @NonNull would help address null pollution, and type parameter annotation would probably help also. Actually, it's even better with expression annotations.

@Alex: Enums do not need to be value-safe, since there is a finite set of them. But, yes, enums provide me some "political cover" for pushing the boundaries on implicit definitions in Java. Regarding Clojure, yes: I am formulating this stuff in terms of Java because it is the JVM system programming language, but I expect non-Java languages will get the best use out of it.

@Hervé: This blog does not make schedule announcements. Regarding C#, it's hard to say, especially since I haven't figured out the best way to express unboxing arrays. (Although I've been thinking about that almost as long.)

@Simon, Alex: Synchronicity! Very cool. Scala is an honored member of the JVM family, and I always hope that the Scala folks will sanity-check my JVM proposals. About retrofitting primitives, yes, there is some brief discussion of that in the Daydreams section. It might be possible.

@Azivarka: Don't worry; Java will never be Scala. There is room in the ecosystem for the luna moth and the lightning bug, the dragon fish and the carp. On the other hand, if you thought Java generics were an unwelcome intrusion, you won't like value types either; it's the same play.

Posted by John Rose on March 24, 2012 at 08:27 AM PDT #

@John,

Great suggestion, I would encourage you to put this in the language. At various times I have experimented with work similar to this and maybe my experience might be of some use.

I wrote an extended Java compiler:

http://dl.acm.org/citation.cfm?id=1082285

that had Value types which toStringed, compared, and hashed on field values, but could be mutable or immutable and Immutable types which inherited from Value types which are very similar to your @ValueSafe types and have most of the same restrictions for the same reasons (the primary difference is that I built my Immutable type via inheritance from my Value type rather than in one hit). My compiler doesn’t currently unbox the types, but see comments below. The compiler does however enforce the restrictions placed on the Value and Immutable types.

I provided an ImmutableArrayList for holding my Immutable types and this solves the typing problem of putting Immutables into Lists and arrays that you mention (like Immutable, ImmutableArrayList is a ValueArrayList with extra restrictions). I also provide a standard conversion mechanism between mutable and immutable types to prevent excessive object creation.

My design notes
-------------
I quite like the building up of the types in stages via inheritance, like my Immutable and would recommend this approach. I no longer maintain the compiler but my next piece of planned work was NeverNull and Primitive types. The NeverNull type, as the name suggests, means an instance of a NeverNull type is guaranteed to be non-null. Interface Primitive extends Immutable and NeverNull and would allow the compiler to unbox instances, as you describe, and arrays of instances into multiple arrays, one for each field (you mentioned this implementation option in passing).

I took a deliberate design philosophy of only allowing the types to be annotated with the restrictions. The reason for this was a bad experience with const in C++, which can be applied to fields etc.

Say in C++ you have an Integer class with a max method, what is the definition of max? There can be only one sensible definition right? A max method that would accept consts and return a const, i.e. Integer const max (Integer const other) const {…}. But that would fail for: Integer nci1, nci2; …; nci1 = nci1.max(nci2), yet that is perfectly reasonable. A partial solution is that the class writer has to provide const and non-const versions of everything, i.e. a second max method Integer max (Integer other) {…}. That is a real pain and worse; how do you compare a const with a non-const, what is the return type?

Therefore I suggest limiting the Immutable annotation, your @ValueSafe annotation, to types only. The problem goes away then, since it makes the class designer king. This is the same reason why traits work better than extension methods; the class designer is king.
Inheriting annotations
Unfortunately you can’t inherit an annotation on an interface (@Inherited works only for classes); this is a real pain! I have mentioned this in a few places including via email to Joe Darcy, but no luck in getting this changed. Unfortunately that means ValueType as an interface won’t work and it will need to be an abstract class.

Alternatively you could get @InheritedOrImplemented added to Java, which would work for either interfaces or classes (I gather from Joe Darcy that this requires a class file format change).

Tuple types
----------
I suggested a tuple library:

http://code.google.com/p/google-collections/issues/detail?id=43

two features of this library are that Tuples are Lists (no need for an asList method) and T1 extends T0, T2 extends T1, etc., this is helpful when processing tuples recursively (T0 acts as an empty option type). The syntax for declaring a tuple is T1<E1>, T2<E1, E2> etc. The syntax for making a tuple is t(“A”), t(“A”, 1), etc. IE t is an overloaded factory method imported via a static import. This syntax works very well in practice, which indicates that a library rather than a language change is all that is required for adding tuples to Java.

The simplest way to write an Immutable type using my compiler is to extend a tuple, e.g. class Complex extends T2<Double, Double>, but unfortunately generics are erased and therefore you need to hand code for performance. However if tuples were made a special case of a reified generic class then this problem would go away!

Good luck in getting Immutable, tuples, etc. into Java,

-- Howard.

Posted by Howard Lovatt on March 25, 2012 at 05:57 PM PDT #

Great posting, the need for "value-like" things cannot be denied, despite the "everything is an object approach" of most OO languages.

According to your suggestion, primitive types and non-value types are compared with ==, valueSafe types in the contrary must be compared using equals (the use of == will be marked as an error for them). Wouldn't it be better then to redirect the symbol == to the equals-method for valueSafe types (similar to what Scala does? especially if the VM generates the equals-method anyway)? The client programmer then only has to know one symbol (== in that case) for comparing two variables, no more need for the error-prone distinction between two comparisons.

You said, "... the class is final, because it does not make sense to create subclasses of value types... ". I believe that's right, however that statement does not seem to be self-evident. Can you give reasons why a value type should not be subclassed?

Posted by Tim on March 26, 2012 at 10:32 PM PDT #

Sounds great.

I had Complex classes way back in SDJ(at SF.net) ever since the late 90s, but it should make it far easier to properly deal with sensor values and measurements.

Something like:
@ValueSafe
public class Measurement<Q extends Quantity<Q>> implements java.io.Serializable, Quantity<Q> {
// immutable component structure:
public final double v, Unit<Q> u;
private Complex(double v, Unit<Q> u) {
this.v = v; this.u = u;
}
[...]

Whatever version of Java this targets (9 or 10 I assume) should also finally have proper Sensor support, previously promised for Java 8, but the latest slides no longer mention that for good reasons.
There's work to do that other languages, like F# sure are going to show you at the MS-hosted language conference this week...

Posted by Werner Keil on April 04, 2012 at 01:59 AM PDT #

While it would be nice to have value types in Java, your proposal is a design smell for me.

Personally I never liked the @Override annotation instead of introducing an override keyword.

I understand that you did not want to break existing code, but with this "keywords as annotations" that Java has been suffering since Java 5, we are slowly getting into annotation hell.

What type of language will Java become, when half of the code is just annotations?

Posted by Paulo Pinto on April 22, 2012 at 03:43 AM PDT #

An interesting article.

I just read actually only the first part (until 'The Payoffs') and thought about the distinction between 'value-safe' and 'value-unsafe' operations.

Could it be a solution if the suggested interface 'ValueType' enforces special actions by the compiler and/or vm, like 'Serializable'? I mean that only one instance per 'value' may exist within the vm. With that, comparison by == is no problem. This will slow down factoring a boxed value but the compiler can optimize so that wrapped values only used when needed (and not for simply transfer values).

Another enforcement could be a default value instead of 'null'. All primitive types have a defined default value. If a ValueTyp must not have fields other than of primitive types or any other ValueTyp you have also a well defined default value for each. The use of 'null' can then be prohibit or be a synonym to that default value.

Posted by Andrä on May 11, 2012 at 02:34 AM PDT #

I understand how tagging with an interface implemented or an annotation will be useful for development. But in a released future JVM, I hope for a context-specific keyword like: public tuple Complex {...}

Java has already class, interface, enumeration ("enum"), annotation ("@interface"), then I would prefer addition of "tuple" keyword specifically to this context (scope of type definition).

I think, there is no risk of confusion with an existing variable, method, type name of current code at this position in code. If context-specific parser is too much problematic, I would prefer "@class" as keyword, like annotation has "@interface":
public @class Complex {...}

Posted by Daniel Latrémolière on August 08, 2012 at 05:32 AM PDT #

Why not extend the byte code to support multiple return types? This type of extension could then be taken advantage of by various language byte code compilers. From a byte code perspective supporting a maximum return size of 64bits is the biggest obstacles to support efficient light weight structures in languages written for the JVM.

If the byte code were extended then language implementors would be able to explicitly and deterministically support efficient light-weight structures while exposing this capability to developers using programming idioms or language structures that are best suited for that language.

I think such a generic and powerful byte code extension could come first, followed by language specific extensions to support concepts such as value types, tuples, complex type etc...

I would be great to see something similar your vreturn (tuples_in_the_vm 2007) be the next JVM byte code extension.

Posted by guest on October 25, 2012 at 05:06 PM PDT #

We really need these features.

Would you consider naming the new operator 'freeze' instead of lockPermanently?

Cheers,

Peter.

Posted by Peter on April 14, 2013 at 09:34 PM PDT #

Great post! I'm glad to have stumbled upon it from your post on struct tearing.

I found two passages confusing, but I'm not sure whether they are errors or lack of understanding on my part:

"The non-static fields of a value class must be non-public and final": Do you mean "non-private" instead? That would be consistent with your earlier statement that value types be structurally transparent.

"protected private final BigInteger digits": Is this introducing a new "protected private" access modifier? Or is this intended to be "protected"?

Cheers!
-Sam

Posted by Sam Umbach on March 07, 2014 at 11:44 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

John R. Rose

Java maven, HotSpot developer, Mac user, Scheme refugee.

Once Sun and present Oracle engineer.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today