value types in the vm, infant edition

A number of folks have been working hard for months on a credible proposal for value types in the VM. I am happy to announce that we have made our first public posting, with many concrete recommendations. Enjoy: State of the Values, April 2014: Infant Edition And remember, Codes like a class, works like an int!
Comments:

This is a very exciting development. Is there any sense when this feature will be released? I have wanted this for years. Java is such a great language, and would be so much more amazing for many math-intensive algorithms with this feature. In particularly performance-critical code, I've been forced to "unwrap" data that should be in "structs" into multiple arrays of primitive types to get around the lack of this feature.

Posted by guest on May 02, 2014 at 05:42 PM PDT #

I get the feeling you might be looking at the problem upside-down.

If as you say, the main obstacle to pass-by-value for complex types is the possibility that such an object might be used as a synchronization monitor, then it sounds like 90% of what you're after could be achieved with vastly less complexity, with the ability to flag a class as "nomonitor" (which would imply the class is final and so are its fields, which must be final primitive-or-other-nomonitor?). At which point all the optimizations you could do on immutable data structures are available to the VM. That approach would provide most of the value of what you're proposing with the least changes to the language.

In that scenario, there is the issue that now instances of interfaces are not guaranteed to be java.lang.Object in the sense of having wait() and the ability to be synchronized on (but I think you have that problem anyway). Either you solve that by making it a runtime error to synchronize on such an object (not ideal but doesn't break existing code), or you require a signature like

public <T extends IFace & Object> void syncsOnElements(Set<T> objs)

so that in order to use synchronization primitives the compiler has to see a type that will definitely have them. You could use your autoboxing proposal for that, but of course two chucks of code that box the same object will not necessarily wind up using the same monitor, so I don't see that as being of help.

In general, this seems like a situation where less is more. In terms of wanting value types, something vastly simpler would do the job nicely - the ability to implement interfaces on a value type is nice-to-have but not critical; the ability to implement equals and hashCode is possibly dangerous (calling equals() on a value object is asking the question "are the contents of this memory region identical to that one?" which is surely not helped by the ability to futz with that computation.

That's if the goal is to provide "value types" which behave in some way that's detectable to the developer coding against them. If the goal is simply to enable the kinds of optimization the VM could do against immutable data, probably the "nomonitor" approach does the trick for that.

I suspect that the "nomonitor" approach, and leaving the way such objects are managed under-the-hood has other advantages: The VM can decide at runtime whether pass-by-reference or pass-by-value is appropriate based on the size of the data structure and heuristics based on previous calls to the same code.

Posted by Tim Boudreau on May 03, 2014 at 03:23 PM PDT #

What is the explosion of complexity with non-final fields? If all fields are final, why can't it be passed by reference to avoid the copying overhead of pass by value? What is the difference between this proposal and C# structs?

Posted by Clay on May 05, 2014 at 07:39 AM PDT #

In addition to graphics programming which has been mentioned, there are a few problem spaces that I've encountered where something along the lines of ValueTypes could be of great significance (but the “Devil is in the Details”):

1) Succinct Data Structures - No, the thing I'm modelling is not a "long" nor is it 2s compliment but I have to call it a "long" because, it's most efficient using 64 bit "chunks" of memory for storing binary (see EnumSet also Tries, DAWGs, MSA-FSA et. al.).

2) Modelling small finite data types that does not naturally fit within 8, 16, 32, 64 bit boundaries.. Nucleotides of the Dna (are {A,C,G,T}… ideally I would model them as 2-bit binary A=00, C=01, T=10, G=11) and I could make Dna contiguous strands (lists) and operate on them using binary operators ( >> << ^ ~) (i.e. a long could store a 32 nucleotide strand) for Dna sequencing...

...I have a few others, (i.e. Cache Oblivious Data Structures /OffHeap/Unsafe and Language interoperability use cases which others have discussed) but (stepping back for a moment) I ask myself (philosophically):

Is the fear of "breaking data encapsulation" in Java holding us back? (i.e. are we trying too hard to fit every problem into aggregates of 8, 16, 32, and 64 bit 2's compliment primitives) and as a result is it causing us to resort to hacks (for performance)?

Seems to me Java (along with many other languages) does great at modeling numeric problem spaces; (do to the historical value of computing trajectories a la ENIAC). However IMHO Java leans too heavily on assuming we want to model everything as being processed in scalar fashion within a numerical context (bytes are signed? WHAT?).

Ideally, programmers like myself would love the "option" to model things at a binary level (just gimme chunks of unsigned bits (i.e. unsignedlong and maybe a good BitVector) and I can take it from there) Modelling all problems as the interactions between primitives or aggregations of signed primitives (and good old unsigned char) makes problems involving large non-numerical data or structures/tuples more "interesting"...

I'm advocating more power through "low level" binary means, (gimme some unsigned types... Josh Bloch seems to think this is one of Java's big "mistakes") I can't speak to the complexity and efficiency involving the interaction amongst unsigned or ValueTypes and languages features like Generics and AutoBoxing, but seems to me this is a big opportunity if done right.

Cheers,
Eric

Posted by M. Eric DeFazio on May 14, 2014 at 07:52 AM PDT #

I have some proposal that I think They would be interesting on how to use Values Types. I have two proposal: How to define if? and how to use it? The two are separate proposal. Now I will write the first proposal

1.Defining Value Types?
A. What about forcing a primary constructor to indicate that it is a value type, there is some thing like it in C#6, but in C# it just to make easy to define a primary constructor. So in Java we can use it differently. So we can say that if we use it that we are defining a value type. It is possible to have more than one explicit constructor. But I don't think that we have to force developer to write a constructor if it is not necessary. Why? According to the document Value-types constructors are really factory methods, so there is no new;dup;init dance in the bytecodes.

So I think that to define a value type we can only do:

final class Point(int x, int y){

boolean equals(Point p){
return this.x==p.x&&this.y==p.y}

};
x and y are automatically final and public members. So that can exactly be compiled like:
final __ByValue class Point {
public final int x;
public final int y;

public Point(int x, int y) {
this.x = x;
this.y = y;
}

public boolean equals(Point that) {
return this.x == that.x && this.y == that.y;
}
}
Simple, concise and contributes to productivity.
We can also do:
final class Point(int x, int y){
private int c;//if I wont
boolean equals(Point p){
return this.x==p.x&&this.y==p.y}
public static Point getFunPoint(){
}
public static void maFunction(Point p, boolean b, double b){
}
}

I also thought about something new:

B. Why not an implicit default equals. So if I do :
final class Point(int x, int y){} that means if the JVM doesn't find an explicit equals, it cans do logical compare.
public boolean equals(Point that) {
return this.x == that.x && this.y == that.y;
}.
So a Point can be defined in one line:

final class Point(int x, int y){}

C. If we don't want to define something else in a value types, braces can be optional as in lambda expressions :)
So to define a value type we can only do:

final class Point(int x, int y);

Posted by Bilal Soidik on May 14, 2014 at 10:04 AM PDT #

I have a different proposal that solves similar problems, but (I think) have some clear advantages over “Value types”:

- does not require such big changes to JVM (e.g. no new opcodes)
- works better with current java coding practises (supports both mutable and immutable types).
- in some cases improves performance of existing java programs without need to change anything in them.

Proposal:

1) Add one new special class to java:

public class ValuesArray<E> implements List<E> {
native E get(int index);
native E set(int index, E element);
native int size();
static native <E> ValuesArray<E> create(E seed, int size);
….
}

This will behave similarly to ArrayList with fixed size, but instead of storing references to objects, it will store values of its fields. So it is something like array of struct instead of array of references. Methods:

create(E seed, int size) - will create new ValuesArray of type E, specified size and filled with values from seed object. It also stores type E (=seed.getClass()).
set(index, element) - instead of storing reference to object, it will store values of its fields.
get(index) - will read values and box them to new object of type E.

Of course there will be some limitations on which classes can be used as E. It must be final class and it must support construction of boxed values:
- for mutable objects: all fields of E are public and non-final and it has public no-param constructor.
- for immutable objects: it should have have constructor that has same parameters as objects fields and it assigns those parameters to them.

2) Improve escape analyses in JVM:
- should be able to detect that field of a class never escapes and “inline” it. (e.g. Line object with 2 Point fields p1 and p2 can be stored as one object on heap, if JVM can determine that p1 and p2 never escape.)
- JVM should be able to use escape analyses results also between method calls. Current escape analysis can allocate object on stack only if all uses of that object are within inlined code. This should be improved so that “on stack” and “inlined field” objects can be passed to other methods without need of unboxing. Not always, but at least in most common/simple cases (e.g. in case it is “no escape” parameter).

3) Add some support for escape analyses in language. E.g.:
- “@NoEscape” annotation for class fields. For JVM this will be just hint, but for developer it should cause compile time errors/warnings when he tries to use the field in the way that breaks “escape analyses” rules.
- “@ValueType” annotation for class. Makes sure that the class is suitable for ValuesArray.

Posted by Palo Marton on June 17, 2014 at 05:15 AM PDT #

Great to see this moving along. Struct-like enhancements to Java like this are great.

My vote for __MakeValue is the null string.

I'd love it to be: Point p1 = (1, 34);

it'd fit well with assignment to return values as:

(x, y) = p1;
or
(x, y) = getRandomPoint();

Posted by guest on June 18, 2014 at 09:41 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

John R. Rose

Java maven, HotSpot developer, Mac user, Scheme refugee.

Once Sun and present Oracle engineer.

Search

Categories
Archives
« April 2015
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
  
       
Today