• JVM
    April 2, 2012

Monday at Microsoft Lang.NEXT

John Rose
We are having a blast at Microsoft Lang.NEXT.

For the record, I posted my talk about Java 8.

[Update 4/06] The videos are already coming out from Channel 9, including my talk on Java 8 (mostly about closures) and an update from Jeroen Frijters on hosting the JVM on .NET.

I recommend Martin Odersky’s keynote about new facilities for expression reflection in Scala. As befits a typeful language, the notation for what Lisp-ers think of as “backquote-comma” and F#-ers call code quotation is based on the type system. The amazing part (at least to me) is that there is no modification to the language parser: no special pseudo-operators or new brackets. I guess this is a generalization of C#’s special treatment of anonymous lambda expressions, added for LINQ.

Here is an example based on the talk, showing a macro definition and its use of an typeful expression template:

def assert(cond: Boolean, msg: Any) = macro Asserts.assertImpl
object Asserts {
def raise(msg: Any) = throw new AssertionError(msg)
def assertImpl(c: Context)
(cond: c.Expr[Boolean], msg: c.Expr[Any]) : c.Expr[Unit] =
c.reify( if (!cond.eval) raise(msg.eval) )

The last code line is roughly comparable to a string template "if (!$cond) raise($msg)" or a Lisp backquote `(if (not ,cond) (raise ,msg)), but the Scala version is hygenically parsed, scoped, and typed. Note also the crucial use of path-dependent types (c.Expr) to allow compilers freedom to switch up their representations.

Speaking of C#, Mads Torgersen talked about new features in C# for asynchronous programming, the async and await keywords which go with the Task type pattern (which includes the varying notions of promise and future). You use async/await in nested pairs to shift code styles between coroutined (async) and blocking (awaited). The source notation is the same, but coroutined code is compiled using state machines. It is similar (but internally distinct) from the older generator/yield notation. To me it looks like a fruitful instance of the backquote-comma notational pattern.

Coroutines are a good way to break up small computations across multiple cores, so it is not surprising that they are a hot topic. Robert Griesemer’s talk on Go was a tour de force of iterative interactive development, in which he started with a serialized Mandelbrot image server and added 25% performance (on two cores: 200 ms to 160 ms) by small code changes to partition the request into coroutined tasks. Each task generated a line of the result image, so the adjustment was simple and natural. At a cost to interoperability, a segmented stack design allows tasks to be scheduled cheaply; fresh stacks start at 4 kilobytes. The use of CSP and a built-in channel type appear to guide the programmer around pitfalls associated with concurrent access to data structures. This is good, since Go data structures are low-level and C-like, allowing many potential race conditions.

Go includes a structural (non-nominal) interface type system which makes it simple to connect disparate data structures; the trick is that an interface-bearing data structure is accompanied by a locally created vtable. Such fine-grained interfacing reminds me of an oldie-but-goodie, the Russell programming language.

During Q/A I noted that their language design includes the “DOTIMES bug”, and their demo code required a non-obvious workaround to fix it. This was answered in the usual circular way, to the effect that the language spec implies the broken behavior, so users just have to live with it. (IMO, it is like a small land mine in the living room.) Happily for other programmers, C# is fixing the problem, and Java never had the problem, because of the final variable capture rule. Really, language designers, what is so hard about defining that each loop iteration gets a distinct binding of the loop variable? Or at least emitting a diagnostic when a loop variable gets captured?

(By the way, the Java community is interested in coroutines also, and Lukas Stadler has built a prototype for us to experiment with. It seems to me that there is a sweet spot in there somewhere, with user-visible async evaluation modes and JVM-mediated transforms, that can get us to a similar place. As a bonus, I would hope that the evaluation modes would also scale down to generators and up to heterogenous processing arrays; is that too much to ask? Perhaps a Scala-like DSL facility is the right Archimedean lever to solve these problems.)

Walter Bright and Andrei Alexandriu presented cool features of the D language. A key problem in managing basic blocks is reliably pairing setup and cleanup actions, such as opening a file and closing it. C++ solves this with stack objects equipped with destructors, a pattern which is now called RAII (resource allocation is initialization). As of 7, Java finally (as it were) has a similar mechanism, although it must be requested via a special syntax. Similarly, D has a special syntax (scope(exit)) for declaring cleanup statements inline immediately next to the associated setups. This is intriguingly similar to to Go’s defer keyword, except that Go defers dynamically pile up on the enclosing call frame, while the D construct is strictly lexical. Also, D has two extra flavors of cleanup, which apply only to abnormal or normal exits. The great benefit of such things is being able to bundle setups and cleanups adjacently, without many additional layers of block nesting. D also ensures pointer safety using 2-word fat pointers. (This reminded me of Sam Kendall’s early work with Bounds Check C, which used 3-word fat pointers. D is a good venue for such ideas.)

In keeping with the keynote theme of quasi-reflective computation at compile time. D has a macro facility introduced with the (confusing) keyword mixin, and called “CTFE” (compile-time function execution). Essentially, D code is executed by the compiler to extrude more D code (as flat text) which the compiler then ingests. The coolest part of all this is the pure attribute in the D type system, which more or less reliably marks functions which are safe to execute at compile time. There is also an immutable attribute for marking data which is safe for global and compile-time consumption.

Here are some other tidbits gleaned from my notes:

  • As data continues to scale up, Von Neumann machines and their data structures—arrays with peek and poke operations—are straining to keep up. The buzzword this week for well-behaved Big Data is monotonic. I am thinking that Java’s long awaited next array version should not look much like an array at all. (Rich Hickey, you are ahead of the pack here with Clojure.) Jeroen Frijters pointed out to me that embedding value type structs in arrays exposes them to A/B/A bugs. Glad I held off on that blog post...
  • As the Von Neumann Memory continues to morph into the Non-Von Network, systems need to decouple the two ends of every query operation, the request from the response, even in the case of single memory loads. This is why asynchronous computations, coroutines, channels, etc. are important. The design problem is to give programmers sequential notations without requiring the hardware to serialize in the same order. Yes, this has to add “magic” to the language; either we do that or (as Mads pointed out) set programmers the impossible task of expressing all desequentializations as explicit callbacks or continuations. The question is which magic will be less surprising and more transparent. Also: which magic will scale into the future, as new programmers are trained on less rigidly sequential styles. (Hint: The future does not contain the statement x = x+1.)
  • On the other end of the programming scale are the transparently partitionable bulk operations, such as supplied by databases and mathematical systems (from APL to Julia). These hide elements of the computation invisible so they can be efficiently scheduled. Even beyond that are the Big Ideas of system modeling and logic programming, which were discussed in several talks.
  • Still, programming tasks will apparently always use collections, at least in the sense of increasingly aggregated data values. This is why we are not done exploring the design space occupied by Java collections, LINQ, etc. It seems to me that much of the brain-power raditionally devoted to language design ought to be pointed at aggregate computation design, whether or not that requires new languages. Martin Odersky’s comment was that designing libraries is just as hard as designing a language. Also, John Cook notes that, from the point of view of the user (the non-programmer domain specialist), R has a language, rather than R is a language. What R also has is really useful arrays and all the specialist tools in easy reach.
  • Kunle Olukotun’s talk showed how to design aggressively for parallelism without inventing a whole new language. The key is DSLs, domain specific languages, for which Scala provides a fertile testbed. (Perhaps this is a good way, in the future, to put specialist tools in easy reach?)
  • Here is important design point about languages: Matlab and R style numeric arrays are very much alive and well. John Cook grounded our considerations with a presentation of how the R language is used, and the Julia project team was present to talk about their efforts to support numerics.
  • The Julia folks have tackled the hard problem of doing numeric stacks in a modular way. The problem is harder than it looks; it requires symmetric dispatcing of binary operations and normalizing representations composed from independent types. For example, what happens when you divide two gaussian integers (Complex[Int]), do you get a Complex[Rational] and if so, why? What happens when you replace Complex and Rational by types defined by non-communicating users? Apparently their framework can handle such things.
  • A bunch of CS professors are cooperatively designing their own pedagogical language, Grace. Besides their sensitivity to theoretical concerns, this team uniquely brings to the table a limitless supply of experimental subjects, also known as “students”. Good luck, fellows! (Really! Even if I had my irony fonts installed, I wouldn’t use them here.)
  • The new Windows Runtime (WinRT) system API emphasizes asynchronous programming with tasks. I guess we programmers are being weaned off of threads; at least, it is high time, since threads are bulky compared to virtualized CPUs. WinRT also provides much richer and more flexible management of component API metadata, repurposing CLR metadata formats to interconnect native, managed, and HTML5 code. Compilers (e.g., F#—there was a talk on this) are being modified to read the metadata on the fly, instead of requiring build-time binding generators.
  • Great quote on Moore’s End: “Taking a sabbatical is no longer an option for accelerating your program.” (Andrei Alexandriu)
  • There was an awesome (though hard to understand) demo of the Roslyn project, in which the Visual IDE language processors are opened up to allow (amazingly) easy insertion of code analysis and transforms.
  • This year the conference was called Lang.NEXT instead of Lang.NET, in order to welcome “non-managed” languages like C++. There was a good panel on native languages. The end result of having the managed and native folks talk, I think, was that the bright line between managed and native become dimmer and more nuanced. This may have defused some latent partisanship. In any case, for those who believe that managed language JITs are inherently poor creatures devoid of heroics, please see the HotSpot architecture wiki. The HotSpot JVM is described in various papers (though there are not enough of them); Arnold Schwaighofer’s thesis provides a recent medium-length account.
  • I am grateful to the organizers for making a lively and comfortable conference. They have set a high bar for such events. We at Oracle will do our best to reciprocate at this year’s JVM Language Summit this summer.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.