Complexity in language design

People talk about the "conceptual weight" of language features. Here's a way to make that a more precise concept, and more accurate too.

The JLS is structured bottom-up: universal elements like grammar, types, values and names; then Java artefacts both major (packages, class, interfaces) and minor (arrays, exceptions), then back to universal concepts like statements and expressions. (The chapters on execution and binary compatibility are in the wrong place; they should be at the end with assignment analysis and the memory model.)

We can exploit this structure when adding a new language feature. By asking which JLS chapters would be affected by the feature, we can gain an idea of the feature's semantic impact, and thus its complexity. For example, adding a new statement form is done near the end, so it's a minor addition. Adding a new keyword is done up front, implying there are many artefacts the new syntax could interact with - such as the names of all your variables.

So if a new language feature L needs changes in a set of chapters S, the complexity of L is:

where degree(S) is the number of cross-references between chapters necessary to fully describe L. An approximation of this factor is P(S,2) - note a permutation not a choice because a forward reference and a backward reference each add complexity.

S should never include 1, the introduction, or 18, which is really an appendix. (And a syntactic grammar is already introduced in chapters 3-15.) You'd have to watch out for a couple of things: some chapters (notably 6) trivially recap or preview other chapters, and example code should not generate vacuous cross-references.

So, consider these JDK 1.5 changes:
- Hexadecimal floating-points literals: S={3}, degree(S)=1. Complexity: 0.33.
- Enumerations: S={3,8,9,13,14,15,16}, degree(S)=a. Complexity: 2.33a.
- Generics: S={4,5,8,9,10,13,15}, degree(S)=b. Complexity: 1.75b.

Since b is probably higher than a, 1.75b will be higher than 2.33a, reflecting the sense that the complexity of generics is higher than the complexity of enums. However, to really capture how much higher, you'd want to improve the granularity of S's elements from chapters to sections or subsections, so that |S| shoots up for generics. Also you'd want to force the measure's range into a reasonable value set by means of constant factors. (Different factors would be needed for different granularities of S.) Finally, since I'm claiming that a total order exists for the chapters, the breadth of the changes - i.e. max(S)-min(S) - could be used in the formula.

I don't use this measure in real life but it is fairly plausible. Comparing an abstract view of a single feature against the abstract view of the whole language might serve as a proxy for the effort needed to implement and test the feature in a compiler.

Comments:

Hexadecimal floating-points literals: S={3}, degree(S)=1. Complexity: 0.333, isn't it?

Posted by George on May 03, 2007 at 08:12 PM PDT #

Ha! I found this article via reddit's programming section, read it, thought "Heh, I must tell Alex about that..." then on a sudden hunch checked the URL...

Posted by Alaric Snell-Pym on May 03, 2007 at 08:35 PM PDT #

George, you're right of course. Thanks for pointing it out.

Posted by Alex on May 04, 2007 at 03:13 AM PDT #

It is an interesting idea but I don't think it really captures the complexity. I.E. I would rate generics as more complicated than enums. Perhaps lines in the compiler might be a better measure.

Posted by Howard Lovatt on May 07, 2007 at 10:15 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

Alex Buckley is the Specification Lead for the Java language and JVM at Oracle.

Search

Categories
Archives
« July 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today
Feeds