RedMonk at FOSDEM: Lies, Damned Lies, and Statistics
By Geertjan-Oracle on Feb 02, 2014
Imagine you switch on the TV and you're just in time to catch the start of the weather report. The reporter is smiling cheerfully into the camera, while saying:
"We haven't been able to measure weather statistics in the whole country. In fact, we don't really know exactly how large the country is. Nevertheless, that being said, based on the weather statistics that we did manage to get hold of, here is the state of the weather today."
That, honestly, is what happened at FOSDEM yesterday in a packed out (people standing in the aisles, sitting on the floor, lined up outside in the corridor) RedMonk session entitled "What a Long Strange Trip It's Been: The Past, Present and Future of Java". (Here's the abstract.)
Now, Steve O'Grady was perfectly frank in saying exactly what he said (and says it again in his blog article), "to be included in this analysis, a language must be observable within both GitHub and Stack Overflow." And that's really all that's been analyzed, i.e., GitHub and Stack Overflow. (As a corollary, you then end up in the endlessly hilarious and inherently unresolvable discussion about what it means that there are many questions on Stack Overflow about a language -- (a) that it is popular, (b) that it is complicated, (c) that it is badly documented, (d) combinations of the above, (e) something else.) OK, so then I think the title "The RedMonk Programming Language Rankings: January 2014" is really too heavy for the analysis that's been done and the content it provides, especially since by their own admission they haven't surveyed closed-source corporate software development, nor have any clue about how to do so. I'd have thought they'd have built up a database of software companies around the world, that they've built relationships with these companies, and that thanks to the relationships they've developed they're able to regularly poll these companies on the software (and languages and frameworks and libraries and IDEs) they're using. However, this turns out to be quite clearly not the case at all.
And I don't think it's a credible defence to say, "hey, I told you that we didn't analyze anything other than GitHub and Stack Overflow" because a lot of organizations take the analysis of RedMonk (and similar organizations, such as Forrester, where the difference is you have to pay for the analysis results and you get an impressively authenticated PDF, rather than a free blog entry) very seriously, in the same way that the weather reporter saying "the weather today has been sunny with blue skies" is taken seriously, because no one reads the small print that says "hey, we really don't know how large the country is and we possibly haven't visited most of it". In the case of the weather report, that would be absurd and they wouldn't have a particularly strong basis to exist if they said that.
Personally, I have no reason to make this point other than a concern that we really do need usable and independent research to be done on the ubiquity of programming languages (and frameworks and libraries and IDEs), since the conclusions reached by RedMonk favor Java, which is my personal favorite language, the one I have been supporting and promoting for the past 10 years.
In other words, I have nothing to gain by calling RedMonk's bluff. (I'd argue that even if you call your own bluff, it's still bluff.) They reached a conclusion I would have wanted them to reach: Java is not only alive, it is vibrant, self-sustaining, used everywhere by everyone in all manner of application development. However, in a year or so, RedMonk may do another scan (or analysis or whatever) of GitHub and Stack Overflow and then come to completely different conclusions. And they would be as invalid and as silly to base any kind of conclusion on than the ones they have most recently come up with and were presented at FOSDEM yesterday:
OK, thanks, Java is very much alive and vibrant and is used for things other than enterprise applications. (Later that day I met up with John Kostaras from NATO, where they work on a massive Java desktop system that manages European air defence, which of course is yet another example of corporate work being done in Java that falls outside of RedMonk's research. In fact, if RedMonk were to research the ubiquity of Java desktop development, they'd say the Java desktop is dead, but only because NATO and all the other massive enterprises in the defence, aerospace, and banking domain don't use GitHub and Stack Overflow.) The call to action at the end was that we should spread the word that Java is alive. Great. Everyone except maybe analysts, who may have been scanning (or analyzing or whatever the verb is) some other repository somewhere and come up with contradictory conclusions, already knew that.
I'm sure I wasn't the only one left with the question marks outlined above, several of the discussions at the end of the session more or less asked the same thing, and the response was more or less the same, kind of along these lines: "Hey, we don't know how large the programming world is, we have no way of knowing that, we have to do the best we can with the data that we can get".
If that is really the case then, in the first place, I really appreciate the honesty. In the second place, however, I'd suggest that analysts start actually building relationships with companies and developers at companies, rather than with repositories and on-line discussion forums. In the third place, it's invalid to draw any broad conclusions from this analysis. The worst thing that one can do is to say, essentially, "yes, the weather statistics are incomplete and, what's worse, we have no clue how incomplete, nevertheless, that having been said, the skies are sunny and blue".
As a final point, the first photo above of the "packed out RedMonk session", (with people standing in the aisles, sitting on the floor, lined up outside in the corridor,) might indicate that there was a massive interest in the state of Java and especially in what RedMonk had to say about that, which might further underline (a) that Java is very popular and (b) that RedMonk is taken very seriously. Both those things may, of course, be true. But the photo doesn't prove that at all because... pretty much all the rooms at FOSDEM were "packed out". It's a free conference with a massive turnout and... surprisingly small rooms.