The move from Mercurial to Git provided an opportunity to consolidate the source code repositories.
May 14, 2021
Download a PDF of this article
Have you ever built your own Java Development Kit from source? Most end users of the JDK will not need to build their own JDK from the Oracle source code. I’ve needed to do that only a few times when I was running on the OpenBSD UNIX-like system, which is not one of the three supported platforms.
Sure, you might want to build your own JDK to try out a new feature that you think should be added to Java. You might choose to build from source to be sure you are running a more trustworthy binary. Having the complete source code readily available, and now in a more commonly used download format, means it is easier than ever to build your own JDK. Yes, it’s a better-documented, easily configured process than in the past. But it’s still a bit confusing.
The source code for the OpenJDK recently moved from the Mercurial version control system (VCS) to the Git VCS and the GitHub repository system, and that’s probably a good thing. Here’s some history, why it matters, and what you need to know.
Blame or thank BitKeeper
If it weren’t for Larry McVoy, developers might not have Git, or GitHub, or Mercurial.
Larry McVoy worked at Sun Microsystems from 1988 to 1994. While there, he presciently suggested that Sun’s UNIX (Solaris) operating system and Novell should be merged and open sourced to head off the growth of Microsoft Windows NT (and Linux). Sun’s management didn’t listen.
McVoy also developed a code repository and version control system called TeamWare. When McVoy left Sun, he expanded that work into a new version control system called BitKeeper. It was much faster at dealing with large codebases and large changes than the two main open source version control systems then in use: Concurrent Versions System (CVS) and Subversion.
BitKeeper was a commercial software product by McVoy’s company, BitMover. He gave a free license to the Linux kernel project, with certain conditions, one being that there would be no reverse engineering of the product.
There was consternation among the Linux community over Linus Torvalds' willingness to use a closed source tool for maintaining the open source Linux kernel, but it was the best tool for the job, and the angst merely simmered on the back burner for a few years.
Eventually Andrew Tridgell, known as the creator of the Samba project (the Microsoft network file system compatibility for UNIX and Linux), tried to reverse-engineer the BitKeeper network protocols to be able to make an open source client for the server. This action, whether right or wrong, precipitated a rift in the space-time continuum: McVoy completely and abruptly rescinded the Linux project’s license for BitKeeper, bringing kernel development to an abrupt halt.
This action precipitated the development of both Git and Mercurial, as you can read about in Zack Brown’s “A Git origin story” in Linux Journal.
Here’s the short version. Torvalds created his own VCS, Git, which he says he named after himself: For nonnative speakers of English, a “git” is defined as “an unpleasant, contemptible, or frustratingly obtuse person.” The Git VCS took over support of the Linux kernel in a matter of weeks.
At about the same time, Matt Mackall started writing another VCS, Mercurial. He claims to have named it Mercurial after McVoy’s temperament, given how abruptly Linux’ BitKeeper license was canceled.
Both Git and Mercurial offer similar features and performance. But over time, Git has largely won over the hearts and minds of developers. Mercurial remains in use but is far less popular.
Git is preinstalled on many operating system distributions, and it can easily be added to most others, using
pkg_add, or whatever standard tool you use for installing third-party software. On systems that don’t include such a tool (such as Windows), you can download the binary, or download Git’s source code and build it yourself.
Why not Mercurial?
One of the largest projects using Mercurial was Sun’s Java OpenJDK, spread over multiple repositories.
There is nothing inherently wrong with Mercurial. It’s just that Mercurial has been seen as an also-ran for decades, and many developers simply skipped it unless they were really dedicated to some project that used it. There are still many projects using it, though; there’s a list at Mercurial’s website that somewhat is out of date (it still shows Java). Interestingly, that page ends with a link to a list of projects that moved from Mercurial to Git.
Why Git? Why GitHub?
Git is now almost the de facto standard in this area; it’s hard to get a job as a developer unless you are comfortable using Git. Git offers branching, letting developers work on multiple sets of changes at the same time, and merging, pulling multiple branches into another branch, typically the main one.
A related platform is GitHub, a cloud-based Git service with plenty of additions. You create an account and then create any number of Git repositories. Repositories can be public (for open source or open info) or private. GitHub is free for individuals and reasonably priced for organizations.
To see what GitHub looks like, check out one of my smaller repositories, which is for PdfShow, a tiny PDF-based presentation tool, or one a bit larger, javasrc, which has sample Java program fragments (including the code samples from my Java Cookbook).
There are several advantages of GitHub over plain or do-it-yourself Git hosting.
A comprehensive web interface. The GitHub web interface has been smoothed over by many talented hands, and it works well. The browser interface allows you to view repositories’ contents and history, view individual files, and edit and commit text files.
Issues processing. GitHub provides a bug tracking facility called Issues, since not all things that need attention are bugs. You can tag issues as bugs, enhancements, or a dozen other tags, and you can even define your own tags. In a team environment, the repository’s administrator can assign responsibility for an issue to specific individuals.
Pull requests. A pull request is a submission of a diff, or a list of changes, to the owner of the repository, to pull changes into the main branch of a repository. This is very powerful because, unlike mailing diff listings around, a pull request is aware of other activity on the repository and will tell the owner if the diff would still apply cleanly. The pull request allows other developers to comment and it doesn’t get mangled by different email systems’ treatment of line endings, spaces, and tabs. All a pull request’s changes do not have to be in the same repository; in fact, they’re commonly not. A person making changes would typically fork (that is, make their own copy of) a repository, work on that fork, test the changes, and then send a pull request to the admin of the original repository. That admin could accept the pull request to merge the code in, offer comments on how to improve the code, or outright reject code.
Social coding. These features add up to one of GitHub’s mottoes: social coding. Teams or individual developers can work together without being in the same office or even the same organization. Social coding was big before the pandemic, and of course it’s even bigger now.
Actions. In GitHub, actions let you perform one or more workflow-related actions when commits are made. An action can be as simple or complex as you need: from something as simple yet important as building and running tests up to building a Docker image and deploying it at scale with Kubernetes or some other technology. If you are logged in and on your own repository’s main page, GitHub has a list of the two dozen or so available ready-to-configure actions under the
Actions tab. There is also a public list of more than 8,300 prebuilt actions.
As you can see, there are plenty of reasons why the OpenJDK project chose Git and GitHub.
Why the move, and why now?
Before the move from Mercurial, the OpenJDK lived in several repositories. It was decided that it would be best to consolidate into a single repository.
At the time this move was first planned—in the Java 9 era—the JDK consisted of eight different repositories. To build, a developer had to download all eight repositories all in the correct directory structure. However, on Mercurial, commits that involved more than one repository were common, but there was no mechanism to enforce atomicity or consistency. To quote from JEP 296: Consolidate the JDK forest into a single repository, created in 2016
The multiplicity of repos presents a larger than necessary barrier to entry to new developers and has led to workarounds such as the “get source” script.
The following list of completed JEPs and project documents provides insight into the four major stages of the GitHub consolidation and migration:
The team has succeeded admirably: The JDK is now built out of a single repository.
Getting the source code and building the OpenJDK
Get the source code. Assuming you already have Git installed, you can download the code. Bear in mind that the codebase is not tiny.
$ cd ~/git # or wherever you want the code
# Do not do this until you read the whole article
$ git clone https://github.com/openjdk/jdk
$ du -sh jdk
$ 1.7G jdk
Yes, the full
OpenJDK/jdk download is 1.7 GB. Good thing that storage is cheap these days! This repository includes the history of the OpenJDK back to some initial commits in December 2007 for Java 7, created by an account mischievously named “J. Duke.”
Build prerequisites. There is a long list of prerequisites. For UNIX and Linux, you basically need a working modern system with a compiler (clang or gcc). For macOS, you need a current version of Apple’s Xcode development suite.
Since UNIX systems (including BSD and macOS) and Linux have been built largely by, and dependent upon the work of, independent or third-party software developers, these systems tend to have more of the prerequisites preinstalled or built-in compared to Microsoft Windows platforms. Thus to build on Windows requires more setup. Refer to the OpenJDK build documentation.
Your first build. My developer Mac system has a lot of tools already on it, including the full Xcode package, so I thought I’d try without even worrying about the prerequisites. For the first build, I simply checked out OpenJDK, typed
configure and then
make, and that was enough to get me a working instance of Java 17. Here is the log, with a lot of boring parts cut out.
$ git clone https://github.com/openjdk/jdk
$ cd jdk
$ bash configure
... tons of output ...
... tons more output ...
Finished building target 'default (exploded-image)' in configuration 'macosx-x86_64-server-release'
$ ls build/macosx-x86_64-server-release/jdk
_optimize_image_exec.cmdline conf modules
_optimize_image_exec.log include release
$ build/macosx-x86_64-server-release/jdk/bin/java --version
openjdk version "17-internal" 2021-09-14
OpenJDK Runtime Environment (build 17-internal+0-adhoc.ian.jdk)
OpenJDK 64-Bit Server VM (build 17-internal+0-adhoc.ian.jdk, mixed mode)
The build worked!
I didn’t specify which Java version to build, so by default I built the latest version, Java 17, which is slated for general availability as of September 14, 2021; hence the date stamp being in the future and the word
internal in the version string. The JDK repository represents the leading edge of the Java SE platform. There are also stable repositories for Java 16, 15, 14, 13, 12, 11, and 8; some of these have two repositories. One has the letter u at the end (such as jdk16u) and it gets security fixes. The one without u does not.
My build took just over half an hour on an older Mac with a 2.9 GHz Intel dual-core (4 thread) Core i7 with 16 GB of RAM and a 2 TB hard drive with 600 GB of free space.
As pointed out in the prerequisites document, the JDK is not a tiny program: Building it goes best with a bunch of CPU cores, a fast disk, and lots of RAM and disk space. My system is a bit under the recommendations on the disk side, in that using a solid-state disk (SSD) is recommended. However, my machine doesn’t have an internal SSD available for testing. An external SSD connected over USB 2.0 would probably be slow.
The OpenJDK project has moved from Mercurial on java.net to Git on GitHub. Along the way, the eight repositories that were once needed to build the JDK were merged into one. This gives you the source code to build a complete instance of Java SE.