Mercurial OpenJDK Questions

Just some misc Q&A on Mercurial and the decisions made in transitioning OpenJDK to it. Hopefully this makes sense to people.

The OpenJDK repositories are at

Why Mercurial? See the analysis done by the Solaris team for details as to why Mercurial. From what I have read and seen, the two choices for a fast good DSCM (Distributed SCM) was Mercurial or Git (both seem like excellent systems). Using the same SCM as Solaris made a great deal of sense. So far, the remote people in places (where the internet connections are slow) love Mercurial. After an initial clone is created, the push/pull actions are very fast, since only the actual changes or diffs are effectively going down the wire (so to speak). Clones onto local disk and multiple clones on the same disk are very very fast and efficient on Solaris and Linux, and Mercurial works on Windows too. The ability to pull changes via http and push changes via ssh is a major advantage too. And of course, it's "open source". ;\^)
Why no imported history from TeamWare or the posted OpenJDK SubVersion repositories? Legally we had constraints on what history we could even try to provide. Technically, importing history from one SCM to another is not as easy as it sounds. The ideal history is the history of the changesets actually created by the developers, and creating any other kind of history (per snapshot or per promotion) seemed problematic to me. If the end result was that the old history details were left in the old SCM data, and that was guaranteed to be accurate history data, then it seemed right to leave it there and not confuse the issue. I'm sure there are people that would disagree with me on this. TeamWare doesn't have a changeset model, but a loose group of files putback model that is not very complete to create changesets like Mercurial wants. History conversion from something like CVS, SubVersion, or a more changeset-like SCM would be another story.
Why a forest and not one big repository? Unlike Solaris, we already had broken up the JDK sources into separate workspaces e.g. hotspot and j2se originally, later we split j2se into jdk, langtools, corba, jaxp, and jaxws. Each of these separate repositories represent piles of sources managed by separate teams or delivered independently from the enclosing jdk product. Having them available as independently buildable repositories was considered a major advantage, and more importantly, some of the developer teams really wanted to keep themselves separate. It's that 'software silo' concept where developers don't want anything changing on them except what's in their immediate area. ;\^) From a Release Engineering point of view, or someone that has to build the whole thing, this forest of repositories is a royal pain, a single repository would have been preferred. But the Mercurial forest extension seemed to provide an answer, so we went with the forest. Could the forest extension and Mercurial itself handle these nested repositories better, I'm sure it could, and will over time. I would not recommend creating separate repositories lightly, they need to be independently buildable and relatively dependence free from the other repositories.
Why the dual repository setup (gates)? When someone does a push to a Mercurial repository, and before the hooks are run that might rollback those changesets, there is a window of time where someone pulling from that repository could get these rolled back changesets. In addition, we have seen situations where people have managed to 'push -f' multiple heads (forgot to merge) to a shared repository, creating very confused co-workers and potentially a chain of silly merge changesets. So we only have people push to a gate (you cannot pull from a gate), where the changesets are checked and verified, and a simple push happens to the matching pull gate. Nobody can ever pull multiple heads, and nobody gets changesets that failed validation. So having the dual setup will allow us to better protect the integrity of the repository.
Why so many different sets of repositories? By having each team keep their changes isolated until they integrate into a master area, they will be more likely to find their own mistakes and regressions before everyone else. Testing is focused on the areas that have changed and can validate the entire team's contribution to the product. In addition, builds of these various team areas (if archived) can be used to isolate regressions, quickly narrowing it down to the team, and then the changesets can be used to isolate it down to the individual author, hopefully being able to quickly isolate and correct the regression.
Why not subsets of repositories for the teams? Good question. Initially we thought we could make the team areas sparse sets, but the complete jdk is a forest or set of repositories, and a subset doesn't provide the teams with a complete set of jdk sources. So to make our (the initial Mercurial Transition engineers) lives easier each team got a complete set. Having a complete set of repositories provides the teams the possibility of building a complete jdk with a known quantity. This situation may change as we gain more experience with Mercurial and forests.

I'll add more to this as time goes on, assuming people find it useful. Add your questions to the comments, I'll try and answer them.



Post a Comment:
Comments are closed for this entry.

Various blogs on JDK development procedures, including building, build infrastructure, testing, and source maintenance.


« July 2016