OpenJDK, Mercurial, and The Changeset View
By kto on Mar 27, 2008
Why do I have to create a "Merge" changeset when there was nothing to merge?
For most of us old TeamWare users, and maybe other SCM users, the need for all the Mercurial "Merge" changesets (or as some people politely refer to as 'merge turds') seems confusing and messy. If the changes don't involve the same file, it can be hard to understand why you need a Merge changeset.
What did TeamWare look like?
In TeamWare a 'resolve' was necessary only when there was a conflict, meaning that two people changed the same file. The tool 'filemerge' provided a way to easily deal with each file conflict, but merging changes is and will always be a risky business. Everyone has had an experience with a 'bad merge', they are nasty problems. No Source Code Management (SCM) tool completely removes the need for merging, and our only hope is for the merging tools to help us out here. It is probably true that a Distributed SCM like TeamWare, Mercurial, or git may create the need for more frequent merging, but the end result is often the same as a non-Distributed SCM, so maybe with a DSCM the merge work is also distributed? Anyway, I digress.
With TeamWare, the 'resolve' action resulted in multiple revisions
in each SCCS file that had a conflict. The TeamWare tool 'vertool'
provided a way to pick an SCCS file and view it's revision history.
Again, this was on a per-file basis, and although that created some benefits for
developers, like being able to 'putback' just one file change, it also made it
a little difficult to record the true state of the entire workspace.
Here is a snapshot of vertool in action for anyone that hasn't seen it:
Notice the SCCS revision graph, when conflicts happened, the graph gets a little more complicated, but unless the changes are abandoned, it always connects back up to the main trunk of the graph. With TeamWare, every file was controlled with SCCS, and every file had a graph. The connections between files was never formally managed by TeamWare, but TeamWare provided some tools like 'freezept' to allow you to try and manage it.
And with Mercurial ...
The changes come in changesets or grouped changes to files, which are treated and tracked as changes to the repository. Yes, the changes are made to specific files, but the revision tracking is done for the entire repository. When a merge situation in Mercurial happens, and they will be frequent, a new changeset has to be created to potentially carry any file merge changes, but most importantly to identify the merged or joined results of two changesets. All changesets have at least one parent changeset, but Merge changesets have two parent changesets. Everytime you do an 'hg pull' that adds new changesets in your repository, and your repository has changesets that have not been pushed yet, you have created what is called a 'multiple head' situation and you will need a Merge changeset. A 'head' is a changeset with no descendants, the tip changeset is a head and must be the only head if you want to push your changesets to the OpenJDK repositories (we do not allow any multiple head pushes with the OpenJDK repositories). This unfortunately means that people that do frequent "syncs" with their parent repository may be creating many Merge changesets, that's just the way it is, like Taxes, we will need to learn to live with it.
The 'hg view' command of Mercurial can provide some insight into this Merge business. To use 'hg view' you need to:
- Enable the hgk extension in your ~/.hgrc file.
- Make sure that the hgk tool in available from your PATH environment variable setting. You may need to download the Mercurial source bundle that matches the version of Mercurial you are using and get the hgk file from the contrib directory.
- Make sure the wish tool is available from your PATH environment variable setting. Note that Solaris Express has a /usr/bin/wish that works, and the MacOS 10.5 has a /usr/bin/wish that works, but you may need to do a little searching to find a wish that is acceptable to hgk. Solaris 10 and older machines may have one at /opt/sfw/bin/wishx or /usr/sfw/bin/wisk8.
For example, to see what the most recent changesets pushed to the OpenJDK jdk7/jdk repository look like:
hg clone http://hg.openjdk.java.net/jdk7/jdk yourjdk cd yourjdk hg view
You should then see something like this:
Looks a little like a Public Transportation System. Notice the groups of changesets created by developers, usually generated one right after the other. If two developers manage to line up (luck), the sequence is simple, but the second one to do a push had to do a pull and create a merge changeset. Layering on top of that is the integrations of the various teams to the master repository, which should appear as major addition to the graph.
Since a changeset is a repository revision this has tremendous benefits. For example, anyone can re-create the state of a repository (all the files) as of any changeset by simply doing:
hg clone -r 82c85cfd8402 yourjdk trimmedjdk
Creating a separate repository that represents the state of everything as of that specific changeset id (which happens to be a changeset I created, specifically http://hg.openjdk.java.net/jdk7/jdk7/jdk/rev/82c85cfd8402).
I hope this has been helpful to at least a few people. Send me comments if I can clarify this more for people.
For more on the hgk extension go to http://www.selenic.com/mercurial/wiki/index.cgi/HgkExtension. Thanks to Chris Mason for creating this great extension.