SCM Migration: The Big Picture
By mkupfer on Feb 15, 2008
When Steve Lau left Sun at the end of last September, I became the go-to guy inside Sun for the migration to Mercurial. I had thought that I had a good high-level grasp of the project. But after getting blindsided a couple times by dependencies I hadn't considered, I drew up a diagram to help me get oriented, identify stakeholders, and maybe anticipate future issues.
Here's a slightly simplified version of the original diagram from the whiteboard in my office:
Blue parallelograms indicate repositories, tan boxes are software modules, solid lines indicate data flow, and dashed lines tie users with the modules that they're using. The three red-rimmed boxes (gk tools, gate hooks, and onbld tools) are where most of the development effort is going.
The primary simplifications in this diagram are
- the data flow from the project gate actually goes through the SCM front-end before going through the gate hooks.
- I've omitted the consolidation's clone workspace (a nightly snapshot of the gate)
- I've omitted the bridge between the current ON workspace in TeamWare and the Mercurial repository that is shadowing it
Even so, this is a moderately busy diagram. There are several components to keep track of and make sure they all fit together.
Most of the work so far has been in the area of the ON build (onbld) tools, pieces of which are used by other consolidations and by the Solaris Companion project. Many of the changes are related to making the tools work with Mercurial as well as with TeamWare/SCCS. We've also had to consider the implications of moving everything outside the Sun firewall, which has meant rethinking interfaces to things like the bug database and our RTI (Request To Integrate) system.
We haven't done as much work on the gatekeeper (gk) tools, although we've started to think about design issues. Many of the design decisions boil down to this question: do we make the minimal set of changes needed to work with Mercurial, or do we make more extensive changes so that the tools can make better use of the features provided by Mercurial? In some cases we are staying with the current approach. For example, we are using separate repositories for build snapshots, rather than using branches and tags in the main gate repository. In other cases we will be changing the tools to use Mercurial features. For example, any automated post-putback processing will be driven directly by Mercurial hooks, rather than the email-based hook system that is needed with TeamWare.
Another set of interesting design decisions has centered around the use of gate hooks to enforce various style and bookkeeping rules. With the current TeamWare setup, we enforce these rules after a putback (at least for ON). The putback triggers various checks, and if your putback violates a rule, you get notified of the problem and given a short window to fix it or your putback is reverted. The gate is normally configured so that anyone (inside Sun) can putback.
While this approach worked when Solaris was closed source, we expect it not to scale for OpenSolaris, where the repository is accessible from anywhere on the Internet and both Sun employees and non-employees can have commit rights. Certain Mercurial hooks can abort a putback ("push" in Mercurial terms), so we could move all the post-putback checks to pre-transaction checks. But moving more checks means more work (e.g., testing), which means a longer time before we can move to Mercurial. So the question becomes which checks really need to happen before putback, and which ones can happen after putback. The check to ensure that a putback has an approved RTI probably needs to happen prior to the putback. The check for adherence to the C style rules can happen after the putback, at least for now.
The opensolaris.org webapp has various bits of functionality for source code management. A project leader or gatekeeper can use the webapp to create, destroy, and lock repositories, as well as to manage commit rights for the project's repositories. Unfortunately, the current set of operations is limited. For example, a gatekeeper might want to lock a repository for most users, but allow access for a specific large project. Alas, this lock granularity is not currently supported. Furthermore, all the controls are currently through a web-based interface, with no scripting hooks. Although there is currently work to improve the webapp and make it easier to change, this work is unlikely to be finished in time for us to make any changes that we expect gatekeepers to want. So we will need to think about other ways to provide the needed functionality, such as giving gatekeepers shell access to the server that hosts the repositories.
The SCM front-end gives a user access to repositories by creating a chroot environment which contains only the repositories that the user has commit privileges for. (Access to other repositories is done via the "anon" user.) If the user reports being unable to pull from, or push to, a repository, the problem could be with the SCM program itself, the SCM front-end, or some other general system service. This diagnosis typically requires shell access to the servers.
We are using Nagios to monitor the health of the servers and services on opensolaris.org. We have written a couple simple Nagios plugins to monitor the Mercurial and Subversion services. As we gain experience with the system, we could update the probes to check for specific failure scenarios.
OpenGrok makes it into this diagram because it makes a private snapshot of each repository that it indexes, so as to provide a consistent view of the tree. We once managed to break the OpenGrok indexing of ON by trying to undo (rollback) a particular putback, so that it would vanish completely from the repository. We didn't know to roll back OpenGrok's snapshot repository as well. So the next time OpenGrok tried to pull from the Mercurial onnv-gate, it created a branch that had to be merged. This was not something OpenGrok was prepared for, so the snapshot tree was not updated. After several days, we started getting complaints from ON teams who couldn't find their recent putbacks in OpenGrok. We figured out the problem, replaced OpenGrok's snapshot repositories, and vowed not to undo/rollback any future putbacks.
So that's the "big picture" of what the SCM Migration project is working on. If you've been frustrated by how long things are taking, well, we're not happy about it, either. Our hope is that by keeping the entire picture in mind, we will not have any serious problems when we finally do move.