"You're a...Gatekeeper? Uh huh. What's a Gatekeeper?"
By Brad Wetmore on Feb 11, 2008
(You might want to read Kelly O'Hair's "OpenJDK Mercurial Wheel" blog entry before reading this.)
Besides my normal job as a developer in the Java Security and Networking (JSN) and the Java Tools/Libraries (TL) groups, I have been tasked from time to time as the "Gatekeeper" (also known as an "Integrator") for the JSN group. Some of you have asked on the IRC channel #openjdk, "What's a Gatekeeper?" Good question. Ask any of the N gatekeepers, and you'll get N different answers.
Since I'm a musician by night, I had to distill it down to a song that's been running through my head this morning (with apologies to Donny and Marie Osmond, as I'm an unfortunate product of the 1970's):
I'm a little bit Developer,
And I'm a little bit Release Engineer.
I've got a little bit of SHA-1 and Blowfish,
With a whole lot of Makefiles in my soul.
Don't know if it's good or bad,
but I know I love it so...
Ok, scratch those last two lines. Gatekeeper is an under-recognized, thankless, but absolutely necessary (IMHO) job.
A quick bit of terminology:
MASTER repository: The master workspace from which products are eventually built. All changes eventually filter into this set of repositories. Also known as "The Golden Source", "Top Level Workspace", or more concretely:
Group repository: The repository that individual developers clone and where they eventually submit their work for inclusion in the MASTER repository. Developers generally work in the group repository assigned to their functional group. (e.g. JSN, TL, 2D, etc). In my case:
Repository Updates: The process by which changes are merged and placed into a shared repository. This is performed by a developer in the case of Dev/Group repositories, or by a special person called a Gatekeeper(below) who merges the Group/MASTER repositories. There are two types of updates:
- Rebase: The process of bringing down code from higher-level repositories to local repositories (e.g. MASTER to JSN, or JSN to DEV). Rebasing is done fairly frequently (I try to do it daily).
- Integration: The process of pushing up changes from a repository to a higher-level repository (e.g. DEV to JSN, or JSN to MASTER). The gatekeeper is sometimes also known as an "Integrator" as they will be "integrating" changes into the MASTER. Gatekeeper integrations are generally done every two weeks or as needed.
So, what is a Gatekeeper? At a high level, we're the technical liaisons between specific development groups and Release Engineering (RE). It's our job to take changes from developers via the group repositories, merge them with the MASTER repositories, then build/test those changes. If all looks good, move the changes into the appropriate repositories. If all doesn't look good, I put on my archaeologist hat, go figure out what broke, and why. Hearing from your gatekeeper is not the way to start your day.
Why have a gatekeeper? Why not just integrate directly into the MASTER? A couple of reasons:
Less breakage: Assume for a second that developers could integrate their code directly to the MASTER repositories. If new code breaks the build, EVERYONE is affected, not just a smaller subgroup.
Let changes bake with your technical peers before releasing to the world. Each gatekeeper is responsible for a specific functional area. (S)he can run tests specifics to that area, and find problems before they ever reach the MASTER. As brilliant as our RE organization is, they are not going to want to investigate why obj.toString() is now throwing a NoSuchMethodError.
Quiescent Source Base during Integrations: Changsets need to fit well with previous changesets. In a project this large, the build/test cycle can take several hours. Thus, the source base needs to be unchanging during this period so Gatekeepers can build/test/integration with the current bits. If First-Come-First-Served integrations were allowed, your careful build/test cycle could be invalidated by someone changing a single line somewhere. You could hope/pray that your changes are compatible, but that's far too risky. Thus Gatekeepers are assigned specific time slots (currently 12 hours), and are guaranteed only they have write-access to the MASTER repositories during that time. This hierarchy model has proved the most expedient for the large numbers of changes that happen in this project.
So what do I actually do? Again, each gatekeeper will give different answers, but basically, here's what the job involves:
Provide a stable gate for my developers (also affectionately known as "gatelings").
Merge and build the repositories nightly on as many platforms as possible (solaris-sparc/solaris-sparcv9, solaris-i586/solaris-amd64, linux-i586/linux-amd64, and windows-i586). My builds include most repositories (jdk, deploy, langtools, jax\*, etc.). I generally don't build docs/install/etc.
Most gatelings generally only need build the 1-2 repositories they are directly modifying, but sometimes their changes will incompatibly affect other workspaces. Gatekeepers are the last line of defense before that code hits the MASTER, so I need to assure myself that your changes won't break the rest of the product. Developers should be doing this themselves, but it's a sad fact that not everyone is as responsible when it comes to build/testing their code.
Test nightly. Depending on the functional group, there are several test suites available. In my case, I have the developer unit/regression tests (in the test subdirectory of each repository), the JCK, and the internal Sun Software Quality Engineering (SQE) tests.
About a week before an integration, provide builds to the SQE teams. They have a procedure called "Pre-Integration Testing" (aka PIT), which is a much more involved testing process than the gatekeepers could run nightly.. Gatekeepers normally build/test on specific reference OS (e.g. Microsoft Windows Server 2000), but the SQE teams will test on many of the other available platforms (e.g. Windows XP, Windows Vista, etc.). If all goes well, they will issue a test report called a "PIT Certificate," which is their blessing that the expected changes don't appear to break anything.
Copy changesets from one set of repositories to the other. (Rebase/Integration)
Update bug status to reflect the changesets just put into the MASTER.
Breathe a sigh of relief, I'm done! (For this week anyways...). Oh, and...
Pray integrator with the next slot does not call you at home in 6 hours, especially if you have the afternoon/evening integration.
In theory, the MASTER should never be broken, we work pretty hard to make sure that doesn't happen, and it's pretty rare. The group repositories...well, let's just say it does happen on occasion. Don't make me come find you because your integration broke the build, neither of us will have a good day. ;)
That's pretty much it. Gatekeeper officially takes about 50% of time. (I always crack myself up when I say that!)
Only true gearheads/propellerheads need continue.
What's that you say, you want even more specifics? (curious little bugger, aren't you! ;) )
Why have yet another set of repositories for the build/test/integrate phase? Simple, the process can be done in a disposable area (the
blue 'JSN Gatekeeper' box above). In case of problems like a bad merge or other breakage, we can simply blow away these repositories and start over.
Ok...how about some pseudo code to make this a little more concrete? Understand I'm doing a lot of handwaving here, otherwise I'll be here all night simplifying the scripts I currently use. (Each gatekeeper usually has their own set of scripts, because each gates tends to have different requirements.)
If I'm doing an integration build to the MASTER, wait until the gate has been released by the previous gatekeeper. Pray the previous gatekeeper didn't introduce any problems.
% hg fclone JSN ws
% hg pull MASTER ws
% cd ws
% hg fmerge/fupdate
% hg foutgoing JSN
% hg foutgoing MASTER
% webrev JSN MASTER
% gnumake long_list_of_options all # on all platforms
% cd jdk/test; jtreg -testjdk:path_to_built_jdk long_list_of_testdirs # on all plaforms
% cd JCK6a; javatest -cp:path_to_JCK long_list_of_tests # on all platforms
% kickOffInternalSQETests.sh # on all platforms
I generally build both the product and OpenJDK variants. At this point, examine all the build logs. Everything needs to be clean. If not, find out why.
If we're creating PIT builds, send these bits to the SQE teams for testing. Wait for your PIT Certificate, then repeat the steps above when ready to integrate.
If I'm rebasing (roughly nightly):
% hg foutgoing JSN # Make sure you're putting the expected bits back.
% hg fpush JSN
If everything looks good and it's integration day:
% hg foutgoing MASTER # Make sure you're putting the expected bits back.
% hg fpush MASTER
% mail -s "JSN Integration complete" firstname.lastname@example.org
There are several variations on the theme, but those are the main steps.
I hope that gives you a little more background about what it is we do. There's a lot of gatekeeper lore I could regale you with, such as the noose made out of a power cable reserved for folks guilty of committing really heinous acts of carelessness, but I'll save that for another post.