"You're a...Gatekeeper? Uh huh. What's a Gatekeeper?"

(You might want to read Kelly O'Hair's "OpenJDK Mercurial Wheel" blog entry before reading this.)

Besides my normal job as a developer in the Java Security and Networking (JSN) and  the Java Tools/Libraries (TL) groups, I have been tasked from time to time as the "Gatekeeper" (also known as an "Integrator") for the JSN group.  Some of you have asked on the IRC channel #openjdk, "What's a Gatekeeper?"  Good question.  Ask any of the N gatekeepers, and you'll get N different answers.

Since I'm a musician by night, I had to distill it down to a song that's been running through my head this morning (with apologies to Donny and Marie Osmond, as I'm an unfortunate product of the 1970's):

I'm a little bit Developer,
And I'm a little bit Release Engineer.
I've got a little bit of SHA-1 and Blowfish,
With a whole lot of Makefiles in my soul.
Don't know if it's good or bad,
but I know I love it so...

Ok, scratch those last two lines.  Gatekeeper is an under-recognized, thankless, but absolutely necessary (IMHO) job.

A quick bit of terminology:

  • MASTER repository:  The master workspace from which products are eventually built.  All changes eventually filter into this set of repositories.  Also known as "The Golden Source", "Top Level Workspace", or more concretely:

    http://hg.openjdk.java.net/jdk7/jdk7

  • Group repository:  The repository that individual developers clone and where they eventually submit their work for inclusion in the MASTER repository.  Developers generally work in the group repository assigned to their functional group.  (e.g. JSN, TL, 2D, etc).  In my case:

  • http://hg.openjdk.java.net/jdk7/jsn

  • Repository Updates:  The process by which changes are merged and placed into a shared repository.  This is performed by a developer in the case of Dev/Group repositories, or by a special person called a Gatekeeper(below) who merges the Group/MASTER repositories.  There are two types of updates:

  • Rebase:  The process of bringing down code from higher-level repositories to local repositories (e.g. MASTER to JSN, or JSN to DEV).  Rebasing is done fairly frequently (I try to do it daily).
  • Integration:  The process of pushing up changes from a repository to a higher-level repository (e.g. DEV to JSN, or JSN to MASTER).  The gatekeeper is sometimes also known as an "Integrator" as they will be "integrating" changes into the MASTER.  Gatekeeper integrations are generally done every two weeks or as needed.

So, what is a Gatekeeper?  At a high level, we're the technical liaisons between specific development groups and Release Engineering (RE).  It's our job to take changes from developers via the group repositories, merge them with the MASTER repositories, then build/test those changes.  If all looks good, move the changes into the appropriate repositories.   If all doesn't look good, I put on my archaeologist hat, go figure out what broke, and why.  Hearing from your gatekeeper is not the way to start your day.


Image of Gate layout 

 

Why have a gatekeeper?  Why not just integrate directly into the MASTER?  A couple of reasons:

  1. Less breakage:  Assume for a second that developers could integrate their code directly to the MASTER repositories.  If new code breaks the build, EVERYONE is affected, not just a smaller subgroup.

  2. Let changes bake with your technical peers before releasing to the world.  Each gatekeeper is responsible for a specific functional area.  (S)he can run tests specifics to that area, and find problems before they ever reach the MASTER.  As brilliant as our RE organization is, they are not going to want to investigate why obj.toString() is now throwing a NoSuchMethodError.

  3. Quiescent Source Base during Integrations:  Changsets need to fit well with previous changesets.  In a project this large, the build/test cycle can take several hours.  Thus, the source base needs to be unchanging during this period so Gatekeepers can build/test/integration with the current bits.  If First-Come-First-Served integrations were allowed, your careful build/test cycle could be invalidated by someone changing a single line somewhere.  You could hope/pray that your changes are compatible, but that's far too risky. Thus Gatekeepers are assigned specific time slots (currently 12 hours), and are guaranteed only they have write-access to the MASTER repositories during that time. This hierarchy model has proved the most expedient for the large numbers of changes that happen in this project.

So what do I actually do?  Again, each gatekeeper will give different answers, but basically, here's what the job involves:

  1. Provide a stable gate for my developers (also affectionately known as "gatelings").

  2. Merge and build the repositories nightly on as many platforms as possible (solaris-sparc/solaris-sparcv9, solaris-i586/solaris-amd64, linux-i586/linux-amd64, and windows-i586).  My builds include most repositories (jdk, deploy, langtools, jax\*, etc.).  I generally don't build docs/install/etc.

    Most gatelings generally only need build the 1-2 repositories they are directly modifying, but sometimes their changes will incompatibly affect other workspaces.  Gatekeepers are the last line of defense before that code hits the MASTER, so I need to assure myself that your changes won't break the rest of the product.  Developers should be doing this themselves, but it's a sad fact that not everyone is as responsible when it comes to build/testing their code.

  3. Test nightly.  Depending on the functional group, there are several test suites available.  In my case, I have the developer unit/regression tests (in the test subdirectory of each repository), the JCK, and the internal Sun Software Quality Engineering  (SQE) tests.

  4. About a week before an integration, provide builds to the SQE teams.  They have a procedure called "Pre-Integration Testing" (aka PIT), which is a much more involved testing process than the gatekeepers could run nightly..  Gatekeepers normally build/test on specific reference OS (e.g. Microsoft Windows Server 2000), but the SQE teams will test on many of the other available platforms (e.g. Windows XP, Windows Vista, etc.).  If all goes well, they will issue a test report called a "PIT Certificate," which is their blessing that the expected changes don't appear to break anything.

  5. Copy changesets from one set of repositories to the other.  (Rebase/Integration)

  6. Update bug status to reflect the changesets just put into the MASTER.

  7. Breathe a sigh of relief, I'm done!  (For this week anyways...).   Oh, and...

    Pray integrator with the next slot does not call you at home in 6 hours, especially if you have the afternoon/evening integration.

In theory, the MASTER should never be broken, we work pretty hard to make sure that doesn't happen, and it's pretty rare.  The group repositories...well, let's just say it does happen on occasion.  Don't make me come find you because your integration broke the build, neither of us will have a good day.  ;)

That's pretty much it.  Gatekeeper officially takes about 50% of time.  (I always crack myself up when I say that!)

Only true gearheads/propellerheads need continue.


What's that you say, you want even more specifics?  (curious little bugger, aren't you!  ;) )

Why have yet another set of repositories for the build/test/integrate phase?  Simple, the process can be done in a disposable area (the blue 'JSN Gatekeeper' box above).  In case of problems like a bad merge or other breakage, we can simply blow away these repositories and start over.

Ok...how about some pseudo code to make this a little more concrete?  Understand I'm doing a lot of handwaving here, otherwise I'll be here all night simplifying the scripts I currently use.  (Each gatekeeper usually has their own set of scripts, because each gates tends to have different requirements.)

If I'm doing an integration build to the MASTER, wait until the gate has been released by the previous gatekeeper.  Pray the previous gatekeeper didn't introduce any problems.

% hg fclone JSN ws
% hg pull MASTER ws
% cd ws
% hg fmerge/fupdate
% hg foutgoing JSN
% hg foutgoing MASTER
% webrev JSN MASTER

% gnumake long_list_of_options all                                     # on all platforms
% cd jdk/test; jtreg -testjdk:path_to_built_jdk long_list_of_testdirs  # on all plaforms
% cd JCK6a; javatest -cp:path_to_JCK long_list_of_tests                # on all platforms
% kickOffInternalSQETests.sh # on all platforms

I generally build both the product and OpenJDK variants.  At this point, examine all the build logs.  Everything needs to be clean.  If not, find out why.

If we're creating PIT builds, send these bits to the SQE teams for testing.  Wait for your PIT Certificate, then repeat the steps above when ready to integrate.

If I'm rebasing (roughly nightly):  

% hg foutgoing JSN               # Make sure you're putting the expected bits back.
% hg fpush JSN

If everything looks good and it's integration day:

% hg foutgoing MASTER            # Make sure you're putting the expected bits back.
% hg fpush MASTER
% mail -s "JSN Integration complete" jdk-gk@openjdk.java.net

There are several variations on the theme, but those are the main steps.

I hope that gives you a little more background about what it is we do.  There's a lot of gatekeeper lore I could regale you with, such as the noose made out of a power cable reserved for folks guilty of committing really heinous acts of carelessness, but I'll save that for another post.

Comments:

Hi Brad,

thank you very much for this interesting blog. Unfortunately, there is still one thing I don't understand: where does the "gate"-repository (i.e. http://hg.openjdk.java.net/jdk7/jsn-gate in this case) enters the game in your example.

If I understand your blog, Kelly's blogs and the OpenJDK documentation right, the gate repositories are "read/write" while the group repositories are "read only". So the developers should only push to the gates and pull from the usual group repositories. After some testing, the gate repository pushes submitted changes forward into the corresponding group repository (at least that's the way how it is explained in "Producing a Changeset" at http://www.openjdk.org/guide/producingChangeset.html).

The "Producing a Changeset" document mentions that before a push to a get, the corresponding gate should be pulled and merge, in order to avoid build problems in the gate. This is in contradiction to the information Kelly gives in his "Mercurial OpenJDK Questions" blog at http://blogs.sun.com/kto/entry/mercurial_openjdk_questions, where he writes: "..you cannot pull from a gate..".

I now have the following questions:

1. How is the gate rebiased: does it pull from its corresponding group repositors before pushing new changes?
2. If a group repository is pushed into the master, does this also happen through a gate repository. Is there a pull and merge from the master first in order to rebias? Or does this reabiasing happen from the master-gate?
3. If I understood you right, you rebias your group repositry on a daily base. But how is the groups gate repoisotry (e.g. jsn-gate in this example) rebiased?

I would be really happy, if you could answer some of my questions.

Thank you and best regards,
Volker

Posted by Volker H. Simonis on April 22, 2009 at 02:49 AM PDT #

It's actually fairly simple, and unfortunately the term "gate" does get overloaded. In the older Teamware world, there was just a single physical workspace, which we simply called a "gate". The term has stuck when writing about our newer processes.

To restate, each OpenJDK work area (TL/JSN/MASTER) is made up of several mercurial repositories (aka a forest of repositories, including hotspot/jdk/etc.). For each work area, there are actually two complete sets of these forests which almost always exact copies of each other: <name> and <name>-gate. The only time they're not the same is when we're in the process of doing a push.

When setting up our OpenJDK mercurial world, we felt it a good idea to impose sanity checks (jcheck) on the incoming changesets. To avoid the impact of unsuccessful pushes on people just trying to read a valid forest, a secondary forest is used that only accepts pushes and does those checks, validating the changesets before they actually move to the read-only forest. Only good changesets are pushed forward into the read-only forest. If a changeset is bad, it is throttled with no impact to the read-only forest.

Like ordinary developers, we gatekeepers pull from <name>, do our work, then push back into <name>-gate. As the push is underway, the sanity checks are done at the <name>-gate: checking that the incoming changesets aren't on a blacklist, no duplicates of bugids, the users are in the contributors database, etc. If all checks pass, then the changesets are immediately pushed from <name>-gate to <name>.

So in my example above, what I was actually doing was:

rebase: pushes changesets from MASTER to JSN.
% hg fpush <JSN>-gate

integration: pushes changesets from JSN to MASTER.
% hg fpush <MASTER>-gate

>3. If I understood you right, you rebias
>your group repositry on a daily base.

Mostly daily. Could be every so often depending on how much traffic there was.

> But
>how is the groups gate repoisotry (e.g.
>jsn-gate in this example) rebiased?

Hopefully now answered above.

Posted by Brad Wetmore on April 22, 2009 at 03:59 AM PDT #

Hi Brad,

Please could you help me by telling me what concepts/languages I need to learn in order to be a Java gatekeeper? ( I know, Java is a must, but does the job require really good programming skills?)

Thanks,
S

Posted by S on October 31, 2010 at 12:51 PM PDT #

S,

Some of the concepts/languages that I would suggest:

Be very detail and process oriented (both define/follow process)
Strong Java and Shell scripting (knowing a little about a lot of areas)
Fast Debug/Isolation skills (you don't have to fix things, but at least be able to engage the right people)

Posted by Bradford Wetmore on November 01, 2010 at 05:31 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Brad currently works in the Java Security and Network Group, Java Standard Edition.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today