Wednesday Dec 12, 2007

Mercurial OpenJDK Questions

Just some misc Q&A on Mercurial and the decisions made in transitioning OpenJDK to it. Hopefully this makes sense to people.

The OpenJDK repositories are at

Why Mercurial? See the analysis done by the Solaris team for details as to why Mercurial. From what I have read and seen, the two choices for a fast good DSCM (Distributed SCM) was Mercurial or Git (both seem like excellent systems). Using the same SCM as Solaris made a great deal of sense. So far, the remote people in places (where the internet connections are slow) love Mercurial. After an initial clone is created, the push/pull actions are very fast, since only the actual changes or diffs are effectively going down the wire (so to speak). Clones onto local disk and multiple clones on the same disk are very very fast and efficient on Solaris and Linux, and Mercurial works on Windows too. The ability to pull changes via http and push changes via ssh is a major advantage too. And of course, it's "open source". ;\^)
Why no imported history from TeamWare or the posted OpenJDK SubVersion repositories? Legally we had constraints on what history we could even try to provide. Technically, importing history from one SCM to another is not as easy as it sounds. The ideal history is the history of the changesets actually created by the developers, and creating any other kind of history (per snapshot or per promotion) seemed problematic to me. If the end result was that the old history details were left in the old SCM data, and that was guaranteed to be accurate history data, then it seemed right to leave it there and not confuse the issue. I'm sure there are people that would disagree with me on this. TeamWare doesn't have a changeset model, but a loose group of files putback model that is not very complete to create changesets like Mercurial wants. History conversion from something like CVS, SubVersion, or a more changeset-like SCM would be another story.
Why a forest and not one big repository? Unlike Solaris, we already had broken up the JDK sources into separate workspaces e.g. hotspot and j2se originally, later we split j2se into jdk, langtools, corba, jaxp, and jaxws. Each of these separate repositories represent piles of sources managed by separate teams or delivered independently from the enclosing jdk product. Having them available as independently buildable repositories was considered a major advantage, and more importantly, some of the developer teams really wanted to keep themselves separate. It's that 'software silo' concept where developers don't want anything changing on them except what's in their immediate area. ;\^) From a Release Engineering point of view, or someone that has to build the whole thing, this forest of repositories is a royal pain, a single repository would have been preferred. But the Mercurial forest extension seemed to provide an answer, so we went with the forest. Could the forest extension and Mercurial itself handle these nested repositories better, I'm sure it could, and will over time. I would not recommend creating separate repositories lightly, they need to be independently buildable and relatively dependence free from the other repositories.
Why the dual repository setup (gates)? When someone does a push to a Mercurial repository, and before the hooks are run that might rollback those changesets, there is a window of time where someone pulling from that repository could get these rolled back changesets. In addition, we have seen situations where people have managed to 'push -f' multiple heads (forgot to merge) to a shared repository, creating very confused co-workers and potentially a chain of silly merge changesets. So we only have people push to a gate (you cannot pull from a gate), where the changesets are checked and verified, and a simple push happens to the matching pull gate. Nobody can ever pull multiple heads, and nobody gets changesets that failed validation. So having the dual setup will allow us to better protect the integrity of the repository.
Why so many different sets of repositories? By having each team keep their changes isolated until they integrate into a master area, they will be more likely to find their own mistakes and regressions before everyone else. Testing is focused on the areas that have changed and can validate the entire team's contribution to the product. In addition, builds of these various team areas (if archived) can be used to isolate regressions, quickly narrowing it down to the team, and then the changesets can be used to isolate it down to the individual author, hopefully being able to quickly isolate and correct the regression.
Why not subsets of repositories for the teams? Good question. Initially we thought we could make the team areas sparse sets, but the complete jdk is a forest or set of repositories, and a subset doesn't provide the teams with a complete set of jdk sources. So to make our (the initial Mercurial Transition engineers) lives easier each team got a complete set. Having a complete set of repositories provides the teams the possibility of building a complete jdk with a known quantity. This situation may change as we gain more experience with Mercurial and forests.

I'll add more to this as time goes on, assuming people find it useful. Add your questions to the comments, I'll try and answer them.


Friday Nov 16, 2007

Transitioning from TeamWare to Mercurial

Just thought I would pull together some basic guidelines for anyone transitioning from TeamWare workspaces to Mercurial repositories.

I'm assuming that multiple branches will not be maintained in the Mercurial repositories in this information. Mercurial will allow you to maintain multiple branches of development in a single repository. It's my opinion that multiple branches of development can be done safer and more reliably by just have a separate clone for the separate branches. Anyone experiencing this feature in SCCS files where separate revision trees can be maintained on a file will probably agree.

TeamWare Basics

Skip this section if you are a regular TeamWare user.

A TeamWare workspace consists of a directory of files under SCCS control, each file is managed individually. Throughout the TeamWare workspace are directories called SCCS contain s.filename files which contain the original file as it was first entered into the SCCS directory, plus deltas to convert that original file to the various increasing numeric revisions of the file. A read-only file is kept in the parent directory of the SCCS directory, edits to the file requires you to use the 'sccs edit filename' command. A 'sccs delget filename' command is used to define a new revision of the file. Each revision of a file can contain a comment. SCCS manages source files and revisions to source files. TeamWare manages batches of SCCS files. TeamWare features and SCCS features get blurred sometimes, but SCCS can and is often used independent from TeamWare.

TeamWare allows for any number of workspaces with a child/parent relationship and also has a very nice code merging tool called filemerge. TeamWare allowed you to have partial workspaces. The top of the workspace contains a Codemgr_wsdata directory that holds various TeamWare book-keeping files. It's drawbacks besides not being open source are around performance and the lack of features like changesets and revision markings (tags). Over the years the short-comings have been somewhat corrected with various scripts and tools written by various teams.

Mercurial For TeamWare Users

Mercurial (like TeamWare) allows you to have any number of repositories (assume repository==workspace) and allows you to access repositories via NFS paths or with ssh:// or even http:// paths.

In many ways, at a high functional level, your Mercurial experience will be similar to the experiences you have had with TeamWare, but the details are vastly different, especially if you have become dependent on the specific format of a TeamWare workspace or the contents of the SCCS files.

At the very top of the Mercurial repository is a hidden directory called .hg which holds the Mercurial book-keeping files plus a secure set of all "commited" sources and changes to those sources.

Unlike TeamWare, where the visible source files were read-only until you explicitly used 'sccs edit' to explicitly edit them, the Mercurial "working set" sources are all read-write, and you are free to edit these files at any time. So by default you will have a working set of read-write sources and the more permanent committed files that are saved in your .hg directory.

With Mercurial, all changes to a repository are done with via "changesets", which are originally created with an 'hg commit' somewhere along the line. An ideal changeset would be all the file changes/renames/deletes/adds for one particular bug, but a changeset can be small or very large. New files, deleted files, and renamed files must all be done via a changeset. You use 'hg commit' to commit file changes into a "changeset" in your own repository, it doesn't go anywhere unless someone pulls it from your repository, or you push the changeset somewhere. The 'hg pull' is like the TeamWare bringover command, and 'hg push' is like the TeamWare putback command, well sort of. Both the 'hg push' and 'hg pull' push or pull "changesets" or committed changes to and from the .hg directories of a repositories. So your working set files are NOT automatically updated when the files in the .hg directory changes (where changesets are kept), you must explicitly run 'hg update' to update your working set files. And it's important to note that with Mercurial you do not "pull or push files" but the changesets or changes to the entire repository. This is very different from TeamWare which manages SCCS files, where you could bringover or putback individual files.

The changeset concept is like a repository wide SCCS revision number, one changeset id defines the state of the entire repository. A changeset that has no children changesets is called a "head", and there should only be one head, which is also called the tip. But when you do a pull, you often end up with multiple "head" changesets, and the gola is to perform an 'hg merge' and 'hg commit' a new "merge" changeset that will become the single "head" or "tip". Regardless of any specific file changes that might be conflicting, a merge changeset will always be needed to get back to one "head".

Roughly Equivalent Command Mappings

NOTE: Optimally, the use of 'hg commit' should be done after all the file adds, deletes, renames, and edits are done. An ideal changeset is one that contains all the changes for a particular feature or bug fix.

Action TeamWare Mercurial
Create new workspace/repository workspace create hg init
Create a child workspace/repository bringover -p parent -w child . hg clone parent child
Add a file sccs create filename hg add filename && hg commit
Delete a file workspace filerm filename hg remove filename && hg commit
Rename a file workspace filemv filename1 filename2 hg rename filename1 filename2 && hg commit
Change a file sccs edit filename && vi filename && sccs delget filename vi filename && hg commit
Rename a workspace/repository workspace move oldpath newpath mv oldpath newpath
Delete a workspace/repository workspace delete -f path rm -f -r path
Verify workspace/repository workspace check hg verify
Pull changes from another workspace/repository bringover -p parentpath . && resolve hg pull parentpath && hg update && hg merge && hg commit
Push changes to another workspace/repository putback -p parentpath . hg push parentpath
Get the name of the workspace/repository workspace name hg root
Get the name of the default parent workspace/repository workspace parent hg paths
Resolve merge conflicts resolve hg merge
Freshen the working set files sccs get filename hg update filename
Check for incoming changes bringover -n -p parentpath . hg incoming parentpath
Check on outgoing changes putback -n -p parentpath . hg outgoing parentpath
Details on changes to a single file sccs prs filename || sccs prt filename hg log -v filename && hg annotate filename
Viewing file changes sccs diff filename hg diff filename
Undo file changes sccs unedit filename hg revert filename
Listing uncommited edited files workspace find -c -OR- sccs tell hg status
Listing all managed files workspace updatenames && cat Codemgr_wsdata/nametable | sort | cut -d' ' -f1 hg locate
List a source file annotated with the revisions Various custom scripts hg annotate filename

Using Mercurial Example

Here is a simple example of pulling a repository and making a changeset.

In this example /export2/build_integration/repos/control is a path to a amsll Mercurial repository.

    # # Step 1: Check hg version # Make sure you have access to the right Mercurial. <1> rm -f -r ${HOME}/MercurialExercises/Exercise1 <2> mkdir -p ${HOME}/MercurialExercises/Exercise1/temp <3> hg version Mercurial Distributed SCM (version 0.9.3) Copyright (C) 2005, 2006 Matt Mackall This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. # # Step 2: Clone a test control repository # Creates your own private repository to play with (may take a few minutes) <9> rm -f -r ${HOME}/MercurialExercises/Exercise1/your_repos <10> mkdir -p ${HOME}/MercurialExercises/Exercise1/your_repos <11> cd ${HOME}/MercurialExercises/Exercise1/your_repos <12> hg clone /export2/build_integration/repos/control control requesting all changes adding changesets adding manifests adding file changes added 1 changesets with 24 changes to 24 files 24 files updated, 0 files merged, 0 files removed, 0 files unresolved # # Step 3: Private clone of the above repository for testing push/pull. # Clone a second private repository. <13> cd ${HOME}/MercurialExercises/Exercise1/your_repos <14> hg clone control control-work 24 files updated, 0 files merged, 0 files removed, 0 files unresolved # # Exercise 1: Step 4: Pretend like something changed and do a 'pull' # Pulling changes now will say 'no changes'. <15> cd ${HOME}/MercurialExercises/Exercise1/your_repos/control <16> hg pull pulling from /export2/build_integration/repos/control searching for changes no changes found <17> cd ${HOME}/MercurialExercises/Exercise1/your_repos/control-work <18> hg pull pulling from ${HOME}/MercurialExercises/Exercise1/your_repos/control searching for changes no changes found # # Step 5: Make a change in your work area and push it # Demonstrates a simple file change, commit and push of a change. <19> cd ${HOME}/MercurialExercises/Exercise1/your_repos/control-work <20> echo '#harmless' >> make/Makefile <21> hg status M make/Makefile <22> hg diff diff -r 48e79d6618ee make/Makefile --- a/make/Makefile Sun Sep 30 17:55:14 2007 -0700 +++ b/make/Makefile Thu Oct 04 19:08:49 2007 -0700 @@ -442,3 +442,4 @@ sponsors-bringover: sponsors-freshen .PHONY: all build what clobber insane freshen \\ fastdebug_build debug_build product_build setup +#harmless <23> hg commit -m "9999999: Fixed world peace" <24> hg outgoing searching for changes changeset: 1:0d317da1a3c4 tag: tip user: ${USER} date: Thu Oct 04 19:08:53 2007 -0700 summary: 9999999: Fixed world peace <25> hg push pushing to ${HOME}/MercurialExercises/Exercise1/your_repos/control searching for changes adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files # # Step 6: Inspect the push (watch the working set files) <26> cd ${HOME}/MercurialExercises/Exercise1/your_repos/control <27> tail -1 make/Makefile <28> hg status <29> hg update 1 files updated, 0 files merged, 0 files removed, 0 files unresolved <30> tail -1 make/Makefile #harmless

Using TeamWare and webrev to import a changeset

The tool webrev creates a set of web pages than can be used to browse code changes, but more recent versions create patch files (looks like "diff -r -u") that can be fed into gpatch (GNU patch) or a similar tool to apply the changes.

In this example /export2/build_integration/ws7/control is a path to a TeamWare integration workspace and /export2/build_integration/repos/control is a path to an equivalent Mercurial repository.

    # # Step 1: Bringover a test control workspace
    #   Creates your own private workspace (may take a few minutes)
    <1> rm -f -r ${HOME}/MercurialExercises/Exercise2/your_ws
    <2> mkdir -p ${HOME}/MercurialExercises/Exercise2/your_ws
    <3> cd ${HOME}/MercurialExercises/Exercise2/your_ws
    <4> bringover -q -p /export2/build_integration/ws7/control -w control .
    Parent workspace: /export2/build_integration/ws7/control
    Child workspace:  ${HOME}/MercurialExercises/Exercise2/your_ws/control
    Examined files: 55
    Bringing over contents changes: 55
    Examined files: 55
    Contents Summary:
          55   create
    # # Step 2: Private child workspace of the above workspace to hold changes.
    <5> cd ${HOME}/MercurialExercises/Exercise2/your_ws
    <6> bringover -q -p control -w control-work .
    Parent workspace: ${HOME}/MercurialExercises/Exercise2/your_ws/control
    Child workspace:  ${HOME}/MercurialExercises/Exercise2/your_ws/control-work
    Examined files: 55
    Bringing over contents changes: 55
    Examined files: 55
    Contents Summary:
          55   create
    # # Step 3: Make changes in the child workspace
    <7> cd ${HOME}/MercurialExercises/Exercise2/your_ws/control-work/make
    <8> sccs edit Makefile
    new delta 1.314
    444 lines
    <9> echo '#harmless' >> Makefile
    <10> sccs delget -y'9999999: Fixed world peace' Makefile
    No id keywords (cm7)
    1 inserted
    0 deleted
    444 unchanged
    No id keywords (cm7)
    445 lines
    <11> cd ..
    <12> putback -n .
    Parent workspace: ${HOME}/MercurialExercises/Exercise2/your_ws/control
    Child workspace:  ${HOME}/MercurialExercises/Exercise2/your_ws/control-work
    Examined files: 55
    Would put back contents changes: 1
    update: make/Makefile
    Examined files: 55
    Contents Summary:
           1   update
          54   no action (unchanged)
    No changes were put back
    <13> webrev -l .
       SCM detected: teamware
     File list from: 'putback -n  .' ...  Done.
          Workspace: ${HOME}/MercurialExercises/Exercise2/your_ws/control-work
    Compare against: ${HOME}/MercurialExercises/Exercise2/your_ws/control
          Output to: ${HOME}/MercurialExercises/Exercise2/your_ws/control-work/webrev
       Output Files:
    		 patch cdiffs udiffs sdiffs frames old new
     Generating PDF: Skipped: no output available
         index.html: Done.
    <14> cat webrev/control-work.patch
    --- old/make/Makefile	Thu Oct  4 19:09:19 2007
    +++ new/make/Makefile	Thu Oct  4 19:09:19 2007
    @@ -442,3 +442,4 @@
     .PHONY: all build what clobber insane freshen \\
     	fastdebug_build debug_build product_build setup
    # # Step 4: Clone a test control repository
    #   Creates your own private repository to play with (may take a few minutes)
    <15> rm -f -r ${HOME}/MercurialExercises/Exercise2/your_repos
    <16> mkdir -p ${HOME}/MercurialExercises/Exercise2/your_repos
    <17> cd ${HOME}/MercurialExercises/Exercise2/your_repos
    <18> hg clone /export2/build_integration/repos/control control
    requesting all changes
    adding changesets
    adding manifests
    adding file changes
    added 1 changesets with 24 changes to 24 files
    24 files updated, 0 files merged, 0 files removed, 0 files unresolved
    <19> hg clone control control-work
    24 files updated, 0 files merged, 0 files removed, 0 files unresolved
    # # Step 5: Import the patch into the repository
    <20> cd ${HOME}/MercurialExercises/Exercise2/your_repos/control-work
    <21> gpatch -u -p1 < ${HOME}/MercurialExercises/Exercise2/your_ws/control-work/webrev/control-work.patch
    patching file make/Makefile
    <22> hg status
    M make/Makefile
    <23> hg diff
    diff -r 48e79d6618ee make/Makefile
    --- a/make/Makefile	Sun Sep 30 17:55:14 2007 -0700
    +++ b/make/Makefile	Thu Oct 04 19:09:32 2007 -0700
    @@ -442,3 +442,4 @@ sponsors-bringover: sponsors-freshen
     .PHONY: all build what clobber insane freshen \\
     	fastdebug_build debug_build product_build setup
    <24> hg commit -m "9999999: Fixed world peace"
    <25> hg outgoing
    searching for changes
    changeset:   1:23e0962ced6d
    tag:         tip
    user:        ${USER}
    date:        Thu Oct 04 19:09:36 2007 -0700
    summary:     9999999: Fixed world peace
    <26> hg push
    pushing to ${HOME}/MercurialExercises/Exercise2/your_repos/control
    searching for changes
    adding changesets
    adding manifests
    adding file changes
    added 1 changesets with 1 changes to 1 files

Beginner Gotchas for TeamWare Users

Not setting up your ~/.hgrc file. The name you define in ~/.hgrc with "[ui]" and "username=" is the name that will be permanently recorded in the changesets you create with 'hg commit'. I don't recommend adding your email address in username, but that's up to you, just keep in mind it will be public information when your changesets reach a public repository. TeamWare/SCCS used your system username, but very few TeamWare workspaces were ever made public.
Forgot the 'hg update' After an 'hg pull' (aka bringover), don't forget the 'hg update', or use 'hg pull -u'. The default pull and push just updates the changesets and doesn't update your read-write working set of files. You need to be careful about updating the working set files on shared repositories they could get updated while others are viewing them.
Forgot to merge After an 'hg pull', you need to 'hg update', and if you have changesets that you have not committed you will also need to 'hg merge' and 'hg commit'. If you forget you will end up with multiple heads and a more difficult time merging later.
Forgot to commit after a merge (multiple heads) After 'hg merge' you need to 'hg commit'. The merge just prepares you for the 'hg commit' of a merge changeset. If you forget you will end up with multiple heads and a more difficult time merging later.
Making accidental edits Mercurial working set files are always read-write and ready to edit, no 'sccs edit' action is necessary. Use 'hg status' to monitor what files you have changed.
Using the wrong relative path File paths supplied to 'hg' commands are relative to the current directory, the TeamWare bringover and putback commands want paths relative to the root of the workspace, regardless of the current directory.
Not defining the file .hgignore for 'hg status' The 'hg status' command tells you what outstanding changes you have in your working set, by default it looks in '.' or the entire directory, but if there are files created during a build, you want 'hg status' to ignore those files. Make sure you define the .hgignore file so that 'hg status' will only find files in the directories you want managed by the repository. TeamWare never really helped with the problem of forgetting to 'sccs create' your files, 'hg status' solves this common problem.
Using NFS/UFS for team integration areas TeamWare for the most part was designed around sharing data via NFS or UFS file systems. Mercurial can work the same way, but when using it for team integration areas we recommend the use of the ssh:// parent path mechanisms described in the Mercurial Book . Unless everyone in the team or group is in the same Unix group, have the same default group, and all use 'umask 2', using NFS/UFS will be problematic. Mercurial obeys the strict Unix rules of file creation and permissions, and over time TeamWare has adjusted itself (perhaps improperly) to avoid the file permission issues you can see with Mercurial.
Too quick on the 'hg commit' Once a changeset is created (the 'hg commit'), and pushed, it's pretty permanent. Make sure that before the 'hg commit' happens that the changes are correct, reviewed, the right ones, and complete, otherwise you'll be creating yet another one to correct your mistakes.
Doing a push with outstanding working set changes The 'hg push' will not detect any outstanding changes to your working set, it just pushes the existing changesets. ALWAYS use 'hg status' before an 'hg push' to make sure you have created all your changesets with 'hg commit', unless of course you have changes you don't want to push.
Committing a sensitive file Accidental additions of sensitive source files can be a big problem. Completely removing a sensitive file that has been accidently added to a repository can be a real problem. be very careful what files you add to a repository! Adding non-open source files to an 'open source' repository will inflict major pain on many people.
Doing anything to the .hg files Don't mess with the .hg data files, if you do you are INSANE, leave that to the Mercurial professionals. If you suspect they have been corrupted, use 'hg verify' to check. Backups are always important, so make sure you keep a relatively recent backup repository. If you can't 'hg rollback', save the repository somewhere, clone a fresh copy from your parent, remove the working set files completely from the clone, and copy in the working set from your corrupted repository (but not the .hg files). Now you can use the standard 'hg status' and 'hg diff' to see what file changes you may have lost and adjust.
Using SCCS keywords Mercurial by default does not support anything like SCCS keywords in files. You should remove these or find another solution.
Looking for putback comments or history files Changeset comments represent BOTH the SCCS comments and the effective TeamWare putback comment.
Using problematic filenames Watch out for directory and filenames that only differ in case (e.g. test and Test), at least on the Mac and Windows these can be troublesome. Long pathnames (>255 characters) can also be a problem.

Converting a TeamWare Workspace to a Mercurial Repository

Converting a TeamWare workspace to a Mercurial repository (without history) is pretty trivial:

       bringover -p your_workspace -w /tmp/repo .
       cd /tmp/repo
       workspace parent -u
       rm -f -r Codemgr_wsdata
       rm -f -r deleted_files
       foreach i ( `find . -name SCCS` )
          ( cd $i/.. && sccs edit SCCS )
          rm -f -r $i
       hg init
       hg add

A simple source tree can be turned into a Mercurial repository with just hg init; hg add. Turning a TeamWare workspace into a plain source tree is relatively simple too, I just create a separate workspace, purge a few files, make sure all the sources are in 'edit' mode, and remove the SCCS directories.

Performance Comparisons and Data

Nothing but good news in this area, for both time and space.

Many of the past tricks used to speed up TeamWare bringovers and putbacks, especially over slow connections should not be necessary with Mercurial, it is very fast. The initial 'hg clone' of a repository should be considerably faster, but the most important actions of 'hg pull' or 'hg push' will be so much faster you may question if the action actually happened. Unlike TeamWare, only the changesets are transported, and many fewer files are accessed and in a more efficient manner.

The size of the repositories should also be smaller (at least 50% smaller) than the equivalent TeamWare workspace, this isn't surprising due to the lack of compression and age of SCCS file.

Wednesday Oct 31, 2007

Working in a Mercurial World

So how do you work with a Distributed SCM? There are many answers, the easy answer is that you clone the forest, make the change in your local forest, create the changeset and push the changeset. Well, that works. But maybe you are working on multiple fixes, and you don't want to repeatedly clone over the network (even if it is fast), so here is another model similar to the way many of the Sun developers worked with TeamWare:

The "incoming" forest is effectively just a local clone of the team forest for TL (Tools & Libraries), or whatever team forest you decide your change belongs in. Note that this TL forest may be sparse, it depends on the team as to what portion of the MASTER forest the team forest will have. The fix1-fix3 are also local forest clones where you would might be working on specific fixes or features. Once a fix was finalized, reviewed, tested, and ready to go, you would create the changeset (or changesets) with an 'hg commit' and push the changeset to the outgoing forest. Depending on how long it takes for each fix will determine how often you may need to sync with the TL area via the incoming clone. You can push your outgoing changes in batches or as frequently as you'd want. Before pushing anything to a repository would require a sync with the parent forest of course.

Some people like to sync often, others wait until just before doing the push. One concern with Mercurial is that each sync may create a merge changeset, depending on whether anything is pulled over. So frequent sync's could create many unnecessary merge changesets.

My tendency is to investigate the Mercurial "mq" extension and see if the fix1-fix3 forests could just be one forest using the "mq" extension. See chapter 12 in the Mercurial Book.


Monday Oct 29, 2007

OpenJDK Mercurial Wheel

Sorry Dorothy, we aren't in Kansas anymore, and there isn't just one repository anymore. ;\^)

The JDK team has been using TeamWare (also a Distributed SCM like Mercurial) for a very long time, and the strategy adopted involves having different teams (usually based on functionality) push changes through specific team areas rather than everyone integrating into one MASTER area. Each team can focus their testing on the changes their team is making, and also protect themselves from regressions made by other teams. It also allows for changes to be "baked" before being pushed into the MASTER area.

There is some overhead here, in that an assigned integrator for each team will need to periodically sync up or merge with the MASTER area, test the merge, and push the resulting merge up to the MASTER area. Sometimes this happens every few days, sometimes every week, and sometimes every two weeks. It depends on many factors. And some of these areas may not push directly to the MASTER area, it's up to the integrator and the team to decide if they want another ply on the wheel (so to speak). For example, the hotspot team has GC, Runtime, Compilers, and Serviceability areas (sometimes called baselines) that those hotspot teams push changes to, and those changes then get pushed to the hotspot area (sometime called "main" or "main/baseline").

And of course, all integrations to the MASTER area are done using a basic reservation model so that the merge and integration is not interrupted or complicated with someone else pushing changes to the MASTER area.

Hopefully this illustration will help.

Given any point in time, every one of these areas could be different in different ways, depends on how often the integrators sync up with the MASTER area. For the most part (with some exceptions) there is little overlap in the actual files changed in these areas, so often the merges are fairly simple, but they can get nasty. So if you need to talk to an integrator, remember, they don't get paid extra for being an integrator, so be nice. ;\^)

For anyone considering a change to the OpenJDK, I recommend they go to the OpenJDK email aliases and connect with the appropriate team for the change you are making.

Expect more details on this in the days ahead.


Sunday Oct 21, 2007

OpenJDK Mercurial Forest

We are getting pretty close now to getting the OpenJDK Mercurial forest content. A forest is just a directory tree or set of directories that can contain multiple repositories. Each of the repositories are independent and are grouped only due to the location of it's directory. Here is an illustration that may help understand the layout and content of a full OpenJDK forest:

In many cases developers may only need to deal with one or two repositories so their own local forests may be pretty sparse, they may not necessarily need all the OpenJDK forest, however, verifying a change doesn't impact the build of the entire forest may require a developer to have a full forest.

A distributed SCM like Mercurial (or TeamWare) allows for distributed development but also isolated development between teams. Each team has an integration area where team members push their changes to, and one team member is assigned the task of integrating those changes into the MASTER forest which is used to create the promoted builds and the final product. A change will trickle from a individual's forest to the integration forest used and finally to the MASTER forest. This is essentially the historic model that we have used with TeamWare workspaces. Changes to integration areas are fresher for that team, but unless it's been sync'd with the MASTER may contain stale changes in other areas.

So there are decisions about what part of the forest you are interested in, and what integration area you might want to see changes from.


Tuesday May 08, 2007

NetBeans 6 Mercurial Village

NetBeans 6 and Mercurial

I just happened to have a picture of the Village where NetBeans 6 and it's new Mercurial plugin were created.

Ok, not really, I'm just kidding, this is actually Plimoth Plantation a re-creation of the original 1627 settlement, in Plymouth, Mass. But both are pretty cool, if you are ever in Massachusetts, make sure you put Plymouth on the list. It's quite educational.

As for NetBeans 6, it's much easier to access than Plimoth Village :\^) Just go to, download and install NetBeans 6 Preview (Milestone 9). Once you get it started, go to Tools->Plugins, find the Mercurial plugin and install it. The editor has changed quite a bit, but seems pretty solid so far. I have a small Java project called JPRT (maybe 150 source files, maybe 35,000 lines of Java code), that has been in Mercurial for quite some time now, and this NetBeans 6 and Mercurial plugin is just right for me. Very handy. I'm in the process of converting to NetBeans 6 now, should be interesting.

The complete OpenJDK was released today at JavaOne 2007 and is now available at Along with the OpenJDK sources are some NetBeans 6 projects, which are documented at As of today, these projects haven't been tested with an OpenJDK Mercurial repository yet, we will be doing that in the next few months.

Granted we haven't released any OpenJDK Mercurial repositories just yet, but as soon as we catch our breath from doing the OpenJDK launch, we will start on the Mercurial transition.

By the way, a great deal of people have been working long hours to make this OpenJDK release at JavaOne 2007 happen. I've been involved in some of it myself, but it took a dedicated and talented team of people to make this all happen. It might seem like a trivial thing to open up some sources, but trust me, it wasn't trivial. My special thanks go out to the Release Engineering Team, who spent some long nights getting those OpenJDK source bundles ready.



Various blogs on JDK development procedures, including building, build infrastructure, testing, and source maintenance.


« June 2016