Thursday Oct 08, 2009

Teamware/SCCS history conversion to Mercurial

Teamware/SCCS history conversion to Mercurial

Originally posted back in December 2007, I've added some new references and some possible strategies, at the end.

Silver Falls, Oregon. No, it doesn't use Mercurial, yet.

Just a few notes on converting source file change history from Teamware/SCCS to Mercurial. These are just notes because in the JDK area and in any Teamware JDK workspaces being converted, we don't plan on converting the old source change history into Mercurial. The major reason why we aren't is a legal issue, and you can imagine what the legal issues are with regards to non-open sources that become open. I won't get into that. But there are some technical issues too, which I will try and cover in case someone decides to attempt such a conversion.

Why convert the revision history?

The complete source history is an extremely valuable asset, being able to know when and who made a change years ago is often essential to understanding a problem in a product. Initially we wanted to preserve this source change history and assumed it wasn't a difficult job. Most engineers have been upset that our current plans don't include this history conversion, but read on if you are curious as to the problems encountered.

The Basic Idea

The basic idea in doing an 'ideal' source history conversion would be to create a Mercurial changeset for each Teamware putback. That means you need to identify the putback event, the specific SCCS revisions of the files, and any file renames or deletes. And each changeset is built upon the previous changeset, so the ordering of the changesets is critical here.

Sounds simple right? Well, read on, it's not so simple.

The Problems

History Files: You need to understand how the Teamware history file works. The Codemgr_wsdata/history file in a workspace does not propagate, so the specifics on a putback won't percolate around your tree of workspaces. This means that each workspace has a history of the Teamware events that happened to it, but not the details of anything that happened to the other workspaces. So to get accurate Teamware history you need the entire tree of integration workspaces (any workspaces that might be the target of a putback) and all that ever existed, then you'd need to fold all the events in these history files together in the proper time order. So the more complicated the Teamware workspace integration tree, the more difficult this task becomes. The JDK workspaces (there are many different workspaces) each have 6-20 different integration workspace instances, and some of these workspaces go back quite a few years, so we are talking some major source change history here.

SCCS file revision numbers: The details in the Teamware history will just list the files involved in a putback or bringover, not the specific SCCS Rev numbers for the files. So matching up the specific SCCS Revs on files to the specific putback event that putback these SCCS revisions is not trivial. (I think there may have been an option in Teamware to record the SCCS revision numbers in the workspace history file, but it is off by default, which is shame). So to create a nice neat Mercurial changeset means you need to somehow match up the filelists and timestamps of the putbacks with the individual SCCS revision numbers of source files. Unfortunately, the SCCS files record a time but no timezone, so if anyone decides to do this kind of history conversion will need to have lots of fuzzy timestamp logic to match up the right SCCS revisions with the putbacks. The username is included in the Teamware history file and the SCCS revisions, so that may also help, except that often an integrator of changes isn't the same person that did the SCCS revision.

SCCS Revision Tree: The SCCS revision tree for each file can be fairly complex graph, depending on how many file merges happened to the file. You might be able to just use the top level SCCS revision number, but information in the SCCS comments of the other revisions will contain important information to preserve.

Deleted files: Teamware deletes files by moving them from where they are to a top level deleted_files directory. So they don't really get deleted, just renamed. However, a common practice with many teams is to purge the deleted_files directory once a product reaches a major milestone. So some of the files may actually be permanently gone, and this needs to be taken into consideration. At some point, you can't recreate the complete source tree if this has happened.

Empty comments: Empty SCCS revision comments, and empty putback comments would also create problems if you planned on using these comments or cookies of information in these comments to connect up the files to the putback events (e.g. bug id numbers or change request numbers). So more specific SCCS revision comments and more specific putback comments might make this job easier.


We considered multiple approaches to doing a source revision history conversion. You could come at it from the putback events, using the history files to identify 'real' changesets, and hope any deleted files are still around. What you'd use as Revision 1 of the files might be a little tricky. Or you could try and just look at the SCCS revisions, and figure out via timestamps, usernames, and perhaps SCCS comments, which files were changed as a group. Or a combination of both. Or you could try to come at if from a time perspective, e.g. all the changes to get you from April 1, 2004 to May 1, 2004.

The simple approach of one changeset per SCCS revision isn't really that simple because Mercurial changesets have an order to them. To do it right you'd need to view the Teamware workspace as a large graph of file nodes, with small sub-graphs of SCCS revisions. Then pick a time T to start Revision 1 of the Mercurial sources, find all the file instances at time T, add these files as a changeset to Mercurial, then repeat that for T+1. Or perhaps T+N where N is selected based on sampling timestamps after T for a quiet time (to avoid picking a time that might split up file changes that happened in a group). Just some wild ideas.

But it just feels wrong, no putback data, the files won't be bunched right, and the resulting repository would contain inaccurate source state in any of these converted changesets.

We never fully explored all the approaches because once the legal constraint came in, there seemed no need to pursue it. It's an interesting and complicated problem, but ultimately one we decided we didn't have to solve.


So the bottom line is that whatever can be created would likely have questionable data if someone asked to have the sources per a particular date or if they wanted to know the state of the entire source tree when a given change was made... Hard to ever be perfect here, and not being perfect could send a few engineers down some deep rabbit holes. :\^(

The old history isn't being destroyed, it's just being left in the old Teamware workspaces. So we will still have access to it, just not via Mercurial repositories. As time passes, we'll build up new and better history in our Mercurial repositories, and maybe by the time I retire, it won't matter much. ;\^)

Update: Some Ideas

Jesse's conversion script turns out to be a possibility. He documents the problems with it, but it's certainly a step toward something.

With the OpenJDK6 repositories which were originally in TeamWare, we had two ways to gain some history. With each build promotion while in TeamWare, we saved a source bundle, so we had a raw snapshot of the source for each build. By using these as potential working set files, this allowed us to start rev0 with Build 0 source bundles, then for each build promoted after that, repeat the steps:

  1. Delete the working set files
  2. Copy in new working set files from the source bundle for Build promotion N.
  3. Run: hg addremove ; hg commit --message "Build Promotion N" ; hg tag BuildN
This provides a large grain history, not great, but could be very valuable to narrow down when a change came in. Adding in more specific history required patch files that you would apply in between, but you needed the patches, you needed to know what Build a change went in, and most critically, you needed to know the order of the patches or changes in case two fixes modified the same files. Ultimately, it worked for OpenJDK6, to a degree. The Build Promotion revisions were accurate, but sometimes getting the others accurate was hard to do. And unfortunately, all the changesets in OpenJDK6 look like they were created by me, which is right in a way, but I really wasn't responsible for many of the patches. So the authors, dates, and SCCS comments were not included, but the bugids were.

Anyway, just thought I would update this rather old posting.


Saturday Feb 21, 2009

Are you a House Elf?

For those of you who haven't seen the Harry Potter movies or read the books, a House Elf is an obedient servant who must obey the orders of their masters. They have their own special magical powers but are bound to their master until freed with a present of clothes (most of them wear old pillowcases). When they do go against orders, they must punish themselves. They pride themselves in their work and are very loyal servants.

In the Harry Potter stories, Dobby (in the picture above) had to punish himself many times for going against his master, but he did it to protect Harry Potter, and because his master was just plain evil. Dobby was an unusual House Elf, he stood up for what he thought was right, even though it was painful.

Are you a House Elf? Are we all House Elves? Most people I know and work with take great pride in what they do. I would like to think that when the time comes, and we are asked to do something we feel is wrong, that we all behave like Dobby, and do the right thing.


Tuesday Jul 08, 2008

Alaskan Cruise Vacation

Just returned June 26th from a 10 day Alaskan Cruise aboard the Dawn Princess, leaving from and returning to San Francisco.

This was my family's 3rd cruise, and our second cruise to Alaska. It's the first time we left from San Francisco, having flown to Anchorage, Alaska for the start of our previous 7 day cruise in 1999 which left us in Vancouver, and having flown to Hawaii for the cruise we had around the islands. We enjoy the Alaskan cruises, and in particular the Princess Cruise Line, but what do we know, only having cruised 3 times. During the cruise they awarded someone a free cruise for having cruised with Princess Cruise Line (just Princess mind you) 72 times (over 700 days of cruising!).

For anyone that has never been on a cruise, it's a very spoiling experience. The food is excellent (and part of the cruise price), the rooms are small (but who stays in their room other than to sleep), the shows are free (granted, they should have paid me for a couple of them), the view is to die for (assuming you like the ocean), the staff was excellent (even in the face of a few rude passengers), and the ports in Alaska are fascinating.

Why did we do a cruise? Well, our first choice was to fly to the East coast, maybe Washington D.C., or New York. But if you add up the flights, the hotels, the food, the rental car, etc. The Alaska Cruise inside stateroom was something like $1100 or $1200, per person double occupancy. It included food, no rental car needed, no hotel costs, and leaving from San Francisco meant no airplanes or airport hassles. Granted, drinks, soda, shore excursions, and misc expenses will happen on a cruise, but overall, we decided on the cruise. (I'm really beginning to hate airplanes and airports).

The Dawn Princess holds something like 1,900 passengers (plus 900 or so crew members) and is no small ship, however it is one of the smaller Princess ships (some take 3,000 passengers). Our previous Alaskan Cruise was on the Sun Princess which is a twin to the Dawn Princess, and although the ship was fine, I think I would have rather tried a different ship, just to try a different ship. :\^)

Here is the itinerary:

1San Francisco, California12:00 AM4:00 PM
2At Sea
3At Sea
4Ketchikan, Alaska7:00 AM3:00 PM
5Juneau, Alaska8:00 AM10:00 PM
6Skagway, Alaska7:00 AM8:30 PM
7Tracy Arm Fjord, Alaska (Scenic Cruising)5:00 AM10:00 AM
8At Sea
9Victoria, British Columbia6:00 AM2:00 PM
10At Sea
11San Francisco, California7:00 AM

I'll try and add some photos when I can wrestle the camera away from my wife. :\^)

It was a great cruise, had lots of fun. Next time we will probably try to visit some new ports in Alaska.


Thursday Jun 12, 2008

Removing duplicate PATH entries

Seemed like everywhere people just kept adding things to PATH without regards to whether it was already in PATH. I don't suspect that long PATH entries are a performance problem in Linux and Solaris, but Windows??? I don't pretend to completely understand all the places in a Windows system where PATH is processed and repeatedly scanned and the directories repeatedly probed, but it seemed like an easy thing to fix for all platforms, ... or so I thought. So I investigated how I could remove duplicate entries in the PATH variable using some kind of shell commands, I ended up with an awk command that worked pretty well.

Below I added newlines in the single quote argument to awk, you may need to mush those lines together into one long line, I broke it up below so it was easier to read.

# Given a PATH like string and a separator, remove duplicate entries
removeDups() # string sep
  if [ "${osname}" = "windows" ] ; then
    printf "%s\\n" "$1" | \\
      sed -e 's@\\\\@/@g' | \\
      ${AWK} -F"$2" \\
       '{ \\
          a[toupper($1)]; \\
          printf "%s",$1; \\
          for(i=2;i<=NF;i++){ \\
            if(!(toupper($i) in a)){ \\
              a[toupper($i)]; \\
              printf "%s%s",FS,$i; \\
            } \\
          }; \\
          printf "\\n"; \\
    printf "%s\\n" "$1" | \\
      ${AWK} -F"$2" \\
       '{ \\
          a[$1]; \\
          printf "%s",$1; \\
          for(i=2;i<=NF;i++){ \\
            if(!($i in a)){ \\
              a[$i]; \\
              printf "%s%s",FS,$i; \\
            } \\
          }; \\
          printf "\\n"; \\

# OS name: Linux or SunOS, pot luck on Windows
osname="`uname -s`"
if [ `printf "%s\\n" "${PATH}"` != "" ] ; then
if [ "`echo ${osname} | grep -i CYGWIN`" != "" ] ; then

# Need particular AWK
if [ "${osname}" = "SunOS" ] ; then

# Get new path setting
newpath=`removeDups "${PATH}" "${sep}"`

# Redefine your PATH setting
if [ "${PATH}" != "${newpath}" ] ; then
  echo "# Resetting PATH to remove duplicates"
  export PATH

Maybe someone else can get something out of this. Or suggest an even better way. ;\^)


Tuesday Jun 10, 2008

CSN Concert at Wente Vineyards

Last night drove down the street to Wente Vineyards and saw Crosby, Stills & Nash. They have a long history that sometimes includes Neil Young.

CSNY, gray hair and all! (We only saw CSN) Teach Your Children, back when they were younger. :\^)

Their harmony was still there, a great concert. Although a bit expensive at Wente, $207 per person, but that included a meal and we were probably 100 feet from the stage or so. Weather was fantastic too.


Saturday May 10, 2008

ZZ Top

Last night we drove to Dixon, California and saw the legendary hard rock band ZZ Top.

ZZ Top as they looked in the 1980's and still look this way in 2008! Sharp Dressed Man Music Video (One of my favorites)



Various blogs on JDK development procedures, including building, build infrastructure, testing, and source maintenance.


« July 2016