Tuesday Oct 12, 2010

Watch This Space

With the latest XWiki update, opensolaris.org now has working watchpoints. The UI for watching a page is pretty obvious--there's a watch/unwatch link in the upper right of the page. But I couldn't figure out how to watch an entire space (e.g., the Tools community space). Today I stumbled across it. The 2nd menu in the upper left shows the space that the current page belongs to. One of the operations you can invoke via that menu is "watch space".

Sunday Jul 25, 2010

Finding ON Flag Day Announcements

ON flag day emails are archived automatically to an internal server and then mirrored externally. Unfortunately, folks sometimes give the internal URL when referring to a particular flag day message, which doesn't do you much good if you don't have access to the Oracle/Sun internal network.

While it would be preferable to have the external URL to start with, the mapping from the internal URL to external URL is pretty simple. Replace "onnv.sfbay.sun.com" with "static.opensolaris.org/on". So, for example,




Tuesday Jul 20, 2010

"make all" in usr/src/uts

I've started work to fix a problem with Install, which is a script that kernel developers can use to help install test kernels. So I needed to build a kernel for testing, but I didn't need to build anything outside the kernel. I thought I'd just set up an environment using bldenv], then build using dmake, i.e.,

$ bldenv -d my-env.sh
$ cd $SRC/uts
$ dmake all

Notice I used "dmake all", not "dmake install". This is because Install extracts kernel modules from the source tree, not the proto area.

When my build finished, I noticed that there were build errors.

/usr/ccs/bin/ld -o ../cheetah/debug64/libcpu.so -G -znoreloc -h 'cpu/$CPU' debug64/cpu_module.o debug64/mach_cpu_module.o
ld: fatal: file ../cheetah/debug64/libcpu.so: open failed: No such file or directory
\*\*\* Error code 1
dmake: Fatal error: Command failed for target `../cheetah/debug64/libcpu.so'
Current working directory /builds/kupfer/6952783/usr/src/uts/sun4u/serengeti/unix
\*\*\* Error code 1
The following command caused the error:
BUILD_TYPE=DBG64 VERSION='6952783:66c93397e15b' dmake  symcheck.targ
dmake: Fatal error: Command failed for target `symcheck.debug64'
Current working directory /builds/kupfer/6952783/usr/src/uts/sun4u/serengeti/unix
\*\*\* Error code 1
The following command caused the error:
(cd ../../../sun4u/serengeti/unix; pwd; \\
CPU_DIR=../cheetah SYM_MOD=../cheetah/obj64/unix.sym dmake symcheck)
dmake: Fatal error: Command failed for target `obj64/unix.sym'
Current working directory /builds/kupfer/6952783/usr/src/uts/sun4u/serengeti/cheetah
\*\*\* Error code 1
The following command caused the error:
BUILD_TYPE=OBJ64 VERSION='6952783:66c93397e15b' dmake  all.targ
dmake: Fatal error: Command failed for target `all.obj64'
Current working directory /builds/kupfer/6952783/usr/src/uts/sun4u/serengeti/cheetah
\*\*\* Error code 1
The following command caused the error:
cd cheetah; pwd; dmake  all
dmake: Fatal error: Command failed for target `cheetah'
Current working directory /builds/kupfer/6952783/usr/src/uts/sun4u/serengeti
\*\*\* Error code 1
The following command caused the error:
cd serengeti; pwd; THISIMPL=serengeti dmake  all
dmake: Fatal error: Command failed for target `serengeti'
Current working directory /builds/kupfer/6952783/usr/src/uts/sun4u
Waiting for 3 jobs to finish

This was odd for a couple reasons.

First, I used bldenv -d, which sets up an environment for building debug kernel modules. Yet the "obj64" directory names indicated that non-debug modules were being built.

Second, the log seemed to indicate that the non-debug build had some sort of dependency on the debug build. That couldn't be right--the nightly script, which most of us use to drive our builds, does the non-debug build before it does the debug build. So any such dependency would break nightly.

Just what was going on here?

Well, the first thing you have to understand is that the kernel makefiles make heavy use of macros to determine what exactly to build. Do you want 32-bit binaries or 64-bit? Debug or non-debug?

After some poking around, I noticed that for the kernel, the "install" target uses DEF_BUILDS, which is going to specify debug or non-debug (one or the other). The "all" target uses ALL_BUILDS, which specifies both debug and non-debug.

So that answers my first question. I was building a non-debug module because I had used "dmake all" instead of "dmake install".

But why was a non-debug build depending on a debug binary?

Some tracing through the makefiles shows that the symcheck target in uts/sun4u/serengeti/unix/Makefile uses the SYM_BUILDS macro, which that makefile sets to $(DEF_BUILDSONLY64) (i.e., just the 64-bit variant of $(DEF_BUILDS)). Because I had used "bldenv -d", SYM_BUILDS was thus set to "debug64", even though it was being invoked for a non-debug module.

What can we conclude from this? Well, if you're building the kernel in a bldenv environment, it's better to use "make install" than "make all", at least on SPARC systems. (It looks like the x86 builds don't use the symcheck target.) It might be possible to restructure the kernel makefiles to avoid this issue, but I don't understand enough of what's going on here to say for sure.

Wednesday Nov 25, 2009

Signed Crypto Gets Its Own Tarball

The OS/Net (ON) component of OpenSolaris has some closed-source code. The binaries for this code (well, the binaries that are redistributable) are made available to non-Sun developers in the form of a compressed tar file, which the build tools incorporate into BFU archives or packages. These closed-bins tarballs also contain binaries for open-source cryptographic code. To satisfy US government regulations, the OpenSolaris cryptography framework requires that certain crypto binaries be signed. Most external developers don't have the necessary key and certificate to sign their binaries, so we provide a working set for them.

This setup has worked okay since the launch of OpenSolaris in 2005, but it's got a couple problems. First, bindrop, the script that puts the crypto binaries into the closed-bins tarball, works off a hard-coded list. As with any manually-maintained list, this introduces a risk that it will not be updated when a new crypto module is added to the system. Second, bindrop gets the crypto binaries by extracting them from the SVR4-format packages that are generated from the ON gate. With the upcoming move to IPS, those packages will go away, and it will be much harder to extract the binaries from the IPS packages that will replace them.

So I'm working on changes to the way we deliver the signed crypto binaries. First, we'll be splitting the crypto out into its own tarball. This gives us more flexibility about when we deliver the crypto binaries. Second, instead of using a hard-coded list, we'll scan the proto area, which is the staging area before files are packaged. Any properly signed binaries will be included in the crypto tarball. If you're interested, CR 6855998 has more details.

The code for this has been written, though it still needs a little more polishing, like making sure that error messages are handled correctly. I'm hoping to get this into build 130, but it might slip into 131.

Saturday Oct 10, 2009

SCM Mounts: Done (Almost)

I've finished the workaround for the sshd privileges issue. I ended up writing a simple setuid C program so that our PAM module could unmount the loopback filesystems. I had been using an RBAC-based approach, but that requires that the user own the mount point for each loopback mount. The more I worked on it, the more failure scenarios I ran into because of that requirement. The setuid approach had none of those issues, and it turned out to be much simpler to code than I had been expecting.

So the changes have been committed to the repository for the SCM infrastructure, and the new bits have been deployed on the backup SCM server. The only thing left is to deploy on the primary SCM server.

Unfortunately, this doesn't mean I'll now have time to finish off the OSCON trip report. Instead, I'll be focusing on a change to the way we deliver crypto binaries to ON developers.

Friday Sep 25, 2009

Progress with SCM Mounts

I've been busy implementing a workaround to the sshd privileges issue that I mentioned a couple months ago. Unfortunately, this has meant some delays in my OSCON trip report, but I hope to post the next installment of that sometime in the next week.

Besides implementing the actual workaround, I've been beefing up the scripts that we use for setting up a test environment. Even though we have a backup production system that I could use for testing, it's safer to test a change of this size on a separate system. I sleep a lot better knowing that if I break something during development, I definitely won't inconvenience users.

I've also been learning about the privilege-management facilities that are available in (Open)Solaris. We had some problems finding a concise but sufficiently detailed writeup of how the mount/unmount privileges work. While the information is present in the umount(2) man page, a much clearer explanation is given in the output from "ppriv -lv".

Saturday Aug 22, 2009

Still Reviewing Web Pages

I'm still reviewing web pages in preparation for the migration of opensolaris.org to XWiki. So far I've finished reviewing the ON Developer Reference and the SCM-related pages in the Tools web space.

The only thing left for (my) review is the SCM Migration project web space. Since that project is no longer active, I don't plan to look too carefully. But a sanity check does seem in order, and maybe there are some obsolete pages or attachments that we can delete.

The ON Developer Reference (DevRef) is a particularly tough case for the migration software because of its extensive use of anchored links. I had been planning to retire the XML (Docbook) source that the DevRef currently uses, and keep everything in XWiki markup, but I'm not looking forward to fixing all the cross-references. So I'm having second thoughts about that strategy.

Sunday Aug 16, 2009

OpenSolaris.org Moving to XWiki

I'm afraid I haven't made any progress on my OSCON trip report this week. We've started the beta testing for migrating opensolaris.org from the current portal application to XWiki. I've been reviewing the ON Developer Reference, and it's taken more of my time than I had expected. (In fact, I'm still not done.)

If you're a community or project leader, please do take the time to review the pages that you're responsible for. Some issues will be easier to fix if they are identified before the migration. And the migration team needs user feedback to help identify which issues cause the most trouble.

If you just have a question, you can ask on the website-discuss list (at opensolaris.org). If you're sure you've found a bug, either in the migration code or in the new XWiki-based site, go to defect.opensolaris.org and file a bug under Development: product=website, component=site-wiki.

Tuesday Jul 21, 2009

Unwanted Mounts

As described in the design document, source code access on opensolaris.org is done via ssh. The user doesn't invoke ssh directly. Rather, the user runs Mercurial (or Subversion), which invokes ssh using its standard processing for ssh URLs. Once connected to the server, a custom restricted shell invokes the server-side program. This is all done in a chroot environment, with loopback mounts providing access to only those repositories that the user has write access to.

The loopback mounts are created when the user logs in, and they are torn down when the source code management (SCM) operation completes. This is done by way of a custom PAM module. As part of the session's "open" processing, the module determines what repositories to grant access to, and it establishes those mount points. As part of the session's "close" processing, it removes those mount points.

We recently noticed that the loopback mounts were not getting unmounted. This causes a couple problems. One is that thousands of unused loopback mounts accumulate on the server. If nothing else, this makes life more difficult for administrators.

The lingering mounts can also lead to a denial of service problem, which we've witnessed a few times. The problem occurs if a repository is deleted and recreated while there is still a loopback mount for it. Future references to the loopback mount will fail with an error. This can interfere with the setup of a user's loopback mounts in a subsequent login, resulting in a situation where users are unable to access recently created repositories. Worse, attempts to unmount the broken loopback mount fail, and lofs doesn't support forced unmount. So the only way to recover is to reboot the server.

After the third or so instance of this, we decided to figure out why the loopback mounts were not getting unmounted. Arguments can be passed to a PAM module by putting them after the module name in /etc/pam.conf, and there's a convention to enable debugging output with the argument "debug", e.g.,

other	session requisite	pam_foo.so.1	debug

For this to be useful, syslogd needs to be configured to display the debug output. For example, put

auth.debug	/var/adm/auth.log

in syslog.conf and utter

# svcadm restart system/system-log

Once we made these two changes, we could see that the session-open routine was running normally, but it didn't look like the session-close routine was getting invoked.

This seemed awfully strange, so we enabled PAM framework debugging with

# touch /etc/pam_debug

(This, too, requires that syslogd be configured to put auth.debug output somewhere accessible.)

This showed that our session-close routine was, in fact, being invoked.

Looking more closely at the session-closed routine, we noticed that it checks what user it is invoked as. If it's not invoked as uid 0, it bails out, before doing any debug logging. Moving the debug logging to come before the uid check confirmed that it was running as the user whose session was ending.

Some Googling revealed a known issue in OpenSSH (from which the Solaris SSH is derived) in which the session-close routine is called as the session's user, not uid 0.

From the comments in the OpenSSH Bugzilla, it looks like a fix is available from upstream, so we're hopeful that we just need to talk to the Sun SSH team about getting the fix into OpenSolaris. We're also looking into possible workarounds, in case the fix can't be pulled in promptly.

Update 2009-09-16

I filed a bug for this: 6869790.

The current status is that the Solaris SSH team is discussing possible fixes, but they haven't come up with a good approach yet. Just reverting the code isn't an option because it would break support for hardware acceleration. And the upstream privilege separation code is different from the code in Solaris, so they can't just use the upstream patch.

Friday Feb 13, 2009

OpenSolaris and gnuserv

I installed OpenSolaris 2008.11 on my notebook (a VAIO TX) several weeks ago. I've been tweaking the environment, in preparation for the day when I move to OpenSolaris on my desktop system.

One of the issues that came up was that gnuserv would exit immediately after being started. This meant that every time I wanted an editor (e.g., for a Mercurial commit), I had to wait for a new XEmacs process to start.

I looked around for some sort of error message but couldn't find anything. I finally started XEmacs using truss -f. Looking at the truss output, I saw that gnuserv was looking in /etc/hosts and not finding an entry with the notebook's hostname ("loiosh").

I added "loiosh" to the localhost ( line, and that fixed the problem.

Sunday Oct 05, 2008

Printing opensolaris.org Pages with Recent Builds

Back in August I upgraded my desktop to snv_95. I sometimes print pages from opensolaris.org to read during my commute, but with snv_95 the pages came out pretty much unreadable. They looked like they had been through several fax transmissions, with blotchy, almost indecipherable characters. At the time I chalked it up to known issues with fonts and went back to running an earlier build (thank you, Live Upgrade).

I revisited the issue last week, after noticing that the headers and footers from Firefox looked okay. It was just the main text that was messed up. I checked my preferences (Content>Fonts&Colors>Advanced)--the checkbox "Allow pages to choose their own fonts" was enabled. I disabled it and tried again, and now the printed pages are legible.

Wednesday Apr 09, 2008

Converting Projects to Mercurial

One of the things that we consider when deprecating components of (Open)Solaris is how users move from the old software to the new software. We've applied that principle to the SCM Migration project, so we've been working on documentation (e.g., a Mercurial cheat sheet for TeamWare users), and the updated tools work with both TeamWare and Mercurial. Also, we don't want to tie the schedules of large projects to the SCM Migration schedule or vice versa. So we need to support projects that are begun under TeamWare, but which are still under development when we're ready to move the gate from TeamWare to Mercurial. That support is provided by a new script called wx2hg.

In general, it's hard to convert a TeamWare workspace to Mercurial, at least if you want to maintain history. But ON already has a policy that putbacks should (usually) add a single delta. That is, any project-specific history will be lost anyway. That makes the job of wx2hg a lot easier.

Suppose you have a project gate--call it my-proj--that is a child of onnv-gate, the ON master gate. We already maintain a Mercurial mirror of onnv-gate, which I will call onnv-hg for now. So when you're ready to move to Mercurial, what you want is a child of onnv-hg. That child should have the same changes relative to onnv-hg that my-proj has relative to onnv-gate.

It turns out that it is pretty easy for wx2hg to do this. The wx front-end keeps track of renames and files with contents changes. So wx2hg just needs to get that information from wx and apply it to a child of onnv-hg. The rest of the script is error detection and handling.

Let's walk through an example.

Suppose I have a workspace that deletes all the SCCS helper scripts in usr/src/tools. And to demonstrate renames, it renames the scripts directory makefile to Makefile.new.

$ pwd
$ putback -n

Would put back name changes: 10

rename from: usr/src/tools/scripts/Makefile
         to: usr/src/tools/scripts/Makefile.new
rename from: usr/src/tools/scripts/sccscheck.1
         to: deleted_files/usr/src/tools/scripts/sccscheck.1
rename from: usr/src/tools/scripts/sccscheck.sh
         to: deleted_files/usr/src/tools/scripts/sccscheck.sh
rename from: usr/src/tools/scripts/sccscp.1
         to: deleted_files/usr/src/tools/scripts/sccscp.1
rename from: usr/src/tools/scripts/sccscp.sh
         to: deleted_files/usr/src/tools/scripts/sccscp.sh
rename from: usr/src/tools/scripts/sccshist.sh
         to: deleted_files/usr/src/tools/scripts/sccshist.sh
rename from: usr/src/tools/scripts/sccsmv.1
         to: deleted_files/usr/src/tools/scripts/sccsmv.1
rename from: usr/src/tools/scripts/sccsmv.sh
         to: deleted_files/usr/src/tools/scripts/sccsmv.sh
rename from: usr/src/tools/scripts/sccsrm.1
         to: deleted_files/usr/src/tools/scripts/sccsrm.1
rename from: usr/src/tools/scripts/sccsrm.sh
         to: deleted_files/usr/src/tools/scripts/sccsrm.sh

The following files are currently checked out and have been edited in workspace
No changes were put back

Note that although Makefile.new is checked out, it need not be.

Converting this to Mercurial is simple. If your TeamWare workspace is in a directory that you have write access to, just point wx2hg at it.

$ pwd
$ /opt/onbld/bin/wx2hg tw.no-sccs-tools.demo

wx2hg first creates a Mercurial child (this step can take a few minutes). The child is created in the same directory as the TeamWare workspace, with the same name plus "-hg".

requesting all changes
adding changesets
adding manifests
adding file changes
added 6349 changesets with 91335 changes to 49774 files
44994 files updated, 0 files merged, 0 files removed, 0 files unresolved

wx2hg then initializes wx if you haven't already done so. If the workspace is already under wx control, it does a "wx update" to ensure it will get up-to-date information about the workspace.

Initializing wx...
New renamed file list:
New active file list:
Will backup wx and active files if necessary
wx initialization complete

wx2hg then checks out all the files with contents changes. We want to put the files into Mercurial with unexpanded SCCS keywords, and checking them out is a quick hack to help us do so.

usr/src/tools/scripts/Makefile.new already checked out

wx2hg then processes the rename list.

rename usr/src/tools/scripts/Makefile -> usr/src/tools/scripts/Makefile.new
rename usr/src/tools/scripts/sccscheck.1 -> deleted_files/usr/src/tools/scripts/sccscheck.1
rename usr/src/tools/scripts/sccscheck.sh -> deleted_files/usr/src/tools/scripts/sccscheck.sh
rename usr/src/tools/scripts/sccscp.1 -> deleted_files/usr/src/tools/scripts/sccscp.1
rename usr/src/tools/scripts/sccscp.sh -> deleted_files/usr/src/tools/scripts/sccscp.sh
rename usr/src/tools/scripts/sccshist.sh -> deleted_files/usr/src/tools/scripts/sccshist.sh
rename usr/src/tools/scripts/sccsmv.1 -> deleted_files/usr/src/tools/scripts/sccsmv.1
rename usr/src/tools/scripts/sccsmv.sh -> deleted_files/usr/src/tools/scripts/sccsmv.sh
rename usr/src/tools/scripts/sccsrm.1 -> deleted_files/usr/src/tools/scripts/sccsrm.1
rename usr/src/tools/scripts/sccsrm.sh -> deleted_files/usr/src/tools/scripts/sccsrm.sh

After the renames, it applies a patch for each modified file...

patching file usr/src/tools/scripts/Makefile.new

...and then you're done.

$ ls -dF \*demo\*
tw.no-sccs-tools.demo/		tw.no-sccs-tools.demo-hg/

You can verify that wx2hg transferred all your changes:

$ cd tw.no-sccs-tools.demo-hg
$ hg diff -g
diff --git a/usr/src/tools/scripts/sccscheck.1 b/deleted_files/usr/src/tools/scripts/sccscheck.1
rename from usr/src/tools/scripts/sccscheck.1
rename to deleted_files/usr/src/tools/scripts/sccscheck.1
diff --git a/usr/src/tools/scripts/sccscheck.sh b/deleted_files/usr/src/tools/scripts/sccscheck.sh
rename from usr/src/tools/scripts/sccscheck.sh
rename to deleted_files/usr/src/tools/scripts/sccscheck.sh
diff --git a/usr/src/tools/scripts/sccscp.1 b/deleted_files/usr/src/tools/scripts/sccscp.1
rename from usr/src/tools/scripts/sccscp.1
rename to deleted_files/usr/src/tools/scripts/sccscp.1
diff --git a/usr/src/tools/scripts/sccscp.sh b/deleted_files/usr/src/tools/scripts/sccscp.sh
rename from usr/src/tools/scripts/sccscp.sh
rename to deleted_files/usr/src/tools/scripts/sccscp.sh
diff --git a/usr/src/tools/scripts/sccshist.sh b/deleted_files/usr/src/tools/scripts/sccshist.sh
rename from usr/src/tools/scripts/sccshist.sh
rename to deleted_files/usr/src/tools/scripts/sccshist.sh
diff --git a/usr/src/tools/scripts/sccsmv.1 b/deleted_files/usr/src/tools/scripts/sccsmv.1
rename from usr/src/tools/scripts/sccsmv.1
rename to deleted_files/usr/src/tools/scripts/sccsmv.1
diff --git a/usr/src/tools/scripts/sccsmv.sh b/deleted_files/usr/src/tools/scripts/sccsmv.sh
rename from usr/src/tools/scripts/sccsmv.sh
rename to deleted_files/usr/src/tools/scripts/sccsmv.sh
diff --git a/usr/src/tools/scripts/sccsrm.1 b/deleted_files/usr/src/tools/scripts/sccsrm.1
rename from usr/src/tools/scripts/sccsrm.1
rename to deleted_files/usr/src/tools/scripts/sccsrm.1
diff --git a/usr/src/tools/scripts/sccsrm.sh b/deleted_files/usr/src/tools/scripts/sccsrm.sh
rename from usr/src/tools/scripts/sccsrm.sh
rename to deleted_files/usr/src/tools/scripts/sccsrm.sh
diff --git a/usr/src/tools/scripts/Makefile b/usr/src/tools/scripts/Makefile.new
rename from usr/src/tools/scripts/Makefile
rename to usr/src/tools/scripts/Makefile.new
--- a/usr/src/tools/scripts/Makefile.new
+++ b/usr/src/tools/scripts/Makefile.new
@@ -50,11 +50,6 @@ SHFILES= \\
 	nightly \\
 	onblddrop \\
 	protocmp.terse \\
-	sccscheck \\
-	sccscp \\
-	sccshist \\
-	sccsmv \\
-	sccsrm \\
 	sdrop \\
 	webrev \\
 	ws \\

Note that you still need to do "hg commit" to check in your new version.

All this assumes that your workspace is in sync with /ws/onnv-clone. If it isn't you may get messages like

wx2hg: can't rename: usr/src/tools/scripts/sccscheck.1 doesn't exist.


wx2hg: usr/src/tools/scripts/Makefile.new: parent mismatch; 
  resync with /ws/onnv-clone or specify branch point with -r hg_rev.

Doing a bringover from /ws/onnv-clone, and resolving any conflicts, should fix things up.

You may also see a message like

Please run
  hg --cwd /export/kupfer/tonic/wx2hg-tests/tw.no-sccs-tools.demo-hg update -C
before retrying.

This is telling you you can reuse the Mercurial child, but you need to reset it first. Once you've resynched with /ws/onnv-clone and run the "hg ... update..." command, you use the -t option to tell wx2hg to reuse the Mercurial child. For example,

/opt/onbld/bin/wx2hg -t tw.no-sccs-tools.demo-hg tw.no-sccs-tools.demo

There's more that wx2hg can do, but those features won't be needed until ON moves to Mercurial. If you get stuck using wx2hg, you can ask for help on the SCM migration team list (scm-migration-dev at opensolaris dot org).

Friday Feb 15, 2008

SCM Migration: The Big Picture

When Steve Lau left Sun at the end of last September, I became the go-to guy inside Sun for the migration to Mercurial. I had thought that I had a good high-level grasp of the project. But after getting blindsided a couple times by dependencies I hadn't considered, I drew up a diagram to help me get oriented, identify stakeholders, and maybe anticipate future issues.

Here's a slightly simplified version of the original diagram from the whiteboard in my office:

Blue parallelograms indicate repositories, tan boxes are software modules, solid lines indicate data flow, and dashed lines tie users with the modules that they're using. The three red-rimmed boxes (gk tools, gate hooks, and onbld tools) are where most of the development effort is going.

The primary simplifications in this diagram are

  • the data flow from the project gate actually goes through the SCM front-end before going through the gate hooks.
  • I've omitted the consolidation's clone workspace (a nightly snapshot of the gate)
  • I've omitted the bridge between the current ON workspace in TeamWare and the Mercurial repository that is shadowing it

Even so, this is a moderately busy diagram. There are several components to keep track of and make sure they all fit together.

Most of the work so far has been in the area of the ON build (onbld) tools, pieces of which are used by other consolidations and by the Solaris Companion project. Many of the changes are related to making the tools work with Mercurial as well as with TeamWare/SCCS. We've also had to consider the implications of moving everything outside the Sun firewall, which has meant rethinking interfaces to things like the bug database and our RTI (Request To Integrate) system.

We haven't done as much work on the gatekeeper (gk) tools, although we've started to think about design issues. Many of the design decisions boil down to this question: do we make the minimal set of changes needed to work with Mercurial, or do we make more extensive changes so that the tools can make better use of the features provided by Mercurial? In some cases we are staying with the current approach. For example, we are using separate repositories for build snapshots, rather than using branches and tags in the main gate repository. In other cases we will be changing the tools to use Mercurial features. For example, any automated post-putback processing will be driven directly by Mercurial hooks, rather than the email-based hook system that is needed with TeamWare.

Another set of interesting design decisions has centered around the use of gate hooks to enforce various style and bookkeeping rules. With the current TeamWare setup, we enforce these rules after a putback (at least for ON). The putback triggers various checks, and if your putback violates a rule, you get notified of the problem and given a short window to fix it or your putback is reverted. The gate is normally configured so that anyone (inside Sun) can putback.

While this approach worked when Solaris was closed source, we expect it not to scale for OpenSolaris, where the repository is accessible from anywhere on the Internet and both Sun employees and non-employees can have commit rights. Certain Mercurial hooks can abort a putback ("push" in Mercurial terms), so we could move all the post-putback checks to pre-transaction checks. But moving more checks means more work (e.g., testing), which means a longer time before we can move to Mercurial. So the question becomes which checks really need to happen before putback, and which ones can happen after putback. The check to ensure that a putback has an approved RTI probably needs to happen prior to the putback. The check for adherence to the C style rules can happen after the putback, at least for now.

The opensolaris.org webapp has various bits of functionality for source code management. A project leader or gatekeeper can use the webapp to create, destroy, and lock repositories, as well as to manage commit rights for the project's repositories. Unfortunately, the current set of operations is limited. For example, a gatekeeper might want to lock a repository for most users, but allow access for a specific large project. Alas, this lock granularity is not currently supported. Furthermore, all the controls are currently through a web-based interface, with no scripting hooks. Although there is currently work to improve the webapp and make it easier to change, this work is unlikely to be finished in time for us to make any changes that we expect gatekeepers to want. So we will need to think about other ways to provide the needed functionality, such as giving gatekeepers shell access to the server that hosts the repositories.

The SCM front-end gives a user access to repositories by creating a chroot environment which contains only the repositories that the user has commit privileges for. (Access to other repositories is done via the "anon" user.) If the user reports being unable to pull from, or push to, a repository, the problem could be with the SCM program itself, the SCM front-end, or some other general system service. This diagnosis typically requires shell access to the servers.

We are using Nagios to monitor the health of the servers and services on opensolaris.org. We have written a couple simple Nagios plugins to monitor the Mercurial and Subversion services. As we gain experience with the system, we could update the probes to check for specific failure scenarios.

OpenGrok makes it into this diagram because it makes a private snapshot of each repository that it indexes, so as to provide a consistent view of the tree. We once managed to break the OpenGrok indexing of ON by trying to undo (rollback) a particular putback, so that it would vanish completely from the repository. We didn't know to roll back OpenGrok's snapshot repository as well. So the next time OpenGrok tried to pull from the Mercurial onnv-gate, it created a branch that had to be merged. This was not something OpenGrok was prepared for, so the snapshot tree was not updated. After several days, we started getting complaints from ON teams who couldn't find their recent putbacks in OpenGrok. We figured out the problem, replaced OpenGrok's snapshot repositories, and vowed not to undo/rollback any future putbacks.

So that's the "big picture" of what the SCM Migration project is working on. If you've been frustrated by how long things are taking, well, we're not happy about it, either. Our hope is that by keeping the entire picture in mind, we will not have any serious problems when we finally do move.

Friday Aug 17, 2007

ksh93 Putback

April Chin put back ksh93 into the ON gate this morning. Woohoo! I'm delighted to have a modern, open-source Korn shell in OpenSolaris, and I'm looking forward to when we can (someday) retire the old Solaris ksh. Many thanks to April, Roland Mainz, and Don Cragun for all their work, as well as to everyone who participated in the project reviews and discussions.

Friday May 18, 2007

Defeating the OpenSolaris Address Mangler

The opensolaris.org webapp includes an automatic email address mangler to make it harder for spammers to harvest email addresses. But it's not very smart, and it mangles things that aren't email addresses, like device paths and repository URLs. If you're editing an HTML page on the web site and you want to bypass the email mangler, replace "@" with "@", as in


Random information that I hope will be interesting to Oracle's technical community. The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.


« August 2016