By mkupfer on Jul 21, 2009
As described in the design
document, source code access on
opensolaris.org is done via
ssh. The user doesn't
ssh directly. Rather, the user runs Mercurial (or
Subversion), which invokes
ssh using its standard
ssh URLs. Once connected to the
server, a custom restricted
the server-side program. This is all done in a
chroot environment, with loopback
mounts providing access to only those repositories that the user
has write access to.
The loopback mounts are created when the user logs in, and they are torn down when the source code management (SCM) operation completes. This is done by way of a custom PAM module. As part of the session's "open" processing, the module determines what repositories to grant access to, and it establishes those mount points. As part of the session's "close" processing, it removes those mount points.
We recently noticed that the loopback mounts were not getting unmounted. This causes a couple problems. One is that thousands of unused loopback mounts accumulate on the server. If nothing else, this makes life more difficult for administrators.
The lingering mounts can also lead to a denial of service
problem, which we've
witnessed a few times. The problem occurs if a repository is
deleted and recreated while there is still a loopback mount for
it. Future references to the loopback mount will fail with an
error. This can interfere with the setup of a user's loopback
mounts in a subsequent login, resulting in a situation where
users are unable to
access recently created repositories. Worse, attempts to
unmount the broken loopback mount fail, and
doesn't support forced unmount.
So the only way to recover is to reboot the server.
After the third or so instance of this, we decided to figure out why the loopback mounts were not getting unmounted. Arguments can be passed to a PAM module by putting them after the module name in /etc/pam.conf, and there's a convention to enable debugging output with the argument "debug", e.g.,
other session requisite pam_foo.so.1 debug
For this to be useful,
syslogd needs to be
configured to display the debug output. For example, put
in syslog.conf and utter
# svcadm restart system/system-log
Once we made these two changes, we could see that the session-open routine was running normally, but it didn't look like the session-close routine was getting invoked.
This seemed awfully strange, so we enabled PAM framework debugging with
# touch /etc/pam_debug
(This, too, requires that
syslogd be configured to
auth.debug output somewhere accessible.)
This showed that our session-close routine was, in fact, being invoked.
Looking more closely at the session-closed routine, we noticed that it checks what user it is invoked as. If it's not invoked as uid 0, it bails out, before doing any debug logging. Moving the debug logging to come before the uid check confirmed that it was running as the user whose session was ending.
Some Googling revealed a known issue in OpenSSH (from which the Solaris SSH is derived) in which the session-close routine is called as the session's user, not uid 0.
From the comments in the OpenSSH Bugzilla, it looks like a fix is available from upstream, so we're hopeful that we just need to talk to the Sun SSH team about getting the fix into OpenSolaris. We're also looking into possible workarounds, in case the fix can't be pulled in promptly.
I filed a bug for this: 6869790.
The current status is that the Solaris SSH team is discussing possible fixes, but they haven't come up with a good approach yet. Just reverting the code isn't an option because it would break support for hardware acceleration. And the upstream privilege separation code is different from the code in Solaris, so they can't just use the upstream patch.