Monday May 26, 2014

Solaris 11.2: presentations in Holland

I'm giving a presentation on Solaris 11.2 new features among others in June  12th and 19th,2014; register here:

I'll be highlighting some new features but will also show how new and old features are combined and how Solaris 11 is our first operating system with a holistic design philosophy.

Friday May 16, 2014

Solaris 11.2: unlink(2)/link(2) for directories: your time is up.

Some thirty years ago, the 4.2BSD Unix release included two new system calls: mkdir(2) and rmdir(2).  Before that time, in order to make a directory, you first needed to call mknod(2) and create the "." and ".." links.  When you remove a file, you would remove those two links and finally unlink the directory itself. As you couldn't call mknod(2) as an ordinary user nor could you call unlink(2) on a directory, the mkdir(1) and rmdir(1) commands were set-uid root.  A cursory inspection of the UNIX-V7 showed that both commands likely had security bugs.

Did 4.2BSD remove the ability to link or unlink directories?  It didn't.  It was probably kept temporarily for backward compatibility.  But many years later, and many Unix releases later, it is still their; neither Sun in SunOS or Solaris, nor Oracle in Solaris 11/11 or 11.1.

If you ask fsck(1m), the final arbiter about what is a valid UFS file system, it will complain loudly and it generally required system admin intervention when you made an additional hardlink to a directory; this was later hidden by logging UFS; fsck was hardly ever run since the introduction of UFS logging especially once it became the default.  In tmpfs it was a good way to lose swap, hide data or confuse the kernel. Special code was needed in find(1) and du(1) to not lose their way when the file system isn't a tree but rather a cyclic graph.

It is one of the reasons why, when Solaris Zones were developed, we decided that non-global zones can only be run without the {SYS_LINKDIR} privilege and that when we introduced ZFS it came without the ability to use link(2) or unlink(2) on directories.  VxFS also doesn't allow additional hardlinks to directories. And no-one complained!

This discrepancy between the global zone and non-global zones and ZFS versus the rest and it gave us problems when developing code; code run in tmpfs file system in the global zone, suddenly stopped working when moved to a non-global zone; code that worked before in UFS stopped working when moved to ZFS or to a non-global zone.  As Linux never allowed unlink(2) on directories, code developed there might suddenly have disastrous effect on Solaris when it was run with (not-so) appropriate privileges under Solaris.  There were at least two cases during the development of Solaris 11.2 when we were bitten by this problem for code we developed ourselves.

The time has arrived to disable link(2) and unlink(2) on directories; and that is what we have done in Solaris 11.2.  The {SYS_LINKDIR} privileges still exists in Solaris 11.2 but it is obsolete and has no effect.  We will likely remove it in a future minor release.

Is this a sudden incompatible change?  Perhaps, but is well within the limits of the specification and using this feature only leads to downtime and support calls. Sorry for removing this rope from your toolbox.

Monday May 12, 2014

Solaris 11: Evolution of v_path.

In Solaris 10, Eric Schrock (now at Delphix) added vnode-to-pathname functionality in the kernel; it stored the pathname used to find a file in the vnode but it did not handle renames nor did it elide ".." from the stored pathnames; the pathname stored was generally a full pathname from the root from the global zone.  It was used for getcwd(3) and for path subdirectory in /proc/pid/.

The v_path was implemented as a hint and whenever it was retrieved, e.g., for getcwd(3) or for the /proc file ssytem, the actual path was computed and the current zone's root directory was removed.

When I started to work on the Extended Policy and later on the Immutable Global zone, it was clear that the v_path was very useful but it wasn't ready for those projects.

The Immutable Non-Global Zone (Solaris 11/11)

In the IMNGZ we need to compute the pathname and then check the pathname against the black-list and the white-list; however, where we are doing that the kernel is deep inside the file system code and we can't verify and recompute the pathname as we might be hold locks that we need further down; but since we are protecting a particular set of files and those files cannot be changed or renamed, it is safe to use the v_path as if it is more than a hint.  We did need to elide ".." and simplify pathnames; this is done directly when we are setting the v_path for a newly created pathname and if the code tries to add a ".." it instead removes the last component of the pathname. We did need to prevent linking protected files into the non-protected file space as that would circumvent the MWAC(5) protection offered in an IMNGZ.

The Extended Policy (Solaris 11.1)

The Extended Policy applies to all filenames in the filesystem, including those that can be renamed.  This is why we put some effort in handling renames better.  We now update the v_path name on rename(2) in all file systems; in the case of a link(2) we also handle this as a rename(2) as the observation is that the new name outlives the first name.  This new behavior works well with leaf nodes but there is no efficient algorithm that can handle the rename of a directory and all its children, yet we have no option other than using v_path for the same reasons we have for the IMNGZ. When we recalculate the pathname, e.g., for /proc or for getcwd() and we find it wanting, we update the v_path to the newly computed path, including all directories making up the full pathname.

One possible security risk is that a vnode has an incorrect v_path and the Extended Policy gives more privileges on that v_path then it gives for the actual pathname.  As this can only happen if the file once lived in that location this is not actually a risk at all; the process was able in the past to use those privileges on that file. We do make sure that linking is not allowed when the Extended Policy gives more privileges for the new pathname.

An update was needed for the secpolicy_*() routines to allow the Extended Policy to make a decision about files or directories that do not exist yet; as an extra benefit privilege debugging now gives even more information as we have more information deep down in the policy routines:

solaris11.0$ ppriv -De mkdir /casper
mkdir[11162]: missing privilege "ALL" (euid = 12345, syscall = 102) for "/" needed at zfs_zaccess+0x2c8
mkdir: Failed to make directory "/casper"; Permission denied

In Solaris 11.1 we know the full filename to be created and also show that with privilege debugging:

solaris11.1$ ppriv -De mkdir /casper
mkdir[13924]: missing privilege "ALL" (euid = 12345, syscall = 102) for "/casper" needed at zfs_zaccess+0x245
mkdir: Failed to make directory "/casper"; Permission denied

In Solaris 11.2 we also show the sycall name:

solaris 11.2$ ppriv -De mkdir /casper
mkdir[17488]: missing privilege "ALL" (euid = 12345, syscall = "mkdirat") for "/casper" at zfs_zaccess+0x245
mkdir: Failed to make directory "/casper"; Permission denied

Getcwd(3), realpath(3) fixes.

As part of the Extended Policy project, fixes to getcwd() and realpath() were made during the development of Solaris 11.1.  We've also put some of these fixes in 11.0 SRUs and in Solaris 10 patches. These fixes are the following:

  • Improved getcwd()/realpath() performance in zones.
  • Improved getcwd()/realpath() performance in the case of renaming (in some cases 1000x faster)
  • Fix getcwd() for chrooted process when the current working directory is not under the root directory. (This was a regression of the in-kernel getcwd())
  • Don't fail with EACCES so quickly
  • No limit on the size of the returned path from getcwd() and realpath()
  • realpath() moved into the kernel and the frealpath() system call (Solaris 11.1 and later only)

Several operating systems have "extended" getcwd(3) to return an unrestricted pathname when called as follows:

   char *cwd = getcwd(NULL, 0);

unfortunately, this is strictly forbidden by the standard:

     The getcwd() function shall fail if:


     EINVAL    The size argument is 0.

So in Solaris you have to loop with a longer and longer buffer until getcwd() no longer returns NULL with errno set to ERANGE or you could use realpath(".", NULL) in which case we can return a long pathname.

Both are actually a lot faster than running your own userland getcwd() implentation and such implementations are more likely to fail.

Tuesday May 31, 2005

Southpark Stdio

I guess we all have to do this now, so here's my self-portrait. After pointing my kids to this, they and their friends spend a whole afternoon creating images of themselves, their mothers and fathers. Well, I preempted them and did myself before they had a chance.

Friday Dec 10, 2004


Sometimes completely unexpected events take place such as this little mishap with our car; modern MPVs are pretty forgiving and you might not immediately notice that you've got a flat, especially not if the driver hasn't driven the particular vehicle for a while, you have the fans blowing and are playing music at the same time. Until you hit the highway, that is. So, what happened next? A lot of noise and a rather poignant smell of burning rubber before we even hit 50 or 60 and in the few seconds we figured out that we had to pull over, the tire give way ; it had half left the rim before we came to a full stop on the hard shoulder. And here's what's left of it:
blown out tire

With cars zooming past at 120kmph(75mph) on a fairly narrow stretch of highway on a cold dark night with wife and kids in the car, you feel like someone has just painted a target sign on you and you anxiously wait until the help arrives. And they did, within 5 or 10 minutes.

Thursday Jul 08, 2004


As I've been posting to Usenet since 1988, I didn't think it would be so hard to start a weblog; but in Usenet I have been mostly reactive, replying to other people's questions, answers and thoughts.

A weblog feels like talking into thin air.

As I work specifically for Solaris Security I will be focussing on a few things that I have done for Solaris 10 which I think are neat. I'll try to explain some of the design decision we made and how and why they are different from, e.g., Trusted Solaris and other systems with similar features. Stay tuned.

(And yes, I will update the Solaris FAQ too, in a while)



« March 2015