Solaris 11: Evolution of v_path.

In Solaris 10, Eric Schrock (now at Delphix) added vnode-to-pathname functionality in the kernel; it stored the pathname used to find a file in the vnode but it did not handle renames nor did it elide ".." from the stored pathnames; the pathname stored was generally a full pathname from the root from the global zone.  It was used for getcwd(3) and for path subdirectory in /proc/pid/.

The v_path was implemented as a hint and whenever it was retrieved, e.g., for getcwd(3) or for the /proc file ssytem, the actual path was computed and the current zone's root directory was removed.

When I started to work on the Extended Policy and later on the Immutable Global zone, it was clear that the v_path was very useful but it wasn't ready for those projects.

The Immutable Non-Global Zone (Solaris 11/11)

In the IMNGZ we need to compute the pathname and then check the pathname against the black-list and the white-list; however, where we are doing that the kernel is deep inside the file system code and we can't verify and recompute the pathname as we might be hold locks that we need further down; but since we are protecting a particular set of files and those files cannot be changed or renamed, it is safe to use the v_path as if it is more than a hint.  We did need to elide ".." and simplify pathnames; this is done directly when we are setting the v_path for a newly created pathname and if the code tries to add a ".." it instead removes the last component of the pathname. We did need to prevent linking protected files into the non-protected file space as that would circumvent the MWAC(5) protection offered in an IMNGZ.

The Extended Policy (Solaris 11.1)

The Extended Policy applies to all filenames in the filesystem, including those that can be renamed.  This is why we put some effort in handling renames better.  We now update the v_path name on rename(2) in all file systems; in the case of a link(2) we also handle this as a rename(2) as the observation is that the new name outlives the first name.  This new behavior works well with leaf nodes but there is no efficient algorithm that can handle the rename of a directory and all its children, yet we have no option other than using v_path for the same reasons we have for the IMNGZ. When we recalculate the pathname, e.g., for /proc or for getcwd() and we find it wanting, we update the v_path to the newly computed path, including all directories making up the full pathname.

One possible security risk is that a vnode has an incorrect v_path and the Extended Policy gives more privileges on that v_path then it gives for the actual pathname.  As this can only happen if the file once lived in that location this is not actually a risk at all; the process was able in the past to use those privileges on that file. We do make sure that linking is not allowed when the Extended Policy gives more privileges for the new pathname.

An update was needed for the secpolicy_*() routines to allow the Extended Policy to make a decision about files or directories that do not exist yet; as an extra benefit privilege debugging now gives even more information as we have more information deep down in the policy routines:

solaris11.0$ ppriv -De mkdir /casper
mkdir[11162]: missing privilege "ALL" (euid = 12345, syscall = 102) for "/" needed at zfs_zaccess+0x2c8
mkdir: Failed to make directory "/casper"; Permission denied

In Solaris 11.1 we know the full filename to be created and also show that with privilege debugging:

solaris11.1$ ppriv -De mkdir /casper
mkdir[13924]: missing privilege "ALL" (euid = 12345, syscall = 102) for "/casper" needed at zfs_zaccess+0x245
mkdir: Failed to make directory "/casper"; Permission denied

In Solaris 11.2 we also show the sycall name:

solaris 11.2$ ppriv -De mkdir /casper
mkdir[17488]: missing privilege "ALL" (euid = 12345, syscall = "mkdirat") for "/casper" at zfs_zaccess+0x245
mkdir: Failed to make directory "/casper"; Permission denied

Getcwd(3), realpath(3) fixes.

As part of the Extended Policy project, fixes to getcwd() and realpath() were made during the development of Solaris 11.1.  We've also put some of these fixes in 11.0 SRUs and in Solaris 10 patches. These fixes are the following:

  • Improved getcwd()/realpath() performance in zones.
  • Improved getcwd()/realpath() performance in the case of renaming (in some cases 1000x faster)
  • Fix getcwd() for chrooted process when the current working directory is not under the root directory. (This was a regression of the in-kernel getcwd())
  • Don't fail with EACCES so quickly
  • No limit on the size of the returned path from getcwd() and realpath()
  • realpath() moved into the kernel and the frealpath() system call (Solaris 11.1 and later only)

Several operating systems have "extended" getcwd(3) to return an unrestricted pathname when called as follows:

   char *cwd = getcwd(NULL, 0);

unfortunately, this is strictly forbidden by the standard:

     The getcwd() function shall fail if:


     EINVAL    The size argument is 0.

So in Solaris you have to loop with a longer and longer buffer until getcwd() no longer returns NULL with errno set to ERANGE or you could use realpath(".", NULL) in which case we can return a long pathname.

Both are actually a lot faster than running your own userland getcwd() implentation and such implementations are more likely to fail.


Post a Comment:
Comments are closed for this entry.



« December 2015