Saturday Feb 20, 2016

Solaris 11.3 SRU 5.6: updates in ps(1) and /proc/<pid>/{cmdline,environ,execname}

Almost as soon as Solaris 2.0 was released, people started to complain about the limit of the ps(1) command line output; it was limited to 80 characters. The standard ps(1) command was also not able to print the environment variables.

The /usr/ucb/ps command could, but it needed to trawl through the address space of the target process.  In order to do so it needs to have at least the same privileges and uids/gids to prevent privilege escalation.  Simple having the {proc_owner} privilege is not sufficient.

When we added pkill(1)/pgrep(1), they to were limited in the same way: they could only find search the first 80 bytes of the command line (PRARGSZ) and the first 16 bytes of the command name (PRFNSZ).

 These were serious limitation; for one, it became difficult to find a specific java process as the typical java command line is generally much larger than 80 bytes and the often the important jar file is beyond the 80 byte limit.

 Of course, our customers did not like this limit either.

 We fixed this problem in Solaris 12 and now also in Solaris 11.3 SRU 5.6 by adding three new files under /proc/<pid>:

  • cmdline - all original arguments separated by NUL bytes
  • environ -  all original environment values separated by NUL bytes
  • execname - the original program name given to exec.

The cmdline and execname are publicly readable; the environ file is restricted to the owner of the process or those processes which have the {proc_owner} privilege. The cmdline and environment file are very similar to those found under Linux, however these do reflect the actual argument vectors in the process' address space, so they do not reflect the changes made by the programs themselves.

A new -o format option "env" was added to ps(1); the new files are used and ps(1) will now display the full command line.

 As neither ps(1) or ps(1b) needs to open /proc/<pid>/as, fewer privileges are now needed and read access to he executable is no longer required: this is big performance win for ps(1b) especially when NFS binaries are in the mix.

As I basically back ported changes to ps and /proc from Solaris 12, the whole list bugs and enhancement is as follows:

        PSARC/2015/207 /proc/<pid>/{cmdline,environ,execname} extensions to /proc.
        15742822 SUNBT7092685 Extend /proc interfaces to allow ps(1) to show more of the command
        15420404 SUNBT6599384 pgrep/pkill don't find processes with 16 char filenames or match ...
        19669195 memory-leak in ucb_procinfo of ucbps.c:569
        15227016 SUNBT5100626 ps(1) sometimes shows an empty string for the ttyname
        15282779 SUNBT6313436 /usr/ucb/ps malloc() failure results in unexpected argument parsing
        14966583 SUNBT4157509 /usr/ucb/ps not bsd or sunos 4.x compatible on command line
        15488063 SUNBT6715628 ps -d makes -z have no effect
        21447952 /usr/ucb/ps gxw hangs, but w/out the w does not; never open /proc/<pid>/as
        21297345 procfs limits the size of the control messages
        15582848 SUNBT6872216 ps command needs to keep trackof prior name/uid information
        15584899 SUNBT6875625 ps command should chdir to /proc to remove lock contention

Monday May 12, 2014

Solaris 11: Evolution of v_path.

In Solaris 10, Eric Schrock (now at Delphix) added vnode-to-pathname functionality in the kernel; it stored the pathname used to find a file in the vnode but it did not handle renames nor did it elide ".." from the stored pathnames; the pathname stored was generally a full pathname from the root from the global zone.  It was used for getcwd(3) and for path subdirectory in /proc/pid/.

The v_path was implemented as a hint and whenever it was retrieved, e.g., for getcwd(3) or for the /proc file ssytem, the actual path was computed and the current zone's root directory was removed.

When I started to work on the Extended Policy and later on the Immutable Global zone, it was clear that the v_path was very useful but it wasn't ready for those projects.

The Immutable Non-Global Zone (Solaris 11/11)

In the IMNGZ we need to compute the pathname and then check the pathname against the black-list and the white-list; however, where we are doing that the kernel is deep inside the file system code and we can't verify and recompute the pathname as we might be hold locks that we need further down; but since we are protecting a particular set of files and those files cannot be changed or renamed, it is safe to use the v_path as if it is more than a hint.  We did need to elide ".." and simplify pathnames; this is done directly when we are setting the v_path for a newly created pathname and if the code tries to add a ".." it instead removes the last component of the pathname. We did need to prevent linking protected files into the non-protected file space as that would circumvent the MWAC(5) protection offered in an IMNGZ.

The Extended Policy (Solaris 11.1)

The Extended Policy applies to all filenames in the filesystem, including those that can be renamed.  This is why we put some effort in handling renames better.  We now update the v_path name on rename(2) in all file systems; in the case of a link(2) we also handle this as a rename(2) as the observation is that the new name outlives the first name.  This new behavior works well with leaf nodes but there is no efficient algorithm that can handle the rename of a directory and all its children, yet we have no option other than using v_path for the same reasons we have for the IMNGZ. When we recalculate the pathname, e.g., for /proc or for getcwd() and we find it wanting, we update the v_path to the newly computed path, including all directories making up the full pathname.

One possible security risk is that a vnode has an incorrect v_path and the Extended Policy gives more privileges on that v_path then it gives for the actual pathname.  As this can only happen if the file once lived in that location this is not actually a risk at all; the process was able in the past to use those privileges on that file. We do make sure that linking is not allowed when the Extended Policy gives more privileges for the new pathname.

An update was needed for the secpolicy_*() routines to allow the Extended Policy to make a decision about files or directories that do not exist yet; as an extra benefit privilege debugging now gives even more information as we have more information deep down in the policy routines:

solaris11.0$ ppriv -De mkdir /casper
mkdir[11162]: missing privilege "ALL" (euid = 12345, syscall = 102) for "/" needed at zfs_zaccess+0x2c8
mkdir: Failed to make directory "/casper"; Permission denied

In Solaris 11.1 we know the full filename to be created and also show that with privilege debugging:

solaris11.1$ ppriv -De mkdir /casper
mkdir[13924]: missing privilege "ALL" (euid = 12345, syscall = 102) for "/casper" needed at zfs_zaccess+0x245
mkdir: Failed to make directory "/casper"; Permission denied

In Solaris 11.2 we also show the sycall name:

solaris 11.2$ ppriv -De mkdir /casper
mkdir[17488]: missing privilege "ALL" (euid = 12345, syscall = "mkdirat") for "/casper" at zfs_zaccess+0x245
mkdir: Failed to make directory "/casper"; Permission denied

Getcwd(3), realpath(3) fixes.

As part of the Extended Policy project, fixes to getcwd() and realpath() were made during the development of Solaris 11.1.  We've also put some of these fixes in 11.0 SRUs and in Solaris 10 patches. These fixes are the following:

  • Improved getcwd()/realpath() performance in zones.
  • Improved getcwd()/realpath() performance in the case of renaming (in some cases 1000x faster)
  • Fix getcwd() for chrooted process when the current working directory is not under the root directory. (This was a regression of the in-kernel getcwd())
  • Don't fail with EACCES so quickly
  • No limit on the size of the returned path from getcwd() and realpath()
  • realpath() moved into the kernel and the frealpath() system call (Solaris 11.1 and later only)

Several operating systems have "extended" getcwd(3) to return an unrestricted pathname when called as follows:

   char *cwd = getcwd(NULL, 0);

unfortunately, this is strictly forbidden by the standard:

     The getcwd() function shall fail if:


     EINVAL    The size argument is 0.

So in Solaris you have to loop with a longer and longer buffer until getcwd() no longer returns NULL with errno set to ERANGE or you could use realpath(".", NULL) in which case we can return a long pathname.

Both are actually a lot faster than running your own userland getcwd() implentation and such implementations are more likely to fail.




« April 2017