Friday Nov 04, 2016

Non-reboot immutable zone (as of 11.3 SRU 12)

Immutable Zones

In Solaris 11 we introduced Immutable Zones.  In Solaris 11.2 we added the Immutable Global Zone.  Please read my earlier blog if you are unfamiliar with Immutable Zones.

What is new: Why was a reboot required?

When a Solaris system reboots after the system was just installed or upgraded, self assembly is performed.  This phase requires the system to be read-write, even when it is an immutable (global) zone.  All services which require self-assembly specify that the self-assembly-complete milestone is dependent on them. When the self-assembly-complete milestone is reached, the system reboots immutable.

As the self-assembly-complete milestone isn't reached until man-index and several other service have run their course, the second boot may come quite a bit later.  Use: the following command to list all the services:

# svcs -d self-assembly-complete

The self-assembly-service itself reboots the system; for non-global zones this is pretty quick but for global zones this can be pretty slow.  We have always found this a bug rather than a feature so we decided to fix that for the next Solaris release.  Of course, if self-assembly needs to wait for man-index and other services to complete, not requiring a reboot was only a small part of the puzzle we needed to solve.

In Solaris 11.2 with the design of the immutable global zone, we created the Trusted Path.  Processes which run under the Trusted Path can modify files which are normally read-only in immutable zones.  While this feature was invented to allow modification and updates of an Immutable Global Zone as well as modifications to non-global Immutable Zones using the zlogin -T or -U options, we realized that this could be used to do certain self-assembly operations asynchronously. Asynchronous updates would be interrupted if we rebooted so we needed to fix both problems at the same time.

We implemented this in the next Solaris release and one of our customers who got earlier access to that release filed a service request for a back port of that feature and the rest is history.

What is new in Solaris 11.3 SRU 12?

We introduce a new milestone. immutable-setup, which is reached early during boot and sets the zone's file-mac-profile.  Services which need to know whether the system is immutable need to wait until that milestone is reached.  For native zones this is already done by zoneadmd before starting the zone.

When the self-assembly-complete milestone is reached, the zone will become immutable at that time.  In earlier releases of Solaris a reboot would happen:

[NOTICE: This read-only system transiently booted read/write] [NOTICE: Now that self assembly has been completed, the system is rebooting]

as of Solaris 11.3 SRU 12 you will see:

[NOTICE: switching to read-only mode]

and the system will continue while man-index asynchronously continues to format manual pages.

But that is not all!  It is now much easier to configure an immutable global zone and, again, without requiring a reboot:

# zonecfg -z global set file-mac-profile=<profile>

# zoneadm -z global apply

You will see the same message on /dev/console

Saturday Feb 20, 2016

Solaris 11.3 SRU 5.6: updates in ps(1) and /proc/<pid>/{cmdline,environ,execname}

Almost as soon as Solaris 2.0 was released, people started to complain about the limit of the ps(1) command line output; it was limited to 80 characters. The standard ps(1) command was also not able to print the environment variables.

The /usr/ucb/ps command could, but it needed to trawl through the address space of the target process.  In order to do so it needs to have at least the same privileges and uids/gids to prevent privilege escalation.  Simple having the {proc_owner} privilege is not sufficient.

When we added pkill(1)/pgrep(1), they to were limited in the same way: they could only find search the first 80 bytes of the command line (PRARGSZ) and the first 16 bytes of the command name (PRFNSZ).

 These were serious limitation; for one, it became difficult to find a specific java process as the typical java command line is generally much larger than 80 bytes and the often the important jar file is beyond the 80 byte limit.

 Of course, our customers did not like this limit either.

 We fixed this problem in Solaris 12 and now also in Solaris 11.3 SRU 5.6 by adding three new files under /proc/<pid>:

  • cmdline - all original arguments separated by NUL bytes
  • environ -  all original environment values separated by NUL bytes
  • execname - the original program name given to exec.

The cmdline and execname are publicly readable; the environ file is restricted to the owner of the process or those processes which have the {proc_owner} privilege. The cmdline and environment file are very similar to those found under Linux, however these do reflect the actual argument vectors in the process' address space, so they do not reflect the changes made by the programs themselves.

A new -o format option "env" was added to ps(1); the new files are used and ps(1) will now display the full command line.

 As neither ps(1) or ps(1b) needs to open /proc/<pid>/as, fewer privileges are now needed and read access to he executable is no longer required: this is big performance win for ps(1b) especially when NFS binaries are in the mix.

As I basically back ported changes to ps and /proc from Solaris 12, the whole list bugs and enhancement is as follows:

        PSARC/2015/207 /proc/<pid>/{cmdline,environ,execname} extensions to /proc.
        15742822 SUNBT7092685 Extend /proc interfaces to allow ps(1) to show more of the command
        15420404 SUNBT6599384 pgrep/pkill don't find processes with 16 char filenames or match ...
        19669195 memory-leak in ucb_procinfo of ucbps.c:569
        15227016 SUNBT5100626 ps(1) sometimes shows an empty string for the ttyname
        15282779 SUNBT6313436 /usr/ucb/ps malloc() failure results in unexpected argument parsing
        14966583 SUNBT4157509 /usr/ucb/ps not bsd or sunos 4.x compatible on command line
        15488063 SUNBT6715628 ps -d makes -z have no effect
        21447952 /usr/ucb/ps gxw hangs, but w/out the w does not; never open /proc/<pid>/as
        21297345 procfs limits the size of the control messages
        15582848 SUNBT6872216 ps command needs to keep trackof prior name/uid information
        15584899 SUNBT6875625 ps command should chdir to /proc to remove lock contention

Tuesday Jul 07, 2015

Solaris 11.3: rtc(1m) no longer warps the time by default

On x86 hardware Solaris derives the time-of-day from the "real-time-clock (RTC)"; traditionally, this clock is defined ticking in "local time".

 That has always been problematic when you have multiple OSes installed on the same hardware. All OSes will want to change RTC when they believe that we have just crossed one of the day-light-saving time boundaries. With Solaris 11's new boot environment, it is even a problem when you only have one OS installed but with multiple boot environments.  That is why Solaris 11 prefers to run the RTC in UTC so it never needs to be changed and all the boot environments are perfectly happy.

If you want to change the timezone for the RTC as recorded in /etc/rtc_config, you would use "rtc -z <timezone>"; unfortunately, the existing behavior was to warp the time. That was rather surprising as most of the time the system's time is properly set. This behavior has always annoyed me but it wasn't until some customer complained about this behavior in comp.unix.solaris that I realized that I wasn't the only person annoyed and so I decided to fix it.

In Solaris 11.3 rtc(1m) will no longer wrap the time.  If you really want to warp the time you will now need to use the new "-w" option.

Solaris 11.3: New Immutable Global Zone file-mac-profile: dynamic-zones

In Solaris 11.2 we introduced the Immutable Global Zone.  Just like the Immutable Zones introduced in Solaris 11/11, it supports three different file-mac-profiles: strict, fixed-configuration and flexible-configuration.

To refresh your memory, these three file-mac-profiles as well as the default value, "none",  are described in zonecfg(1m) as follows:

           There are currently four supported values for this property:  none,
           strict, fixed-configuration, and flexible-configuration.

           none  makes the zone exactly the same as a normal, r/w zone. strict
           allows no exceptions to the read-only  policy.  fixed-configuration
           allows  the zone to write to files in and below /var, except direc-
           tories containing configuration files:


           flexible-configuration is equal to fixed-configuration, but  allows
           writing to files in /etc in addition.

In Solaris 11.3 we are adding fourth file-mac-profile: dynamic-zones.  It should be seen as sitting between fixed-configuration and flexible-configuration.

This particular profile is only valid for the global zone; it allows the administrator to create and destroy non-global zones, kernel zones, etc.

While this is already possible with the flexible-configuration, that file-mac-profile allows the ability to change much of the system configuration; but with the other profiles, creating or destroying a zone requires using the Trusted Path.  The dynamic-zones profile is a compromise: it allows to restrict the configuration of the system, yet it does allow a user with proper authorizations to create and destroy zones.

The dynamic-zones profile was targeted specifically at using an immutable global zone on the OpenStack Nova compute nodes.

Solaris 11.3: New per-share, per-instance reserved port property for NFS

It sounds like a lifetime ago, that I added the following question to the Solaris FAQ:

7.8) How can I make the NFS server ignore unprivileged clients?

    In a restricted environment, i.e., an environment where the
    administrator controls root access, you can enhance NFS security
    by setting the "NFS_PORTMON" variable.  This variable is set in
    /etc/system, like this:

    * Prior to Solaris 2.5
    set nfs:nfs_portmon = 1

    * Solaris 2.5 and later
    set nfssrv:nfs_portmon = 1

 You could wonder why this was never the default, the answer is that reserved ports are a BSD Unix invention from the time that computers where large and centrally administrated; an invention later copied to all Unix like operating system but outside of that world it makes little sense. As a result, many NFS clients can use any port and might not be able to restrict the ports they use.

The "nfs_portmon" variable was global; Solaris has evolved and now has multiple different NFS server instances (one for each zone); customers also have requested to have a per-share setting.

In Solaris 11.3 we introduce a new sharectl property:

 # sharectl get -p resvport nfs

as well as a new resvport share option:

# zfs get share.nfs.sys.resvport build/casper
NAME          PROPERTY                    VALUE  SOURCE
build/casper  share.nfs.sec.sys.resvport  off    default

The sharectl property is global for the NFS server instance; if it is set to true, this overrides per-share properties.  If a system is upgraded, it will take the value from /etc/system and it will log a message that in future, sharectl(1m) should be used instaed.

When the sharectl property is set to false, you can set resvport for each share individually.  As you can that this is restricted to the "sys" security mode; when proper security such as Kerberos V is used, we do not verify that the NFS client uses privileged ports.

It goes without saying that actual NFS security can only be had when using a security mode other than "sys"




« April 2017