Wednesday Sep 24, 2014

getcwd(NULL, 0) revisited

Earlier I claimed that the POSIX standard didn't allow an extension in which the following statement returned anything other than NULL while setting errno to EINVAL:

char *cwd = getcwd(NULL, 0);

However, standard also says:

"If buf is a null pointer, the behavior of getcwd() is unspecified."

so the GNU/Linux extension is perfectly legal.  I was clearly wrong.

As many application check for this behavior when configuring, replacing the standard Solaris getcwd() with a concoction which runs much slower and fails more often, when Solaris getcwd() is found wanting, we're changing getcwd() to allocate sufficient memory when it is called with a buffer pointer of NULL and a size of zero.

Of course, you could already have that in Solaris 11, if you replaced getcwd(NULL, 0) with realpath(".", NULL).

Addendum: this fix is included in Solaris 11.2 SRU 3

Friday May 02, 2014

Solaris 11.2: No Limits

In the past, I have increased a number of limitations in Solaris:

  • In Solaris 11.0, I increased NGROUPS_MAX to 1024 (from 32); also available since Solaris 10u8.
  • In Solaris 11.1, I added support for more than 16 groups for NFS AUTH_SYS authentication
  • In Solaris 11.1, I changed the system calls getcwd() and realpath() to support returning pathnames longer than MAXPATHLEN (and introduced frealpath() while I was in that code)

So what did I change in Solaris 11.2?   It was about time to look at the restrictions of user names and group names.

In a micro release, such as a Solaris 11 update, we cannot modify constants such as LOGNAME_MAX because of binary compatibility, we can only do that in a future minor release.  However, we can modify the code that limit usernames.  These are the bugs we have fixed and this shows how much work it actually was:

    14933330 SUNBT4033673 getlogin causes passwd to fail if login name is longer than 8 chars
    14954449 SUNBT4109819 programs inconsistently limit the size of user names
    15059729 SUNBT4435330 logname(1) prints out only part of long login name
    15178384 SUNBT4927530 *w* w(1) truncates usernames to 8 chars
    15393621 SUNBT6551524 su truncates LOGNAME for long usernames.
    15436992 SUNBT6627292 *cron* confused about username lengths
    15550167 SUNBT6819489 *su* sulog source username truncated to 8 chars but not destination
    15574163 SUNBT6857992 ps -u does not support usernames longer than 10 chars
    15579148 SUNBT6866548 last command does not support usernames longer than 8 characters
    17528753 group name handling in Solaris is a standards violation
    17528788 useradd(1m) user name handling problems
    17600453 bug 15226690, find with long usernames, not completely fixed
    17600724 The fix for 14954449 misses some programs (in.rlogind, in.rshd. zone*, dump)
    17625438 group file updates very inefficient.
    17625458 pwck lives in the past
    18068180 SunSSH truncates usernames/home directories with %.100s
    18068355 A few programs still limit the size of user names.
    18068215 passmgmt invents its own limits for the sizes of entries in /etc/passwd

In generaly, the code was changed to lift limits, but we are generally limited by the format of the utmpx file.  The maximum length of a username that can be stored there, is 32 bytes.  This is now a safe limit and we support user names in length upto 32 characters, despite protests from useradd(1m).  getlogin() and getlogin_r() can return a string of at most 33 characters, including the final NUL character.  Of course, getlogin_r() will not store past the end of the buffer given to it but it will now accept a buffer of any size.   Programs changed are, among others:

  • logname(1)
  • w(1)
  • who(1)
  • last(1)
  • ls(1)  - now a 64 bit executable
  • find(1) - now a 64 bit executable
  • passmgmt(1)
  • useradd/usermod/roleadd/rolemod(1m)
  • sshd(1mr)
  • repquota(1m)
  • zfs(1)
  • yppasswd(1)
  • tar(1)
  • lastcomm(1)
  • cron(1) etc
  • newtask(1)
  • ps(1)
  • wall(1)
  • rwall(1)
  • zlogin(1)
  • grpck(1)
  • pwck(1)
  • login(1)
  • in.rexecd(1m), in.rshd(1m), in.rlogind(1m)

And libraries such as libsocket (remote shell/remote login/rexec protocol)

I could only wonder why so many applications cache the return value of getpwuid() and getgrgid() while doing that in a fixed sized character array.

For reasons only known in New-Jersey, we didn't allow groupnames over 8 characters while limiting the characters to lower case and digits; as there is no manifest constant defining the size of a group name, there is no problem increasing it so we currently support upto 32 characters and we now accept all portable file name characters in a group name (lower and upper case, digits, dot, hyphen and underscore as long as the name doesn't start with a hyphen. Other than programs caching the result of getpwuid(), I found no other limits on the length of a group name in our code.

Thursday May 01, 2014

Solaris 11.2: Immutable Global Zone

This is blog is a bit more substantial; it requires some knowledge about Solaris Zones, Immutable Zones and Solaris administration in general. It is high-level; in future I'm hoping to get down to the nuts and bolts.

Immutable Zones

In Solaris 11 we added the Read-Only Root Non-Global Zones, marketed as Immutable Zones; this is a feature that makes a zone tamper-proof.

In an Immutable Zone is configured simply by setting the "file-mac-profile" to one of "strict" (not much writeable), "fixed-configuration" and "flexible-configuration" (configuration is writeable but binaries and such or not). This is all implemented in the kernel based on pathnames and depending on the context; the super-user in the global zone can still update the zone or even modify protected files as long as that is not done from within the zone.

We have made some changes to Immutable Non-Global Zones (IMZ, for short) that came out of developing the Immutable Global Zones (IMGZ); we have added a new feature, the "Trusted Path (TP)"; when logged in through the Trusted Path using the "-T" option to zlogin(1m), you can now modify protected files from within the zone. This is much safer as you no longer need to give root access in the global zone nor do you need to boot the IMZ in writeable mode. In the following example, we log in to the zone "fixed" which has been configured with the fixed-configuration file-mac-profile. A normal root login doesn't allow us to modify "/etc/passwd"; I'm using touch(1) under privilege debugging to illustrate the error and was caused the error. When we login with the "-T" option we suddenly can modify "/etc/passwd" because we're now in the Trusted Path. Notice also that the output from privilege debugging has been clarified; it points to the MWAC(5) manual page and it now also lists the system call name and not the number as it did before.

# zlogin fixed
[Connected to zone 'fixed' pts/3]
Oracle Corporation      SunOS 5.11      11.2    April 2014
root@fixed:~# ppriv -De touch /etc/passwd
touch[117063]: MWAC(5) policy violation (euid = 0, syscall = "utimensat") for "/etc/passwd" at fop_setattr+0x10b
touch: cannot change times on /etc/passwd: Read-only file system
root@fixed:~# logout
[Connection to zone 'fixed' pts/3 closed]

# zlogin -T fixed
[Connected to zone 'fixed' pts/3]
Oracle Corporation      SunOS 5.11      11.2    April 2014
root@fixed:~# ppriv -De touch /etc/passwd

Additionally, we have restricted the use of mount(1m) in an IMZ, while we allowed random loopback mounts before we now only allow loopback mounts on empty directories unless the file or directory isn't protected by MWAC(5).

Immutable Global Zone

In order to prevent tampering of the file system, we have extended Immutable Zones in Solaris 11.2 to the global zone; using the same mechanism you can now configure the global zone as an IMGZ. As there is no "super-global" zone, a different mechanism has been designed to enter the Trusted Path. A kernel-zone still has a bare-metal zone controlling it, so this doesn't apply to kernel zones. Some additional steps need to be taken and they are listed here.

Preparing the global zone for immutable global zone.

As maintenance of the global zone is only possible using the Trusted Path access; Trusted Path is only available on the console, so make sure the console is accessible through the ILOM, a serial connection or through the graphical console.

Once a system is configured as an immutable global zone, the break sequence, F1-A on a graphical console, <break> or the alternate break sequence (CR-tilde-<ctl-b>) on a serial console, will instead start the Trusted Path login. (A immediate second break sequence will work as a standard break-sequence: start the kernel debugger (if it is loaded), drop to the OBP, etc)

Configuring the Global Immutable Zone

The configuration of the global zone is done through zonecfg(1m) by picking the appropriate file-mac-profile for your situation; they allowed values are the same for non-global immutable zones: "strict", "fixed-configuration", "flexible-configuration". See zonecfg(1m).

Note that if the system uses DHCP to set network interfaces, the "flexible-configuration" must be selected.

        # zonecfg -z global
        zonecfg:global> set file-mac-profile=flexible-configuration
The "rpool" dataset will be restricted but sub dataset can be unrestricted using "add dataset"
        zonecfg:global> add dataset
        zonecfg:global:dataset> set name=rpool/export
        zonecfg:global:dataset> end

        zonecfg:global> add dataset
        zonecfg:global:dataset> set name=rpool/zones
        zonecfg:global:dataset> end
In this example we add "rpool/export" and "rpool/zones"; writable data sets for users and for zones. An immutable global zone can only run zones in unrestricted datasets. All the children of an unrestricted dataset are also unrestricted.

Note that all datasets on other zpools are unrestricted and there is no needed to add them with "add dataset".

After committing the zonecfg boot information is written and the boot archive is updated:

        zonecfg:global> commit
        updating /platform/sun4u/boot_archive
When the system is configured, it should be rebooted the system will boot with an immutable global zone.

Maintenance of the immutable global zone

An immutable zone cannot be updated other then through the Trusted Path login or when the system is booted in writeable mode by using the "-w" flag when booting. Note that if you try to reboot the immutable zone with "reboot -- -w", the argument is ignored when not performed through the Trusted Path login.

After using the break-sequence on the console, you should be greeted with:

        trusted path console login:
Login and assume the root role; at that point ordinary commands used to update the system are available; this includes "pkg update", "beadm activate" or also "zonecfg" if the need arises to change the global zone's configuration.

A separate pam stack can be configured for tpdlogin(1).

When "pkg update" is performed, the first boot of the immutable global zone is read write; this is needed by the system to perform the needed self-assembly steps. When the self-assembly steps have been performed, the system will reboot and in this second boot the system will be immutable again.

Wednesday Apr 30, 2014

Solaris 11.2: User, Pid and Commands in netstat(1m)

As it has been years since I've blogged, let me start with one of smallest features I added to Solaris 11.2; an option to netstat(1m), allowing administrators to figure out who is using which port and which with process or command is using a particular network connection.

As there little or no similarity between other netstat implementation, we picked our own option letter "-u". At the same time we realigned the columns as the standard width didn't fit modern TCP window sizes, the length of Unix sockets, etc. We've also removed, for unprivileged users, unusable information such as the "kernel addresses", leaving a bit more room, though an 80 width terminal isn't really enough room for all of the information. Alignment only guaranteed with -n, of course.

Our implementation doesn't use /proc like the Linux implementation uses nor does it look through /dev/kmem like lsof(1m) does; instead we get the information available direct in the kernel. While some of the information might be out of date, we can give information about sockets in TIME_WAIT or CLOSE_WAIT, even when the latter sockets haven't been accepted yet! Additionally, those sockets owned by the kernel are also listed. This works in the global zone, non-global zones, kernel zones *and* even in Solaris 10 branded zones; the latter uses the "native" Solaris 11.2 netstat command.

Here is some sample output, partially hidden by how we format blogs (so, install Solaris 11.2 and all will be revealed)

% netstat -aun

   Local Address        Remote Address      User    Pid      Command       State
-------------------- -------------------- -------- ------ -------------- ----------
      *.50258                             root       1038 syslogd        Idle
      *.*                                 root        133 in.mpathd      Unbound
      *.*                                 root        133 in.mpathd      Unbound
      *.*                                 netadm      721 nwamd          Unbound
      *.*                                 netadm      721 nwamd          Unbound
      *.123                               root        961 ntpd           Idle
      *.123                               root        961 ntpd           Idle                             root        961 ntpd           Idle
10.311.249.18.123                         root        961 ntpd           Idle
      *.111                               daemon      980 rpcbind        Idle
      *.*                                 daemon      980 rpcbind        Unbound
      *.41327                             daemon      980 rpcbind        Idle
      *.111                               daemon      980 rpcbind        Idle
      *.*                                 daemon      980 rpcbind        Unbound
      *.37058                             daemon      980 rpcbind        Idle
      *.*                                 root        988 in.ndpd        Unbound
      *.*                                 root        999 statd          Unbound
      *.*                                 root        999 statd          Unbound
      *.39150                             root        999 statd          Idle
      *.43382                             root        999 statd          Idle
      *.4045                              daemon     1008 lockd          Idle
      *.4045                              daemon     1008 lockd          Idle
      *.56874                             root       1004 inetd          Idle
      *.37069                             root       1004 inetd          Idle
      *.42765                             root       1148 mountd         Idle
      *.64957                             root       1148 mountd         Idle
      *.2049                              root       1150 nfsd           Idle
      *.2049                              root       1150 nfsd           Idle

   Local Address                     Remote Address                   User    Pid      Command       State      If
--------------------------------- --------------------------------- -------- ------ -------------- ---------- -----
      *.*                                                           root        133 in.mpathd      Unbound    
      *.*                                                           netadm      721 nwamd          Unbound    
      *.123                                                         root        961 ntpd           Idle       
::1.123                                                             root        961 ntpd           Idle       
      *.111                                                         daemon      980 rpcbind        Idle       
      *.*                                                           daemon      980 rpcbind        Unbound    
      *.41327                                                       daemon      980 rpcbind        Idle       
      *.*                                                           root        988 in.ndpd        Unbound    
      *.39150                                                       root        999 statd          Idle       
      *.4045                                                        daemon     1008 lockd          Idle       
      *.37069                                                       root       1004 inetd          Idle       
      *.42765                                                       root       1148 mountd         Idle       
      *.2049                                                        root       1150 nfsd           Idle       

   Local Address        Remote Address      User     Pid     Command     Swind  Send-Q  Rwind  Recv-Q    State
-------------------- -------------------- -------- ------ ------------- ------- ------ ------- ------ -----------             *.*            root        133 in.mpathd           0      0  128000      0 LISTEN
      *.111                *.*            daemon      980 rpcbind             0      0  128000      0 LISTEN
      *.*                  *.*            daemon      980 rpcbind             0      0  128000      0 IDLE
      *.111                *.*            daemon      980 rpcbind             0      0  128000      0 LISTEN
      *.*                  *.*            daemon      980 rpcbind             0      0  128000      0 IDLE
      *.36887              *.*            root        999 statd               0      0  128000      0 LISTEN
      *.65159              *.*            root        999 statd               0      0  128000      0 LISTEN
10.311.249.18.58810  10.312.132.13.636    root        851 nscd            49232      0  128872      0 ESTABLISHED
      *.4045               *.*            daemon     1008 lockd               0      0 1049200      0 LISTEN
      *.4045               *.*            daemon     1008 lockd               0      0 1048952      0 LISTEN
      *.22                 *.*            root       1030 sshd                0      0  128000      0 LISTEN               *.*            root       1068 sendmail            0      0  128000      0 LISTEN              *.*            root       1068 sendmail            0      0  128000      0 LISTEN
      *.47629              *.*            root       1148 mountd              0      0  128000      0 LISTEN
      *.35906              *.*            root       1148 mountd              0      0  128000      0 LISTEN
      *.2049               *.*            root       1150 nfsd                0      0 1049200      0 LISTEN
      *.2049               *.*            root       1150 nfsd                0      0 1048952      0 LISTEN             *.*            pkg5srv    1600                     0      0  128000      0 LISTEN
10.311.249.18.857    10.311.246.25.2049   casper        0 <kernel>        49232      0 1049800    116 ESTABLISHED
10.311.249.18.22     10.311.249.34.64127  root       1030 sshd           263536     63  128872      0 ESTABLISHED             *.*            casper     1969 sshd                0      0  128000      0 LISTEN

   Local Address                     Remote Address                   User    Pid      Command      Swind  Send-Q  Rwind  Recv-Q   State      If
--------------------------------- --------------------------------- -------- ------ -------------- ------- ------ ------- ------ ----------- -----
::1.5999                                *.*                         root        133 in.mpathd            0      0  128000      0 LISTEN      
      *.111                             *.*                         daemon      980 rpcbind              0      0  128000      0 LISTEN      
      *.*                               *.*                         daemon      980 rpcbind              0      0  128000      0 IDLE        
      *.36887                           *.*                         root        999 statd                0      0  128000      0 LISTEN      
      *.4045                            *.*                         daemon     1008 lockd                0      0 1049200      0 LISTEN      
      *.22                              *.*                         root       1030 sshd                 0      0  128000      0 LISTEN      
::1.25                                  *.*                         root       1068 sendmail             0      0  128000      0 LISTEN      
      *.47629                           *.*                         root       1148 mountd               0      0  128000      0 LISTEN      
      *.2049                            *.*                         root       1150 nfsd                 0      0 1049200      0 LISTEN      
::1.6010                                *.*                         casper     1969 sshd                 0      0  128000      0 LISTEN      
::1.51794                         ::1.6010                          casper     1970 xterm           130880      0  139264      0 ESTABLISHED 
::1.6010                          ::1.51794                         casper     1969 sshd            139060      0  130880      0 ESTABLISHED 

Active UNIX domain sockets
Type       User        Pid Command        Local Address                           Remote Address
stream-ord casper     1969 sshd            (socketpair)                            (socketpair)
stream-ord casper     1969 sshd            (socketpair)                            (socketpair)
stream-ord casper     1969 sshd            (socketpair)                            (socketpair)
stream-ord casper     1969 sshd            (socketpair)                            (socketpair)
stream-ord casper     1969 sshd            (socketpair)                            (socketpair)
stream-ord root        372 dbus-daemon    /var/run/dbus/system_bus_socket
stream-ord root       1028 rmvolmgr                                               /var/run/dbus/system_bus_socket
stream-ord root        372 dbus-daemon    /var/run/dbus/system_bus_socket
stream-ord root        943 hald                                                   /var/run/dbus/system_bus_socket
stream-ord root       1004 inetd          /system/volatile/inetd.uds
stream-ord root        943 hald           /system/volatile/hald/dbus-TM2nMhzrpM
stream-ord root        993 hald-addon-sto                                         /system/volatile/hald/dbus-TM2nMhzrpM
stream-ord pkg5srv    1601 httpd.worker   /system/volatile/pkg/sysrepo/wsgi.1601.0.1.sock
stream-ord root        943 hald           /system/volatile/hald/dbus-TM2nMhzrpM
dgram      root        988 in.ndpd        /system/volatile/in.ndpd_mib
stream-ord root        988 in.ndpd        /system/volatile/in.ndpd_ipadm
stream-ord root        970 hald-addon-cpu                                         /system/volatile/hald/dbus-TM2nMhzrpM
stream-ord root        943 hald           /system/volatile/hald/dbus-MIhDasTVfy
stream-ord root        944 hald-runner                                            /system/volatile/hald/dbus-MIhDasTVfy
stream-ord root        943 hald           /system/volatile/hald/dbus-MIhDasTVfy
stream-ord root        943 hald           /system/volatile/hald/dbus-TM2nMhzrpM
stream-ord root        372 dbus-daemon    /var/run/dbus/system_bus_socket
stream-ord root        922 console-kit-da                                         /var/run/dbus/system_bus_socket
stream-ord root        196 rad            /system/volatile/rad/radsocket-unauth
stream-ord root        372 dbus-daemon     (socketpair)                            (socketpair)
stream-ord root        372 dbus-daemon     (socketpair)                            (socketpair)
stream-ord root        196 rad            /system/volatile/rad/radsocket
stream-ord root        372 dbus-daemon    /var/run/dbus/system_bus_socket
Adding the option -v, you also get command line:
   Local Address        Remote Address      User    Pid     State       Command
-------------------- -------------------- -------- ------ ---------- ----------------
      *.50258                             root       1038 Idle       /usr/sbin/syslogd
      *.*                                 root        133 Unbound    /lib/inet/in.mpathd
      *.*                                 root        133 Unbound    /lib/inet/in.mpathd
      *.*                                 netadm      721 Unbound    /lib/inet/nwamd
      *.*                                 netadm      721 Unbound    /lib/inet/nwamd
      *.123                               root        961 Idle       /usr/lib/inet/ntpd -p /var/run/ -g
And for half-closed connection, you'd also get the information you want:
   Local Address        Remote Address      User     Pid     Command     Swind  Send-Q  Rwind  Recv-Q    State
-------------------- -------------------- -------- ------ ------------- ------- ------ ------- ------ -----------       casper     1033 closewait      130880      0  139264      0 FIN_WAIT_2      casper     1031 closewait      139264      0  130880      0 CLOSE_WAIT       casper     1033 closewait      130880      0  139264      0 FIN_WAIT_2      casper     1031 closewait      139264      0  130880      0 CLOSE_WAIT       casper     1033 closewait      130880      0  139264      0 FIN_WAIT_2      casper     1031 closewait      139264      0  130880      0 CLOSE_WAIT

PS: I used the Hollywood IP extension to masquerade the IP addresses.

Thursday Sep 22, 2005


After advancing the state of Solaris on the Ferrari 3400 with frkit, someone suggested that I should one of all new laptops we at Sun may decide to standardize on. That's how it came to be that I now have both a Ferrari 3400 and a Ferrari 4000.

But today in the mail, I got a message telling me of yet another laptop heading my way. This time a lightweight Fujitsu s2110, again a AMD64 based laptop, as those are the ones we like best.

Perhaps should I make a plot of laptops I got and when and then see if I can estimate the curve; I think I got one in '96, one in 2000, another one in dec 2004 and then again 6 months later and yet again 3 months later; with this accelerating pace it'd be one a day at christmas and one per hour early next year. Hm, perhaps not likely.

Solaris keeps on improving rapidly when it comes to device support; and while in the laptop space things appear to be moving forward very rapidly, there also appears to be some gravitating toward common chipsets. Graphics are often an issue but the fact that the Ferrari 4000 comes with a ATI X700 has the consequence that the updating of the Xorg ati driver is done much more quickly than before.

The Ferrari 3400 is relatively well supported in S10, though I think you really need my powernow driver and even then it still runs fairly hot to the touch.

The Ferrari 4000 requires some external drivers, but then, so does most bleeding hardware, regardless of OS. For the Ferrari 4000, you'll need to download the ethernet driver "bcme" from and we're working hard on getting OSS sound to work nicely on it. The 4000 runs much cooler than the 3400, but the downside is that it always has its fan blowing, albeit quitely. Probably because of a device enumeration bug, the firewire does not yet work. The SD card reader is a special device and we do not support it, unfortunately.

Of course, we're working on getting our broadcom ethernet driver "bge" and the one by broadcom "bcme" to be merged and shipped as a single driver.

Cardbus support is coming for all laptops, as the cardbus interface is properly standardized and they all work more or less the same.

I haven't gotten the Fujitsu yet, so I can't tell how well that will run and/or whether tweaking is necessary.

I don't like to recommend any particular brand or kind of laptop; one recommendation which I can make is this: run Solaris Express on it. It will get all the laptop features you may want much sooner. Such features include new drivers, Xorg support for new hardware, ACPI support, newboot, bug fixes (in some cases the difference between a device working and not working is just a small fix in an existing driver).

S10 was a huge leap forward and brought Solaris for x86/x64 to a point where it again runs on lots of (server) hardware. In Solaris Express, there is much more room for desktop/laptop innovation. We now ship several different x64 desktop platforms, so the x64 desktop/laptop space has much more visibility inside Sun.

If you want performant OpenGL, the only choice you have now is buying a laptop with an nVidia graphics chip and installing the nVidia "closed source" driver.

On the wireless front, things are moving but slowly, but more soon here. So watch this space.

Bluetooth is still a barren landscape when it comes to Solaris; I can't use the bluetooth rodent that came with the Ferrari 4000 (I'm saying rodent because it's quite a bit bigger than a mouse)

Note: I've just started the laptop-discuss list at

Wednesday Apr 27, 2005

The End of Realmode Boot

I've already mentioned two great new features in our current development release; ACPICA and USB hotplug.

But there's one change that's much more far reaching than that: Newboot.

Most Solaris x86 users will be familiar with the blue screen/device configuration assistant/boot sequence and how ancient some of that feels. Perhaps few are aware that the DCA is actually a realmode DOS like environment where each boot device requires its own realmode driver. These drivers needed to be compiled with a 16 bit compiler and 16 bit MASM, not available for ready money anywhere. While the official build environment required NT, I managed to build it on environments ranging from MS Windows 98 and 2000 on actual PCs to Caldera DOS 7 on a SunPCi card (which allowed for automatic building which was great fun). Now that this piece of shameful history lies in the past, I am not afraid to confess.

But as of last Sunday, April 17th, 2005, we have "legacy free" newboot. Newboot uses grub with ufs support so we now have native grub support and a menu we can edit from inside Solaris. Device enumeration completely done using ACPI

Because it skip the device configuration assistant and boot a single large file with all kernel device drivers which makes startup quite a bit quicker and allows us to boot from any bootable device as long as we also support it in the kernel so we can mount root.

And we've reverted back to white on black consoles; this again takes some getting used, surprisingly enough.

One thing to note is that before you may had to disable ACPI in the kernel and the BIOS; with Newboot + ACPICA, you actually stand a much better chance of the system working with all the default settings: ACPI on, ACPI 2.0 enabled. Even legacy USB enabled now has a much better chance of working than before.

But this is a radical change an PC BIOSes and hardware being like it is, interesting times ahead. SO please test drive when this hits Solaris Express in a few months time.

As of this writing, it's a bit in the balance whether you'll get to see the source first as part of OpenSolaris or the binaries as part of a Solaris Express.

Tuesday Apr 26, 2005

Yet Another Desktop/Laptop Usability Step

"Solaris Nevada" build 14 is proving to be another quantum leap for Solaris desktop usability.

I discussed the new USB hotplug support in vold before, but in the last few days we've also gotten the virtual keyboard/mouse driver in the next Solaris release. People often complained about the fact that their laptop keyboard died until the next reboot when they plugged in a USB or other keyboard. Well, not anymore! We now have virtualized keyboard and mouse drivers which collect events from all available keyboards and present them through a single virtual keyboard and mouse. It is also still possible to use the devices as seprate devices in case you have a multi-head/multi-user environment, but for the common case of a single system with multiple keyboard (laptop + keyboard) this is another big step.

You can plug in the other keyboard at any time, running under X or the commandline, it just works.

Solaris FAQ Updated

For the first time in many years (2.5 years) I've updated the Solaris FAQ

Much more work is needed on it but at least this is a start. I'm hoping to update it more regularly now. It's also still here but it seems to be doing fine there.

Wednesday Apr 20, 2005

ACPICA in Solaris

With the long history of neglect that Solaris on x86 endured, quite a few components got to be extremely stale and fragile. And this wasn't just a lack of device drivers but also a lack of basic new functionality in the core OS.

This week saw another quantum leap; the induction of Intel's ACPI reference implementation (ACPICA) into the next Solaris release.

For years I wanted to have battery support on my old VAIO and later on my Ferrari. And I wanted a power button that did something, etc. I tried to make do with the old "acpi_intp" interpreter which was part of Solaris; but it leaked memory like a sieve and was limited in functionality. Integrating ACPICA looked daunting but fortunately someone made an actual project out of this and the end result is that we now have a state of the art ACPI interpreter in Solaris.

There are basically only two ACPI interpreters in widespread use: the Windows one and the Intel one; by leveraging Intel's source, we stand a fair chance of having Solaris work with more ACPI BIOSes. If our system required ACPI to be turned of for Solaris to work, you may find yourself forced to switch it on when you upgrade later this year.

I've been distributing acpica and a number of other useful Solaris binaries in a single internal kit called "frkit" (originally aimed at Ferrari's but now running on countless systems); frkit includes acpica, a powerbutton/battery handler, an AMD PowerNOW! powermanagement module, a GNOME battery monitor, and our development cardbus and wireless drivers + tools.

One of the more interesting parts of that is possibly the "NDISulator" port from FreeBSD which allows the Sun Ferraristi to use the builtin Broadcom wireless on their ACer Ferraris in 32 and 64 bit mode.

ACPICA is just phase one of a larger project; we have not yet bothered much with the "P" (for power) from ACPI; but we hope to leverage the new implementation to provide the necessary "S3" and "S4" sleep state support.

The speed at with new features work on my Ferrari which I've had now for 4 months is in stark contrast with my Vaio which I got not too long before S9 for x86 was postponed. It's clear that we needed a ramp-up after the wind-down, but it seems to be going more quickly than ever before.

Tuesday Apr 19, 2005

USB hotplug finally works

Today I was pleasantly surprised to see the latest putback to SNV, our next Solaris release (soon on a OpenSOlaris source server near you)

Before this putback you could hotplug/eject devices into devices (SD cards and such in card readers, floppies in floppy drives) but you had to restart vold to mount USB pen drives. But now, you can insert them, remove them, etc, and they're mounted and unmounted automatically.

This was really one of my most wanted features and its great to finally have it. Will it make an update? I certainly hope so; but you can always try Sun Express.

Thursday Apr 14, 2005

Solaris 10 Encryption Supplement download

The Solaris 10 encryption supplement is available for download here. Apparently, it's difficult to find going through the Sun download sides so I give a link here.

The supplement adds 256 bit AES and 448 bit Blowfish; DES, 3DES, 128 bit AES, blowfish and RC4 are already in the standard release. In other words, the standard release gets you what you would have gotten with the encryption supplement for older releases. And the S10 Encryption Supplement takes it one step further. The main reason for not making this part of the Solaris CDs is import restrictions, rather than export restrictions.

Tuesday Apr 12, 2005

Timezones and multi boot.

One of the things that has always been bothering me is the fact that on x86 systems you cannot really run multiple operating systems and survice the timezone change. That's because the clock runs in local time; and localtime is ambiguous. The system cannot tell whether the DST change has been or not so it needs to record this fact in the filesystem (that's why Solaris on x86 has the "rtc -c" cronjob). If you boot all your OSes in turn after the changeover, your system will be N hours off once they're all done adjusting time. The problem is probably best summarized here

On Unix this was long solved by running the clock in the UTC or GMT timezone; that clock is unambiguous, give or take a leap second, and allows multiple versions of the OS to coexist.

Last week, it was pointed out to me in comp.unix.solaris that there is a hidden registry key in later releases of MS-Windows. I already knew how to fix up Solaris, so I combined this to:

       Set the following registry key (it does not exist!)


       In the control panel with Day&Time settings, check the "automatically adjust" check box.

        Boot into Solaris and run:

              rtc -c -z UTC

        then correct the clock with date/rdate
        (if you use liveupgrade, lumount your other partition(s) and copy the
        /etc/rtc_config file to all of them)

        In Linux, you'll need to run "timeconfig" and select "RTC set to GMT".

Note that if you don't multiboot, it's probably also a good idea to run "rtc -c -z UTC" and then correct the date; for one you won't be bitten by the AMD64 timezone bug we had in Solaris 10.

Monday Nov 29, 2004

Fujitsu Lifebook B112 running Solaris 10

As one of two resident Solaris Engineers in Holland, you sometimes get strange requests such as one from an IT operations person who was given this really old laptop. A fujitsu Lifebook B112; 96MB, 3GB harddisk. It was running Solaris 7 and he had no root password.

Hacking it was not too difficult; installing Solaris 10, of course, would be more fun.

There are a few challenges getting a vintage laptop up and running; it has no onboard networking, no bootable CD. And Solaris 10 does not support booting from the Xircom PE3 parallel port ethernet card anymore. And where would you even find such a device?

Well, turns out that we were indeed able to locate a Xicrom PE3 adaptor and armed with "perl" we could make the S9 "pe" driver run under S10; the "pe" driver is no longer supported because we obsolete GLDv0 but by making PE load "misc/GLD" rather than "misc/gld" and installing the old GLD driver we added the device to the miniroot on the S10 install server. Armed with an S9 boot floppy we then booted S10 from the server and started the install. It couldn't fit all of Solaris 10 so we removed a few components to cut it down so it would fit in the 2.8GB partition (the installer is a bit generous in allocating space so at the end we only had 2.0 GB installed).

After churning for around 3 hours, Solaris had been installed (Pe ethernet is very slow and a 233MHz Pentium is not very fast). I had to go home (friday) and resolved to see how far we'd get on monday. But I couldn't wait.

The next hack was attacking a system sitting idle with install finished (we didn't dare reboot because it'd come up w/o ethernet if we did) from the install server. After looking with snoop I found that there are actually two ways of doing that: the first one is the new "eventhook" mechanism we have for dhcp; whenever a dhcp event occurs, the eventhook script is run. The second method was even simpler; it turns out that Solaris 10 init stat's inittab every 5 minutes. So I added a line to inittab which popped an xterm up over my VPN tunnel to my house. Added the "PE" ethernet driver; finished some other config stuff by hand and rebooted. And it came up, still with PE but I soon killed it remotely fiddling with cardbus and the PCMCIA Ethernet card.

With my own modified PCIC driver I was able to get the lifebook to use "pcelx0", with all supported devices. Xorg came up without a hitch too, just a little bit of fiddling to get it to use the External monitor; no luck with the touchscreen yet. Here's what it looked like Lifebook B112 Running Solaris desktop login

Even sound was simple; ``update_drv -a -i '"ESS1879"' sbpro'' and the sound driver attached.

Thursday Aug 12, 2004

The Solaris BOF

Yesterday's Solaris BOF at the Usenix Security symposium was well attended (about 70 people); there were great many questions about zones which shows there is a great interest in the topic. (I di both a zones and a privilege application).

Talking to many people in the hallways of the conference is great and I'm still tackled with questions the day after the BOF (we ran out of time, on questions).

Thursday Jul 22, 2004

Solaris Privileges

So what makes Solaris Privileges different? Why didn't we copy something else like Trusted Solaris Privileges or "POSIX" capabilities?

Let's start from what we formulated as our requirements near the beginning of our project.

One of the important features of Solaris is complete binary backward compatibility; in order to offer that we needed to design the privilege subsystem in such a manner that current practices, binaries and products would continue to work. Of course, some have solved this issue by providing a system wide knob to turn: root / root + privileges / just privileges. We don't like knobs in our OS; specifically not ones which drastically alter the behaviour of a system. It makes it harder to develop software; it needs to work for all settings. Certain products may require conflicting settings, and so on. So we decided on a "per-process" knob which is largely automatic

With backward compatibility comes the onus on the software developer to develop future proof interfaces; that ruled out all other interfaces as they all have fixed bitmaps and fixed privilege/capability numbers, fixed structure sizes in the programmer visible parts of the system. Solaris Privileges have none of that. And while we could savely reuse the names of the Trusted Solaris interfaces we can not redefine interfaces even from a defunct standard. So we have interfaces which smell like Trusted Solaris but with a completely new userland representation of privileges and privilege sets. We can never have more signals; but we can have more privileges and more privilege sets!

The privileges and privilege sets in Solaris 10 are represented to userland processes and non-core kernel modules as strings; privilege sets are bitmasks of undetermined size; they can only be allocated through the C library routines. Privilege set names are also strings and not plain integer indices; this gives us even more flexibility. A Solaris binary compiled for 4 privilege sets of each 32 privileges will continue to work on a Solaris system with 5 privilege sets each of which can contain 64 privileges and with all the privileges having their internal representation renumbered.

Evolving from the Super user model

The traditional super user model is fairly straighforward; a process has three uids associated with it the effective uid, the real uid and the saved uid. A privileged process is a process which runs with an effective uid 0. A process can temporarily relinquish privileges by setting the effective uid back to the real uid; the saved uid remains 0 or can be set to the real user id as well; in the latter case the process has permanently relinquished its privileged status. But if the saved uid is 0 the process can swap the effective uid back and forth, implementing some form of privilege bracketing. Of course, once such a process is compromised, an exploit can also swap the effective uid back to 0. And the only choice is to have all privileges available or none.

In your typical Unix privilege model the powers formaly associated with uid 0 (PFAWU0) are split into a number of privileges; each process has three different privilege sets; the Effective, Permitted and Inheritable sets, or E, P and I, for short. E closely models the effective uid and P is very much like the saved/real uids: the Effective set determines which privileges a process has active; this is the set of privileges the kernel verifies its privilege checks against. The Permitted set is very much like the saved set; it contains privileges a process is allowed to use. So a process is free to remove any privilege from E and a process is free to add whatever privilege he wants to E as long as it carries that privilege in P. The Inheritable set allows a process to pass privileges on to sub processes; e.g., in case you want to run a webserver with a particular uid but with the additional privilege allowing it to bind to port 80. You put the privilege in the inheritable set (if it is in your permitted or effective set) and the executable will run with that privilege. Privilege bracketing is then performed by adding privileges to E and removing them from E; when a process is done and wishes to relinquish all privileges forever, it removes them from P (which automatically causes removal from E). The fourth privilege set we use in Solaris is the "Limit" set. The privilege set is the upper limit of privilege a process and its off-spring can ever obtain. Solaris uses the limit set for a number of additional things; it is used to restrict the power of the super user in non-global zones and it is used as a mechanism to determine with what privileges a backward compatible uid 0 process runs with.

So how is this compatibility achieved? Well, after a long debate the answer we came up with was really simple: if we want an implementation to be backward compatible for applications which don't know better, what's simpler than a single per-process knob? This know we've called "Privilege Awareness" (PA) and in order to explain this we introduce the notion of Observability; and we make the kernel operate on the observed effective (EO) and observed permitted (PO) set; these contrast with the implementation sets, the actual bits in the kernel credential for a process. The kernel then sees the privilege sets as follows:

EO = euid == 0 && !PA ? L : EI
PO = any uid == 0 && !PA ? L : PI

The observed set closely follows the effective uid and the permitted set models the fact that if any of your uids is 0 you can recover an effective uid of 0. A process becomes privilege aware if it modifies its E or P set or when it requests to become privilege aware. A PA process can also request to become non-PA but such a transition is only possible if the observed sets can be made to remain constant on such a transition. The kernel will try to drop PAness on exec().

The privilege sets can be modified by the process itself but the kernel modifies them also at exec(2) time using the following rules:
            I' = L & I               (L intersected with I)
            E' = P' = I'
            L remains unchanged

For your typical process this is a noop, as the typical process has the following privileges:
            I = P = E = { basic }
            L = { all zone privileges }

As seen here, the system defines a set of privileges known as the "basic" set; it is a set of privileges requires for operations which traditionally weren't privileged in Unix; the design of the basic privilege set and the specific rules about its use make sure that it too can be extended in future with new privileges, without requiring applications which use the feature now privilege to be modified. The current set of basic privileges in Solaris are the privilege needed to fork(), a privilege needed to exec(), a privilege needed to create hardlinks to files not owned by the current effective uid, a privilege needed to send signal to processes outside of your current session and the privilege needed to see other user's processes.
In order to properly work in an environment with a changed basic set, a process would specify the privileges it needs as the basic set + non basic privileges minus the basic privileges it knows it doesn't need. When the basic set is then extended in a later Solaris release the process is guaranteed to continue to work.

What I've outlined up to this point is mostly an enabling technology; in and by itself it does not make the system more secure; but it allows us to harden the system and reduce the risk.

Why privileges should not be orthogonal

My background is very much a hardening background and not a Trusted Computing background; so I have always felt that the privilege model as employed in a number of operating systems has one serious weakness: most of these operating systems define privileges which allow you to acquire more privileges. Typical for such privileges are single privileges which allow you to write directly to disks or kernel memory. What is the point of such privileges? You can just as well give process requiring such privileges all privileges. In Solaris we have defined a very simple rule which I've dubbed The principle of privilege escalation prevention; the basic rule is this: "an operation needs at least as many privileges to be performed as can be gained by executing it". Simply put: if you want to write to /dev/kmem, you will need all privileges! If you want to control a process, you will need at least as many privilege as a process has. If you want to assign privileges to another process, you must have those privileges. If you want to mount on top of something, you must own that something.

And when we find more of such holes we will plug them. Of course, we still have a little bit of a problem with the user with uid 0: he still own all the files and we have not restricted him in writing to his files. So in order to practice safe computing, run with a different uid and perhaps a few extra privileges.

Any, that's all for today of to the beach and see you in two weeks or so. Or see you at Usenix security in San Diego! In the next episode we'l shed some light on the repercussions of changing the credential and the visible changes obviated in userland; we'll answer such important questions as "will door_cred()" survive privileges?



« April 2015