X

Recent Posts

Solaris

Solaris 11.3 SRU 5.6: updates in ps(1) and /proc/<pid>/{cmdline,environ,execname}

Almost as soon as Solaris 2.0 was released, people started to complain about the limit of the ps(1) command line output; it was limited to 80 characters. The standard ps(1) command was also not able to print the environment variables. The /usr/ucb/ps command could, but it needed to trawl through the address space of the target process.  In order to do so it needs to have at least the same privileges and uids/gids to prevent privilege escalation.  Simple having the {proc_owner} privilege is not sufficient. When we added pkill(1)/pgrep(1), they to were limited in the same way: they could only find search the first 80 bytes of the command line (PRARGSZ) and the first 16 bytes of the command name (PRFNSZ).  These were serious limitation; for one, it became difficult to find a specific java process as the typical java command line is generally much larger than 80 bytes and the often the important jar file is beyond the 80 byte limit.  Of course, our customers did not like this limit either.  We fixed this problem in Solaris 12 and now also in Solaris 11.3 SRU 5.6 by adding three new files under /proc/<pid>: cmdline - all original arguments separated by NUL bytes environ -  all original environment values separated by NUL bytes execname - the original program name given to exec. The cmdline and execname are publicly readable; the environ file is restricted to the owner of the process or those processes which have the {proc_owner} privilege. The cmdline and environment file are very similar to those found under Linux, however these do reflect the actual argument vectors in the process' address space, so they do not reflect the changes made by the programs themselves. A new -o format option "env" was added to ps(1); the new files are used and ps(1) will now display the full command line.  As neither ps(1) or ps(1b) needs to open /proc/<pid>/as, fewer privileges are now needed and read access to he executable is no longer required: this is big performance win for ps(1b) especially when NFS binaries are in the mix. As I basically back ported changes to ps and /proc from Solaris 12, the whole list bugs and enhancement is as follows:         PSARC/2015/207 /proc/<pid>/{cmdline,environ,execname} extensions to /proc.        15742822 SUNBT7092685 Extend /proc interfaces to allow ps(1) to show more of the command        15420404 SUNBT6599384 pgrep/pkill don't find processes with 16 char filenames or match ...        19669195 memory-leak in ucb_procinfo of ucbps.c:569        15227016 SUNBT5100626 ps(1) sometimes shows an empty string for the ttyname        15282779 SUNBT6313436 /usr/ucb/ps malloc() failure results in unexpected argument parsing        14966583 SUNBT4157509 /usr/ucb/ps not bsd or sunos 4.x compatible on command line        15488063 SUNBT6715628 ps -d makes -z have no effect        21447952 /usr/ucb/ps gxw hangs, but w/out the w does not; never open /proc/<pid>/as         21297345 procfs limits the size of the control messages        15582848 SUNBT6872216 ps command needs to keep trackof prior name/uid information        15584899 SUNBT6875625 ps command should chdir to /proc to remove lock contention

Almost as soon as Solaris 2.0 was released, people started to complain about the limit of the ps(1) command line output; it was limited to 80 characters. The standard ps(1) command was also not able...

General

Solaris 11.3: New per-share, per-instance reserved port property for NFS

It sounds like a lifetime ago, that I added the following question to the Solaris FAQ: 7.8) How can I make the NFS server ignore unprivileged clients?    In a restricted environment, i.e., an environment where the    administrator controls root access, you can enhance NFS security    by setting the "NFS_PORTMON" variable.  This variable is set in    /etc/system, like this:    * Prior to Solaris 2.5    set nfs:nfs_portmon = 1    * Solaris 2.5 and later    set nfssrv:nfs_portmon = 1  You could wonder why this was never the default, the answer is that reserved ports are a BSD Unix invention from the time that computers where large and centrally administrated; an invention later copied to all Unix like operating system but outside of that world it makes little sense. As a result, many NFS clients can use any port and might not be able to restrict the ports they use. The "nfs_portmon" variable was global; Solaris has evolved and now has multiple different NFS server instances (one for each zone); customers also have requested to have a per-share setting. In Solaris 11.3 we introduce a new sharectl property:  # sharectl get -p resvport nfsresvport=false as well as a new resvport share option: # zfs get share.nfs.sys.resvport build/casperNAME          PROPERTY                    VALUE  SOURCEbuild/casper  share.nfs.sec.sys.resvport  off    default The sharectl property is global for the NFS server instance; if it is set to true, this overrides per-share properties.  If a system is upgraded, it will take the value from /etc/system and it will log a message that in future, sharectl(1m) should be used instaed. When the sharectl property is set to false, you can set resvport for each share individually.  As you can that this is restricted to the "sys" security mode; when proper security such as Kerberos V is used, we do not verify that the NFS client uses privileged ports.It goes without saying that actual NFS security can only be had when using a security mode other than "sys"

It sounds like a lifetime ago, that I added the following question to the Solaris FAQ: 7.8) How can I make the NFS server ignore unprivileged clients?    In a restricted environment, i.e.,...

Solaris

Solaris 11.3: New Immutable Global Zone file-mac-profile: dynamic-zones

In Solaris 11.2 we introduced the Immutable Global Zone.  Just like the Immutable Zones introduced in Solaris 11/11, it supports three different file-mac-profiles: strict, fixed-configuration and flexible-configuration. To refresh your memory, these three file-mac-profiles as well as the default value, "none",  are described in zonecfg(1m) as follows:            There are currently four supported values for this property:  none,           strict, fixed-configuration, and flexible-configuration.           none  makes the zone exactly the same as a normal, r/w zone. strict           allows no exceptions to the read-only  policy.  fixed-configuration           allows  the zone to write to files in and below /var, except direc-           tories containing configuration files:             /var/ld             /var/lib/postrun             /var/pkg             /var/spool/cron,             /var/spool/postrun             /var/svc/manifest             /var/svc/profiles           flexible-configuration is equal to fixed-configuration, but  allows           writing to files in /etc in addition. In Solaris 11.3 we are adding fourth file-mac-profile: dynamic-zones.  It should be seen as sitting between fixed-configuration and flexible-configuration. This particular profile is only valid for the global zone; it allows the administrator to create and destroy non-global zones, kernel zones, etc. While this is already possible with the flexible-configuration, that file-mac-profile allows the ability to change much of the system configuration; but with the other profiles, creating or destroying a zone requires using the Trusted Path.  The dynamic-zones profile is a compromise: it allows to restrict the configuration of the system, yet it does allow a user with proper authorizations to create and destroy zones. The dynamic-zones profile was targeted specifically at using an immutable global zone on the OpenStack Nova compute nodes.

In Solaris 11.2 we introduced the Immutable Global Zone.  Just like the Immutable Zones introduced in Solaris 11/11, it supports three different file-mac-profiles: strict, fixed-configuration and...

General

Solaris 11.2: unlink(2)/link(2) for directories: your time is up.

Some thirty years ago, the 4.2BSD Unix release included two new system calls: mkdir(2) and rmdir(2).  Before that time, in order to make a directory, you first needed to call mknod(2) and create the "." and ".." links.  When you remove a file, you would remove those two links and finally unlink the directory itself. As you couldn't call mknod(2) as an ordinary user nor could you call unlink(2) on a directory, the mkdir(1) and rmdir(1) commands were set-uid root.  A cursory inspection of the UNIX-V7 showed that both commands likely had security bugs. Did 4.2BSD remove the ability to link or unlink directories?  It didn't.  It was probably kept temporarily for backward compatibility.  But many years later, and many Unix releases later, it is still their; neither Sun in SunOS or Solaris, nor Oracle in Solaris 11/11 or 11.1. If you ask fsck(1m), the final arbiter about what is a valid UFS file system, it will complain loudly and it generally required system admin intervention when you made an additional hardlink to a directory; this was later hidden by logging UFS; fsck was hardly ever run since the introduction of UFS logging especially once it became the default.  In tmpfs it was a good way to lose swap, hide data or confuse the kernel. Special code was needed in find(1) and du(1) to not lose their way when the file system isn't a tree but rather a cyclic graph. It is one of the reasons why, when Solaris Zones were developed, we decided that non-global zones can only be run without the {SYS_LINKDIR} privilege and that when we introduced ZFS it came without the ability to use link(2) or unlink(2) on directories.  VxFS also doesn't allow additional hardlinks to directories. And no-one complained! This discrepancy between the global zone and non-global zones and ZFS versus the rest and it gave us problems when developing code; code run in tmpfs file system in the global zone, suddenly stopped working when moved to a non-global zone; code that worked before in UFS stopped working when moved to ZFS or to a non-global zone.  As Linux never allowed unlink(2) on directories, code developed there might suddenly have disastrous effect on Solaris when it was run with (not-so) appropriate privileges under Solaris.  There were at least two cases during the development of Solaris 11.2 when we were bitten by this problem for code we developed ourselves. The time has arrived to disable link(2) and unlink(2) on directories; and that is what we have done in Solaris 11.2.  The {SYS_LINKDIR} privileges still exists in Solaris 11.2 but it is obsolete and has no effect.  We will likely remove it in a future minor release. Is this a sudden incompatible change?  Perhaps, but is well within the limits of the specification and using this feature only leads to downtime and support calls. Sorry for removing this rope from your toolbox.

Some thirty years ago, the 4.2BSD Unix release included two new system calls: mkdir(2) and rmdir(2).  Before that time, in order to make a directory, you first needed to call mknod(2) and create the...

General

Solaris 11: Evolution of v_path.

In Solaris 10, Eric Schrock (now at Delphix) added vnode-to-pathname functionality in the kernel; it stored the pathname used to find a file in the vnode but it did not handle renames nor did it elide ".." from the stored pathnames; the pathname stored was generally a full pathname from the root from the global zone.  It was used for getcwd(3) and for path subdirectory in /proc/pid/. The v_path was implemented as a hint and whenever it was retrieved, e.g., for getcwd(3) or for the /proc file ssytem, the actual path was computed and the current zone's root directory was removed. When I started to work on the Extended Policy and later on the Immutable Global zone, it was clear that the v_path was very useful but it wasn't ready for those projects. The Immutable Non-Global Zone (Solaris 11/11) In the IMNGZ we need to compute the pathname and then check the pathname against the black-list and the white-list; however, where we are doing that the kernel is deep inside the file system code and we can't verify and recompute the pathname as we might be hold locks that we need further down; but since we are protecting a particular set of files and those files cannot be changed or renamed, it is safe to use the v_path as if it is more than a hint.  We did need to elide ".." and simplify pathnames; this is done directly when we are setting the v_path for a newly created pathname and if the code tries to add a ".." it instead removes the last component of the pathname. We did need to prevent linking protected files into the non-protected file space as that would circumvent the MWAC(5) protection offered in an IMNGZ. The Extended Policy (Solaris 11.1) The Extended Policy applies to all filenames in the filesystem, including those that can be renamed.  This is why we put some effort in handling renames better.  We now update the v_path name on rename(2) in all file systems; in the case of a link(2) we also handle this as a rename(2) as the observation is that the new name outlives the first name.  This new behavior works well with leaf nodes but there is no efficient algorithm that can handle the rename of a directory and all its children, yet we have no option other than using v_path for the same reasons we have for the IMNGZ. When we recalculate the pathname, e.g., for /proc or for getcwd() and we find it wanting, we update the v_path to the newly computed path, including all directories making up the full pathname. One possible security risk is that a vnode has an incorrect v_path and the Extended Policy gives more privileges on that v_path then it gives for the actual pathname.  As this can only happen if the file once lived in that location this is not actually a risk at all; the process was able in the past to use those privileges on that file. We do make sure that linking is not allowed when the Extended Policy gives more privileges for the new pathname. An update was needed for the secpolicy_*() routines to allow the Extended Policy to make a decision about files or directories that do not exist yet; as an extra benefit privilege debugging now gives even more information as we have more information deep down in the policy routines: solaris11.0$ ppriv -De mkdir /caspermkdir[11162]: missing privilege "ALL" (euid = 12345, syscall = 102) for "/" needed at zfs_zaccess+0x2c8mkdir: Failed to make directory "/casper"; Permission denied In Solaris 11.1 we know the full filename to be created and also show that with privilege debugging: solaris11.1$ ppriv -De mkdir /caspermkdir[13924]: missing privilege "ALL" (euid = 12345, syscall = 102) for "/casper" needed at zfs_zaccess+0x245mkdir: Failed to make directory "/casper"; Permission denied In Solaris 11.2 we also show the sycall name: solaris 11.2$ ppriv -De mkdir /caspermkdir[17488]: missing privilege "ALL" (euid = 12345, syscall = "mkdirat") for "/casper" at zfs_zaccess+0x245mkdir: Failed to make directory "/casper"; Permission denied Getcwd(3), realpath(3) fixes. As part of the Extended Policy project, fixes to getcwd() and realpath() were made during the development of Solaris 11.1.  We've also put some of these fixes in 11.0 SRUs and in Solaris 10 patches. These fixes are the following: Improved getcwd()/realpath() performance in zones. Improved getcwd()/realpath() performance in the case of renaming (in some cases 1000x faster) Fix getcwd() for chrooted process when the current working directory is not under the root directory. (This was a regression of the in-kernel getcwd()) Don't fail with EACCES so quickly No limit on the size of the returned path from getcwd() and realpath() realpath() moved into the kernel and the frealpath() system call (Solaris 11.1 and later only) Several operating systems have "extended" getcwd(3) to return an unrestricted pathname when called as follows:    char *cwd = getcwd(NULL, 0); unfortunately, this is strictly forbidden by the standard:      The getcwd() function shall fail if:     ....      EINVAL    The size argument is 0. So in Solaris you have to loop with a longer and longer buffer until getcwd() no longer returns NULL with errno set to ERANGE or you could use realpath(".", NULL) in which case we can return a long pathname. Both are actually a lot faster than running your own userland getcwd() implentation and such implementations are more likely to fail.

In Solaris 10, Eric Schrock (now at Delphix) added vnode-to-pathname functionality in the kernel; it stored the pathname used to find a file in the vnode but it did not handle renames nor did it elide...

Solaris

Solaris 11.2: No Limits

In the past, I have increased a number of limitations in Solaris: In Solaris 11.0, I increased NGROUPS_MAX to 1024 (from 32); also available since Solaris 10u8. In Solaris 11.1, I added support for more than 16 groups for NFS AUTH_SYS authentication In Solaris 11.1, I changed the system calls getcwd() and realpath() to support returning pathnames longer than MAXPATHLEN (and introduced frealpath() while I was in that code) So what did I change in Solaris 11.2?   It was about time to look at the restrictions of user names and group names. In a micro release, such as a Solaris 11 update, we cannot modify constants such as LOGNAME_MAX because of binary compatibility, we can only do that in a future minor release.  However, we can modify the code that limit usernames.  These are the bugs we have fixed and this shows how much work it actually was:     14933330 SUNBT4033673 getlogin causes passwd to fail if login name is longer than 8 chars    14954449 SUNBT4109819 programs inconsistently limit the size of user names    15059729 SUNBT4435330 logname(1) prints out only part of long login name    15178384 SUNBT4927530 *w* w(1) truncates usernames to 8 chars    15393621 SUNBT6551524 su truncates LOGNAME for long usernames.    15436992 SUNBT6627292 *cron* confused about username lengths    15550167 SUNBT6819489 *su* sulog source username truncated to 8 chars but not destination    15574163 SUNBT6857992 ps -u does not support usernames longer than 10 chars    15579148 SUNBT6866548 last command does not support usernames longer than 8 characters    17528753 group name handling in Solaris is a standards violation    17528788 useradd(1m) user name handling problems    17600453 bug 15226690, find with long usernames, not completely fixed    17600724 The fix for 14954449 misses some programs (in.rlogind, in.rshd. zone*, dump)    17625438 group file updates very inefficient.    17625458 pwck lives in the past    18068180 SunSSH truncates usernames/home directories with %.100s    18068355 A few programs still limit the size of user names.    18068215 passmgmt invents its own limits for the sizes of entries in /etc/passwd In generaly, the code was changed to lift limits, but we are generally limited by the format of the utmpx file.  The maximum length of a username that can be stored there, is 32 bytes.  This is now a safe limit and we support user names in length upto 32 characters, despite protests from useradd(1m).  getlogin() and getlogin_r() can return a string of at most 33 characters, including the final NUL character.  Of course, getlogin_r() will not store past the end of the buffer given to it but it will now accept a buffer of any size.   Programs changed are, among others: logname(1) w(1) who(1) last(1) ls(1)  - now a 64 bit executable find(1) - now a 64 bit executable passmgmt(1) useradd/usermod/roleadd/rolemod(1m) sshd(1mr) repquota(1m) zfs(1) yppasswd(1) tar(1) lastcomm(1) cron(1) etc newtask(1) ps(1) wall(1) rwall(1) zlogin(1) grpck(1) pwck(1) login(1) in.rexecd(1m), in.rshd(1m), in.rlogind(1m) And libraries such as libsocket (remote shell/remote login/rexec protocol) I could only wonder why so many applications cache the return value of getpwuid() and getgrgid() while doing that in a fixed sized character array. For reasons only known in New-Jersey, we didn't allow groupnames over 8 characters while limiting the characters to lower case and digits; as there is no manifest constant defining the size of a group name, there is no problem increasing it so we currently support upto 32 characters and we now accept all portable file name characters in a group name (lower and upper case, digits, dot, hyphen and underscore as long as the name doesn't start with a hyphen. Other than programs caching the result of getpwuid(), I found no other limits on the length of a group name in our code.

In the past, I have increased a number of limitations in Solaris: In Solaris 11.0, I increased NGROUPS_MAX to 1024 (from 32); also available since Solaris 10u8. In Solaris 11.1, I added support for more...

Solaris

Solaris 11.2: Immutable Global Zone

This is blog is a bit more substantial; it requires some knowledge about Solaris Zones, Immutable Zones and Solaris administration in general. It is high-level; in future I'm hoping to get down to the nuts and bolts.Immutable ZonesIn Solaris 11 we added the Read-Only Root Non-Global Zones, marketed as Immutable Zones; this is a feature that makes a zone tamper-proof.In an Immutable Zone is configured simply by setting the "file-mac-profile"to one of "strict" (not much writeable), "fixed-configuration" and "flexible-configuration" (configuration is writeable but binaries and such or not). This is all implemented in the kernel based on pathnames and depending on the context; the super-user in the global zone can still update the zone or even modify protected files as long as that is not done from within the zone.We have made some changes to Immutable Non-Global Zones (IMZ, for short) that came out of developing the Immutable Global Zones (IMGZ); we have added a new feature, the "Trusted Path (TP)"; when logged in through the Trusted Path using the "-T" option to zlogin(1m), you can now modify protected files from within the zone. This is much safer as you no longer need to give root access in the global zone nor do you need to boot the IMZ in writeable mode. In the following example, we log in to the zone "fixed" which has been configured with the fixed-configuration file-mac-profile. A normal root login doesn't allow us to modify "/etc/passwd"; I'm using touch(1) under privilege debugging to illustrate the error and was caused the error. When we login with the "-T" option we suddenly can modify "/etc/passwd" because we're now in the Trusted Path. Notice also that the output from privilege debugging has been clarified; it points to the MWAC(5) manual page and it now also lists the system call name and not the number as it did before.# zlogin fixed[Connected to zone 'fixed' pts/3]Oracle Corporation SunOS 5.11 11.2 April 2014root@fixed:~# ppriv -De touch /etc/passwdtouch[117063]: MWAC(5) policy violation (euid = 0, syscall = "utimensat") for "/etc/passwd" at fop_setattr+0x10btouch: cannot change times on /etc/passwd: Read-only file systemroot@fixed:~# logout[Connection to zone 'fixed' pts/3 closed]# zlogin -T fixed[Connected to zone 'fixed' pts/3]Oracle Corporation SunOS 5.11 11.2 April 2014root@fixed:~# ppriv -De touch /etc/passwdroot@fixed:~# Additionally, we have restricted the use of mount(1m) in an IMZ, while we allowed random loopback mounts before we now only allow loopback mounts on empty directories unless the file or directory isn't protected by MWAC(5).Immutable Global ZoneIn order to prevent tampering of the file system, we have extended Immutable Zones in Solaris 11.2 to the global zone; using the same mechanism you can now configure the global zone as an IMGZ. As there is no "super-global" zone, a different mechanism has been designed to enter the Trusted Path. A kernel-zone still has a bare-metal zone controlling it, so this doesn't apply to kernel zones. Some additional steps need to be taken and they are listed here.Preparing the global zone for immutable global zone.As maintenance of the global zone is only possible using theTrusted Path access; Trusted Path is only available on the console, so make sure the console is accessible through the ILOM, a serial connection orthrough the graphical console.Once a system is configured as an immutable global zone,the break sequence, F1-A on a graphical console, <break>or the alternate break sequence (CR-tilde-<ctl-b>) on a serial console,will instead start the Trusted Path login.(A immediate second break sequence will work as astandard break-sequence: start the kernel debugger (ifit is loaded), drop to the OBP, etc)Configuring the Global Immutable ZoneThe configuration of the global zone is done through zonecfg(1m)by picking the appropriate file-mac-profile for your situation;they allowed values are the same for non-global immutable zones:"strict", "fixed-configuration", "flexible-configuration".See zonecfg(1m).Note that if the system uses DHCP to set network interfaces, the"flexible-configuration" must be selected. # zonecfg -z global zonecfg:global> set file-mac-profile=flexible-configurationThe "rpool" dataset will be restricted but sub dataset can beunrestricted using "add dataset" zonecfg:global> add dataset zonecfg:global:dataset> set name=rpool/export zonecfg:global:dataset> end zonecfg:global> add dataset zonecfg:global:dataset> set name=rpool/zones zonecfg:global:dataset> endIn this example we add "rpool/export" and "rpool/zones";writable data sets for users and for zones. An immutable globalzone can only run zones in unrestricted datasets.All the children of an unrestricted dataset are alsounrestricted.Note that all datasets on other zpools are unrestricted and there is noneeded to add them with "add dataset".After committing the zonecfg boot information is written and theboot archive is updated: zonecfg:global> commit updating /platform/sun4u/boot_archiveWhen the system is configured, it should be rebooted the systemwill boot with an immutable global zone.Maintenance of the immutable global zoneAn immutable zone cannot be updated other then through the Trusted Pathlogin or when the system is booted in writeable mode by usingthe "-w" flag when booting. Note that if you try to reboot theimmutable zone with "reboot -- -w", the argument is ignoredwhen not performed through the Trusted Path login.After using the break-sequence on the console, you should begreeted with: trusted path console login:Login and assume the root role; at that point ordinary commandsused to update the system are available; this includes "pkg update","beadm activate" or also "zonecfg" if the need arises to change theglobal zone's configuration.A separate pam stack can be configured for tpdlogin(1).When "pkg update" is performed, the first boot of the immutable globalzone is read write; this is needed by the system to performthe needed self-assembly steps. When the self-assembly steps have beenperformed, the system will reboot and in this second boot the systemwill be immutable again.

This is blog is a bit more substantial; it requires some knowledge about Solaris Zones, Immutable Zones and Solaris administration in general. It is high-level; in future I'm hoping to get down to...

Solaris

Solaris 11.2: User, Pid and Commands in netstat(1m)

As it has been years since I've blogged, let me start with one of smallest features I added to Solaris 11.2; an option to netstat(1m), allowing administrators to figure out who is using which port and which with process or command is using a particular network connection.As there little or no similarity between other netstat implementation, we picked our own option letter "-u". At the same time we realigned the columns as the standard width didn't fit modern TCP window sizes, the length of Unix sockets, etc. We've also removed, for unprivileged users, unusable information such as the "kernel addresses", leaving a bit more room, though an 80 width terminal isn't really enough room for all of the information. Alignment only guaranteed with -n, of course.Our implementation doesn't use /proc like the Linux implementation uses nor does it look through /dev/kmem like lsof(1m) does; instead we get the information available direct in the kernel. While some of the information might be out of date, we can give information about sockets in TIME_WAIT or CLOSE_WAIT, even when the latter sockets haven't been accepted yet! Additionally, those sockets owned by the kernel are also listed. This works in the global zone, non-global zones, kernel zones *and* even in Solaris 10 branded zones; the latter uses the "native" Solaris 11.2 netstat command.Here is some sample output, partially hidden by how we format blogs (so, install Solaris 11.2 and all will be revealed)% netstat -aunUDP: IPv4 Local Address Remote Address User Pid Command State-------------------- -------------------- -------- ------ -------------- ---------- *.50258 root 1038 syslogd Idle *.* root 133 in.mpathd Unbound *.* root 133 in.mpathd Unbound *.* netadm 721 nwamd Unbound *.* netadm 721 nwamd Unbound *.123 root 961 ntpd Idle *.123 root 961 ntpd Idle127.0.0.1.123 root 961 ntpd Idle10.311.249.18.123 root 961 ntpd Idle *.111 daemon 980 rpcbind Idle *.* daemon 980 rpcbind Unbound *.41327 daemon 980 rpcbind Idle *.111 daemon 980 rpcbind Idle *.* daemon 980 rpcbind Unbound *.37058 daemon 980 rpcbind Idle *.* root 988 in.ndpd Unbound *.* root 999 statd Unbound *.* root 999 statd Unbound *.39150 root 999 statd Idle *.43382 root 999 statd Idle *.4045 daemon 1008 lockd Idle *.4045 daemon 1008 lockd Idle *.56874 root 1004 inetd Idle *.37069 root 1004 inetd Idle *.42765 root 1148 mountd Idle *.64957 root 1148 mountd Idle *.2049 root 1150 nfsd Idle *.2049 root 1150 nfsd IdleUDP: IPv6 Local Address Remote Address User Pid Command State If--------------------------------- --------------------------------- -------- ------ -------------- ---------- ----- *.* root 133 in.mpathd Unbound *.* netadm 721 nwamd Unbound *.123 root 961 ntpd Idle ::1.123 root 961 ntpd Idle *.111 daemon 980 rpcbind Idle *.* daemon 980 rpcbind Unbound *.41327 daemon 980 rpcbind Idle *.* root 988 in.ndpd Unbound *.39150 root 999 statd Idle *.4045 daemon 1008 lockd Idle *.37069 root 1004 inetd Idle *.42765 root 1148 mountd Idle *.2049 root 1150 nfsd Idle TCP: IPv4 Local Address Remote Address User Pid Command Swind Send-Q Rwind Recv-Q State-------------------- -------------------- -------- ------ ------------- ------- ------ ------- ------ -----------127.0.0.1.5999 *.* root 133 in.mpathd 0 0 128000 0 LISTEN *.111 *.* daemon 980 rpcbind 0 0 128000 0 LISTEN *.* *.* daemon 980 rpcbind 0 0 128000 0 IDLE *.111 *.* daemon 980 rpcbind 0 0 128000 0 LISTEN *.* *.* daemon 980 rpcbind 0 0 128000 0 IDLE *.36887 *.* root 999 statd 0 0 128000 0 LISTEN *.65159 *.* root 999 statd 0 0 128000 0 LISTEN10.311.249.18.58810 10.312.132.13.636 root 851 nscd 49232 0 128872 0 ESTABLISHED *.4045 *.* daemon 1008 lockd 0 0 1049200 0 LISTEN *.4045 *.* daemon 1008 lockd 0 0 1048952 0 LISTEN *.22 *.* root 1030 sshd 0 0 128000 0 LISTEN127.0.0.1.25 *.* root 1068 sendmail 0 0 128000 0 LISTEN127.0.0.1.587 *.* root 1068 sendmail 0 0 128000 0 LISTEN *.47629 *.* root 1148 mountd 0 0 128000 0 LISTEN *.35906 *.* root 1148 mountd 0 0 128000 0 LISTEN *.2049 *.* root 1150 nfsd 0 0 1049200 0 LISTEN *.2049 *.* root 1150 nfsd 0 0 1048952 0 LISTEN127.0.0.1.1008 *.* pkg5srv 1600 0 0 128000 0 LISTEN10.311.249.18.857 10.311.246.25.2049 casper 0 <kernel> 49232 0 1049800 116 ESTABLISHED10.311.249.18.22 10.311.249.34.64127 root 1030 sshd 263536 63 128872 0 ESTABLISHED127.0.0.1.6010 *.* casper 1969 sshd 0 0 128000 0 LISTENTCP: IPv6 Local Address Remote Address User Pid Command Swind Send-Q Rwind Recv-Q State If--------------------------------- --------------------------------- -------- ------ -------------- ------- ------ ------- ------ ----------- -----::1.5999 *.* root 133 in.mpathd 0 0 128000 0 LISTEN *.111 *.* daemon 980 rpcbind 0 0 128000 0 LISTEN *.* *.* daemon 980 rpcbind 0 0 128000 0 IDLE *.36887 *.* root 999 statd 0 0 128000 0 LISTEN *.4045 *.* daemon 1008 lockd 0 0 1049200 0 LISTEN *.22 *.* root 1030 sshd 0 0 128000 0 LISTEN ::1.25 *.* root 1068 sendmail 0 0 128000 0 LISTEN *.47629 *.* root 1148 mountd 0 0 128000 0 LISTEN *.2049 *.* root 1150 nfsd 0 0 1049200 0 LISTEN ::1.6010 *.* casper 1969 sshd 0 0 128000 0 LISTEN ::1.51794 ::1.6010 casper 1970 xterm 130880 0 139264 0 ESTABLISHED ::1.6010 ::1.51794 casper 1969 sshd 139060 0 130880 0 ESTABLISHED Active UNIX domain socketsType User Pid Command Local Address Remote Addressstream-ord casper 1969 sshd (socketpair) (socketpair)stream-ord casper 1969 sshd (socketpair) (socketpair)stream-ord casper 1969 sshd (socketpair) (socketpair)stream-ord casper 1969 sshd (socketpair) (socketpair)stream-ord casper 1969 sshd (socketpair) (socketpair)stream-ord root 372 dbus-daemon /var/run/dbus/system_bus_socketstream-ord root 1028 rmvolmgr /var/run/dbus/system_bus_socketstream-ord root 372 dbus-daemon /var/run/dbus/system_bus_socketstream-ord root 943 hald /var/run/dbus/system_bus_socketstream-ord root 1004 inetd /system/volatile/inetd.udsstream-ord root 943 hald /system/volatile/hald/dbus-TM2nMhzrpMstream-ord root 993 hald-addon-sto /system/volatile/hald/dbus-TM2nMhzrpMstream-ord pkg5srv 1601 httpd.worker /system/volatile/pkg/sysrepo/wsgi.1601.0.1.sockstream-ord root 943 hald /system/volatile/hald/dbus-TM2nMhzrpMdgram root 988 in.ndpd /system/volatile/in.ndpd_mibstream-ord root 988 in.ndpd /system/volatile/in.ndpd_ipadmstream-ord root 970 hald-addon-cpu /system/volatile/hald/dbus-TM2nMhzrpMstream-ord root 943 hald /system/volatile/hald/dbus-MIhDasTVfystream-ord root 944 hald-runner /system/volatile/hald/dbus-MIhDasTVfystream-ord root 943 hald /system/volatile/hald/dbus-MIhDasTVfystream-ord root 943 hald /system/volatile/hald/dbus-TM2nMhzrpMstream-ord root 372 dbus-daemon /var/run/dbus/system_bus_socketstream-ord root 922 console-kit-da /var/run/dbus/system_bus_socketstream-ord root 196 rad /system/volatile/rad/radsocket-unauthstream-ord root 372 dbus-daemon (socketpair) (socketpair)stream-ord root 372 dbus-daemon (socketpair) (socketpair)stream-ord root 196 rad /system/volatile/rad/radsocketstream-ord root 372 dbus-daemon /var/run/dbus/system_bus_socketAdding the option -v, you also get command line:UDP: IPv4 Local Address Remote Address User Pid State Command-------------------- -------------------- -------- ------ ---------- ---------------- *.50258 root 1038 Idle /usr/sbin/syslogd *.* root 133 Unbound /lib/inet/in.mpathd *.* root 133 Unbound /lib/inet/in.mpathd *.* netadm 721 Unbound /lib/inet/nwamd *.* netadm 721 Unbound /lib/inet/nwamd *.123 root 961 Idle /usr/lib/inet/ntpd -p /var/run/ntp.pid -g...And for half-closed connection, you'd also get the information you want:TCP: IPv4 Local Address Remote Address User Pid Command Swind Send-Q Rwind Recv-Q State-------------------- -------------------- -------- ------ ------------- ------- ------ ------- ------ -----------127.0.0.1.55770 127.0.0.1.4321 casper 1033 closewait 130880 0 139264 0 FIN_WAIT_2127.0.0.1.4321 127.0.0.1.55770 casper 1031 closewait 139264 0 130880 0 CLOSE_WAIT127.0.0.1.54943 127.0.0.1.4321 casper 1033 closewait 130880 0 139264 0 FIN_WAIT_2127.0.0.1.4321 127.0.0.1.54943 casper 1031 closewait 139264 0 130880 0 CLOSE_WAIT127.0.0.1.41279 127.0.0.1.4321 casper 1033 closewait 130880 0 139264 0 FIN_WAIT_2127.0.0.1.4321 127.0.0.1.41279 casper 1031 closewait 139264 0 130880 0 CLOSE_WAIT...PS: I used the Hollywood IP extension to masquerade the IP addresses.

As it has been years since I've blogged, let me start with one of smallest features I added to Solaris 11.2; an option to netstat(1m), allowing administrators to figure out who is using which port and...

OpenSolaris

OGP election

About a week ago I accepted my nomination for the OGB after being nominated by Garrett D'Amore medio february.Why do I nominate myself?I've always felt a strong sense of community with all folks involved with Unix, SunOS and later Solaris. Having earned the dubious distinction of running one of the few Solaris 2.1 sites in production and sharing my experiences of that time with the world, I can truly say that I have been part of the Solaris community pretty much from the day it was born.I joined Sun some years later, in 1995, and continued to be outward facing and involved with the community, regardless of whatever folly reigned at Sun at the time such as the time the edict came that all outside communication needed to be approved by a PR person. Surely they wouldn't have found the time to approve my 1000s of posts, even if they had found the will.As the most prolific Sun employee/poster in OpenSolaris I believe I have firmly established my role as a community player; leading the laptop community and sharing some of the stuff I made through the OpenSolaris website.I also think the OGB needs a person who is well-versed in Solaris "ON" development; someone who knows an ARC from a C-team and who has more than passing knowledge of our development process.As an OGB member, I think I foremost want to focus on getting the open development process moving ahead more smoothly; and this does mean direct commit access. The current system is too much of a bottleneck for external development. As a Sun employee, I can look at both sides of the fence which can help resolve issues between Sun and the community. But I also believe in quality all the time; it is what makes Solaris fairly stable to use, even using the more or lessexperimental releases.I've written a bunch of code and have literally build 100s of kernels; some of which blew up spectacularly but others of which code found its way back into (Open)Solaris such as Solaris privileges, getpeerucred() et.al. I've distributed experimental code (acpidrv, powernow) and even experimented with the X server. As a security person, I have needed to touch larger parts of the system than many of my peers; security bugs know no boundaries.I make a point of always running the latest release of Solaris Nevada on most of my systems, that is, unless there's fatal brakage. So my little home server, my laptop and my desktops all run Snv_59 today.Once we have established all our procedures, I see the OGB pretty much as a hands-off body. We are there to quell conflicts but I don't think we should be pussyfooting around the mailing lists; a spade is a spade so let's not call it by another name. Arguments are healthy and should not be supressed unless they become destructive. I like to think that developers like it that way: just enjoy doing there thing with as little as possible outside interference.Yes, I'm late in posting this; let me just say that life was pretty full the last few weeks. Both workwise (moving office and some urgent matters which required me to skip some vacation) and personal (buying a new house). Lame excuses, I know.Let your vote be counted!

About a week ago I accepted my nomination for the OGB after being nominated by Garrett D'Amore medio february. Why do I nominate myself?I've always felt a strong sense of community with all...

Solaris

Laptops

After advancing the state of Solaris on the Ferrari 3400 with frkit, someone suggested that I should one of all new laptops we at Sun may decide to standardize on.That's how it came to be that I now have both a Ferrari 3400 and a Ferrari 4000.But today in the mail, I got a message telling me of yet another laptop heading my way. This time a lightweight Fujitsu s2110, again a AMD64 based laptop, asthose are the ones we like best.Perhaps should I make a plot of laptops I got and when and then see if I can estimate the curve; I think I got one in '96, one in 2000, another one in dec 2004and then again 6 months later and yet again 3 months later; with this accelerating pace it'd be one a day at christmas and one per hour early next year. Hm, perhapsnot likely.Solaris keeps on improving rapidly when it comes to device support; and while in the laptop space things appear to be moving forward very rapidly, there alsoappears to be some gravitating toward common chipsets. Graphics are often an issue but the fact that the Ferrari 4000 comes with a ATI X700 has the consequencethat the updating of the Xorg ati driver is done much more quickly than before.The Ferrari 3400 is relatively well supported in S10, though I think you really need my powernow driver and even then it still runs fairly hot to the touch.The Ferrari 4000 requires some external drivers, but then, so does most bleeding hardware, regardless of OS. For the Ferrari 4000, you'll need to downloadthe ethernet driver "bcme" from broadcom.com and we're working hard on getting OSS sound to work nicely on it. The 4000 runs much cooler than the 3400, butthe downside is that it always has its fan blowing, albeit quitely. Probably because of a device enumeration bug, the firewire does not yet work. The SDcard reader is a special device and we do not support it, unfortunately.Of course, we're working on getting our broadcom ethernet driver "bge" and the one by broadcom "bcme" to be merged and shipped as a single driver.Cardbus support is coming for all laptops, as the cardbus interface is properly standardized and they all work more or less the same.I haven't gotten the Fujitsu yet, so I can't tell how well that will run and/or whether tweaking is necessary.I don't like to recommend any particular brand or kind of laptop; one recommendation which I can make is this: run Solaris Express on it. It will get all the laptop featuresyou may want much sooner. Such features include new drivers, Xorg support for new hardware, ACPI support, newboot, bug fixes (in some cases the difference between a device working and not working is just a small fix in an existing driver).S10 was a huge leap forward and brought Solaris for x86/x64 to a point where it again runs on lots of (server) hardware. In Solaris Express, there is much more room fordesktop/laptop innovation. We now ship several different x64 desktop platforms, so the x64 desktop/laptop space has much more visibility inside Sun.If you want performant OpenGL, the only choice you have now is buying a laptop with an nVidia graphics chip and installing the nVidia "closed source" driver.On the wireless front, things are moving but slowly, but more soon here. So watch this space.Bluetooth is still a barren landscape when it comes to Solaris; I can't use the bluetooth rodent that came with the Ferrari 4000 (I'm saying rodent because it's quite a bitbigger than a mouse)Note: I've just started the laptop-discuss list at opensolaris.org

After advancing the state of Solaris on the Ferrari 3400 with frkit, someone suggested that I should one of all new laptops we at Sun may decide to standardize on.That's how it came to be that I now...

OpenSolaris

First Installment (of frkit)

I've teased people before about the nifty hacks I've been doing for my Ferrari 3400 laptop.The hacks I did and the tool I wrote to make the distribution easier were so well liked thatthere was this "meme" propagating that whenever we got even cooler laptops, I should get thefirst one. And so it happened, I literally got the first Ferrari 4000 shipped to Sun.Now, this is a whole different beast than the Ferrari 3400 and I haven't yet gottenquite to the same comfort level yet.I've long promised to make all of the neat stuff available, but legalities are the difficultpart of such a venture. But now with OpenSolaris and a supported license scheme(plus management buy-in), I now feel comfortable to release the stuff which I wroteor was derived from source now available under the CDDL)The first installment includes my single CPU "PowerNow!(tm)" driverand my battery driver and utility.What the heck, let's throw in the mdb scripts which enable the additional keys on the Acer keyboards(mail, www, P1, P2, audio control). Some of these appear standard controls and may work for internet keyboards as well. The tar.gz files all come with an install script which will take care of all the details of the installation; the battery driver requires ACPICA; that isonly included in Solaris Nevada (11) build 14 and later.I'll see what I can do about the GNOME battery utility we've done as well; oh, sorry for the somewhat lacking documentation.Update: I've added acpipowertool, a small graphic battery meter by Matt Simmons, and fixed some installation issues for root user's without "nm" in $PATH.Update2: (2005/7/31) Ive upgraded powernow so it works for more systems and to better integrate powernowadm with SMF; apcidrv is also updated to do a little bit more of thermal zone handling. acpidrv only works for Solaris express build 14 and later; powernow should work with Solaris 10 GA also.Update3: frkit is for some time now available as runnable script at www.opensolaris.org in the

I've teased people before about the nifty hacks I've been doing for my Ferrari 3400 laptop. The hacks I did and the tool I wrote to make the distribution easier were so well liked thatthere was this...

OpenSolaris

User Credentials and all that

Peter Harvey's story reminds me of the unforeseenconsequences of creating the ucred in Solaris 10. The ucred was motivated by two factors:the introduction of privileges and a way to propagate information about processcredentials through the system in userland.Before Solaris 10, we had several mechanisms, some internal, some public, all propagatinga subset of that information.in sys/door.h:/\* \* Structure used to return info from door_cred \*/typedef struct door_cred { uid_t dc_euid; /\* Effective uid of client \*/ gid_t dc_egid; /\* Effective gid of client \*/ uid_t dc_ruid; /\* Real uid of client \*/ gid_t dc_rgid; /\* Real gid of client \*/ pid_t dc_pid; /\* pid of client \*/ int dc_resv[4]; /\* Future use \*/} door_cred_t;in sys/tl.h:#define TL_OPT_PEER_CRED 10typedef struct tl_credopt { uid_t tc_uid; /\* Effective user id \*/ gid_t tc_gid; /\* Effective group id \*/ uid_t tc_ruid; /\* Real user id \*/ gid_t tc_rgid; /\* Real group id \*/ uid_t tc_suid; /\* Saved user id (from exec) \*/ gid_t tc_sgid; /\* Saved group id (from exec) \*/ uint_t tc_ngroups; /\* number of supplementary groups \*/} tl_credopt_t;in rpc/svc.h:/\* \* Obtaining local credentials. \*/typedef struct __svc_local_cred_t { uid_t euid; /\* effective uid \*/ gid_t egid; /\* effective gid \*/ uid_t ruid; /\* real uid \*/ gid_t rgid; /\* real gid \*/ pid_t pid; /\* caller's pid, or -1 if not available \*/} svc_local_cred_t;and in the project I missed this one in sys/stropts.h:struct k_strrecvfd { /\* SVR4 expanded syscall interface structure \*/ struct file \*fp; uid_t uid; gid_t gid; char fill[8];};There was also the need to be able to enquire about other processes and perhaps networkconnections and packets; a getpeereidinterface was requested.Now, what information should such an interface return? Network interfaces often only allowyou to shape requests as a blob of bytes. And that blob needs to have a predictable maximumsize too. As you can see from the above examples, even declaring a number of filler elementsis not sufficient; none of the above structures which include a filler have space for thefull complement of 16 groups, let alone Pete's proposed 65536 maximum number of groups.The most natural way of implementing a blob which such restrictions is using an opaquedata structure with accessor functions (in <ucred.h>):extern ucred_t \*ucred_get(pid_t pid);extern void ucred_free(ucred_t \*);extern uid_t ucred_geteuid(const ucred_t \*);extern uid_t ucred_getruid(const ucred_t \*);extern uid_t ucred_getsuid(const ucred_t \*);extern gid_t ucred_getegid(const ucred_t \*);extern gid_t ucred_getrgid(const ucred_t \*);extern gid_t ucred_getsgid(const ucred_t \*);extern int ucred_getgroups(const ucred_t \*, const gid_t \*\*);extern const priv_set_t \*ucred_getprivset(const ucred_t \*, priv_ptype_t);extern uint_t ucred_getpflags(const ucred_t \*, uint_t);extern pid_t ucred_getpid(const ucred_t \*); /\* for door_cred compatibility \*/extern size_t ucred_size(void);extern int getpeerucred(int, ucred_t \*\*);extern zoneid_t ucred_getzoneid(const ucred_t \*);extern projid_t ucred_getprojid(const ucred_t \*);The ucred_t itself is defined in sys/ucred.h,a header which isn't installed on the system because programs are not supposed to use it; it is a private interface betweenthe kernel and the library.One function of note is perhaps ucred_size() which returns the maximum size of a credential on the system;it can be used to size credentials allocated on the stack or embedded in structures.In many cases, the system will just allocate one for you and return the allocated one, but the interfaceshave been structured so you can reuse ones returned earlier or ones you allocated yourself.By now you may be asking yourself where you get creds; well, here are some examples in the OpenSolarissource code:nscd getting a door cred,rpcbind getting an rpc caller credential andthe use of the TL option by RPC.And your typical use of the function in an inetd started daemon:#include <ucred.h>intmain(int argc, char \*\*argv){ucred_t \*uc = NULL;if (getpeerucred(0, &uc) == 0) {/\* we know something about the caller \*/} return (0);}And a slightly bigger example where we use XPG4 recvmsg to receive a UCRED control messages:/\* \* Send a 1 byte UDP packet; print the response packet if one is \* received. \*/#include <sys/socket.h>#include <sys/uio.h>#include <sys/signal.h>#include <netinet/in.h>#include <stdio.h>#include <unistd.h>#include <string.h>#include <netdb.h>#include <stdlib.h>#include <arpa/inet.h>intmain(int argc, char \*\*argv){ struct sockaddr_storage stor; struct sockaddr_in \*sin = (struct sockaddr_in \*)&stor; struct sockaddr_in6 \*sin6 = (struct sockaddr_in6 \*)&stor; ssize_t bytes; union { struct cmsghdr hdr; unsigned char buf[2048]; double align; } cbuf; unsigned char buf[2048]; struct msghdr msg; struct cmsghdr \*cmsg; struct iovec iov; int one = 1; msg.msg_name = &stor; msg.msg_iov = &iov; msg.msg_iovlen = 1; iov.iov_base = buf; setsockopt(0, IPPROTO_IP, IP_RECVDSTADDR, &one, sizeof (one)); setsockopt(0, IPPROTO_IPV6, IPV6_RECVPKTINFO, &one, sizeof (one)); setsockopt(0, SOL_SOCKET, SO_RECVUCRED, &one, sizeof (one)); alarm(30); while (1) { char abuf[256]; msg.msg_control = &cbuf; msg.msg_controllen = sizeof (cbuf); msg.msg_namelen = sizeof (stor); iov.iov_len = sizeof (buf); bytes = recvmsg(0, &msg, 0); if (bytes >= 0) { if (msg.msg_namelen != 0 && connect(0, (struct sockaddr \*)&stor, msg.msg_namelen) != 0) exit(1); printf("you connected from %s with the credential\\n", inet_ntop(stor.ss_family, stor.ss_family == AF_INET ? (void \*)&sin->sin_addr : (void \*)&sin6->sin6_addr, abuf, sizeof(abuf))); for (cmsg = CMSG_FIRSTHDR(&msg); cmsg; cmsg = CMSG_NXTHDR(&msg, cmsg)) { if (cmsg->cmsg_level == SOL_SOCKET && cmsg->cmsg_type == SCM_UCRED) { ucred_t \*uc = (ucred_t \*) CMSG_DATA(cmsg); /\* We have a ucred here !! \*/ } } if (msg.msg_namelen != 0) (void) connect(0, NULL, 0); } else { exit(1); } }}But thinking back of Pete's problem, we see a problem when increasing max groups,even worse, this libnsl private datastructure is abused and multiple copies exist which need to be kept in sync (so partsof the system broke when I changed it in this one place). The bug is an illustration why cut & paste programmingdoesn't work and why even when you share a private defintion, you must use a proper header file.I filed the bug as soon as I did the quick fix for the Solaris Express respin, the bug is

Peter Harvey's story reminds me of the unforeseen consequences of creating the ucred in Solaris 10. The ucred was motivated by two factors:the introduction of privileges and a way to propagate...

Solaris

The End of Realmode Boot

I've already mentioned two great new features in our current development release; ACPICA and USB hotplug.But there's one change that's much more far reaching than that: Newboot.Most Solaris x86 users will be familiar with the blue screen/device configuration assistant/boot sequence and how ancient some of that feels. Perhaps few are aware that the DCA is actually a realmode DOS like environment where each boot device requires its own realmode driver. These drivers needed to be compiled with a 16 bit compiler and 16 bit MASM, not available for ready money anywhere. While the official build environment required NT, I managed to build it onenvironments ranging from MS Windows 98 and 2000 on actual PCs to Caldera DOS 7 on a SunPCi card (which allowed for automatic building which was great fun). Now that thispiece of shameful history lies in the past, I am not afraid to confess.But as of last Sunday, April 17th, 2005, we have "legacy free" newboot. Newboot uses grub with ufs support so we now have native grub supportand a menu we can edit from inside Solaris. Device enumeration completely done using ACPIBecause it skip the device configuration assistant and boot a single large file with all kernel device drivers which makes startup quite a bit quicker and allowsus to boot from any bootable device as long as we also support it in the kernel so we can mount root.And we've reverted back to white on black consoles; this again takes some getting used, surprisingly enough.One thing to note is that before you may had to disable ACPI in the kernel and the BIOS; with Newboot + ACPICA, you actually stand a much better chance of thesystem working with all the default settings: ACPI on, ACPI 2.0 enabled. Even legacy USB enabled now has a much better chance of working than before.But this is a radical change an PC BIOSes and hardware being like it is, interesting times ahead. SO please test drive when this hits Solaris Express in a few monthstime.As of this writing, it's a bit in the balance whether you'll get to see the source first as part of OpenSolaris or the binaries as part of a Solaris Express.

I've already mentioned two great new features in our current development release; ACPICA and USB hotplug. But there's one change that's much more far reaching than that: Newboot.Most Solaris x86 users...

Solaris

ACPICA in Solaris

With the long history of neglect that Solaris on x86 endured, quite a few components got to be extremely stale and fragile. And this wasn't just a lack of device drivers but also a lack of basic new functionality in the core OS.This week saw another quantum leap; the induction of Intel's ACPI reference implementation (ACPICA) into the next Solaris release.For years I wanted to have battery support on my old VAIO and later on my Ferrari. And I wanted a power button that did something, etc. I tried to make do with the old "acpi_intp" interpreter which was part of Solaris; but it leaked memory like a sieve and was limited in functionality. Integrating ACPICA looked daunting but fortunately someone made an actual project out of this and the end result is that we now have a state of the art ACPI interpreter in Solaris.There are basically only two ACPI interpreters in widespread use: the Windows one and the Intel one; by leveraging Intel's source, we stand a fair chance of having Solaris work with more ACPI BIOSes. If our system required ACPI to be turned of for Solaris to work, you may find yourself forced to switch it on when you upgrade later this year.I've been distributing acpica and a number of other useful Solaris binaries in a single internal kit called "frkit" (originally aimed at Ferrari's but now running on countless systems);frkit includes acpica, a powerbutton/battery handler, an AMD PowerNOW! powermanagement module, a GNOME battery monitor, and our development cardbus and wireless drivers + tools.One of the more interesting parts of that is possibly the "NDISulator" port from FreeBSD which allows the Sun Ferraristi to use the builtin Broadcom wireless on their ACer Ferrarisin 32 and 64 bit mode.ACPICA is just phase one of a larger project; we have not yet bothered much with the "P" (for power) from ACPI; but we hope to leverage the new implementation to providethe necessary "S3" and "S4" sleep state support.The speed at with new features work on my Ferrari which I've had now for 4 months is in stark contrast with my Vaio which I got not too long before S9 for x86 was postponed.It's clear that we needed a ramp-up after the wind-down, but it seems to be going more quickly than ever before.

With the long history of neglect that Solaris on x86 endured, quite a few components got to be extremely stale and fragile. And this wasn't just a lack of device drivers but also a lack of basic...

Solaris

Timezones and multi boot.

One of the things that has always been bothering me is the fact that on x86 systems you cannot really run multiple operating systemsand survice the timezone change. That's because the clock runs in local time; and localtime is ambiguous. The system cannottell whether the DST change has been or not so it needs to record this fact in the filesystem (that's why Solaris on x86 has the"rtc -c" cronjob). If you boot all your OSes in turn after the changeover, your system will be N hours off once they'reall done adjusting time. The problem is probably best summarized hereOn Unix this was long solved by running the clock in the UTC or GMT timezone; that clock is unambiguous, give or take a leap second,and allows multiple versions of the OS to coexist.Last week, it was pointed out to me in comp.unix.solaristhat there is a hidden registry key in later releases of MS-Windows. I already knew how to fix up Solaris, so I combined this to: Set the following registry key (it does not exist!)HKLM/SYSTEM/CurrentControlSet/Control/TimeZoneInformation/RealTimeIsUniversal(REG_DWORD = 1) In the control panel with Day&Time settings, check the "automatically adjust" check box. Boot into Solaris and run: rtc -c -z UTC then correct the clock with date/rdate (if you use liveupgrade, lumount your other partition(s) and copy the /etc/rtc_config file to all of them) In Linux, you'll need to run "timeconfig" and select "RTC set to GMT".Note that if you don't multiboot, it's probably also a good idea to run "rtc -c -z UTC" and then correct the date; for one you won't be bitten by the AMD64 timezone bug we had in Solaris 10.

One of the things that has always been bothering me is the fact that on x86 systems you cannot really run multiple operating systemsand survice the timezone change. That's because the clock runs...

OpenSolaris

Open Solaris CAB

It has been a busy week flying to SFO and having our first CAB meeting. The first good thing that happened was that KLM had finally changed the aging and horrible MD11 for shiny new Boeing 777s with personal video. I have a bit of a problem with the new immigration procedures, and I like the Brazilian's government stand on this.I had met Rich Teer fleetingly before in the hallways of the Menlo Park Campus so he recognized me when we checked in at the same time; we probably were on the same BART train from the airport. But I had never met the others. I feel we have a great team with very many different competences, from Roy Fielding's experience with Apache's governance model,Simon Phipps' tireless evangelism. And Al Hopper with his tireless Solaris on x86 enthusiasm. Rich Teer, a SPARC fan, and accomplished author and myself as Solaris engineering representative, being the more technical side of things.Are we just marketing as the Register would have it? No, we're very serious about it. Is the CAB just a bunch of YES-men? Can we get the respect of our community if we are?Sun takes both Open Solaris and the independence from Sun serious; Jonathan Schwartz came to meet us but none of the other executives was allowed at our meeting. He talked to us at length and was very serious about clearing up the roadblocks that we had already determined to be on the path to OpenSolaris. It is clear that they want us to succeed and want us to independent. Jonathan even stayed for lunch. The Sun press conference we took part in was a first for me. The press was not hostile and mostly asked questions which were to the point; some more than others.We have a lot of work to do and will do most of it in email on a publicly readable mailing list.The second day we listened to Jonathan's keynote at the OSBC conference and spend the afternoon doing interviews with the press followed by a press reception and Sun engineering diner/Open Solaris launch party at Lulu's. And guess what, we were able to make the Americans walk all over town, the itenary was all "5 min cab ride" and some such nonsense.On the final day I took Ben Rockwood's advice and tried out "Clam chowder in sourdough" after taking the cable car to lombard street and walking down to the harbor. The weather was gorgeous, the same cannot be said of the weather in Amsterdam which is now unseasonably cold.

It has been a busy week flying to SFO and having our first CAB meeting. The first good thing that happened was that KLM had finally changed the aging and horrible MD11 for shiny new Boeing 777s with...

Solaris

Fujitsu Lifebook B112 running Solaris 10

As one of two resident Solaris Engineers in Holland, you sometimes get strange requests such as one from an IT operations person who wasgiven this really old laptop. A fujitsu Lifebook B112; 96MB, 3GB harddisk. It was running Solaris 7 and he had no root password.Hacking it was not too difficult; installing Solaris 10, of course, would be more fun.There are a few challenges getting a vintage laptop up and running; it has no onboard networking, no bootable CD.And Solaris 10 does not support booting from the Xircom PE3 parallel port ethernet card anymore. And where would you even find such a device?Well, turns out that we were indeed able to locate a Xicrom PE3 adaptor and armed with "perl" we could make the S9 "pe" driver run under S10; the "pe" driver is no longer supported because we obsolete GLDv0 but by making PE load "misc/GLD" rather than "misc/gld" and installing the old GLD driver we added the device to the miniroot on the S10 install server. Armed with an S9 boot floppy we then booted S10 from the server and started the install. It couldn't fit all of Solaris 10 so we removed a few components to cut it down so it would fit in the 2.8GB partition (the installer is a bit generous in allocating space so at the end we only had 2.0 GB installed).After churning for around 3 hours, Solaris had been installed (Pe ethernet is very slow and a 233MHz Pentium is not very fast). I had to go home (friday) and resolved to see how far we'd get on monday. But I couldn't wait.The next hack was attacking a system sitting idle with install finished (we didn't dare reboot because it'd come up w/o ethernet if we did) from the install server. After looking with snoop I found that there are actually two ways of doing that: the first one is the new "eventhook" mechanism we have for dhcp; whenever a dhcp eventoccurs, the eventhook script is run. The second method was even simpler; it turns out that Solaris 10 init stat's inittab every 5 minutes. So I added a line toinittab which popped an xterm up over my VPN tunnel to my house. Added the "PE" ethernet driver; finished some other config stuff by hand and rebooted.And it came up, still with PE but I soon killed it remotely fiddling with cardbus and the PCMCIA Ethernet card.With my own modified PCIC driver I was able to get the lifebook to use "pcelx0", with all supported devices. Xorg came up without a hitch too, just a littlebit of fiddling to get it to use the External monitor; no luck with the touchscreen yet. Here's what it looked like Even sound was simple; ``update_drv -a -i '"ESS1879"' sbpro'' and the sound driver attached.

As one of two resident Solaris Engineers in Holland, you sometimes get strange requests such as one from an IT operations person who was given this really old laptop. A fujitsu Lifebook B112; 96MB,...

Solaris

Solaris Privileges

So what makes Solaris Privileges different? Why didn't we copy something else like Trusted Solaris Privileges or "POSIX" capabilities?Let's start from what we formulated as our requirements near the beginning of our project.One of the important features of Solaris is complete binary backward compatibility; in order to offer that we needed to design the privilege subsystem in such a manner that current practices, binaries and products would continue to work. Of course, some have solved this issue by providing a system wide knob to turn: root / root + privileges / just privileges. We don't like knobs in our OS; specifically not ones which drastically alter the behaviour of a system. It makes it harder to develop software; it needs to work for all settings. Certain productsmay require conflicting settings, and so on. So we decided on a "per-process" knob which is largely automaticWith backward compatibility comes the onus on the software developer to develop future proof interfaces; that ruled out all other interfacesas they all have fixed bitmaps and fixed privilege/capability numbers, fixed structure sizes in the programmer visible parts of the system.Solaris Privileges have none of that. And while we could savely reuse the names of the Trusted Solaris interfaces we can not redefine interfaces even from a defunct standard. So we have interfaces which smell like Trusted Solaris but with a completely new userland representation of privileges and privilege sets. We can never have more signals; but we can have more privileges and more privilege sets!The privileges and privilege sets in Solaris 10 are represented to userland processes and non-core kernel modules as strings; privilegesets are bitmasks of undetermined size; they can only be allocated through the C library routines. Privilege set names are also strings andnot plain integer indices; this gives us even more flexibility. A Solaris binary compiled for 4 privilege sets of each 32 privileges will continue to work on a Solaris system with 5 privilege sets each of which can contain 64 privileges and with all the privileges having theirinternal representation renumbered.Evolving from the Super user modelThe traditional super user model is fairly straighforward; a process has three uids associated with it the effective uid, the real uid and the saved uid. A privileged process is a process which runs with an effective uid 0. A process can temporarily relinquish privileges bysetting the effective uid back to the real uid; the saved uid remains 0 or can be set to the real user id as well; in the latter case the process has permanently relinquished its privileged status. But if the saved uid is 0 the process can swap the effective uid back andforth, implementing some form of privilege bracketing. Of course, once such a process is compromised, an exploit can also swap the effectiveuid back to 0. And the only choice is to have all privileges available or none.In your typical Unix privilege model the powers formaly associated with uid 0 (PFAWU0) are split into a number of privileges; each process has three different privilege sets; the Effective, Permitted and Inheritable sets, or E, P and I, for short. E closely models the effective uidand P is very much like the saved/real uids: the Effective set determines which privileges a process has active; this is the set of privileges the kernel verifies its privilege checks against. The Permitted set is very much like the saved set; it contains privileges a process is allowed to use. So a process is free to remove any privilege from E and a process is free to add whatever privilege he wants to E as longas it carries that privilege in P. The Inheritable set allows a process to pass privileges on to sub processes; e.g., in case you want to run a webserver with a particular uid but with the additional privilege allowing it to bind to port 80. You put the privilege in the inheritableset (if it is in your permitted or effective set) and the executable will run with that privilege.Privilege bracketing is then performed by adding privileges to E and removing them from E; when a process is done and wishes torelinquish all privileges forever, it removes them from P (which automatically causes removal from E).The fourth privilege set we use in Solaris is the "Limit" set. The privilege set is the upper limit of privilege a process and itsoff-spring can ever obtain. Solaris uses the limit set for a number of additional things; it is used to restrict the power of the super userin non-global zones and it is used as a mechanism to determine with what privileges a backward compatible uid 0 process runs with.So how is this compatibility achieved? Well, after a long debate the answer we came up with was really simple: if we want an implementation tobe backward compatible for applications which don't know better, what's simpler than a single per-process knob? This know we've called "Privilege Awareness" (PA) and in order to explain this we introduce the notion of Observability; and we make the kernel operate on the observed effective (EO) and observed permitted (PO) set; these contrast with the implementation sets, the actual bits in the kernel credential for a process.The kernel then sees the privilege sets as follows: EO = euid == 0 && !PA ? L : EI PO = any uid == 0 && !PA ? L : PIThe observed set closely follows the effective uid and the permitted set models the fact that if any of your uids is 0 you can recover an effective uid of 0. A process becomes privilege aware if it modifies its E or P set or when it requests to become privilege aware. A PA process can also request to become non-PA but such a transition is only possible if the observed sets can be made to remain constant on such a transition. The kernel will try to drop PAness on exec().The privilege sets can be modified by the process itself but the kernel modifies them also at exec(2) time using the following rules: I' = L & I (L intersected with I) E' = P' = I' L remains unchangedFor your typical process this is a noop, as the typical process has the following privileges: I = P = E = { basic } L = { all zone privileges }As seen here, the system defines a set of privileges known as the "basic" set; it is a set of privileges requires for operations which traditionally weren't privileged in Unix; the design of the basic privilege set and the specific rules about its use make sure that it too can be extended in future with new privileges, without requiring applications which use the feature now privilege to be modified. The current set of basic privileges in Solaris are the privilege needed to fork(), a privilege needed to exec(), a privilege needed to create hardlinks to files not owned by the current effective uid, a privilege needed to send signal to processes outside of your current session and the privilege needed to see other user's processes.In order to properly work in an environment with a changed basic set, a process would specify the privileges it needs as the basic set + non basic privileges minus the basic privileges it knows it doesn't need. When the basic set is then extended in a later Solaris release the process is guaranteed to continue to work.What I've outlined up to this point is mostly an enabling technology; in and by itself it does not make the system more secure; but it allows us to harden the system and reduce the risk.Why privileges should not be orthogonalMy background is very much a hardening background and not a Trusted Computing background; so I have always felt that the privilege model as employed in a number of operating systems has one serious weakness: most of these operating systems define privileges which allow you toacquire more privileges. Typical for such privileges are single privileges which allow you to write directly to disks or kernel memory. What is the point of such privileges? You can just as well give process requiring such privileges all privileges. In Solaris we have defined a very simple rule which I've dubbed The principle of privilege escalation prevention; the basic rule is this: "an operation needs at least as many privileges to be performed as can be gained by executing it". Simply put: if you want to write to /dev/kmem, you will need all privileges! If you want to control a process, you will need at least as many privilege as a process has. If you want to assign privileges to another process, you must have those privileges. If you want to mount on top of something, you must own that something.And when we find more of such holes we will plug them. Of course, we still have a little bit of a problem with the user with uid 0: he still own all the files and we have not restricted him in writing to his files. So in order to practice safe computing, run with a different uid and perhaps a few extra privileges.Any, that's all for today of to the beach and see you in two weeks or so. Or see you at Usenix security in San Diego! In the next episode we'l shed some light on the repercussions of changing the credential and the visible changes obviated in userland; we'll answer such important questions as "will door_cred()" survive privileges?

So what makes Solaris Privileges different? Why didn't we copy something else like Trusted Solaris Privileges or "POSIX" capabilities?Let's start from what we formulated as our requirements near...

Oracle

Integrated Cloud Applications & Platform Services