Privilege (Set Me Free)
By comay on Jan 09, 2007
One of the perhaps lessor known features of Solaris Containers or Zones is that applications running inside these virtualized environments execute with less privileges than applications executing outside the container. This is enforced through the Solaris Privileges framework which was also introduced in the Solaris 10 release.
When comparing virtualization solutions, typically OS level virtualization mechanisms like Zones or FreeBSD Jails are thought to provide less security than mechanisms where a machine architecture is virtualized, such as with the family of products from VMware or with paravirtualization mechanisms such as Xen, in which the guest OS is ported to the virtualized machine architecture. One reason for that is there is usually weaker separation between virtualized OS environments since at several levels in the kernel there is some sharing of data structures and code paths.
However in some cases, OS level virtualization provides an advantage for certain aspects of security. For example, with Solaris Containers the privilege mechanism in the kernel enforces limitations on the types of operations an application can perform. Consider the case of the ability to create or "plumb" a software networking interface using ifconfig(1M) or set an IP address on that interface. In some situations, one wants to allow such operations inside a virtualized environment because a particular application requires the ability to change an existing IP address or to toggle an interface up or down. The ramification of this, however, is that a malicious or naive user inside the environment might change their IP address to something not expected with the results ranging from disruption in the network topology to the potential of spoofing another machine on the network. In addition, most applications do not actually require the ability to change their environment's IP address or create new network interfaces or even know the name of the interface in their environment. Rather, they typically want one or more IPv4 or IPv6 addresses which they can bind(3SOCKET) to.
In the case of Solaris Containers, the privilege to set the IP address of an interface is not given to any applications running inside a container and there is no way for an application to escalate or grow the set of privileges from those they started out with. The end result, in this example, is that the root password or super-user privileges can be given to a user inside a container but they will be unable to manipulate or affect the topology of the network or impersonate another machine and potentially gain access to its network traffic.1
Until recently, the set of privileges a container's applications were limited to was fixed. However starting with both Solaris Express 5/06 and Solaris 10 11/06, the global zone administrator can change this set of privileges. What this means from a practical point of view is that containers can become more capable by adding some of the privileges that are not usually present. An example here might be the ability to run DTrace from within the container2. Dan provided an excellent writeup on the details for doing so here.
As another example, by adding some additional privileges to the container's default privilege set, a Network Time Protocol (NTP) server can be deployed in the container which is preferable from a security point of view, especially for a server that might be facing a hostile Internet. In order to configure the container appropriately, the list of privileges that it requires needs to be known. Solaris 10 currently ships with the 3-5.93e version of xntpd(1M), which is the daemon that implements the NTP server capability. This particularly daemon actually can take advantages of three privileges that are not normally present within a container. The first, perhaps obviously, is the privilege to change the system clock - sys_time. With the addition of this privilege, xntpd will be able to successfully set the system clock when it needs to.
However it also turns out that the daemon tries to both lock down its memory pages and also run in the real-time scheduling class. It does this so that the daemon can maintain accurate time particularly in the face of other system activity. These two operations are also covered by unique privileges - proc_lock_memory and proc_priocntl.
Tying these privileges3 together, we can take an existing container and configure it to be a NTP server. In this example, Sun's internal network routes IP multicast and so I will leverage that to connect to the network's NTP servers listening on the standard multicast address of 22.214.171.124 for NTP: For example, consider this update to the configuration of the zone myzone:
global# zonecfg -z myzone zonecfg:myzone> set limitpriv=default,proc_lock_memory,proc_priocntl,sys_time zonecfg:myzone> commit zonecfg:myzone> exit global# zoneadm -z myzone boot
Then from within the newly booted container, I will set up the configuration of the server itself and start the service:
myzone# cp -p /etc/inet/ntp.client /etc/inet/ntp.conf myzone# svcadm enable network/ntp
The property that was set in the container's configuration,
limitpriv, consists of a list of privileges similar to the form
In this particular example, the container's privilege set is limited to
the standard default set of privileges plus the three additional
privileges required by the NTP server.
It is worthwhile to note that privileges can also be taken away by preceding them with an exclamation mark (!) or a minus sign (-). This can allow a container to be booted in which applications have even fewer privileges than usual. For example, to take away the ability to generate ICMP datagrams from the zone named twilight, the global zone administrator would configure the container as follows:
global# zonecfg -z twilight set limitpriv=default,!net_icmpaccess
There are a few restrictions on what privilges can be added to a container as well as some concerning which ones can be removed. For more details, please see the original proposal and the ensuing discussion on the zones-discuss mailing list. This proposal and many others concerning containers and other parts of OpenSolaris have benefited greatly from the participation of the OpenSolaris Zones Community. Information about each of these proposals can be found here.
2 The ability to use DTrace inside a non-global zone is at the present time restricted to Solaris Express as some additional changes to DTrace were required. However, these changes should be appearing in an upcoming Solaris 10 release.
3 Starting with Solaris Express 11/06, the privilege to lock memory has actually been added to the container's default set. This is because additional resource controls have been added that can limit the amount of memory applications within a container can lock so it is no longer necessary to make this privilege an optional one.