Thursday Jun 27, 2013

Improving Manageability of Virtual Environments

Boot Environments for Solaris 10 Branded Zones

Until recently, Solaris 10 Branded Zones on Solaris 11 suffered one notable regression: Live Upgrade did not work. The individual packaging and patching tools work correctly, but the ability to upgrade Solaris while the production workload continued running did not exist. A recent Solaris 11 SRU (Solaris 11.1 SRU 6.4) restored most of that functionality, although with a slightly different concept, different commands, and without all of the feature details. This new method gives you the ability to create and manage multiple boot environments (BEs) for a Solaris 10 Branded Zone, and modify the active or any inactive BE, and to do so while the production workload continues to run.

Background

In case you are new to Solaris: Solaris includes a set of features that enables you to create a bootable Solaris image, called a Boot Environment (BE). This newly created image can be modified while the original BE is still running your workload(s). There are many benefits, including improved uptime and the ability to reboot into (or downgrade to) an older BE if a newer one has a problem.

In Solaris 10 this set of features was named Live Upgrade. Solaris 11 applies the same basic concepts to the new packaging system (IPS) but there isn't a specific name for the feature set. The features are simply part of IPS. Solaris 11 Boot Environments are not discussed in this blog entry.

Although a Solaris 10 system can have multiple BEs, until recently a Solaris 10 Branded Zone (BZ) in a Solaris 11 system did not have this ability. This limitation was addressed recently, and that enhancement is the subject of this blog entry.

This new implementation uses two concepts. The first is the use of a ZFS clone for each BE. This makes it very easy to create a BE, or many BEs. This is a distinct advantage over the Live Upgrade feature set in Solaris 10, which had a practical limitation of two BEs on a system, when using UFS. The second new concept is a very simple mechanism to indicate the BE that should be booted: a ZFS property. The new ZFS property is named com.oracle.zones.solaris10:activebe (isn't that creative? ;-) ).

It's important to note that the property is inherited from the original BE's file system to any BEs you create. In other words, all BEs in one zone have the same value for that property. When the (Solaris 11) global zone boots the Solaris 10 BZ, it boots the BE that has the name that is stored in the activebe property.

Here is a quick summary of the actions you can use to manage these BEs:

To create a BE:

  • Create a ZFS clone of the zone's root dataset

To activate a BE:

  • Set the ZFS property of the root dataset to indicate the BE

To add a package or patch to an inactive BE:

  • Mount the inactive BE
  • Add packages or patches to it
  • Unmount the inactive BE

To list the available BEs:

  • Use the "zfs list" command.

To destroy a BE:

  • Use the "zfs destroy" command.

Preparation

Before you can use the new features, you will need a Solaris 10 BZ on a Solaris 11 system. You can use these three steps - on a real Solaris 11.1 server or in a VirtualBox guest running Solaris 11.1 - to create a Solaris 10 BZ. The Solaris 11.1 environment must be at SRU 6.4 or newer.

  1. Create a flash archive on the Solaris 10 system
    s10# flarcreate -n s10-system /net/zones/archives/s10-system.flar
  2. Configure the Solaris 10 BZ on the Solaris 11 system
    s11# zonecfg -z s10z
    Use 'create' to begin configuring a new zone.
    zonecfg:s10z> create -t SYSsolaris10
    zonecfg:s10z> set zonepath=/zones/s10z
    zonecfg:s10z> exit
    s11# zoneadm list -cv
      ID NAME             STATUS     PATH                           BRAND     IP    
       0 global           running    /                              solaris   shared
       - s10z             configured /zones/s10z                    solaris10 excl  
    
  3. Install the zone from the flash archive
    s11# zoneadm -z s10z install -a /net/zones/archives/s10-system.flar -p
    

You can find more information about the migration of Solaris 10 environments to Solaris 10 Branded Zones in the documentation.

The rest of this blog entry demonstrates the commands you can use to accomplish the aforementioned actions related to BEs.

New features in action

Note that the demonstration of the commands occurs in the Solaris 10 BZ, as indicated by the shell prompt "s10z# ". Many of these commands can be performed in the global zone instead, if you prefer. If you perform them in the global zone, you must change the ZFS file system names.

Create

The only complicated action is the creation of a BE. In the Solaris 10 BZ, create a new "boot environment" - a ZFS clone. You can assign any name to the final portion of the clone's name, as long as it meets the requirements for a ZFS file system name.

s10z# zfs snapshot rpool/ROOT/zbe-0@snap
s10z# zfs clone -o mountpoint=/ -o canmount=noauto rpool/ROOT/zbe-0@snap rpool/ROOT/newBE
cannot mount 'rpool/ROOT/newBE' on '/': directory is not empty
filesystem successfully created, but not mounted
You can safely ignore that message: we already know that / is not empty! We have merely told ZFS that the default mountpoint for the clone is the root directory.

(Note that a Solaris 10 BZ that has a separate /var file system requires additional steps. See the MOS document mentioned at the bottom of this blog entry.)

List the available BEs and active BE

Because each BE is represented by a clone of the rpool/ROOT dataset, listing the BEs is as simple as listing the clones.

s10z# zfs list -r rpool/ROOT
NAME               USED  AVAIL  REFER  MOUNTPOINT
rpool/ROOT        3.55G  42.9G    31K  legacy
rpool/ROOT/zbe-0     1K  42.9G  3.55G  /
rpool/ROOT/newBE  3.55G  42.9G  3.55G  /
The output shows that two BEs exist. Their names are "zbe-0" and "newBE".

You can tell Solaris that one particular BE should be used when the zone next boots by using a ZFS property. Its name is com.oracle.zones.solaris10:activebe. The value of that property is the name of the clone that contains the BE that should be booted.

s10z# zfs get com.oracle.zones.solaris10:activebe rpool/ROOT
NAME        PROPERTY                             VALUE  SOURCE
rpool/ROOT  com.oracle.zones.solaris10:activebe  zbe-0  local

Change the active BE

When you want to change the BE that will be booted next time, you can just change the activebe property on the rpool/ROOT dataset.

s10z# zfs get com.oracle.zones.solaris10:activebe rpool/ROOT
NAME        PROPERTY                             VALUE  SOURCE
rpool/ROOT  com.oracle.zones.solaris10:activebe  zbe-0  local
s10z# zfs set com.oracle.zones.solaris10:activebe=newBE rpool/ROOT
s10z# zfs get com.oracle.zones.solaris10:activebe rpool/ROOT
NAME        PROPERTY                             VALUE  SOURCE
rpool/ROOT  com.oracle.zones.solaris10:activebe  newBE  local
s10z# shutdown -y -g0 -i6
After the zone has rebooted:
s10z# zfs get com.oracle.zones.solaris10:activebe rpool/ROOT
rpool/ROOT  com.oracle.zones.solaris10:activebe  newBE  local
s10z# zfs mount
rpool/ROOT/newBE                /
rpool/export                    /export
rpool/export/home               /export/home
rpool                           /rpool
Mount the original BE to see that it's still there.
s10z# zfs mount -o mountpoint=/mnt rpool/ROOT/zbe-0
s10z# ls /mnt
Desktop                         export                          platform
Documents                       export.backup.20130607T214951Z  proc
S10Flar                         home                            rpool
TT_DB                           kernel                          sbin
bin                             lib                             system
boot                            lost+found                      tmp
cdrom                           mnt                             usr
dev                             net                             var
etc                             opt

Patch an inactive BE

At this point, you can modify the original BE. If you would prefer to modify the new BE, you can restore the original value to the activebe property and reboot, and then mount the new BE to /mnt (or another empty directory) and modify it.

Let's mount the original BE so we can modify it. (The first command is only needed if you haven't already mounted that BE.)

s10z# zfs mount -o mountpoint=/mnt rpool/ROOT/zbe-0
s10z# patchadd -R /mnt -M /var/sadm/spool 104945-02
Note that the typical usage will be:
  1. Create a BE
  2. Mount the new (inactive) BE
  3. Use the package and patch tools to update the new BE
  4. Unmount the new BE
  5. Reboot

Delete an inactive BE

ZFS clones are children of their parent file systems. In order to destroy the parent, you must first "promote" the child. This reverses the parent-child relationship. (For more information on this, see the documentation.)

The original rpool/ROOT file system is the parent of the clones that you create as BEs. In order to destroy an earlier BE that is that parent of other BEs, you must first promote one of the child BEs to be the ZFS parent. Only then can you destroy the original BE.

Fortunately, this is easier to do than to explain:

s10z# zfs promote rpool/ROOT/newBE 
s10z# zfs destroy rpool/ROOT/zbe-0
s10z# zfs list -r rpool/ROOT
NAME               USED  AVAIL  REFER  MOUNTPOINT
rpool/ROOT        3.56G   269G    31K  legacy
rpool/ROOT/newBE  3.56G   269G  3.55G  /

Documentation

This feature is so new, it is not yet described in the Solaris 11 documentation. However, MOS note 1558773.1 offers some details.

Conclusion

With this new feature, you can add and patch packages to boot environments of a Solaris 10 Branded Zone. This ability improves the manageability of these zones, and makes their use more practical. It also means that you can use the existing P2V tools with earlier Solaris 10 updates, and modify the environments after they become Solaris 10 Branded Zones.

Wednesday Jun 12, 2013

Comparing Solaris 11 Zones to Solaris 10 Zones

Many people have asked whether Oracle Solaris 11 uses sparse-root zones or whole-root zones. I think the best answer is "both and neither, and more" - but that's a wee bit confusing. :-) This blog entry attempts to explain that answer.

First a recap: Solaris 10 introduced the Solaris Zones feature set, way back in 2005. Zones are a form of server virtualization called "OS (Operating System) Virtualization." They improve consolidation ratios by isolating processes from each other so that they cannot interact. Each zone has its own set of users, naming services, and other software components. One of the many advantages is that there is no need for a hypervisor, so there is no performance overhead. Many data centers run tens to hundreds of zones per server!

In Solaris 10, there are two models of package deployment for Solaris Zones. One model is called "sparse-root" and the other "whole-root." Each form has specific characteristics, abilities, and limitations.

A whole-root zone has its own copy of the Solaris packages. This allows the inclusion of other software in system directories - even though that practice has been discouraged for many years. Although it is also possible to modify the Solaris content in such a zone, e.g. patching a zone separately from the rest, this was highly frowned on. :-( (More importantly, modifying the Solaris content in a whole-root zone may lead to an unsupported configuration.)

The other model is called "sparse-root." In that form, instead of copying all of the Solaris packages into the zone, the directories containing Solaris binaries are re-mounted into the zone. This allows the zone's users to access them at their normal places in the directory tree. Those are read-only mounts, so a zone's root user cannot modify them. This improves security, and also reduces the amount of disk space used by the zone - 200MB instead of the usual 3-5 GB per zone. These loopback mounts also reduce the amount of RAM used by zones because Solaris only stores in RAM one copy of a program that is in use by several zones. This model also has disadvantages. One disadvantage is the inability to add software into system directories such as /usr. Also, although a sparse-root can be migrated to another Solaris 10 system, it cannot be moved to a Solaris 11 system as a "Solaris 10 Zone."

In addition to those contrasting characteristics, here are some characteristics of zones in Solaris 10 that are shared by both packaging models:

  • A zone can modify its own configuration files in /etc.
  • A zone can be configured so that it manages its own networking, or so that it cannot modify its network configuration.
  • It is difficult to give a non-root user in the global zone the ability to boot and stop a zone, without giving that user other abilities.
  • In a zone that can manage its own networking, the root user can do harmful things like spoof other IP addresses and MAC addresses.
  • It is difficult to assign network patcket processing to the same CPUs that a zone used. This could lead to unpredictable performance and performance troubleshooting challenges.
  • You cannot run a large number of zones in one system (e.g. 50) that each managed its own networking, because that would require assignment of more physical NICs than available (e.g. 50).
  • Except when managed by Ops Center, zones could not be safely stored on NAS.
  • Solaris 10 Zones cannot be NFS servers.
  • The fsstat command does not report statistics per zone.

Solaris 11 Zones use the new packaging system of Solaris 11. Their configuration does not offer a choice of packaging models, as Solaris 10 does. Instead, two (well, four) different models of "immutability" (changeability) are offered. The default model allows a privileged zone user to modify the zone's content. The other (three) limit the content which can be changed: none, or two overlapping sets of configuration files. (See "Configuring and Administering Immutable Zones".)

Solaris 11 addresses many of those limitations. With the characteristics listed above in mind, the following table shows the similarities and differences between zones in Solaris 10 and in Solaris 11. (Cells in a row that are similar have the same background color.)

Characteristic Solaris 10
Whole-Root
Solaris 10
Sparse-Root
Solaris 11 Solaris 11
Immutable Zones
Each zone has a copy of most Solaris packagesYesNo YesYes
Disk space used by a zone (typical)3.5 GB100 MB 500MB500MB
A privileged zone user can add software to /usrYesNo YesNo
A zone can modify its Solaris programsTrueFalse TrueFalse
Each zone can modify its configuration filesYesYes YesNo
Delegated administrationNoNo YesYes
A zone can be configured to manage its own networkingYesYes YesYes
A zone can be configured so that it cannot manage its own networkingYesYes YesYes
A zone can be configured with resource controlsYesYes YesYes
Integrated tool to measure a zone's resource consumption (zonestat)NoNo YesYes
Network processing automatically happens on that zone's CPUsNoNo YesYes
Zones can be NFS serversNoNoYesYes
Per-zone fsstat dataNoNoYesYes

As you can see, the statement "Solaris 11 Zones are whole-root zones" is only true using the narrowest definition of whole-root zones: those zones which have their own copy of Solaris packaging content. But there are other valuable characteristics of sparse-root zones that are still available in Solaris 11 Zones. Also, some Solaris 11 Zones do not have some characteristics of whole-root zones.

For example, the table above shows that you can configure a Solaris 11 zone that has read-only Solaris content. And Solaris 11 takes that concept further, offering the ability to tailor that immutability. It also shows that Solaris 10 sparse-root and whole-root zones are more similar to each other than to Solaris 11 Zones.

Conclusion

Solaris 11 Zones are slightly different from Solaris 10 Zones. The former can achieve the goals of the latter, and they also offer features not found in Solaris 10 Zones. Solaris 11 Zones offer the best of Solaris 10 whole-root zones and sparse-root zones, and offer an array of new features that make Zones even more flexible and powerful.

Tuesday Feb 12, 2013

Solaris 10 1/13 (aka "Update 11") Released

Larry Wake lets us know that Solaris 10 1/13 has been released and is available for download.

Tuesday Feb 10, 2009

Zones to the Rescue

Recently, Thomson Reuters "demonstrated that RMDS [Reuters Marked Data Systems software] performs better in a virtualized environment with Solaris Containers than it does with a number of individual Sun server machines."

This enabled Thomson Reuters to break the "million-messages-per-second barrier."

The performance improvement is probably due to the extremely high bandwidth, low latency characteristics of inter-Container network communications. Because all inter-Container network traffic is accomplished with memory transfers - using default settings - packets 'move' at computer memory speeds, which are much better than common 100Mbps or 1Gbps ethernet bandwidth. Further, that network performance is much more consistent without extra hardware - switches and routers - that can contribute to latency.

Articles can be found at: http://finance.yahoo.com/news/Sun-Microsystems-and-Thomson-bw-14306924.html

Wednesday Apr 09, 2008

Effortless Upgrade: Solaris 8 System to Solaris 8 Container

[Update 23 Aug 2008: Solaris 9 Containers was released a few months back. Reply to John's comment: yes, the steps to use Solaris 9 Containers are the same as the steps for Solaris 8 Containers]

Since 2005, Solaris 10 has offered the Solaris Containers feature set, creating isolated virtual Solaris environments for Solaris 10 applications. Although almost all Solaris 8 applications run unmodified in Solaris 10 Containers, sometimes it would be better to just move an entire Solaris 8 system - all of its directories and files, configuration information, etc. - into a Solaris 10 Container. This has become very easy - just three commands.

Sun offers a Solaris Binary Compatibility Guarantee which demonstrates the significant effort that Sun invests in maintaining compatibility from one Solaris version to the next. Because of that effort, almost all applications written for Solaris 8 run unmodified on Solaris 10, either in a Solaris 10 Container or in the Solaris 10 global zone.

However, there are still some data centers with many Solaris 8 systems. In some situations it is not practical to re-test all of those applications on Solaris 10. It would be much easier to just move the entire contents of the Solaris 8 file systems into a Solaris Container and consolidate many Solaris 8 systems into a much smaller number of Solaris 10 systems.

For those types of situations, and some others, Sun now offers Solaris 8 Containers. These use the "Branded Zones" framework available in OpenSolaris and first released in Solaris 10 in August 2007. A Solaris 8 Container provides an isolated environment in which Solaris 8 binaries - applications and libraries - can run without modification. To a user logged in to the Container, or to an application running in the Container, there is very little evidence that this is not a Solaris 8 system.

The Solaris 8 Container technology rests on a very thin layer of software which performs system call translations - from Solaris 8 system calls to Solaris 10 system calls. This is not binary emulation, and the number of system calls with any difference is small, so the performance penalty is extremely small - typically less than 3%.

Not only is this technology efficient, it's very easy to use. There are five steps, but two of them can be combined into one:

  1. install Solaris 8 Containers packages on the Solaris 10 system
  2. patch systems if necessary
  3. archive the contents of the Solaris 8 system
  4. move the archive to the Solaris 10 system (if step 3 placed the archive on a file system accessible to the Solaris 10 system, e.g. via NFS, this step is unnecessary)
  5. configure and install the Solaris 8 Container, using the archive
The rest of this blog entry is a demonstration of Solaris 8 Containers, using a Sun Ultra 60 workstation for the Solaris 8 system and a Logical Domain on a Sun Fire T2000 for the Solaris 10 system. I chose those two only because they were available to me. Any two SPARC systems could be used as long as one can run Solaris 8 and one can run Solaris 10.

Almost any Solaris 8 revision or patch level will work, but Sun strongly recommends applying the most recent patches to that system. The Solaris 10 system must be running Solaris 10 8/07, and requires the following minimum patch levels:

  • Kernel patch: 127111-08 (-11 is now available) which needs
    • 125369-13
    • 125476-02
  • Solaris 8 Branded Zones patch: 128548-05
The first step is installation of the Solaris 8 Containers packages. You can download the Solaris 8 Containers software packages from http://sun.com/download. Installing those packages on the Solaris 10 system is easy and takes 5-10 seconds:
s10-system# pkgadd -d . SUNWs8brandr  SUNWs8brandu  SUNWs8p2v
Now we can patch the Solaris 10 system, using the patches listed above.

After patches have been applied, it's time to archive the Solaris 8 system. In order to remove the "archive transfer" step I'll turn the Solaris 10 system into an NFS server and mount it on the Solaris 8 system. The archive can be created by the Solaris 8 system, but stored on the Solaris 10 system. There are several tools which can be used to create the archive: Solaris flash archive tools, cpio, pax, etc. In this example I used flarcreate, which first became available on Solaris 8 2/04.

s10-system# share /export/home/s8-archives
s8-system# mount s10-system:/export/home/s8-archives /mnt
s8-system# flarcreate -S -n atl-sewr-s8 /mnt/atl-sewr-s8.flar
Creation of the archive takes longer than any other step - 15 minutes to an hour, or even more, depending on the size of the Solaris 8 file systems.

With the archive in place, we can configure and install the Solaris 8 Container. In this demonstration the Container was "sys-unconfig'd" by using the -u option. The opposite of that is -p, which preserves the system configuration information of the Solaris 8 system.

s10-system# zonecfg -z test8
zonecfg:test8> create -t SUNWsolaris8
zonecfg:test8> set zonepath=/zones/roots/test8
zonecfg:test8> add net
zonecfg:test8:net> set address=129.152.2.81
zonecfg:test8:net> set physical=vnet0
zonecfg:test8:net> end
zonecfg:test8> exit
s10-system# zoneadm -z test8 install -u -a /export/home/s8-archives/atl-sewr-s8.flar
              Log File: /var/tmp/test8.install.995.log
            Source: /export/home/s8-archives/atl-sewr-s8.flar
        Installing: This may take several minutes...
    Postprocessing: This may take several minutes...

            Result: Installation completed successfully.
          Log File: /zones/roots/test8/root/var/log/test8.install.995.log
This step should take 5-10 minutes. After the Container has been installed, it can be booted.
s10-system# zoneadm -z test8 boot
s10-system# zlogin -C test8
At this point I was connected to the Container's console. It asked the usual system configuration questions, and then rebooted:
[NOTICE: Zone rebooting]

SunOS Release 5.8 Version Generic_Virtual 64-bit
Copyright 1983-2000 Sun Microsystems, Inc.  All rights reserved

Hostname: test8
The system is coming up.  Please wait.
starting rpc services: rpcbind done.
syslog service starting.
Print services started.
Apr  1 18:07:23 test8 sendmail[3344]: My unqualified host name (test8) unknown; sleeping for retry
The system is ready.

test8 console login: root
Password:
Apr  1 18:08:04 test8 login: ROOT LOGIN /dev/console
Last login: Tue Apr  1 10:47:56 from vpn-129-150-80-
Sun Microsystems Inc.   SunOS 5.8       Generic Patch   February 2004
# bash
bash-2.03# psrinfo
0       on-line   since 04/01/2008 03:56:38
1       on-line   since 04/01/2008 03:56:38
2       on-line   since 04/01/2008 03:56:38
3       on-line   since 04/01/2008 03:56:38
bash-2.03# ifconfig -a
lo0:1: flags=1000849 mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
vnet0:1: flags=1000843 mtu 1500 index 2
        inet 129.152.2.81 netmask ffffff00 broadcast 129.152.2.255

At this point the Solaris 8 Container exists. It's accessible on the local network, existing applications can be run in it, or new software can be added to it, or existing software can be patched.

To extend the example, here is the output from the commands I used to limit this Solaris 8 Container to only use a subset of the 32 virtual CPUs on that Sun Fire T2000 system.

s10-system# zonecfg -z test8
zonecfg:test8> add dedicated-cpu
zonecfg:test8:dedicated-cpu> set ncpus=2
zonecfg:test8:dedicated-cpu> end
zonecfg:test8> exit
bash-3.00# zoneadm -z test8 reboot
bash-3.00# zlogin -C test8

Console:
[NOTICE: Zone rebooting]

SunOS Release 5.8 Version Generic_Virtual 64-bit
Copyright 1983-2000 Sun Microsystems, Inc.  All rights reserved

Hostname: test8
The system is coming up.  Please wait.
starting rpc services: rpcbind done.
syslog service starting.
Print services started.
Apr  1 18:14:53 test8 sendmail[3733]: My unqualified host name (test8) unknown; sleeping for retry
The system is ready.
test8 console login: root
Password:
Apr  1 18:15:24 test8 login: ROOT LOGIN /dev/console
Last login: Tue Apr  1 18:08:04 on console
Sun Microsystems Inc.   SunOS 5.8       Generic Patch   February 2004
# psrinfo
0       on-line   since 04/01/2008 03:56:38
1       on-line   since 04/01/2008 03:56:38

Finally, to learn more about Solaris 8 Containers: For those who were counting, the "three commands" were, at a minimum, flarcreate, zonecfg and zoneadm.

Tuesday Apr 08, 2008

ZoiT: Solaris Zones on iSCSI Targets (aka NAC: Network-Attached Containers)

Introduction

Solaris Containers have a 'zonepath' ('home') which can be a directory on the root file system or on a non-root file system. Until Solaris 10 8/07 was released, a local file system was required for this directory. Containers that are on non-root file systems have used UFS, ZFS, or VxFS. All of those are local file systems - putting Containers on NAS has not been possible. With Solaris 10 8/07, that has changed: a Container can now be placed on remote storage via iSCSI.

Background

Solaris Containers (aka Zones) are Sun's operating system level virtualization technology. They allow a Solaris system (more accurately, an 'instance' of Solaris) to have multiple, independent, isolated application environments. A program running in a Container cannot detect or interact with a process in another Container.

Each Container has its own root directory. Although viewed as the root directory from within that Container, that directory is also a non-root directory in the global zone. For example, a Container's root directory might be called /zones/roots/myzone/root in the global zone.

The configuration of a Container includes something called its "zonepath." This is the directory which contains a Container's root directory (e.g. /zones/roots/myzone/root) and other directories used by Solaris. Therefore, the zonepath of myzone in the example above would be /zones/roots/myzone.

The global zone administrator can choose any directory to be a Container's zonepath. That directory could just be a directory on the root partition of Solaris, though in that case some mechanism should be used to prevent that Container from filling up the root partition. Another alternative is to use a separate partition for that Container, or one shared among multiple Containers. In the latter case, a quota should be used for each Container.

Local file systems have been used for zonepaths. However, many people have strongly expressed a desire for the ability to put Containers on remote storage. One significant advantage to placing Containers on NAS is the simplification of Container migration - moving a Container from one system to another. When using a local file system, the contents of the Container must be transmitted from the original host to the new host. For small, sparse zones this can take as little as a few seconds. For large, whole-root zones, this can take several minutes - a whole-root zone is an entire copy of Solaris, taking up as much as 3-5 GB. If remote storage can be used to store a zone, the zone's downtime can be as little as a second or two, during which time a file system is unmounted on one system and mounted on another.

Here are some significant advantages to iSCSI over SANs:

  1. the ability to use commodity Ethernet switching gear, which tends to be less expensive than SAN switching equipment
  2. the ability to manage storage bandwidth via standard, mature, commonly used IP QoS features
  3. iSCSI networks can be combined with non-iSCSI IP networks to reduce the hardware investment and consolidate network management. If that is not appropriate, the two networks can be separate but use the same type of equipment, reducing costs and types of in-house infrastrucuture management expertise.

Unfortunately, a Container cannot 'live' on an NFS server, and it's not clear if or when that limitation will be removed.

iSCSI Basics

iSCSI is simply "SCSI communication over IP." In this case, SCSI commands and responses are sent between two iSCSI-capable devices, which can be general-purpose computers (Solaris, Windows, Linux, etc.) or specific-purpose storage devices (e.g. Sun StorageTek 5210 NAS, EMC Celerra NS40, etc.). There are two endpoints to iSCSI communications: the initiator (client) and the target (server). A target publicizes its existence. An initiator binds to a target.

The industry's design for iSCSI includes a large number of features, including security. Solaris implements many of those features. Details can be found:

In Solaris, the command iscsiadm(1M) configures an initiator, and the command iscsitadm(1M) configures a target.

Steps

This section demonstrates the installation of a Container onto a remote file system that uses iSCSI for its transport.

The target system is an LDom on a T2000, and looks like this:

System Configuration:  Sun Microsystems  sun4v
Memory size: 1024 Megabytes
SUNW,Sun-Fire-T200
SunOS ldg1 5.10 Generic_127111-07 sun4v sparc SUNW,Sun-Fire-T200
Solaris 10 8/07 s10s_u4wos_12b SPARC
The initiator system is another LDom on the same T2000 - although there is no requirement that LDoms are used, or that they be on the same computer if they are used.
System Configuration:  Sun Microsystems  sun4v
Memory size: 896 Megabytes
SUNW,Sun-Fire-T200
SunOS ldg4 5.11 snv_83 sun4v sparc SUNW,Sun-Fire-T200
Solaris Nevada snv_83a SPARC
The first configuration step is the creation of the storage underlying the iSCSI target. Although UFS could be used, let's improve the robustness of the Container's contents and put the target's storage under control of ZFS. I don't have extra disk devices to give to ZFS, so I'll make some and use them for a zpool - in real life you would use disk devices here:
Target# mkfile 150m /export/home/disk0
Target# mkfile 150m /export/home/disk1
Target# zpool create myscsi mirror /export/home/disk0 /export/home/disk1
Target# zpool status
  pool: myscsi
 state: ONLINE
 scrub: none requested
config:

        NAME                  STATE     READ WRITE CKSUM
        myscsi                ONLINE       0     0     0
          /export/home/disk0  ONLINE       0     0     0
          /export/home/disk1  ONLINE       0     0     0
Now I can create a zvol - an emulation of a disk device:
Target# zfs list
NAME    USED  AVAIL  REFER  MOUNTPOINT
myscsi   86K   258M  24.5K  /myscsi
Target# zfs create -V 200m myscsi/jvol0
Target# zfs list
NAME           USED  AVAIL  REFER  MOUNTPOINT
myscsi         200M  57.9M  24.5K  /myscsi
myscsi/jvol0  22.5K   258M  22.5K  -
Creating an iSCSI target device from a zvol is easy:
Target# iscsitadm list target
Target# zfs set shareiscsi=on myscsi/jvol0
Target# iscsitadm list target
Target: myscsi/jvol0
    iSCSI Name: iqn.1986-03.com.sun:02:c8a82272-b354-c913-80f9-db9cb378a6f6
    Connections: 0
Target# iscsitadm list target -v
Target: myscsi/jvol0
    iSCSI Name: iqn.1986-03.com.sun:02:c8a82272-b354-c913-80f9-db9cb378a6f6
    Alias: myscsi/jvol0
    Connections: 0
    ACL list:
    TPGT list:
    LUN information:
        LUN: 0
            GUID: 0x0
            VID: SUN
            PID: SOLARIS
            Type: disk
            Size:  200M
            Backing store: /dev/zvol/rdsk/myscsi/jvol0
            Status: online

Configuring the iSCSI initiator takes a little more work. There are three methods to find targets. I will use a simple one. After telling Solaris to use that method, it only needs to know what the IP address of the target is.

Note that the example below uses "iscsiadm list ..." several times, without any output. The purpose is to show the difference in output before and after the command(s) between them.

First let's look at the disks available before configuring iSCSI on the initiator:

Initiator# ls /dev/dsk
c0d0s0  c0d0s2  c0d0s4  c0d0s6  c0d1s0  c0d1s2  c0d1s4  c0d1s6
c0d0s1  c0d0s3  c0d0s5  c0d0s7  c0d1s1  c0d1s3  c0d1s5  c0d1s7
We can view the currently enabled discovery methods, and enable the one we want to use:
Initiator# iscsiadm list discovery
Discovery:
        Static: disabled
        Send Targets: disabled
        iSNS: disabled
Initiator# iscsiadm list target
Initiator# iscsiadm modify discovery --sendtargets enable
Initiator# iscsiadm list discovery
Discovery:
        Static: disabled
        Send Targets: enabled
        iSNS: disabled
At this point we just need to tell Solaris which IP address we want to use as a target. It takes care of all the details, finding all disk targets on the target system. In this case, there is only one disk target.
Initiator# iscsiadm list target
Initiator# iscsiadm add discovery-address 129.152.2.90
Initiator# iscsiadm list target
Target: iqn.1986-03.com.sun:02:c8a82272-b354-c913-80f9-db9cb378a6f6
        Alias: myscsi/jvol0
        TPGT: 1
        ISID: 4000002a0000
        Connections: 1
Initiator# iscsiadm list target -v
Target: iqn.1986-03.com.sun:02:c8a82272-b354-c913-80f9-db9cb378a6f6
        Alias: myscsi/jvol0
        TPGT: 1
        ISID: 4000002a0000
        Connections: 1
                CID: 0
                  IP address (Local): 129.152.2.75:40253
                  IP address (Peer): 129.152.2.90:3260
                  Discovery Method: SendTargets
                  Login Parameters (Negotiated):
                        Data Sequence In Order: yes
                        Data PDU In Order: yes
                        Default Time To Retain: 20
                        Default Time To Wait: 2
                        Error Recovery Level: 0
                        First Burst Length: 65536
                        Immediate Data: yes
                        Initial Ready To Transfer (R2T): yes
                        Max Burst Length: 262144
                        Max Outstanding R2T: 1
                        Max Receive Data Segment Length: 8192
                        Max Connections: 1
                        Header Digest: NONE
                        Data Digest: NONE
The initiator automatically finds the iSCSI remote storage, but we need to turn this into a disk device. (Newer builds seem to not need this step, but it won't hurt. Looking in /devices/iscsi will help determine whether it's needed.)
Initiator# devfsadm -i iscsi
Initiator# ls /dev/dsk
c0d0s0    c0d0s3    c0d0s6    c0d1s1    c0d1s4    c0d1s7    c1t7d0s2  c1t7d0s5
c0d0s1    c0d0s4    c0d0s7    c0d1s2    c0d1s5    c1t7d0s0  c1t7d0s3  c1t7d0s6
c0d0s2    c0d0s5    c0d1s0    c0d1s3    c0d1s6    c1t7d0s1  c1t7d0s4  c1t7d0s7
Initiator# ls -l /dev/dsk/c1t7d0s0
lrwxrwxrwx   1 root     root         100 Mar 28 00:40 /dev/dsk/c1t7d0s0 ->
../../devices/iscsi/disk@0000iqn.1986-03.com.sun%3A02%3Ac8a82272-b354-c913-80f9-db9cb378a6f60001,0:a

Now that the local device entry exists, we can do something useful with it. Installing a new file system requires the use of format(1M) to partition the "disk" but it is assumed that the reader knows how to do that. However, here is the first part of the format dialogue, to show that format lists the new disk device with its unique identifier - the same identifier listed in /devices/iscsi.
Initiator# format
Searching for disks...done

c1t7d0: configured with capacity of 199.98MB

AVAILABLE DISK SELECTIONS:
       0. c0d0 
          /virtual-devices@100/channel-devices@200/disk@0
       1. c0d1 
          /virtual-devices@100/channel-devices@200/disk@1
       2. c1t7d0 
          /iscsi/disk@0000iqn.1986-03.com.sun%3A02%3Ac8a82272-b354-c913-80f9-db9cb378a6f60001,0
Specify disk (enter its number): 2
selecting c1t7d0
[disk formatted]
Disk not labeled.  Label it now? no

Let's jump to the end of the partitioning steps, after assigning all of the available disk space to partition 0:
partition> print
Current partition table (unnamed):
Total disk cylinders available: 16382 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders        Size            Blocks
  0       root    wm       0 - 16381      199.98MB    (16382/0/0) 409550
  1 unassigned    wu       0                0         (0/0/0)          0
  2     backup    wu       0 - 16381      199.98MB    (16382/0/0) 409550
  3 unassigned    wm       0                0         (0/0/0)          0
  4 unassigned    wm       0                0         (0/0/0)          0
  5 unassigned    wm       0                0         (0/0/0)          0
  6 unassigned    wm       0                0         (0/0/0)          0
  7 unassigned    wm       0                0         (0/0/0)          0

partition> label
Ready to label disk, continue? y

The new raw disk needs a file system.
Initiator# newfs /dev/rdsk/c1t7d0s0
newfs: construct a new file system /dev/rdsk/c1t7d0s0: (y/n)? y
/dev/rdsk/c1t7d0s0:     409550 sectors in 16382 cylinders of 5 tracks, 5 sectors
        200.0MB in 1024 cyl groups (16 c/g, 0.20MB/g, 128 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
 32, 448, 864, 1280, 1696, 2112, 2528, 2944, 3232, 3648,
Initializing cylinder groups:
....................
super-block backups for last 10 cylinder groups at:
 405728, 406144, 406432, 406848, 407264, 407680, 408096, 408512, 408928, 409344

Back on the target:
Target# zfs list
NAME           USED  AVAIL  REFER  MOUNTPOINT
myscsi         200M  57.9M  24.5K  /myscsi
myscsi/jvol0  32.7M   225M  32.7M  -
Finally, the initiator has a new file system, on which we can install a zone.
Initiator# mkdir /zones/newroots
Initiator# mount /dev/dsk/c1t7d0s0 /zones/newroots
Initiator# zonecfg -z iscuzone
iscuzone: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:iscuzone> create
zonecfg:iscuzone> set zonepath=/zones/newroots/iscuzone
zonecfg:iscuzone> add inherit-pkg-dir
zonecfg:iscuzone:inherit-pkg-dir> set dir=/opt
zonecfg:iscuzone:inherit-pkg-dir> end
zonecfg:iscuzone> exit
Initiator# zoneadm -z iscuzone install
Preparing to install zone .
Creating list of files to copy from the global zone.
Copying <2762> files to the zone.
Initializing zone product registry.
Determining zone package initialization order.
Preparing to initialize <1162> packages on the zone.
...
Initialized <1162> packages on zone.
Zone  is initialized.
Installation of these packages generated warnings: 
The file  contains a log of the zone installation.

There it is: a Container on an iSCSI target on a ZFS zvol.

Zone Lifecycle, and Tech Support

There is more to management of Containers than creating them. When a Solaris instance is upgraded, all of its native Containers are upgraded as well. Some upgrade methods work better with certain system configurations than others. This is true for UFS, ZFS, other local file system types, and iSCSI targets that use any of them for underlying storage.

You can use Solaris Live Upgrade to patch or upgrade a system with Containers. If the Containers are on a traditional file system which uses UFS (e.g. /, /export/home) LU will automatically do the right thing. Further, if you create a UFS file system on an iSCSI target and install one or more Containers on it, the ABE will also need file space for its copy of those Containers. To mimic the layout of the original BE you could use another UFS file system on another iSCSI target. The lucreate command would look something like this:

# lucreate -m /:/dev/dsk/c0t0d0s0:ufs   -m /zones:/dev/dsk/c1t7d0s0:ufs -n newBE

Conclusion

If you want to put your Solaris Containers on NAS storage, Solaris 10 8/07 will help you get there, using iSCSI.

Wednesday Sep 05, 2007

New Zones Features

On September 4, 2007, Solaris 10 8/07 became available. You can download it - free of charge - at: http://www.sun.com/software/solaris/. Just click on "Get Software" and follow the instructions.

This update to Solaris 10 has many new features. Of those, many enhance Solaris Containers either directly or indirectly. This update brings the most important changes to Containers since they were introduced in March of 2005. A brief introduction to them seems appropriate, but first a review of the previous update.

Solaris 10 11/06 added four features to Containers. One of them is called "configurable privileges" and allows the platform administrator to tailor the abilities of a Container to the needs of its application. I blogged about configurable privileges before, so I won't say any more here.

At least as important as that feature was the new ability to move (also called 'migrate') a Container from one Solaris 10 computer to another. This uses the 'detach' and 'attach' sub-commands to zoneadm(1M).

Other, minor new features, included:

  • rename a zone (i.e. Container)
  • move a zone to a different place in the file system on the same computer

New Features in Solaris 10 8/07 that Enhance Containers

New Resource Management Features

Solaris 10 8/07 has improved the resource management features of Containers. Some of these are new resource management features and some are improvements to the user interface. First I will describe three new "RM" features.

Earlier releases of Solaris 10 included the Resource Capping Daemon. This tool enabled you to place a 'soft cap' on the amount of RAM (physical memory) that an application, user or group of users could use. Excess usage would be detected by rcapd. When it did, physical memory pages owned by that entity would be paged out until the memory usage decreased below the cap.

Although it was possible to apply this tool to a zone, it was cumbersome and required cooperation from the administrator of the Container. In other words, the root user of a capped Container could change the cap. This made it inappropriate for potentially hostile environments, including service providers.

Solaris 10 8/07 enables the platform administrator to set a physical memory cap on a Container using an enhanced version of rcapd. Cooperation of the Container's administrator is not necessary - only the platform administrator can enable or disable this service or modify the caps. Further, usage has been greatly simplified to the following syntax:

global# zonecfg -z myzone
zonecfg:myzone> add capped-memory
zonecfg:myzone:capped-memory> set physical=500m
zonecfg:myzone:capped-memory> end
zonecfg:myzone> exit
The next time the Container boots, this cap (500MB of RAM) will be applied to it. The cap can be also be modified while the Container is running, with:
global# rcapadm -z myzone -m 600m
Because this cap does not reserve RAM, you can over-subscribe RAM usage. The only drawback is the possibility of paging.

For more details, see the online documentation.

Virtual memory (i.e. swap space) can also be capped. This is a 'hard cap.' In a Container which has a swap cap, an attempt by a process to allocate more VM than is allowed will fail. (If you are familiar with system calls: malloc() will fail with ENOMEM.)

The syntax is very similar to the physical memory cap:

global# zonecfg -z myzone
zonecfg:myzone> add capped-memory
zonecfg:myzone:capped-memory> set swap=1g
zonecfg:myzone:capped-memory> end
zonecfg:myzone> exit
This limit can also be changed for a running Container:
global# prctl -n zone.max-swap -v 2g -t privileged   -r -e deny -i zone myzone
Just as with the physical memory cap, if you want to change the setting for a running Container and for the next time it boots, you must use zonecfg and prctl or rcapadm.

The third new memory cap is locked memory. This is the amount of physical memory that a Container can lock down, i.e. prevent from being paged out. By default a Container now has the proc_lock_memory privilege, so it is wise to set this cap for all Containers.

Here is an example:

global# zonecfg -z myzone
zonecfg:myzone> add capped-memory
zonecfg:myzone:capped-memory> set locked=100m
zonecfg:myzone:capped-memory> end
zonecfg:myzone> exit

Simplified Resource Management Features

Dedicated CPUs

Many existing resource management features have a new, simplified user interface. For example, "dedicated-cpus" re-use the existing Dynamic Resource Pools features. But instead of needing many commands to configure them, configuration can be as simple as:

global# zonecfg -z myzone
zonecfg:myzone> add dedicated-cpu
zonecfg:myzone:dedicated-cpu> set ncpus=1-3
zonecfg:myzone:dedicated-cpu> end
zonecfg:myzone> exit
After using that command, when that Container boots, Solaris:
  1. removes a CPU from the default pool
  2. assigns that CPU to a newly created temporary pool
  3. associates that Container with that pool, i.e. only schedules that Container's processes on that CPU
Further, if the load on that CPU exceeds a default threshold and another CPU can be moved from another pool, Solaris will do that, up to the maximum configured amount of three CPUs. Finally, when the Container is stopped, the temporary pool is destroyed and its CPU(s) are placed back in the default pool.

Also, three existing project resource controls were applied to Containers:

global# zonecfg -z myzone
zonecfg:myzone> set max-shm-memory=100m
zonecfg:myzone> set max-shm-ids=100
zonecfg:myzone> set max-msg-ids=100
zonecfg:myzone> set max-sem-ids=100
zonecfg:myzone> exit
Fair Share Scheduler

A commonly used method to prevent "CPU hogs" from impacting other workloads is to assign a number of CPU shares to each workload, or to each zone. The relative number of shares assigned per zone guarantees a relative minimum amount of CPU power. This is less wasteful than dedicating a CPU to a Container that will not completely utilize the dedicated CPU(s).

Several steps were needed to configure this in the past. Solaris 10 8/07 simplifies this greatly: now just two steps are needed. The system must use FSS as the default scheduler. This command tells the system to use FSS as the default scheduler the next time it boots.

global# dispadmin -d FSS
Also, the Container must be assigned some shares:
global# zonecfg -z myzone
zonecfg:myzone> set cpu-shares=100
zonecfg:myzone> exit
Shared Memory Accounting

One feature simplification is not a reduced number of commands, but reduced complexity in resource monitoring. Prior to Solaris 10 8/07, the accounting of shared memory pages had an unfortunate subtlety. If two processes in a Container shared some memory, per-Container summaries counted the shared memory usage once for every process that was sharing the memory. It would appear that a Container was using more memory than it really was.

This was changed in 8/07. Now, in the per-Container usage section of prstat and similar tools, shared memory pages are only counted once per Container.

Global Zone Resource Management

Solaris 10 8/07 adds the ability to persistently assign resource controls to the global zone and its processes. These controls can be applied:
  • pool
  • cpu-shares
  • capped-memory: physical, swap, locked
  • dedicated-cpu: ncpus, importance
Example:
global# zonecfg -z global
zonecfg:myzone> set cpu-shares=100
zonecfg:myzone> set scheduling-class=FSS
zonecfg:myzone> exit
Use those features with caution. For example, assigning a physical memory cap of 100MB to the global zone will surely cause problems...

New Boot Arguments

The following boot arguments can now be used:

Argument or OptionMeaning
-sBoot to the single-user milestone
-m <milestone>Boot to the specified milestone
-i </path/to/init>Boot the specified program as 'init'. This is only useful with branded zones.

Allowed syntaxes include:

global# zoneadm -z myzone boot -- -s
global# zoneadm -z yourzone reboot -- -i /sbin/myinit
ozone# reboot -- -m verbose
In addition, these boot arguments can be stored with zonecfg, for later boots.
global# zonecfg -z myzone
zonecfg:myzone> set bootargs="-m verbose"
zonecfg:myzone> exit

Configurable Privileges

Of the existing three DTrace privileges, dtrace_proc and dtrace_user can now be assigned to a Container. This allows the use of DTrace from within a Container. Of course, even the root user in a Container is still not allowed to view or modify kernel data, but DTrace can be used in a Container to look at system call information and profiling data for user processes.

Also, the privilege proc_priocntl can be added to a Container to enable the root user of that Container to change the scheduling class of its processes.

IP Instances

This is a new feature that allows a Container to have exclusive access to one or more network interfaces. No other Container, even the global zone, can send or receive packets on that NIC.

This also allows a Container to control its own network configuration, including routing, IP Filter, the ability to be a DHCP client, and others. The syntax is simple:

global# zonecfg -z myzone
zonecfg:myzone> set ip-type=exclusive
zonecfg:myzone> add net
zonecfg:myzone:net> set physical=bge1
zonecfg:myzone:net> end
zonecfg:myzone> exit

IP Filter Improvements

Some network architectures call for two systems to communicate via a firewall box or other piece of network equipment. It is often desirable to create two Containers that communicate via an external device, for similar reasons. Unfortunately, prior to Solaris 10 8/07 that was not possible. In 8/07 the global zone administrator can configure such a network architecture with the existing IP Filter commands.

Upgrading and Patching Containers with Live Upgrade

Solaris 10 8/07 adds the ability to use Live Upgrade tools on a system with Containers. This makes it possible to apply an update to a zoned system, e.g. updating from Solaris 10 11/06 to Solaris 10 8/07. It also drastically reduces the downtime necessary to apply some patches.

The latter ability requires more explanation. An existing challenge in the maintenance of zones is patching - each zone must be patched when a patch is applied. If the patch must be applied while the system is down, the downtime can be significant.

Fortunately, Live Upgrade can create an Alternate Boot Environment (ABE) and the ABE can be patched while the Original Boot Environment (OBE) is still running its Containers and their applications. After the patches have been applied, the system can be re-booted into the ABE. Downtime is limited to the time it takes to re-boot the system.

An additional benefit can be seen if there is a problem with the patch and that particular application environment. Instead of backing out the patch, the system can be re-booted into the OBE while the problem is investigated.

Branded Zones (Branded Containers)

Some times it would be useful to run an application in a Container, but the application is not yet available for Solaris, or is not available for the version of Solaris that is being run. To run an application like that, perhaps a special Solaris environment could be created that only runs applications for that version of Solaris, or for that operating system.

Solaris 10 8/07 contains a new framework called Branded Zones. This framework enables the creation and installation of Containers that are not the default 'native' type of Containers, but have been tailored to run 'non-native' applications.

Solaris Containers for Linux Applications

The first brand to be integrated into Solaris 10 is the brand called 'lx'. This brand is intended for x86 appplications which run well on CentOS 3 or Red Hat Linux 3. This brand is specific to x86 computers. The name of this feature is Solaris Containers for Linux Applications.

Conclusion

This was only a brief introduction to these many new and improved features. Details are available in the usual places, including http://docs.sun.com, http://sun.com/bigadmin, and http://www.sun.com/software/solaris/utilization.jsp.

Thursday May 24, 2007

How Much Does it cost to upgrade to Solaris 10

This week we have a guest blogger. Christine works with me in the Solaris Adoption Practice. She doesn't have a blog but suddenly finds that this week she has something to say. Here is her two-part discussion about starting your Solaris 10 Migration. She can be reached at christine DOT tran AT sun.com.

Part 1: "How much does it cost to upgrade to Solaris 10?"

We in the Solaris Adoption group get questions like this once in a while, from customers and from account managers. It has grown more frequent in recent weeks. The answer is: "It depends!" This answer exasperates some people and drives others up the wall. They try asking it another way: "How long does it take?" "Give me a ball-park level-of-effort estimate." "What's the cost of NOT upgrading to Solaris 10?"

The answer is: It depends.

I can understand the impetus behind the question. Before buying a car, a phone plan, a Twinkie, I want to know how much it costs. I'm trading my moneys for goods or service, and I want to know how much I'm in for. For crying out loud if they can estimate how much it costs to raise a child ($165,630 for 17 years, before inflation), you can tell me how much it costs to upgrade to Solaris 10! What is this "it depends" monkey business?

Introducing change into any system will cost you. It doesn't matter if you're patching, upgrading a firmware version, Apache, or the entire OS. At the very least it'll cost you time: even if you don't incur downtime, you will have expended time to do the task. Let's say you'll expend five minutes of time to introduce a patch to your system." There's your cost: five minutes." Will you install the patch?" "Sure. Why not?" How about one hour? Will you install the patch if it takes you one hour? How about eight hours? "We-l-l-l ... that's a wee bit long for a patch, isn't it? What's the patch for?"

And THERE is the first question you should ask, "What is it for?" If the patch fixes a rarely used OpenGL library, you may patch if it takes five minutes. You may not patch if it takes one hour, unless you had other patches to apply. You certainly are not going to spend eight hours. On the other hand if the patch fixes a security vulnerability with open exploits, you definitely are going to patch even if it takes you eight hours.

Before collaring your sales rep and asking him how much will it cost to upgrade to Solaris 10, ask "What is it for?" The cost is related to the benefits derived, a price tag by itself is meaningless

Should you upgrade to Solaris 10? From 2.6, Solaris 8, Solaris 9, migrate from some other OS? Perhaps not. Perhaps your system is running at the utilization rate you want, at the stable patch level you want, you're not going to be adding new hardware, you don't expect to grow your users, your disks, your bandwidth, you don't want any new feature. In short, you are perfectly happy with your server and there's no improvement you can think to make. In that case, no, you probably shouldn't spend the effort to run Solaris 10.

Of course, this scenario only exists in a perfect world. Solaris 10 is so full of Wheaties goodness that it would be near impossible to find a case where it's not an improvement on a previous version of Solaris. However, upgrading to Solaris 10 only to be running the latest, the hippest, the coolest, will give you no quantifiable benefit except that you're now hip and cool

Begin with a problem in your datacenter that you want to solve. Make it a business problem, and not a technical problem. Not "poor network performance", but "buyers take too long to go from checkout to payment." Not "consolidation" but "maintenance contracts and license on 25 web servers eating up IT budget."

These problems will resolve into technical problems: want a faster network stack, want consolidation but guaranteed computing resources per server, want faster and better development tools, want to re-use old JBODs. The technical problems will be solved by Solaris 10.

If you answer yes to all of the above, and have other requirements beyond that, prioritize and pick the top three. You're going to spend time, money, and manpower. Keeping the migration scope small makes the project manageable, and the relationship between cost and benefit clear

Part II: "How do I upgrade to Solaris 10?"

You've identified the top three business problems that Solaris 10 will solve. As an example, let's make them

  1. paying transaction too slow, users have to wait 5 seconds for payment to complete
  2. no planning for projected user growth tripling in 6 months
  3. acquisition of competitor, who wants to keep their own server environment

Let's say Solaris 10 improved TCP/IP stack will solve problem number 1, buying a big honking new server will solve problem number 2, and deploying zones to mimic your acquisition environment on the big honking new server will solve problem number 3

Your pilot migration plan will focus on the "before" and "after" of these three things. You want to know if upgrading to Solaris 10 will make your transaction faster, if new hardware will give you room to grow, and if you can host your acquisition without adding footprint in your datacenter. Granted, these are strawman problems simplified for the brevity of this post, but these are not very far removed from the real problems in real datacenters or mom-n-pop shops.

Planning

1. Scope your migration. Let's say you're just migrating 10 older platforms to a Sun Fire V890. You want to see faster paying transaction, enough memory and CPU power to triple the workload, and a several zones to mimic the environment of your recent acquisition. Decide on the target: for example, you'll want to end up with Solaris 10 11/06, most current patch cluster, application version x.y.

2.         Take an inventory. What applications will be loaded on the V890, what firmware, what driver? Are there supporting applications that will have to be carried over? Don't forget any network or SAN support, like trunking or PowerPath.

3.         Check your inventory. Is your COTS application supported by the vendor on Solaris 10? Sun can help you check this. If your application is developed in-house, can it pass Appcert, the tool to flag binary incompatibility between Solaris versions? If you have application versions not supported on Solaris 10, can you upgrade the application.

4.         Understand your operating environment. Do you know how your applications are backed up? Where are traps and alerts being sent? What to do for Disaster Recovery? What firewall policy or user policy is in effect? What are the intersections between applications which will be moved

5.         Have test methods and criteria. What test will verify that your application is functional? Are there performance benchmark? Have you taken a performance baseline measurement? A baseline measurement indicates how you are performing now. After migration, take another measurement and compare it with your baseline.

6.         Some /etc/systems parameters have been obsoleted in Solaris 10, replaced by Resource Management parameters. Check your /etc/systems to see if you need to convert these parameters.

7.         Solaris 10 comes with SMF (Service Management Facility.) Are there scripts in /etc/rc that you want to convert to XML manifest?

8.         There are special considerations to using zones. Have a plan for creating, patching, and managing zones.

9.         Do you have a Change Management framework? If yes, fit your migration activities into this framework. File your migration plan with the Change Management Board.

Action

10.    Upgrade a test environment first.

11.    Depending on your standard installation practice, write up an install plan for upgrading to Solaris 10. It can be as simple as taking a backup and inserting a Solaris 10 CD, or as complex as un-encapsulating rootdg, removing disks from Veritas Volume Manager and using LiveUpgrade. The install plan should have a section on how to fall-back if the installation does not go well.

12.    Have a good backup. Test your good backup.

13.    Schedule a maintenance window and follow your installation plan. Install any additional drivers, firmware, support packages, mandatory patches. Install your zones. Install your applications.

14.    Perform functional tests (does the OS and application work?) performance test (does the application run as well or better than before?) and integration test (does the application interact and perform as before when interfacing with other applications?) You have documented these test methods and results in Step 5.

15.    Can you perform routine administration tasks as before? Check cron jobs, log management, backups, system management, patching, name service, directory service, remote management, and anything else you do in your daily care-and-feeding routine. You have documented these tasks in Step 4.

16.    Repeat the previous two steps for your zones and applications running in a zone.

It's important to define an endpoint for your pilot where you stop work and examine if your migration has fulfilled your objectives. In other words, now your application is twice as fast, your new server can support three times the workload, and it contains zones which enclose a separate working environment for your recently-acquired business unit.

A successful small pilot can be broaden to cover a production environment. Document your pilot project, and repeat it for your production environment. From here, repeat the steps, with modifications from your experience, to upgrade another part of your datacenter. Or you can write another project plan for an OS refresh to gradually transform the rest of your datacenter. But you're done with the hardest part: the beginning.

Monday Dec 18, 2006

Zones and Configurable Privileges

Part 2 of Many

Another network feature that won't work in a non-global zone in Solaris 10 3/05, 1/06, or 6/06 is the service "dhcp-server". I wondered if appropriate privileges could be assigned to a zone, using Solaris 10 11/06, in order to enable that service to work properly in a non-global zone.

But how do you know which privilege(s) are needed? Although a tool to analyze an executable (and the libraries that it uses) for necessary privileges would be very useful, I am not aware of such a tool. However, there is a tool which will analyze a running program: privdebug.

I used dhcpmgr(1M) to configure the global zone of one Solaris 10 system to be a DHCP server, and told another Solaris 10 system to be a DHCP client by creating the appropriate /etc/dhcp.<interface-name> file. Then I ran privdebug to start gathering data.

After running privdebug as:

# ./privdebug.pl -n in.dhcpd -v -f
its output looked something like this (abbreviated slightly):
STAT TIMESTAMP          PPID   PID    PRIV                 CMD
USED 481061858324       7      1489   proc_fork            in.dhcpd
USED 481063008106       1489   1490   sys_resource         in.dhcpd
USED 481067169173       1489   1490   net_privaddr         in.dhcpd
USED 481067214515       1489   1490   net_privaddr         in.dhcpd
USED 481067261082       1489   1490   net_privaddr         in.dhcpd
USED 7602182665254      7      2307   proc_fork            in.dhcpd
USED 7602184084176      2307   2308   sys_resource         in.dhcpd
USED 7602195780436      1      2308   net_privaddr         in.dhcpd
USED 7602195826717      1      2308   net_privaddr         in.dhcpd
USED 7602195874362      1      2308   net_privaddr         in.dhcpd
USED 7617671777513      1      2308   net_icmpaccess       in.dhcpd
USED 7618028208673      1      2308   sys_net_config       in.dhcpd
USED 7618028224029      1      2308   sys_net_config       in.dhcpd
USED 7618028622618      1      2308   sys_net_config       in.dhcpd
USED 7618937845453      1      2308   sys_net_config       in.dhcpd
USED 7618937861126      1      2308   sys_net_config       in.dhcpd
USED 7786427652239      1      2308   net_icmpaccess       in.dhcpd
USED 7786782253121      1      2308   sys_net_config       in.dhcpd
USED 7786782266742      1      2308   sys_net_config       in.dhcpd
USED 7786782417242      1      2308   sys_net_config       in.dhcpd
With that list, it was easy to check each of the privileges that in.dhcpd used against the list of privileges that are allowed in a non-global zone.

Although proc_fork, sys_resource, net_privaddr and net_icmpaccess are in a non-global zone's default list of privileges, sys_net_config is not allowed in a non-global zone. Because of that, a non-global zone cannot be a DHCP server using Solaris 10 11/06.

That was a fun experiment, but in order to make a non-global zone a DHCP server we must wait for the Crossbow project to add sufficient IP instance functionality, along with its new sys_ip_config privilege. The latter will be allowed in a non-global zone.

About

Jeff Victor writes this blog to help you understand Oracle's Solaris and virtualization technologies.

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today