Tuesday Mar 10, 2015

Maintaining Configuration Files in Solaris 11.2

Introduction

Have you used Solaris 11 and wondered how to maintain customized system configuration files? In the past, and on other Unix/Linux systems, maintaining these configuration files was fraught with peril: extra bolt-on tools are needed to track changes, verify that inappropriate changes were not made, and fix them when something broke them.

A combination of features added to Solaris 10 and 11 address those problems. This blog entry describes the current state of related features, and demonstrates the method that was designed and implemented to automatically deploy and track changes to configuration files, verify consistency, and fix configuration files that "broke." Further, these new features are tightly integrated with the Solaris Service Management Facility introduced in Solaris 10 and the packaging system introduced in Solaris 11.

Background

Solaris 10 added the Service Management Facility, which significantly improved on the old, unreliable pile of scripts in /etc/rc#.d directories. This also allowed us to move from the old model of system configuration information stored in ASCII files to a database of configuration information. The latter change reduces the risk associated with manual or automated modifications of text files. Each modification is the result of a command that verifies the correctness of the change before applying it. That verification process greatly reduces the opportunities for a mistake that can be very difficult to troubleshoot.

During updates to Solaris 10 and 11 we continued to move configuration files into SMF service properties. However, there are still configuration files, and we wanted to provide better integration between the Solaris 11 packaging facility (IPS), and those remaining configuration files. This blog entry demonstrates some of that integration, using features added up through Solaris 11.1.

Many Solaris systems need customized email delivery rules. In the past, providing those rules required replacing /etc/mail/sendmail.cf with a custom file. However, this created the need to maintain that file - restoring it after a system udpate, verifying its integrity periodically, and potentially fixing it if someone or something broke it.

Method

IPS provides the tools to accomplish those goals, specifically:

  1. maintain one or more versions of a configuration file in an IPS repository
  2. use IPS and AI (Automated Installer) to install, update, verify, and potentially fix that configuration file
  3. automatically perform the steps necessary to re-configure the system with a configuration file that has just been installed or updated.

The rest of this assumes that you understand Solaris 11 and IPS.

In this example, we want to deliver a custom sendmail.cf file to multiple systems. We will do that by creating a new IPS package that contains just one configuration file. We need to create the "precursor" to a sendmail.cf file, (sendmail.mc) that will be expanded by sendmail when it starts. We also need to create a custom manifest for the package. Finally, we must create an SMF service profile, which will cause Solaris to understand that a new sendmail configuration is available and should be integrated into its database of configuration information.

Here are the steps in more detail.

  1. Create a directory ("mypkgdir") that will hold the package manifest and a directory ("contents") for package contents.
    $ mkdir -p mypkgdir/contents
    $ cd mypkgdir
    
    Then create the configuration file that you want to deploy with this package. For this example, we simply copy an existing configuration file.
    $ cp /etc/mail/cf/cf/sendmail.mc contents/custom_sm.mc
    
  2. Create a manifest file in mypkgdir/sendmail-config.p5m: (the entity that owns the computers is the fictional corporation Consolidated Widgets, Inc.)
    set name=pkg.fmri value=pkg://cwi/site/sendmail-config@8.14.9,1.0
    set name=com.cwi.info.name value=Solaris11sendmail
    set name=pkg.description value="ConWid sendmail.mc file for Solaris 11, accepts only local connections."
    set name=com.cwi.info.description value="Sendmail configuration"
    set name=pkg.summary value="Sendmail configuration"
    set name=variant.opensolaris.zone value=global value=nonglobal
    set name=com.cwi.info.version value=8.14.9
    set name=info.classification value=org.opensolaris.category.2008:System/Core
    set name=org.opensolaris.smf.fmri value=svc:/network/smtp:sendmail
    depend fmri=pkg://solaris/service/network/smtp/sendmail type=require
    file custom_sm.mc group=mail mode=0444 owner=root \
       path=etc/mail/cf/cf/custom_sm.mc
    file custom_sm_mc.xml group=mail mode=0444 owner=root \
       path=lib/svc/manifest/site/custom_sm_mc.xml        \
       restart_fmri=svc:/system/manifest-import:default   \
       refresh_fmri=svc:/network/smtp:sendmail            \
       restart_fmri=svc:/network/smtp:sendmail
    
    
    The "depend" line tells IPS that the package smtp/sendmail must already be installed on this system. If it isn't, Solaris will install that package before proceeding to install this package.
    The line beginning "file custom_sm.mc" gives IPS detailed metadata about the configuration file, and indicates the full pathname - within an image - at which the macro should be stored. The last line specifies the local file name of of the service profile (more on that later), and the location to store it during package installation. It also lists three actuators: SMF services to refresh (re-configure) or restart at the end of package installation. The first of those imports new manifests and service profiles. Importing the service profile changes the property path_to_sendmail_mc. The other two re-configure and restart sendmail. Those two actions expand and then use the new configuration file - the goal of this entire exercise!

  3. Create a service profile:
    $ svcbundle -o contents/custom_sm_mc.xml -s bundle-type=profile \
        -s service-name=network/smtp -s instance-name=sendmail -s enabled=true \
        -s instance-property=config:path_to_sendmail_mc:astring:/etc/mail/cf/cf/custom_sm.mc
    
    That command creates the file custom_sm_mc.xml, which describes the profile. The sole profile of that profile is to set the sendmail service property "config/path_to_sendmail_mc" to the name of the new sendmail macro file.

  4. Verify correctness of the manifest. In this example, the Solaris repository is mounted at /mnt/repo1. For most systems, "-r" will be followed by the repository's URI, e.g. http://pkg.oracle.com/solaris/release/ or a data center's repository.
    $ pkglint -c /tmp/pkgcache -r /mnt/repo1 sendmail-config.p5m
    Lint engine setup...
    Starting lint run...
    
    $
    
    As usual, the lack of output indicates success.

  5. Create the package, make it available in a repo to a test IPS client.
    Note: The documentation explains these steps in more detail.
    Note: this example stores a repo in /var/tmp/cwirepo. This will work, but I am not suggesting that you place repositories in /var/tmp. You should a repo in a directory that is publicly available.
    $ pkgrepo create /var/tmp/cwirepo
    $ pkgrepo -s /var/tmp/cwirepo set publisher/prefix=cwi
    $ pkgsend -s /var/tmp/cwirepo publish -d contents sendmail-config.p5m
    pkg://cwi/site/sendmail-config@8.14.9,1.0:20150305T163445Z
    PUBLISHED
    $ pkgrepo verify -s /var/tmp/cwirepo
    Initiating repository verification.
    $ pkgrepo info -s /var/tmp/cwirepo
    PUBLISHER PACKAGES STATUS           UPDATED
    cwi       1        online           2015-03-05T16:39:13.906678Z
    $ pkgrepo list -s /var/tmp/cwirepo
    PUBLISHER NAME                                          O VERSION
    cwi       site/sendmail-config                            8.14.9,1.0:20150305T163913Z
    $ pkg list -afv -g /var/tmp/cwirepo
    FMRI                                                                         IFO
    pkg://cwi/site/sendmail-config@8.14.9,1.0:20150305T163913Z                   ---
    
    

With all of that, you can use the usual IPS packaging commands. I tested this by adding the "cwi" publisher to a running native Solaris Zone and making the repo available as a loopback mount:

# zlogin testzone mkdir /var/tmp/cwirepo
# zonecfg -rz testzone
zonecfg:testzone> add fs
zonecfg:testzone:fs> set dir=/var/tmp/cwirepo
zonecfg:testzone:fs> set special=/var/tmp/cwirepo
zonecfg:testzone:fs> set type=lofs
zonecfg:testzone:fs> end
zonecfg:testzone> commit
zone 'testzone': Checking: Mounting fs dir=/var/tmp/cwirepo
zone 'testzone': Applying the changes
zonecfg:testzone> exit
# zlogin testzone
root@testzone:~# pkg set-publisher -g /var/tmp/cwirepo cwi
root@testzone:~# pkg info -r sendmail-config
          Name: site/sendmail-config
       Summary: Sendmail configuration
   Description: ConWid sendmail.mc file for Solaris 11, accepts only local
                connections.
      Category: System/Core
         State: Not installed
     Publisher: cwi
       Version: 8.14.9
 Build Release: 1.0
        Branch: None
Packaging Date: March  5, 2015 08:14:22 PM
          Size: 1.59 kB
          FMRI: pkg://cwi/site/sendmail-config@8.14.9,1.0:20150305T201422Z

root@testzone:~#  pkg install site/sendmail-config
           Packages to install:  1
            Services to change:  2
       Create boot environment: No
Create backup boot environment: No
DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED
Completed                                1/1           2/2      0.0/0.0    0B/s

PHASE                                          ITEMS
Installing new actions                         12/12
Updating package state database                 Done
Updating package cache                           0/0
Updating image state                            Done
Creating fast lookup database                   Done
Updating package cache                           2/2

root@testzone:~# pkg verify  site/sendmail-config
root@testzone:~#

Installation of that package causes several effects. Obviously, the custom sendmail configuration file custom_sm.mc is placed into the directory /etc/mail/sendmail/cf/cf. The sendmail daemon is restarted, automatically expanding that file into a sendmail.cf file and using it. I have noticed that on occasion, it is necessary to refresh and restart the sendmail service.

Conclusion

The result of all of that is an easily maintained configuration file. These concepts can be used with other configuration files, and can be extended to more complex sets of configuration files.

For more information, see these documents:

Acknolwedgements

I appreciate the assistance of Dave Miner, John Beck, and Scott Dickson, who helped me understand the details of these features. However, I am responsible for any errors.

Monday Dec 08, 2014

Provisioning Solaris 11

Oracle Solaris 11 introduced the Image Package System, a new software packaging in which Solaris software components are delivered. It replaces the System V Release 4 ("SVR4") packaging system used by Solaris 2.0 through Solaris 10. If you are learning Solaris 11, learning about IPS is a must! The links below will take you to the documents, videos, blog entries, and other artifacts that I think are the most important ones to begin.

Blog Entries, Screencasts


Downloadable


How-to Guides, Other Papers


Official Documentation

Wednesday Nov 12, 2014

Details of Solaris Zones as Hard Partitions

Oracle offers the ability to license much of its software, including the Oracle Database, based on the quantity of CPUs that will run the software. When performance goals can be met with only a subset of the computer's CPUs, it may be appropriate to limit licensing costs by using a processor-based licensing metric and using one of the hardware partitioning technologies included with the computer.

In general, computers that use the Oracle Solaris OS can choose from a variety of resource management features, including CPU, memory, and I/O. Because of the high level of integration between the plethora of Solaris features, using resource management does not mean avoiding other features. In particular, the use of resources by workloads running in Solaris Zones can be constrained.

The document Hard Partitioning With Oracle Solaris Zones explains the different Solaris features that can be used to limit software licensing costs when a processor-based metric is used. It also demonstrates the use of those features. The approved methods include the ability to limit a Solaris Zone to a specific quantity of CPUs, or the ability to limit a set of Solaris Zones to a specific quantity of shared CPUs.

Tuesday Aug 26, 2014

Migratory Solaris Kernel Zones

The Introduction

Oracle Solaris 11.2 introduced Oracle Solaris Kernel Zones. Kernel Zones (KZs) offer a midpoint between traditional operating system virtualization and virtual machines. They exhibit the low overhead and low management effort of Solaris Zones, and add the best parts of the independence of virtual machines.

A Kernel Zone is a type of Solaris Zone that runs its own Solaris kernel. This gives each Kernel Zone complete independence of software packages, as well as other benefits.

One of the more interesting new abilities that Kernel Zones bring to Solaris Zones is the ability to "pause" a running KZ and, "resume" it on a different computer - or the same computer, if you prefer.

Of what value is the ability to "pause" a zone? One potential use is moving a workload from a smaller computer (with too few CPUs, or insufficient RAM) to a larger one. Some workloads do not maintain much state, and can restart quickly, and so they wouldn't benefit from suspend/resume. Others, such as static (read-only) databases, may take 30 minutes to start and obtain good performance. The ability to suspend, but not stop, the workload and its operating system can be very valuable.

Another possible use of this ability is the staging of multiple KZs which have already booted and, perhaps, have started to run a workload. Instead of booting in a few minutes, the workload can continue from a known state in just a few seconds. Further, the suspended zone can be "unpaused" on the computer of your choice. Suspended kernel zones are like a nest of dozing ants, waiting to take action at a moment's notice.

This blog entry shows the steps to create and move a KZ, highlighting both the Solaris iSCSI implementation as well as kernel zones and their suspend/resume feature. Briefly, the steps are:

  1. Create shared storage
  2. Make shared storage available to both computers - the one that will run the zone, at first, as well as the computer on which the zone will be resumed.
  3. Configure the zone on each system.
  4. Install the zone on one system.
  5. "Warm migrate" the zone by pausing it, and then, on the other computer, resuming it.

Links to relevant documentation and blogs are provided at the bottom.

The Method

The Kernel Zones suspend/resume feature requires the use of storage accessible by multiple computers. However, neither Kernel Zones nor suspend/resume requires a specific type of shared storage. In Solaris 11.2 the only types of shared storage that supports zones are iSCSI and Fiber Channel. This blog entry uses iSCSI.

The example below uses three computers. One is the iSCSI target, i.e. the storage server. The other two run the KZ, one at a time. All three systems run Solaris 11.2, although the iSCSI features below work on early updates to Solaris 11, or a ZFS Storage Appliance (the current family shares the brand name ZS3), or another type of iSCSI target.

In the commands shown below, the prompt "storage1#" indicates commands that would be entered into the iSCSI target. Similarly, "node1#" indicates commands that you would enter into the first computer that will run the kernel zone. The few commands preceded by the prompt "bothnodes#" must be run on the both node1 and node2. The name of the kernel zone is "ant1".

For simplicity, the example below ignores security concerns. (More about security below.)

Finally, note that these commands should be run by a non-root user who prefaces each command with the pseudo-command "sudo". ;-)

Step 1. Provide shared storage for the kernel zone. The zone only needs one device for its zpool. Redundancy is provided by the zpool in the iSCSI target. (For a more detailed explanation, see the link to the COMSTAR documentation in the section "The Links" below.)

storage1# pkg install group/feature/storage-server           # Install necessary software.
storage1# svcadm enable stmf:default                         # Enable that software.
storage1# zfs create rpool/zvols                             # A dataset for the zvol.
storage1# zfs create -V 20g rpool/zvols/ant1                 # Create a zvol as backing store.
storage1# stmfadm create-lu /dev/zvol/rdsk/rpool/zvols/ant1  # Create a back-end LUN.
Logical unit created: 600144F068D1CD00000053ECD3D20001
storage1# stmfadm list-lu
LU Name: 600144F068D1CD00000053ECD3D20001
storage1# stmfadm add-view 600144F068D1CD00000053ECD3D20001
storage1# stmfadm list-view -l 600144F068D1CD00000053ECD3D20001
View Entry: 0
Host group : All
Target Group : All
LUN : Auto

storage1# svcadm enable -r svc:/network/iscsi/target:default # Enable the target service.
storage1# svcs -l iscsi/target
fmri svc:/network/iscsi/target:default
name iscsi target
enabled true
state online
next_state none
state_time Auguest 10, 2014 03:58:50 PM EST
logfile /var/svc/log/network-iscsi-target:default.log
restarter svc:/system/svc/restarter:default
manifest /lib/svc/manifest/network/iscsi/iscsi-target.xml
dependency require_any/error svc:/milestone/network (online)
dependency require_all/none svc:/system/stmf:default (online)
storage1# itadm create-target
Target iqn.1986-03.com.sun:02:238d10b8-cca8-ef7a-e095-e1132d91c4a5
successfully created
storage1# itadm list-target -v
TARGET NAME                                                  STATE    SESSIONS
iqn.1986-03.com.sun:02:238d10b8-cca8-ef7a-e095-e1132d91c4a5  online   0
        alias:                  -
        auth:                   none (defaults)
        targetchapuser:         -
        targetchapsecret:       unset
        tpg-tags:               default

Step 2A. Configure initiators Configuring iSCSI on the two iSCSI initiators uses exactly the same commands on each, so I'll just list them once.

bothnodes# svcadm enable network/iscsi/initiator
bothnodes# iscsiadm modify discovery --sendtargets enable # The simplest discovery method.
bothnodes# iscsiadm add discovery-address 192.168.1.11    # IP address of the storage server.
At this point, the initiator will automatically discover all of the iSCSI LUNs offered by that target. One way to view the list of them is with the format(1M) command.
bothnodes# format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
....
       1. c0t600144F068D1CD00000053ECD3D20001d0 
          /scsi_vhci/disk@g600144f068d1cd00000053ecd3d20001
...

Step 2B. On each of the two computers that will host the zone, identify the Storage Uniform Resource Identfiers ("SURI") - see suri(5) for more information. This command tells you the SURI of that LUN, in each of multiple formats. We'll need this SURI to specify the storage for the kernel zone.

bothnodes# suriadm lookup-uri c0t600144F068D1CD00000053ECD3D20001d0
iscsi://house/luname.naa.600144f068d1cd00000053ecd3d20001
iscsi://house/target.iqn.1986-03.com.sun:02:238d10b8-cca8-ef7a-e095-e1132d91c4a5,lun.1
dev:dsk/c0t600144F068D1CD00000053ECD3D20001d0
Step 2C. When you suspend a kernel zone, its RAM pages must be stored temporarily in a file. In order to resume the zone on a different computer, the "suspend file" must be on storage that both computers can access. For this example, we'll use an NFS share. (Another iSCSI LUN could be used instead.) The method shown below is not particularly secure, although the suspended image is first encrypted. Secure methods would require the use of other Solaris features, but they are not the topic of this blog entry.
storage1# zfs create -p rpool/export/suspend
storage1# zfs set share.nfs=on rpool/export/suspend
That share must be made available on both nodes, with appropriate permissions.
node1# mkdir /mnt/suspend
node1# mount -F nfs storage1:/export/suspend /mnt/suspend
node2# mkdir /mnt/suspend
node2# mount -F nfs storage1:/export/suspend /mnt/suspend
Step 3. Configure a kernel zone, using the two iSCSI LUNs and a system profile. You can configure a kernel zone very easily. The only required settings are the name and the use of the kernel zone template. The name of the latter is SYSsolaris-kz. That template specifies a VNIC, 2GB of dedicated RAM, 1 virtual CPU, and local storage that will be configured automatically when the zone is installed. We need shared storage instead of local storage, so one of the first steps will be deleting the local storage resource. That device will have an ID number of zero. After deleting that resource, we add the LUN, using the SURI determined earlier.
node1# zonecfg -z ant1
  Use 'create' to begin configuring a new zone.
  zonecfg:ant1> create -t SYSsolaris-kz
  zonecfg:ant1> remove device id=0
  zonecfg:ant1> add device
  zonecfg:ant1:device> set storage=iscsi://house/luname.naa.600144f068d1cd00000053ecd3d20001
  zonecfg:ant1:device> set bootpri=0
  zonecfg:ant1:device> info
  device:
        match not specified
        storage: iscsi://house/luname.naa.600144f068d1cd00000053ecd3d20001
        id: 1
        bootpri: 0
  zonecfg:ant1:device> end
  zonecfg:ant1> add suspend
  zonecfg:ant1:suspend> set path=/mnt/suspend/ant1.sus
  zonecfg:ant1:suspend> end
  zonecfg:ant1> exit
We can create a reusable configuration profile.
node1# sysconfig create-profile  -o ant1
[The usual sysconfig conversation ensues...]

Step 4. Install and boot the kernel zone.

node1# zoneadm -z ant1 install -c ant1/sc_profile.xml
Progress being logged to /var/log/zones/zoneadm.20140815T155143Z.ant1.install
pkg cache: Using /var/pkg/publisher.
 Install Log: /system/volatile/install.15996/install_log
 AI Manifest: /tmp/zoneadm15390.W_a4NE/devel-ai-manifest.xml
  SC Profile: /usr/share/auto_install/sc_profiles/enable_sci.xml
Installation: Starting ...

        Creating IPS image
        Installing packages from:
            solaris
                origin:  http://pkg.oracle.com/solaris/release/
        The following licenses have been accepted and not displayed.
        Please review the licenses for the following packages post-install:
          consolidation/osnet/osnet-incorporation
        Package licenses may be viewed using the command:
          pkg info --license 

DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED
Completed                            483/483   64276/64276  543.7/543.7    0B/s

PHASE                                          ITEMS
Installing new actions                   87530/87530
Updating package state database                 Done
Updating package cache                           0/0
Updating image state                            Done
Creating fast lookup database                   Done
Installation: Succeeded
        Done: Installation completed in 207.564 seconds.

node1# zoneadm -z ant1 boot

Step 5. With all of the hard work behind us, we can "warm migrate" the zone. The first step is preparation of the destination system - "node2" in our example - by applying the zone's configuration to the destination.

node1# zonecfg -z ant1 export -f /mnt/suspend/ant1.cfg
node2# zonecfg -z ant1 -f /mnt/suspend/ant1.cfg
The "detach" operation does not delete anything. It merely tells node1 to cease considering the zone to be usable.
node1# zoneadm -z ant1 suspend
node1# zoneadm -z ant1 detach

A separate "resume" sub-command for zoneadm was not necessary. The "boot" sub-command fulfills that purpose.

node2# zoneadm -z ant1 attach
node2# zoneadm -z ant1 boot
Of course, "warm migration" is different from "live migration" in one important respect: the duration of the service outage. Live migration achieves a service outage that lasts a small fraction of a second. In one experiment, warm migration of a kernel zone created a service outage that lasted 30 seconds. It's not live migration, but is an important step forward, compared to other types of Solaris Zones.

The Notes

  1. This example used a zpool as back-end storage. That zpool provided data redundancy, so additional redundancy was not needed within the kernel zone. If unmirrored devices (e.g. physical disks were specified in zonecfg) then data redundancy should be achieved within the zone. Fortunately, you can specify two devices in zonecfg, and "zoneadm ... install" will automatically mirror them.
  2. In a simple network configuration, the steps above create a kernel zone that has normal network access. More complicated networks may require additional steps, such as VLAN configuration, etc.
  3. Some steps regarding file permissions on the NFS mount were omitted for clarity. This is one of the security weaknesses of the steps shown above. All of the weaknesses can be addressed by using additional Solaris features. These include, but are not limited to, iSCSI features (iSNS, CHAP authentication, RADIUS, etc.), NFS security features (e.g. NFS ACLs, Kerberos, etc.), RBAC, etc.

The Links

Thursday Apr 24, 2014

Oracle Solaris 11.2 Launch

On April 29th, Oracle will launch Oracle Solaris 11.2. This version will add significant new features that reinforce Solaris' position as the leading cloud OS.

These new features:

  • Further increase the flexibility of Solaris system virtualization
  • Simplify the creation of private and public clouds
  • Add unique software-defined networking (SDN) capabilities
  • Reduce management effort via OpenStack integration

To attend the launch event live in New York City, or view the live webcast, visit http://oracle.com/goto/solaris-11-2.

Wednesday Aug 28, 2013

Threads, as fas the eye can see...

Recently, I contributed to a new white paper that addresses the question "ok... now I have a computer with over 1,000 hardware threads... what do I do with all of those threads?" The topics include details of the newest SPARC S3 core, workload consolidation and server virtualization, multi-threaded programming, and more.

Oracle published the paper, which you can find here: http://www.oracle.com/technetwork/server-storage/sun-sparc-enterprise/documentation/index.html. My personal thanks go to Dr. Foxwell, who filled the role of engineer-herder for this project, and also to the other co-authors: Ruud, JeffS, and Darryl.

Tuesday Jul 16, 2013

AIX2Solaris

Maximize the Value, Minimize the Effort

Are you among the people migrating workloads from IBM AIX to Oracle Solaris 11? Even if you have not yet begun this migration, you will benefit from the knowledge contained in these resources:
  1. an online comparison of features
  2. the IBM AIX to Oracle Solaris Technology Mapping Guide.

Thursday Jul 11, 2013

Hands-On Training

Curious about Oracle Solaris, Oracle Linux or Oracle VM?
Or are you beyond curious, and in need of hands-on experience?

Oracle is proud to host "Virtual Sysadmin Day!" During this event, you will learn how to build a secure, multi-level application deployed using virtualization capabilities of Oracle Solaris 11, and/or many other activities.

You must register for Virtual Sysadmin Day to attend. At that site you can also view the agenda and pre-event instructions to prepare your laptop or desktop.

Thursday Jun 27, 2013

Improving Manageability of Virtual Environments

Boot Environments for Solaris 10 Branded Zones

Until recently, Solaris 10 Branded Zones on Solaris 11 suffered one notable regression: Live Upgrade did not work. The individual packaging and patching tools work correctly, but the ability to upgrade Solaris while the production workload continued running did not exist. A recent Solaris 11 SRU (Solaris 11.1 SRU 6.4) restored most of that functionality, although with a slightly different concept, different commands, and without all of the feature details. This new method gives you the ability to create and manage multiple boot environments (BEs) for a Solaris 10 Branded Zone, and modify the active or any inactive BE, and to do so while the production workload continues to run.

Background

In case you are new to Solaris: Solaris includes a set of features that enables you to create a bootable Solaris image, called a Boot Environment (BE). This newly created image can be modified while the original BE is still running your workload(s). There are many benefits, including improved uptime and the ability to reboot into (or downgrade to) an older BE if a newer one has a problem.

In Solaris 10 this set of features was named Live Upgrade. Solaris 11 applies the same basic concepts to the new packaging system (IPS) but there isn't a specific name for the feature set. The features are simply part of IPS. Solaris 11 Boot Environments are not discussed in this blog entry.

Although a Solaris 10 system can have multiple BEs, until recently a Solaris 10 Branded Zone (BZ) in a Solaris 11 system did not have this ability. This limitation was addressed recently, and that enhancement is the subject of this blog entry.

This new implementation uses two concepts. The first is the use of a ZFS clone for each BE. This makes it very easy to create a BE, or many BEs. This is a distinct advantage over the Live Upgrade feature set in Solaris 10, which had a practical limitation of two BEs on a system, when using UFS. The second new concept is a very simple mechanism to indicate the BE that should be booted: a ZFS property. The new ZFS property is named com.oracle.zones.solaris10:activebe (isn't that creative? ;-) ).

It's important to note that the property is inherited from the original BE's file system to any BEs you create. In other words, all BEs in one zone have the same value for that property. When the (Solaris 11) global zone boots the Solaris 10 BZ, it boots the BE that has the name that is stored in the activebe property.

Here is a quick summary of the actions you can use to manage these BEs:

To create a BE:

  • Create a ZFS clone of the zone's root dataset

To activate a BE:

  • Set the ZFS property of the root dataset to indicate the BE

To add a package or patch to an inactive BE:

  • Mount the inactive BE
  • Add packages or patches to it
  • Unmount the inactive BE

To list the available BEs:

  • Use the "zfs list" command.

To destroy a BE:

  • Use the "zfs destroy" command.

Preparation

Before you can use the new features, you will need a Solaris 10 BZ on a Solaris 11 system. You can use these three steps - on a real Solaris 11.1 server or in a VirtualBox guest running Solaris 11.1 - to create a Solaris 10 BZ. The Solaris 11.1 environment must be at SRU 6.4 or newer.

  1. Create a flash archive on the Solaris 10 system
    s10# flarcreate -n s10-system /net/zones/archives/s10-system.flar
  2. Configure the Solaris 10 BZ on the Solaris 11 system
    s11# zonecfg -z s10z
    Use 'create' to begin configuring a new zone.
    zonecfg:s10z> create -t SYSsolaris10
    zonecfg:s10z> set zonepath=/zones/s10z
    zonecfg:s10z> exit
    s11# zoneadm list -cv
      ID NAME             STATUS     PATH                           BRAND     IP    
       0 global           running    /                              solaris   shared
       - s10z             configured /zones/s10z                    solaris10 excl  
    
  3. Install the zone from the flash archive
    s11# zoneadm -z s10z install -a /net/zones/archives/s10-system.flar -p
    

You can find more information about the migration of Solaris 10 environments to Solaris 10 Branded Zones in the documentation.

The rest of this blog entry demonstrates the commands you can use to accomplish the aforementioned actions related to BEs.

New features in action

Note that the demonstration of the commands occurs in the Solaris 10 BZ, as indicated by the shell prompt "s10z# ". Many of these commands can be performed in the global zone instead, if you prefer. If you perform them in the global zone, you must change the ZFS file system names.

Create

The only complicated action is the creation of a BE. In the Solaris 10 BZ, create a new "boot environment" - a ZFS clone. You can assign any name to the final portion of the clone's name, as long as it meets the requirements for a ZFS file system name.

s10z# zfs snapshot rpool/ROOT/zbe-0@snap
s10z# zfs clone -o mountpoint=/ -o canmount=noauto rpool/ROOT/zbe-0@snap rpool/ROOT/newBE
cannot mount 'rpool/ROOT/newBE' on '/': directory is not empty
filesystem successfully created, but not mounted
You can safely ignore that message: we already know that / is not empty! We have merely told ZFS that the default mountpoint for the clone is the root directory.

(Note that a Solaris 10 BZ that has a separate /var file system requires additional steps. See the MOS document mentioned at the bottom of this blog entry.)

List the available BEs and active BE

Because each BE is represented by a clone of the rpool/ROOT dataset, listing the BEs is as simple as listing the clones.

s10z# zfs list -r rpool/ROOT
NAME               USED  AVAIL  REFER  MOUNTPOINT
rpool/ROOT        3.55G  42.9G    31K  legacy
rpool/ROOT/zbe-0     1K  42.9G  3.55G  /
rpool/ROOT/newBE  3.55G  42.9G  3.55G  /
The output shows that two BEs exist. Their names are "zbe-0" and "newBE".

You can tell Solaris that one particular BE should be used when the zone next boots by using a ZFS property. Its name is com.oracle.zones.solaris10:activebe. The value of that property is the name of the clone that contains the BE that should be booted.

s10z# zfs get com.oracle.zones.solaris10:activebe rpool/ROOT
NAME        PROPERTY                             VALUE  SOURCE
rpool/ROOT  com.oracle.zones.solaris10:activebe  zbe-0  local

Change the active BE

When you want to change the BE that will be booted next time, you can just change the activebe property on the rpool/ROOT dataset.

s10z# zfs get com.oracle.zones.solaris10:activebe rpool/ROOT
NAME        PROPERTY                             VALUE  SOURCE
rpool/ROOT  com.oracle.zones.solaris10:activebe  zbe-0  local
s10z# zfs set com.oracle.zones.solaris10:activebe=newBE rpool/ROOT
s10z# zfs get com.oracle.zones.solaris10:activebe rpool/ROOT
NAME        PROPERTY                             VALUE  SOURCE
rpool/ROOT  com.oracle.zones.solaris10:activebe  newBE  local
s10z# shutdown -y -g0 -i6
After the zone has rebooted:
s10z# zfs get com.oracle.zones.solaris10:activebe rpool/ROOT
rpool/ROOT  com.oracle.zones.solaris10:activebe  newBE  local
s10z# zfs mount
rpool/ROOT/newBE                /
rpool/export                    /export
rpool/export/home               /export/home
rpool                           /rpool
Mount the original BE to see that it's still there.
s10z# zfs mount -o mountpoint=/mnt rpool/ROOT/zbe-0
s10z# ls /mnt
Desktop                         export                          platform
Documents                       export.backup.20130607T214951Z  proc
S10Flar                         home                            rpool
TT_DB                           kernel                          sbin
bin                             lib                             system
boot                            lost+found                      tmp
cdrom                           mnt                             usr
dev                             net                             var
etc                             opt

Patch an inactive BE

At this point, you can modify the original BE. If you would prefer to modify the new BE, you can restore the original value to the activebe property and reboot, and then mount the new BE to /mnt (or another empty directory) and modify it.

Let's mount the original BE so we can modify it. (The first command is only needed if you haven't already mounted that BE.)

s10z# zfs mount -o mountpoint=/mnt rpool/ROOT/zbe-0
s10z# patchadd -R /mnt -M /var/sadm/spool 104945-02
Note that the typical usage will be:
  1. Create a BE
  2. Mount the new (inactive) BE
  3. Use the package and patch tools to update the new BE
  4. Unmount the new BE
  5. Reboot

Delete an inactive BE

ZFS clones are children of their parent file systems. In order to destroy the parent, you must first "promote" the child. This reverses the parent-child relationship. (For more information on this, see the documentation.)

The original rpool/ROOT file system is the parent of the clones that you create as BEs. In order to destroy an earlier BE that is that parent of other BEs, you must first promote one of the child BEs to be the ZFS parent. Only then can you destroy the original BE.

Fortunately, this is easier to do than to explain:

s10z# zfs promote rpool/ROOT/newBE 
s10z# zfs destroy rpool/ROOT/zbe-0
s10z# zfs list -r rpool/ROOT
NAME               USED  AVAIL  REFER  MOUNTPOINT
rpool/ROOT        3.56G   269G    31K  legacy
rpool/ROOT/newBE  3.56G   269G  3.55G  /

Documentation

This feature is so new, it is not yet described in the Solaris 11 documentation. However, MOS note 1558773.1 offers some details.

Conclusion

With this new feature, you can add and patch packages to boot environments of a Solaris 10 Branded Zone. This ability improves the manageability of these zones, and makes their use more practical. It also means that you can use the existing P2V tools with earlier Solaris 10 updates, and modify the environments after they become Solaris 10 Branded Zones.

Fastest Engineered System

Quick! It's not too late - you can still register for today's webcast announcement of the new Oracle SuperCluster T5-8.

Wednesday Jun 12, 2013

Comparing Solaris 11 Zones to Solaris 10 Zones

Many people have asked whether Oracle Solaris 11 uses sparse-root zones or whole-root zones. I think the best answer is "both and neither, and more" - but that's a wee bit confusing. :-) This blog entry attempts to explain that answer.

First a recap: Solaris 10 introduced the Solaris Zones feature set, way back in 2005. Zones are a form of server virtualization called "OS (Operating System) Virtualization." They improve consolidation ratios by isolating processes from each other so that they cannot interact. Each zone has its own set of users, naming services, and other software components. One of the many advantages is that there is no need for a hypervisor, so there is no performance overhead. Many data centers run tens to hundreds of zones per server!

In Solaris 10, there are two models of package deployment for Solaris Zones. One model is called "sparse-root" and the other "whole-root." Each form has specific characteristics, abilities, and limitations.

A whole-root zone has its own copy of the Solaris packages. This allows the inclusion of other software in system directories - even though that practice has been discouraged for many years. Although it is also possible to modify the Solaris content in such a zone, e.g. patching a zone separately from the rest, this was highly frowned on. :-( (More importantly, modifying the Solaris content in a whole-root zone may lead to an unsupported configuration.)

The other model is called "sparse-root." In that form, instead of copying all of the Solaris packages into the zone, the directories containing Solaris binaries are re-mounted into the zone. This allows the zone's users to access them at their normal places in the directory tree. Those are read-only mounts, so a zone's root user cannot modify them. This improves security, and also reduces the amount of disk space used by the zone - 200MB instead of the usual 3-5 GB per zone. These loopback mounts also reduce the amount of RAM used by zones because Solaris only stores in RAM one copy of a program that is in use by several zones. This model also has disadvantages. One disadvantage is the inability to add software into system directories such as /usr. Also, although a sparse-root can be migrated to another Solaris 10 system, it cannot be moved to a Solaris 11 system as a "Solaris 10 Zone."

In addition to those contrasting characteristics, here are some characteristics of zones in Solaris 10 that are shared by both packaging models:

  • A zone can modify its own configuration files in /etc.
  • A zone can be configured so that it manages its own networking, or so that it cannot modify its network configuration.
  • It is difficult to give a non-root user in the global zone the ability to boot and stop a zone, without giving that user other abilities.
  • In a zone that can manage its own networking, the root user can do harmful things like spoof other IP addresses and MAC addresses.
  • It is difficult to assign network patcket processing to the same CPUs that a zone used. This could lead to unpredictable performance and performance troubleshooting challenges.
  • You cannot run a large number of zones in one system (e.g. 50) that each managed its own networking, because that would require assignment of more physical NICs than available (e.g. 50).
  • Except when managed by Ops Center, zones could not be safely stored on NAS.
  • Solaris 10 Zones cannot be NFS servers.
  • The fsstat command does not report statistics per zone.

Solaris 11 Zones use the new packaging system of Solaris 11. Their configuration does not offer a choice of packaging models, as Solaris 10 does. Instead, two (well, four) different models of "immutability" (changeability) are offered. The default model allows a privileged zone user to modify the zone's content. The other (three) limit the content which can be changed: none, or two overlapping sets of configuration files. (See "Configuring and Administering Immutable Zones".)

Solaris 11 addresses many of those limitations. With the characteristics listed above in mind, the following table shows the similarities and differences between zones in Solaris 10 and in Solaris 11. (Cells in a row that are similar have the same background color.)

Characteristic Solaris 10
Whole-Root
Solaris 10
Sparse-Root
Solaris 11 Solaris 11
Immutable Zones
Each zone has a copy of most Solaris packagesYesNo YesYes
Disk space used by a zone (typical)3.5 GB100 MB 500MB500MB
A privileged zone user can add software to /usrYesNo YesNo
A zone can modify its Solaris programsTrueFalse TrueFalse
Each zone can modify its configuration filesYesYes YesNo
Delegated administrationNoNo YesYes
A zone can be configured to manage its own networkingYesYes YesYes
A zone can be configured so that it cannot manage its own networkingYesYes YesYes
A zone can be configured with resource controlsYesYes YesYes
Integrated tool to measure a zone's resource consumption (zonestat)NoNo YesYes
Network processing automatically happens on that zone's CPUsNoNo YesYes
Zones can be NFS serversNoNoYesYes
Per-zone fsstat dataNoNoYesYes

As you can see, the statement "Solaris 11 Zones are whole-root zones" is only true using the narrowest definition of whole-root zones: those zones which have their own copy of Solaris packaging content. But there are other valuable characteristics of sparse-root zones that are still available in Solaris 11 Zones. Also, some Solaris 11 Zones do not have some characteristics of whole-root zones.

For example, the table above shows that you can configure a Solaris 11 zone that has read-only Solaris content. And Solaris 11 takes that concept further, offering the ability to tailor that immutability. It also shows that Solaris 10 sparse-root and whole-root zones are more similar to each other than to Solaris 11 Zones.

Conclusion

Solaris 11 Zones are slightly different from Solaris 10 Zones. The former can achieve the goals of the latter, and they also offer features not found in Solaris 10 Zones. Solaris 11 Zones offer the best of Solaris 10 whole-root zones and sparse-root zones, and offer an array of new features that make Zones even more flexible and powerful.

Tuesday Apr 23, 2013

More SPARC T5 Performance Results

Performance results for the new SPARC T5 systems keep coming in...

Last week, SPEC published the most recent result for the SPECjbb013-MultiJVM benchmark. This benchmark "is relevant to all audiences who are interested in Java server performance, including JVM vendors, hardware developers, Java application developers, researchers and members of the academic community" according to SPEC.

All of the published results are at: http://www.spec.org/jbb2013/results/jbb2013multijvm.html.

For the first table below, I selected all of the max-JOPS results greater than 50,000 JOPS using the most recent Java version, for the SPARC T5-2 and for competing systems. From the SPECjbb2013 data, I derived two new values, max-JOPS/chip and max-JOPS/core. The latter value compensates for the different quantity of cores used in one of the tests. Finally, the "Advantage of T5" column shows the portion by which the T5-2 cores perform better than the other systems' cores. For example, on this benchmark a 32-core T5-2 computer demonstrated 15% better per-core performance than an HP DL560p with the same number of cores.

As you can see, a SPARC T5 core is faster than an Intel Xeon core, compared against competing systems with 32 or more cores.

Model CPU Chips Cores OS max-JOPS Date Published max-JOPS per chip max-JOPS per core Advantage of T5
SPARC T5-2 SPARC T5 2 32 Solaris 11.1 75658 April 2013 37829 2364
HP ProLiant DL560p Gen8 Intel E5-4650 4 32 Windows Server 2008 R2 Enterprise 67850 April 2013 16963 2120 12%
HP ProLiant DL560p Gen8 Intel E5-4650 4 32 RHEL 6.3 66007 April 2013 16502 2063 15%
HP ProLiant DL980 G7 Intel E7-4870 8 80 RHEL 6.3 106141 April 2013 13268 1327 78%

The SPECjbb2013 benchmark also includes a performance measure called "critical-JOPS." This measurement represents the ability of a system to achieve high levels of throughput while still maintaining a short response time. The performance advantage of the T5 cores is even more pronounced.

Model CPU Chips Cores OS critical- JOPS Date Published critical- JOPS per chip critical- JOPS per core Advantage of T5
SPARC T5-2 SPARC T5 2 32 Solaris 11.1 23334 April 2013 11667 729
HP ProLiant DL560p Gen8 Intel E5-4650 4 32 Windows Server 2008 R2 Enterprise 16199 April 2013 4050 506 44%
HP ProLiant DL560p Gen8 Intel E5-4650 4 32 RHEL 6.3 18049 April 2013 4512 564 29%
HP ProLiant DL980 G7 Intel E7-4870 8 80 RHEL 6.3 23268 April 2013 2909 291 151%

As always, care should be taken in choosing a benchmark that is similar to the workload that you will run on a computer. For example, if you plan to implement a database server, using the SPECint benchmark will not help you, because that benchmark merely measures the performance of the CPU cores and speed and size of memory caches (and perhaps the memory system). It does not measure performance of network or disk I/O, and both of those are important factors in database performance - especially storage I/O.

According to the SPECjbb2013 design document, this benchmark "exercises the CPU, memory and network I/O, but not disk I/O." Because of this, it can be used as a simple method to estimate relative Java processing performance. From the data shown in the tables above, it is clear that the newest SPARC cores deliver Java performance that is competitive with the most recent Intel Xeon CPU cores.

Edit [2013.04.23]: Jim Laurent uses the same benchmark results in a quick look at the smooth scalability of Solaris 11, compared to RHEL6.

For more information on recent SPARC T5 world records, see https://blogs.oracle.com/BestPerf/.

SPEC and the benchmark name SPECjbb are registered trademarks of Standard Performance Evaluation Corporation (SPEC). Results as of 4/21/2013, see http://www.spec.org for more information.

Wednesday Apr 17, 2013

IDC Reviews New SPARC T5 and M5 Servers

On April 5, IDC published an article that provides their view of the recent announcement of Oracle T5 and M5 servers.

IDC's conclusion: "Oracle has invested deeply in improving`the performance of the T-series processors it developed following its acquisition of Sun Microsystems in 2010. It has pushed its engineering efforts to release new SPARC processor technology — providing a much more competitive general-purpose server platform. This will provide an immediate improvement for its large installed base, even as it lends momentum to a new round of competition in the Unix server marketplace."

IDC also noted the "dramatic performance gains for SPARC, with 16-core microprocessor technology based on three years of IP (intellectual property) development at Oracle, following Oracle's acquisition of Sun Microsystems Inc. in January, 2010."

The new T5 servers use SPARC T5 processor chips that offer more than double the performance of SPARC T4 chips, which were released just over a year ago. And the T4 chips, in turn, were a significant departure from all previous SPARC CMT CPUs, in that the T4 chips offered excellent performance for single-threaded workloads.

The new M5 servers use up to 32 SPARC M5 processors, each using the same "S3" SPARC cores as the T5 chips.

Wednesday Mar 27, 2013

New SPARC Servers Outrun the Competition - by Leaps and Bounds!

In case you missed yesterday's launch of Oracle's new SPARC server line, based on the SPARC T5 and M5 processors, here is a brief summary - and the obligatory links to details...

The new SPARC T5 chip uses the "S3" core which has been in the SPARC T4 generation for over a year. That core offers, among other things, 8 hardware threads, two simultaneous integer pipelines, and some other compute units as well (FP, crypto, etc.). The S3 core also includes the instructions necessary to work with the SPARC hypervisor that implements SPARC virtual machines (Oracle VM Server for SPARC, previously called Logical Domains or LDoms) including Live Migration of VMs.

However, four significant improvements have been made in the new systems:

  1. 16 cores on each T5 chip, instead of T4's 8 cores per chip, was made possible because of a die shrink (to 28 nm).
  2. An increase in clock rate to 3.6 GHz yields an immediate 20% improvement in processing over T4 systems.
  3. Increased chip scalability allows up to 8 CPU chips per system, doubling the maximum number of cores in the mid-range systems.
  4. In addition to the mid-range servers, now the high-end M5-32 also supports OVM Server for SPARC (LDoms), while maintaining the ability to use hard partitions (Dynamic Domains) in that system. (The T5-based servers (PDF) also have LDoms, just like the T4-based systems.)

The result of those is the "world's fastest microprocessor." Between the four T5 (mid-range) systems and the M5-32, this new generation of systems has already achieved 17 performance world records.

Some of the simpler comparisons that were made yesterday include (see the press release for substantiation):

  1. An Oracle T5-8 (8 CPU sockets) running Solaris has a higher SAP performanace rating than an IBM Power 780 (8 sockets) running AIX.
  2. A single, 2-socket T5-2 has three times the performance, at 13% of the cost, of two Power 770's - on a JD Edwards performance test.
  3. Two T5-2 servers have almost double the Siebel performance of two Power 750 servers - at one-fourth the price.
  4. One 8-processor T5-8 outperforms an 8-processor Power 780 - at one-seventh the cost - on the common SPECint_rate 2006 benchmark.

The new high-end SPARC system - the M5-32 - sports 192 cores (1,536 hardware threads) of compute power. It can also be packed with 32 TB (yes, terabytes!) of RAM. Put your largest DB entirely in RAM, and see how fast it goes!

Oracle has refreshed its entire SPARC server line all at once, greatly improving performance - not only compared to the previous SPARC generation, but also compared to the current generation of servers from other manufacturers.

Monday Mar 25, 2013

New SPARC Chips, New Servers

On Tuesday, Oracle will announce new SPARC servers with the world's fastest microprocessor. Considering that the current SPARC processors already have performance comparable with the newest from competing architectures, the performance of these new processors should give you the best real-world performance for your enterprise workloads.

You can register to watch the event live at 4:00 PM EDT (New York).

About

Jeff Victor writes this blog to help you understand Oracle's Solaris and virtualization technologies.

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Search

Archives
« August 2015
SunMonTueWedThuFriSat
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
     
Today