Wednesday Aug 28, 2013

Threads, as fas the eye can see...

Recently, I contributed to a new white paper that addresses the question "ok... now I have a computer with over 1,000 hardware threads... what do I do with all of those threads?" The topics include details of the newest SPARC S3 core, workload consolidation and server virtualization, multi-threaded programming, and more.

Oracle published the paper, which you can find here: http://www.oracle.com/technetwork/server-storage/sun-sparc-enterprise/documentation/index.html. My personal thanks go to Dr. Foxwell, who filled the role of engineer-herder for this project, and also to the other co-authors: Ruud, JeffS, and Darryl.

Tuesday Jul 16, 2013

AIX2Solaris

Maximize the Value, Minimize the Effort

Are you among the people migrating workloads from IBM AIX to Oracle Solaris 11? Even if you have not yet begun this migration, you will benefit from the knowledge contained in these resources:
  1. an online comparison of features
  2. the IBM AIX to Oracle Solaris Technology Mapping Guide.

Thursday Jul 11, 2013

Hands-On Training

Curious about Oracle Solaris, Oracle Linux or Oracle VM?
Or are you beyond curious, and in need of hands-on experience?

Oracle is proud to host "Virtual Sysadmin Day!" During this event, you will learn how to build a secure, multi-level application deployed using virtualization capabilities of Oracle Solaris 11, and/or many other activities.

You must register for Virtual Sysadmin Day to attend. At that site you can also view the agenda and pre-event instructions to prepare your laptop or desktop.

Thursday Jun 27, 2013

Improving Manageability of Virtual Environments

Boot Environments for Solaris 10 Branded Zones

Until recently, Solaris 10 Branded Zones on Solaris 11 suffered one notable regression: Live Upgrade did not work. The individual packaging and patching tools work correctly, but the ability to upgrade Solaris while the production workload continued running did not exist. A recent Solaris 11 SRU (Solaris 11.1 SRU 6.4) restored most of that functionality, although with a slightly different concept, different commands, and without all of the feature details. This new method gives you the ability to create and manage multiple boot environments (BEs) for a Solaris 10 Branded Zone, and modify the active or any inactive BE, and to do so while the production workload continues to run.

Background

In case you are new to Solaris: Solaris includes a set of features that enables you to create a bootable Solaris image, called a Boot Environment (BE). This newly created image can be modified while the original BE is still running your workload(s). There are many benefits, including improved uptime and the ability to reboot into (or downgrade to) an older BE if a newer one has a problem.

In Solaris 10 this set of features was named Live Upgrade. Solaris 11 applies the same basic concepts to the new packaging system (IPS) but there isn't a specific name for the feature set. The features are simply part of IPS. Solaris 11 Boot Environments are not discussed in this blog entry.

Although a Solaris 10 system can have multiple BEs, until recently a Solaris 10 Branded Zone (BZ) in a Solaris 11 system did not have this ability. This limitation was addressed recently, and that enhancement is the subject of this blog entry.

This new implementation uses two concepts. The first is the use of a ZFS clone for each BE. This makes it very easy to create a BE, or many BEs. This is a distinct advantage over the Live Upgrade feature set in Solaris 10, which had a practical limitation of two BEs on a system, when using UFS. The second new concept is a very simple mechanism to indicate the BE that should be booted: a ZFS property. The new ZFS property is named com.oracle.zones.solaris10:activebe (isn't that creative? ;-) ).

It's important to note that the property is inherited from the original BE's file system to any BEs you create. In other words, all BEs in one zone have the same value for that property. When the (Solaris 11) global zone boots the Solaris 10 BZ, it boots the BE that has the name that is stored in the activebe property.

Here is a quick summary of the actions you can use to manage these BEs:

To create a BE:

  • Create a ZFS clone of the zone's root dataset

To activate a BE:

  • Set the ZFS property of the root dataset to indicate the BE

To add a package or patch to an inactive BE:

  • Mount the inactive BE
  • Add packages or patches to it
  • Unmount the inactive BE

To list the available BEs:

  • Use the "zfs list" command.

To destroy a BE:

  • Use the "zfs destroy" command.

Preparation

Before you can use the new features, you will need a Solaris 10 BZ on a Solaris 11 system. You can use these three steps - on a real Solaris 11.1 server or in a VirtualBox guest running Solaris 11.1 - to create a Solaris 10 BZ. The Solaris 11.1 environment must be at SRU 6.4 or newer.

  1. Create a flash archive on the Solaris 10 system
    s10# flarcreate -n s10-system /net/zones/archives/s10-system.flar
  2. Configure the Solaris 10 BZ on the Solaris 11 system
    s11# zonecfg -z s10z
    Use 'create' to begin configuring a new zone.
    zonecfg:s10z> create -t SYSsolaris10
    zonecfg:s10z> set zonepath=/zones/s10z
    zonecfg:s10z> exit
    s11# zoneadm list -cv
      ID NAME             STATUS     PATH                           BRAND     IP    
       0 global           running    /                              solaris   shared
       - s10z             configured /zones/s10z                    solaris10 excl  
    
  3. Install the zone from the flash archive
    s11# zoneadm -z s10z install -a /net/zones/archives/s10-system.flar -p
    

You can find more information about the migration of Solaris 10 environments to Solaris 10 Branded Zones in the documentation.

The rest of this blog entry demonstrates the commands you can use to accomplish the aforementioned actions related to BEs.

New features in action

Note that the demonstration of the commands occurs in the Solaris 10 BZ, as indicated by the shell prompt "s10z# ". Many of these commands can be performed in the global zone instead, if you prefer. If you perform them in the global zone, you must change the ZFS file system names.

Create

The only complicated action is the creation of a BE. In the Solaris 10 BZ, create a new "boot environment" - a ZFS clone. You can assign any name to the final portion of the clone's name, as long as it meets the requirements for a ZFS file system name.

s10z# zfs snapshot rpool/ROOT/zbe-0@snap
s10z# zfs clone -o mountpoint=/ -o canmount=noauto rpool/ROOT/zbe-0@snap rpool/ROOT/newBE
cannot mount 'rpool/ROOT/newBE' on '/': directory is not empty
filesystem successfully created, but not mounted
You can safely ignore that message: we already know that / is not empty! We have merely told ZFS that the default mountpoint for the clone is the root directory.

(Note that a Solaris 10 BZ that has a separate /var file system requires additional steps. See the MOS document mentioned at the bottom of this blog entry.)

List the available BEs and active BE

Because each BE is represented by a clone of the rpool/ROOT dataset, listing the BEs is as simple as listing the clones.

s10z# zfs list -r rpool/ROOT
NAME               USED  AVAIL  REFER  MOUNTPOINT
rpool/ROOT        3.55G  42.9G    31K  legacy
rpool/ROOT/zbe-0     1K  42.9G  3.55G  /
rpool/ROOT/newBE  3.55G  42.9G  3.55G  /
The output shows that two BEs exist. Their names are "zbe-0" and "newBE".

You can tell Solaris that one particular BE should be used when the zone next boots by using a ZFS property. Its name is com.oracle.zones.solaris10:activebe. The value of that property is the name of the clone that contains the BE that should be booted.

s10z# zfs get com.oracle.zones.solaris10:activebe rpool/ROOT
NAME        PROPERTY                             VALUE  SOURCE
rpool/ROOT  com.oracle.zones.solaris10:activebe  zbe-0  local

Change the active BE

When you want to change the BE that will be booted next time, you can just change the activebe property on the rpool/ROOT dataset.

s10z# zfs get com.oracle.zones.solaris10:activebe rpool/ROOT
NAME        PROPERTY                             VALUE  SOURCE
rpool/ROOT  com.oracle.zones.solaris10:activebe  zbe-0  local
s10z# zfs set com.oracle.zones.solaris10:activebe=newBE rpool/ROOT
s10z# zfs get com.oracle.zones.solaris10:activebe rpool/ROOT
NAME        PROPERTY                             VALUE  SOURCE
rpool/ROOT  com.oracle.zones.solaris10:activebe  newBE  local
s10z# shutdown -y -g0 -i6
After the zone has rebooted:
s10z# zfs get com.oracle.zones.solaris10:activebe rpool/ROOT
rpool/ROOT  com.oracle.zones.solaris10:activebe  newBE  local
s10z# zfs mount
rpool/ROOT/newBE                /
rpool/export                    /export
rpool/export/home               /export/home
rpool                           /rpool
Mount the original BE to see that it's still there.
s10z# zfs mount -o mountpoint=/mnt rpool/ROOT/zbe-0
s10z# ls /mnt
Desktop                         export                          platform
Documents                       export.backup.20130607T214951Z  proc
S10Flar                         home                            rpool
TT_DB                           kernel                          sbin
bin                             lib                             system
boot                            lost+found                      tmp
cdrom                           mnt                             usr
dev                             net                             var
etc                             opt

Patch an inactive BE

At this point, you can modify the original BE. If you would prefer to modify the new BE, you can restore the original value to the activebe property and reboot, and then mount the new BE to /mnt (or another empty directory) and modify it.

Let's mount the original BE so we can modify it. (The first command is only needed if you haven't already mounted that BE.)

s10z# zfs mount -o mountpoint=/mnt rpool/ROOT/zbe-0
s10z# patchadd -R /mnt -M /var/sadm/spool 104945-02
Note that the typical usage will be:
  1. Create a BE
  2. Mount the new (inactive) BE
  3. Use the package and patch tools to update the new BE
  4. Unmount the new BE
  5. Reboot

Delete an inactive BE

ZFS clones are children of their parent file systems. In order to destroy the parent, you must first "promote" the child. This reverses the parent-child relationship. (For more information on this, see the documentation.)

The original rpool/ROOT file system is the parent of the clones that you create as BEs. In order to destroy an earlier BE that is that parent of other BEs, you must first promote one of the child BEs to be the ZFS parent. Only then can you destroy the original BE.

Fortunately, this is easier to do than to explain:

s10z# zfs promote rpool/ROOT/newBE 
s10z# zfs destroy rpool/ROOT/zbe-0
s10z# zfs list -r rpool/ROOT
NAME               USED  AVAIL  REFER  MOUNTPOINT
rpool/ROOT        3.56G   269G    31K  legacy
rpool/ROOT/newBE  3.56G   269G  3.55G  /

Documentation

This feature is so new, it is not yet described in the Solaris 11 documentation. However, MOS note 1558773.1 offers some details.

Conclusion

With this new feature, you can add and patch packages to boot environments of a Solaris 10 Branded Zone. This ability improves the manageability of these zones, and makes their use more practical. It also means that you can use the existing P2V tools with earlier Solaris 10 updates, and modify the environments after they become Solaris 10 Branded Zones.

Fastest Engineered System

Quick! It's not too late - you can still register for today's webcast announcement of the new Oracle SuperCluster T5-8.

Wednesday Jun 12, 2013

Comparing Solaris 11 Zones to Solaris 10 Zones

Many people have asked whether Oracle Solaris 11 uses sparse-root zones or whole-root zones. I think the best answer is "both and neither, and more" - but that's a wee bit confusing. :-) This blog entry attempts to explain that answer.

First a recap: Solaris 10 introduced the Solaris Zones feature set, way back in 2005. Zones are a form of server virtualization called "OS (Operating System) Virtualization." They improve consolidation ratios by isolating processes from each other so that they cannot interact. Each zone has its own set of users, naming services, and other software components. One of the many advantages is that there is no need for a hypervisor, so there is no performance overhead. Many data centers run tens to hundreds of zones per server!

In Solaris 10, there are two models of package deployment for Solaris Zones. One model is called "sparse-root" and the other "whole-root." Each form has specific characteristics, abilities, and limitations.

A whole-root zone has its own copy of the Solaris packages. This allows the inclusion of other software in system directories - even though that practice has been discouraged for many years. Although it is also possible to modify the Solaris content in such a zone, e.g. patching a zone separately from the rest, this was highly frowned on. :-( (More importantly, modifying the Solaris content in a whole-root zone may lead to an unsupported configuration.)

The other model is called "sparse-root." In that form, instead of copying all of the Solaris packages into the zone, the directories containing Solaris binaries are re-mounted into the zone. This allows the zone's users to access them at their normal places in the directory tree. Those are read-only mounts, so a zone's root user cannot modify them. This improves security, and also reduces the amount of disk space used by the zone - 200MB instead of the usual 3-5 GB per zone. These loopback mounts also reduce the amount of RAM used by zones because Solaris only stores in RAM one copy of a program that is in use by several zones. This model also has disadvantages. One disadvantage is the inability to add software into system directories such as /usr. Also, although a sparse-root can be migrated to another Solaris 10 system, it cannot be moved to a Solaris 11 system as a "Solaris 10 Zone."

In addition to those contrasting characteristics, here are some characteristics of zones in Solaris 10 that are shared by both packaging models:

  • A zone can modify its own configuration files in /etc.
  • A zone can be configured so that it manages its own networking, or so that it cannot modify its network configuration.
  • It is difficult to give a non-root user in the global zone the ability to boot and stop a zone, without giving that user other abilities.
  • In a zone that can manage its own networking, the root user can do harmful things like spoof other IP addresses and MAC addresses.
  • It is difficult to assign network patcket processing to the same CPUs that a zone used. This could lead to unpredictable performance and performance troubleshooting challenges.
  • You cannot run a large number of zones in one system (e.g. 50) that each managed its own networking, because that would require assignment of more physical NICs than available (e.g. 50).
  • Except when managed by Ops Center, zones could not be safely stored on NAS.
  • Solaris 10 Zones cannot be NFS servers.
  • The fsstat command does not report statistics per zone.

Solaris 11 Zones use the new packaging system of Solaris 11. Their configuration does not offer a choice of packaging models, as Solaris 10 does. Instead, two (well, four) different models of "immutability" (changeability) are offered. The default model allows a privileged zone user to modify the zone's content. The other (three) limit the content which can be changed: none, or two overlapping sets of configuration files. (See "Configuring and Administering Immutable Zones".)

Solaris 11 addresses many of those limitations. With the characteristics listed above in mind, the following table shows the similarities and differences between zones in Solaris 10 and in Solaris 11. (Cells in a row that are similar have the same background color.)

Characteristic Solaris 10
Whole-Root
Solaris 10
Sparse-Root
Solaris 11 Solaris 11
Immutable Zones
Each zone has a copy of most Solaris packagesYesNo YesYes
Disk space used by a zone (typical)3.5 GB100 MB 500MB500MB
A privileged zone user can add software to /usrYesNo YesNo
A zone can modify its Solaris programsTrueFalse TrueFalse
Each zone can modify its configuration filesYesYes YesNo
Delegated administrationNoNo YesYes
A zone can be configured to manage its own networkingYesYes YesYes
A zone can be configured so that it cannot manage its own networkingYesYes YesYes
A zone can be configured with resource controlsYesYes YesYes
Integrated tool to measure a zone's resource consumption (zonestat)NoNo YesYes
Network processing automatically happens on that zone's CPUsNoNo YesYes
Zones can be NFS serversNoNoYesYes
Per-zone fsstat dataNoNoYesYes

As you can see, the statement "Solaris 11 Zones are whole-root zones" is only true using the narrowest definition of whole-root zones: those zones which have their own copy of Solaris packaging content. But there are other valuable characteristics of sparse-root zones that are still available in Solaris 11 Zones. Also, some Solaris 11 Zones do not have some characteristics of whole-root zones.

For example, the table above shows that you can configure a Solaris 11 zone that has read-only Solaris content. And Solaris 11 takes that concept further, offering the ability to tailor that immutability. It also shows that Solaris 10 sparse-root and whole-root zones are more similar to each other than to Solaris 11 Zones.

Conclusion

Solaris 11 Zones are slightly different from Solaris 10 Zones. The former can achieve the goals of the latter, and they also offer features not found in Solaris 10 Zones. Solaris 11 Zones offer the best of Solaris 10 whole-root zones and sparse-root zones, and offer an array of new features that make Zones even more flexible and powerful.

Tuesday Apr 23, 2013

More SPARC T5 Performance Results

Performance results for the new SPARC T5 systems keep coming in...

Last week, SPEC published the most recent result for the SPECjbb013-MultiJVM benchmark. This benchmark "is relevant to all audiences who are interested in Java server performance, including JVM vendors, hardware developers, Java application developers, researchers and members of the academic community" according to SPEC.

All of the published results are at: http://www.spec.org/jbb2013/results/jbb2013multijvm.html.

For the first table below, I selected all of the max-JOPS results greater than 50,000 JOPS using the most recent Java version, for the SPARC T5-2 and for competing systems. From the SPECjbb2013 data, I derived two new values, max-JOPS/chip and max-JOPS/core. The latter value compensates for the different quantity of cores used in one of the tests. Finally, the "Advantage of T5" column shows the portion by which the T5-2 cores perform better than the other systems' cores. For example, on this benchmark a 32-core T5-2 computer demonstrated 15% better per-core performance than an HP DL560p with the same number of cores.

As you can see, a SPARC T5 core is faster than an Intel Xeon core, compared against competing systems with 32 or more cores.

Model CPU Chips Cores OS max-JOPS Date Published max-JOPS per chip max-JOPS per core Advantage of T5
SPARC T5-2 SPARC T5 2 32 Solaris 11.1 75658 April 2013 37829 2364
HP ProLiant DL560p Gen8 Intel E5-4650 4 32 Windows Server 2008 R2 Enterprise 67850 April 2013 16963 2120 12%
HP ProLiant DL560p Gen8 Intel E5-4650 4 32 RHEL 6.3 66007 April 2013 16502 2063 15%
HP ProLiant DL980 G7 Intel E7-4870 8 80 RHEL 6.3 106141 April 2013 13268 1327 78%

The SPECjbb2013 benchmark also includes a performance measure called "critical-JOPS." This measurement represents the ability of a system to achieve high levels of throughput while still maintaining a short response time. The performance advantage of the T5 cores is even more pronounced.

Model CPU Chips Cores OS critical- JOPS Date Published critical- JOPS per chip critical- JOPS per core Advantage of T5
SPARC T5-2 SPARC T5 2 32 Solaris 11.1 23334 April 2013 11667 729
HP ProLiant DL560p Gen8 Intel E5-4650 4 32 Windows Server 2008 R2 Enterprise 16199 April 2013 4050 506 44%
HP ProLiant DL560p Gen8 Intel E5-4650 4 32 RHEL 6.3 18049 April 2013 4512 564 29%
HP ProLiant DL980 G7 Intel E7-4870 8 80 RHEL 6.3 23268 April 2013 2909 291 151%

As always, care should be taken in choosing a benchmark that is similar to the workload that you will run on a computer. For example, if you plan to implement a database server, using the SPECint benchmark will not help you, because that benchmark merely measures the performance of the CPU cores and speed and size of memory caches (and perhaps the memory system). It does not measure performance of network or disk I/O, and both of those are important factors in database performance - especially storage I/O.

According to the SPECjbb2013 design document, this benchmark "exercises the CPU, memory and network I/O, but not disk I/O." Because of this, it can be used as a simple method to estimate relative Java processing performance. From the data shown in the tables above, it is clear that the newest SPARC cores deliver Java performance that is competitive with the most recent Intel Xeon CPU cores.

Edit [2013.04.23]: Jim Laurent uses the same benchmark results in a quick look at the smooth scalability of Solaris 11, compared to RHEL6.

For more information on recent SPARC T5 world records, see https://blogs.oracle.com/BestPerf/.

SPEC and the benchmark name SPECjbb are registered trademarks of Standard Performance Evaluation Corporation (SPEC). Results as of 4/21/2013, see http://www.spec.org for more information.

Wednesday Apr 17, 2013

IDC Reviews New SPARC T5 and M5 Servers

On April 5, IDC published an article that provides their view of the recent announcement of Oracle T5 and M5 servers.

IDC's conclusion: "Oracle has invested deeply in improving`the performance of the T-series processors it developed following its acquisition of Sun Microsystems in 2010. It has pushed its engineering efforts to release new SPARC processor technology — providing a much more competitive general-purpose server platform. This will provide an immediate improvement for its large installed base, even as it lends momentum to a new round of competition in the Unix server marketplace."

IDC also noted the "dramatic performance gains for SPARC, with 16-core microprocessor technology based on three years of IP (intellectual property) development at Oracle, following Oracle's acquisition of Sun Microsystems Inc. in January, 2010."

The new T5 servers use SPARC T5 processor chips that offer more than double the performance of SPARC T4 chips, which were released just over a year ago. And the T4 chips, in turn, were a significant departure from all previous SPARC CMT CPUs, in that the T4 chips offered excellent performance for single-threaded workloads.

The new M5 servers use up to 32 SPARC M5 processors, each using the same "S3" SPARC cores as the T5 chips.

Wednesday Mar 27, 2013

New SPARC Servers Outrun the Competition - by Leaps and Bounds!

In case you missed yesterday's launch of Oracle's new SPARC server line, based on the SPARC T5 and M5 processors, here is a brief summary - and the obligatory links to details...

The new SPARC T5 chip uses the "S3" core which has been in the SPARC T4 generation for over a year. That core offers, among other things, 8 hardware threads, two simultaneous integer pipelines, and some other compute units as well (FP, crypto, etc.). The S3 core also includes the instructions necessary to work with the SPARC hypervisor that implements SPARC virtual machines (Oracle VM Server for SPARC, previously called Logical Domains or LDoms) including Live Migration of VMs.

However, four significant improvements have been made in the new systems:

  1. 16 cores on each T5 chip, instead of T4's 8 cores per chip, was made possible because of a die shrink (to 28 nm).
  2. An increase in clock rate to 3.6 GHz yields an immediate 20% improvement in processing over T4 systems.
  3. Increased chip scalability allows up to 8 CPU chips per system, doubling the maximum number of cores in the mid-range systems.
  4. In addition to the mid-range servers, now the high-end M5-32 also supports OVM Server for SPARC (LDoms), while maintaining the ability to use hard partitions (Dynamic Domains) in that system. (The T5-based servers (PDF) also have LDoms, just like the T4-based systems.)

The result of those is the "world's fastest microprocessor." Between the four T5 (mid-range) systems and the M5-32, this new generation of systems has already achieved 17 performance world records.

Some of the simpler comparisons that were made yesterday include (see the press release for substantiation):

  1. An Oracle T5-8 (8 CPU sockets) running Solaris has a higher SAP performanace rating than an IBM Power 780 (8 sockets) running AIX.
  2. A single, 2-socket T5-2 has three times the performance, at 13% of the cost, of two Power 770's - on a JD Edwards performance test.
  3. Two T5-2 servers have almost double the Siebel performance of two Power 750 servers - at one-fourth the price.
  4. One 8-processor T5-8 outperforms an 8-processor Power 780 - at one-seventh the cost - on the common SPECint_rate 2006 benchmark.

The new high-end SPARC system - the M5-32 - sports 192 cores (1,536 hardware threads) of compute power. It can also be packed with 32 TB (yes, terabytes!) of RAM. Put your largest DB entirely in RAM, and see how fast it goes!

Oracle has refreshed its entire SPARC server line all at once, greatly improving performance - not only compared to the previous SPARC generation, but also compared to the current generation of servers from other manufacturers.

Monday Mar 25, 2013

New SPARC Chips, New Servers

On Tuesday, Oracle will announce new SPARC servers with the world's fastest microprocessor. Considering that the current SPARC processors already have performance comparable with the newest from competing architectures, the performance of these new processors should give you the best real-world performance for your enterprise workloads.

You can register to watch the event live at 4:00 PM EDT (New York).

Wednesday Feb 13, 2013

Solaris Stories

Demand for Solaris continues to increase, as shown by these recent customer references:

Tuesday Feb 12, 2013

Solaris 10 1/13 (aka "Update 11") Released

Larry Wake lets us know that Solaris 10 1/13 has been released and is available for download.

Tuesday Jan 22, 2013

Analyst commentary on Solaris and SPARC

Forrester's Richard Fichera updates and confirms his earlier views on the present and future of Solaris and SPARC.

Wednesday Nov 14, 2012

Webcast: New Features of Solaris 11.1 and Solaris Cluster 4.1

If you missed last week's webcast of the new features in Oracle Solaris 11.1 you can view the recording. The speakers discuss changes that improve performance and scalability, particularly for Oracle DB, and many other enhancements.

New features include Optimized Shared Memory (improves DB startup time), accelerated kernel locks (improves Oracle RAC performance and scalability), virtual memory improvements, a DTrace data collecter in the DB, Zones installed on Shared Storage (simplifies migration), Data Center Bridging, and Edge Virtual Bridging.

To view the archived webcast, you must register and use the URL that you receive in e-mail.

Tuesday Nov 13, 2012

Oracle Solaris: Zones on Shared Storage

Oracle Solaris 11.1 has several new features. At oracle.com you can find a detailed list.

One of the significant new features, and the most significant new feature releated to Oracle Solaris Zones, is casually called "Zones on Shared Storage" or simply ZOSS (rhymes with "moss"). ZOSS offers much more flexibility because you can store Solaris Zones on shared storage (surprise!) so that you can perform quick and easy migration of a zone from one system to another. This blog entry describes and demonstrates the use of ZOSS.

ZOSS provides complete support for a Solaris Zone that is stored on "shared storage." In this case, "shared storage" refers to fiber channel (FC) or iSCSI devices, although there is one lone exception that I will demonstrate soon. The primary intent is to enable you to store a zone on FC or iSCSI storage so that it can be migrated from one host computer to another much more easily and safely than in the past.

With this blog entry, I wanted to make it easy for you to try this yourself. I couldn't assume that you have a SAN available - which is a good thing, because neither do I! :-) What could I use, instead? [There he goes, foreshadowing again... -Ed.]

Developing this entry reinforced the lesson that the solution to every lab problem is VirtualBox. ;-) Oracle VM VirtualBox (its formal name) helps here in a couple of important ways. It offers the ability to easily install multiple copies of Solaris as guests on top of any popular system (Microsoft Windows, MacOS, Solaris, Oracle Linux (and other Linuxes) etc.). It also offers the ability to create a separate virtual disk drive (VDI) that appears as a local hard disk to a guest. This virtual disk can be moved very easily from one guest to another. In other words, you can follow the steps below on a laptop or larger x86 system.

Please note that the ability to use ZOSS to store a zone on a local disk is very useful for a lab environment, but not so useful for production. I do not suggest regularly moving disk drives among computers. [Update, 2013.01.28: Apparently the previous sentence caused some confusion. I do recommend the use of Zones on Shared Storage in production environments, when appropriate storage is used. "Appropriate storage" would include SAN or iSCSI at this point. I do not recommend using ZOSS with local disks in production because doing so would require moving the disks between computers.]

In the method I describe below, that virtual hard disk will contain the zone that will be migrated among the (virtual) hosts. In production, you would use FC or iSCSI LUNs instead. The zonecfg(1M) man page details the syntax for each of the three types of devices.

Why Migrate?

Why is the migration of virtual servers important? Some of the most common reasons are:
  • Moving a workload to a different computer so that the original computer can be turned off for extensive maintenance.
  • Moving a workload to a larger system because the workload has outgrown its original system.
  • If the workload runs in an environment (such as a Solaris Zone) that is stored on shared storage, you can restore the service of the workload on an alternate computer if the original computer has failed and will not reboot.
  • You can simplify lifecycle management of a workload by developing it on a laptop, migrating it to a test platform when it's ready, and finally moving it to a production system.

Concepts

For ZOSS, the important new concept is named "rootzpool". You can read about it in the zonecfg(1M) man page, but here's the short version: it's the backing store (hard disk(s), or LUN(s)) that will be used to make a ZFS zpool - the zpool that will hold the zone. This zpool:

  • contains the zone's Solaris content, i.e. the root file system
  • does not contain any content not related to the zone
  • can only be mounted by one Solaris instance at a time

Method Overview

Here is a brief list of the steps to create a zone on shared storage and migrate it. The next section shows the commands and output.
  1. You will need a host system with an x86 CPU (hopefully at least a couple of CPU cores), at least 2GB of RAM, and at least 25GB of free disk space. (The steps below will not actually use 25GB of disk space, but I don't want to lead you down a path that ends in a big sign that says "Your HDD is full. Good luck!")
  2. Configure the zone on both systems, specifying the rootzpool that both will use. The best way is to configure it on one system and then copy the output of "zonecfg export" to the other system to be used as input to zonecfg. This method reduces the chances of pilot error. (It is not necessary to configure the zone on both systems before creating it. You can configure this zone in multiple places, whenever you want, and migrate it to one of those places at any time - as long as those systems all have access to the shared storage.)
  3. Install the zone on one system, onto shared storage.
  4. Boot the zone.
  5. Provide system configuration information to the zone. (In the Real World(tm) you will usually automate this step.)
  6. Shutdown the zone.
  7. Detach the zone from the original system.
  8. Attach the zone to its new "home" system.
  9. Boot the zone.
The zone can be used normally, and even migrated back, or to a different system.

Details

The rest of this shows the commands and output. The two hostnames are "sysA" and "sysB".

Note that each Solaris guest might use a different device name for the VDI that they share. I used the device names shown below, but you must discover the device name(s) after booting each guest. In a production environment you would also discover the device name first and then configure the zone with that name. Fortunately, you can use the command "zpool import" or "format" to discover the device on the "new" host for the zone.

The first steps create the VirtualBox guests and the shared disk drive. I describe the steps here without demonstrating them.

  1. Download VirtualBox and install it using a method normal for your host OS. You can read the complete instructions.
  2. Create two VirtualBox guests, each to run Solaris 11.1. Each will use its own VDI as its root disk.
  3. Install Solaris 11.1 in each guest.Install Solaris 11.1 in each guest. To install a Solaris 11.1 guest, you can either download a pre-built VirtualBox guest, and import it, or install Solaris 11.1 from the "text install" media. If you use the latter method, after booting you will not see a windowing system. To install the GUI and other important things, login and run "pkg install solaris-desktop" and take a break while it installs those important things.
  4. Life is usually easier if you install the VirtualBox Guest Additions because then you can copy and paste between the host and guests, etc. You can find the guest additions in the folder matching the version of VirtualBox you are using. You can also read the instructions for installing the guest additions.
  5. To create the zone's shared VDI in VirtualBox, you can open the storage configuration for one of the two guests, select the SATA controller, and click on the "Add Hard Disk" icon nearby. Choose "Create New Disk" and specify an appropriate path name for the file that will contain the VDI. The shared VDI must be at least 1.5 GB. Note that the guest must be stopped to do this.
  6. Add that VDI to the other guest - using its Storage configuration - so that each can access it while running. The steps start out the same, except that you choose "Choose Existing Disk" instead of "Create New Disk." Because the disk is configured on both of them, VirtualBox prevents you from running both guests at the same time.
  7. Identify device names of that VDI, in each of the guests. Solaris chooses the name based on existing devices. The names may be the same, or may be different from each other. This step is shown below as "Step 1."

Assumptions

In the example shown below, I make these assumptions.
  • The guest that will own the zone at the beginning is named sysA.
  • The guest that will own the zone after the first migration is named sysB.
  • On sysA, the shared disk is named /dev/dsk/c7t2d0
  • On sysB, the shared disk is named /dev/dsk/c7t3d0

(Finally!) The Steps

Step 1) Determine the name of the disk that will move back and forth between the systems.
root@sysA:~# format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
       0. c7t0d0 
          /pci@0,0/pci8086,2829@d/disk@0,0
       1. c7t2d0 
          /pci@0,0/pci8086,2829@d/disk@2,0
Specify disk (enter its number): ^D
Step 2) The first thing to do is partition and label the disk. The magic needed to write an EFI label is not overly complicated.
root@sysA:~# format -e c7t2d0
selecting c7t2d0
[disk formatted]

FORMAT MENU:
...
format> fdisk
No fdisk table exists. The default partition for the disk is:

  a 100% "SOLARIS System" partition

Type "y" to accept the default partition,  otherwise type "n" to edit the
 partition table. n
SELECT ONE OF THE FOLLOWING:
...
Enter Selection: 1
...
  G=EFI_SYS    0=Exit? f
SELECT ONE...
...
6

format> label
...
Specify Label type[1]: 1
Ready to label disk, continue? y

format> quit

root@sysA:~# ls /dev/dsk/c7t2d0
/dev/dsk/c7t2d0

Step 3) Configure zone1 on sysA.
root@sysA:~# zonecfg -z zone1
Use 'create' to begin configuring a new zone.
zonecfg:zone1> create
create: Using system default template 'SYSdefault'
zonecfg:zone1> set zonename=zone1
zonecfg:zone1> set zonepath=/zones/zone1
zonecfg:zone1> add rootzpool
zonecfg:zone1:rootzpool> add storage dev:dsk/c7t2d0
zonecfg:zone1:rootzpool> end
zonecfg:zone1> exit
root@sysA:~#
oot@sysA:~# zonecfg -z zone1 info
zonename: zone1
zonepath: /zones/zone1
brand: solaris
autoboot: false
bootargs:
file-mac-profile:
pool:
limitpriv:
scheduling-class:
ip-type: exclusive
hostid:
fs-allowed:
anet:
...
rootzpool:
        storage: dev:dsk/c7t2d0
Step 4) Install the zone. This step takes the most time, but you can wander off for a snack or a few laps around the gym - or both! (Just not at the same time...)
root@sysA:~# zoneadm -z zone1 install
Created zone zpool: zone1_rpool
Progress being logged to /var/log/zones/zoneadm.20121022T163634Z.zone1.install
       Image: Preparing at /zones/zone1/root.

 AI Manifest: /tmp/manifest.xml.RXaycg
  SC Profile: /usr/share/auto_install/sc_profiles/enable_sci.xml
    Zonename: zone1
Installation: Starting ...

              Creating IPS image
Startup linked: 1/1 done
              Installing packages from:
                  solaris
                      origin:  http://pkg.us.oracle.com/support/
DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED
Completed                            183/183   33556/33556  222.2/222.2  2.8M/s

PHASE                                          ITEMS
Installing new actions                   46825/46825
Updating package state database                 Done
Updating image state                            Done
Creating fast lookup database                   Done
Installation: Succeeded

        Note: Man pages can be obtained by installing pkg:/system/manual

 done.

        Done: Installation completed in 1696.847 seconds.


  Next Steps: Boot the zone, then log into the zone console (zlogin -C)

              to complete the configuration process.

Log saved in non-global zone as /zones/zone1/root/var/log/zones/zoneadm.20121022T163634Z.zone1.install
Step 5) Boot the Zone.
root@sysA:~# zoneadm -z zone1 boot
Step 6) Login to zone's console to complete the specification of system information.
root@sysA:~# zlogin -C zone1
Answer the usual questions and wait for a login prompt. Then you can end the console session with the usual "~." incantation.

Step 7) Shutdown the zone so it can be "moved."

root@sysA:~# zoneadm -z zone1 shutdown
Step 8) Detach the zone so that the original global zone can't use it.
root@sysA:~# zoneadm list -cv
  ID NAME             STATUS     PATH                           BRAND    IP
   0 global           running    /                              solaris  shared
   - zone1            installed  /zones/zone1                   solaris  excl
root@sysA:~# zpool list
NAME          SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
rpool        17.6G  11.2G  6.47G  63%  1.00x  ONLINE  -
zone1_rpool  1.98G   484M  1.51G  23%  1.00x  ONLINE  -
root@sysA:~# zoneadm -z zone1 detach
Exported zone zpool: zone1_rpool
Step 9) Review the result and shutdown sysA so that sysB can use the shared disk.
root@sysA:~# zpool list
NAME    SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
rpool  17.6G  11.2G  6.47G  63%  1.00x  ONLINE  -
root@sysA:~# zoneadm list -cv
  ID NAME             STATUS     PATH                           BRAND    IP
   0 global           running    /                              solaris  shared
   - zone1            configured /zones/zone1                   solaris  excl
root@sysA:~# init 0
Step 10) Now boot sysB and configure a zone with the parameters shown above in Step 1. (Again, the safest method is to use "zonecfg ... export" on sysA as described in section "Method Overview" above.) The one difference is the name of the rootzpool storage device, which was shown in the list of assumptions, and which you must determine by booting sysB and using the "format" or "zpool import" command.

When that is done, you should see the output shown next. (I used the same zonename - "zone1" - in this example, but you can choose any valid zonename you want.)

root@sysB:~# zoneadm list -cv
  ID NAME             STATUS     PATH                           BRAND    IP
   0 global           running    /                              solaris  shared
   - zone1            configured /zones/zone1                   solaris  excl
root@sysB:~# zonecfg -z zone1 info
zonename: zone1
zonepath: /zones/zone1
brand: solaris
autoboot: false
bootargs:
file-mac-profile:
pool:
limitpriv:
scheduling-class:
ip-type: exclusive
hostid:
fs-allowed:
anet:
        linkname: net0
...
rootzpool:
        storage: dev:dsk/c7t3d0
Step 11) Attaching the zone automatically imports the zpool.
root@sysB:~# zoneadm -z zone1 attach
Imported zone zpool: zone1_rpool
Progress being logged to /var/log/zones/zoneadm.20121022T184034Z.zone1.attach
    Installing: Using existing zone boot environment
      Zone BE root dataset: zone1_rpool/rpool/ROOT/solaris
                     Cache: Using /var/pkg/publisher.
  Updating non-global zone: Linking to image /.
Processing linked: 1/1 done
  Updating non-global zone: Auditing packages.
No updates necessary for this image.

  Updating non-global zone: Zone updated.
                    Result: Attach Succeeded.
Log saved in non-global zone as /zones/zone1/root/var/log/zones/zoneadm.20121022T184034Z.zone1.attach

root@sysB:~# zoneadm -z zone1 boot
root@sysB:~# zlogin zone1
[Connected to zone 'zone1' pts/2]
Oracle Corporation      SunOS 5.11      11.1    September 2012
Step 12) Now let's migrate the zone back to sysA. Create a file in zone1 so we can verify it exists after we migrate the zone back, then begin migrating it back.
root@zone1:~# ls /opt
root@zone1:~# touch /opt/fileA
root@zone1:~# ls -l /opt/fileA
-rw-r--r--   1 root     root           0 Oct 22 14:47 /opt/fileA
root@zone1:~# exit
logout

[Connection to zone 'zone1' pts/2 closed]
root@sysB:~# zoneadm -z zone1 shutdown
root@sysB:~# zoneadm -z zone1 detach
Exported zone zpool: zone1_rpool
root@sysB:~# init 0
Step 13) Back on sysA, check the status.
Oracle Corporation      SunOS 5.11      11.1    September 2012
root@sysA:~# zoneadm list -cv
  ID NAME             STATUS     PATH                           BRAND    IP
   0 global           running    /                              solaris  shared
   - zone1            configured /zones/zone1                   solaris  excl
root@sysA:~# zpool list
NAME    SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
rpool  17.6G  11.2G  6.47G  63%  1.00x  ONLINE  -
Step 14) Re-attach the zone back to sysA.
root@sysA:~# zoneadm -z zone1 attach
Imported zone zpool: zone1_rpool
Progress being logged to /var/log/zones/zoneadm.20121022T190441Z.zone1.attach
    Installing: Using existing zone boot environment
      Zone BE root dataset: zone1_rpool/rpool/ROOT/solaris
                     Cache: Using /var/pkg/publisher.
  Updating non-global zone: Linking to image /.
Processing linked: 1/1 done
  Updating non-global zone: Auditing packages.
No updates necessary for this image.

  Updating non-global zone: Zone updated.
                    Result: Attach Succeeded.
Log saved in non-global zone as /zones/zone1/root/var/log/zones/zoneadm.20121022T190441Z.zone1.attach

root@sysA:~# zpool list
NAME          SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
rpool        17.6G  11.2G  6.47G  63%  1.00x  ONLINE  -
zone1_rpool  1.98G   491M  1.51G  24%  1.00x  ONLINE  -
root@sysA:~# zoneadm -z zone1 boot
root@sysA:~# zlogin zone1
[Connected to zone 'zone1' pts/2]
Oracle Corporation      SunOS 5.11      11.1    September 2012
root@zone1:~# zpool list
NAME    SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
rpool  1.98G   538M  1.46G  26%  1.00x  ONLINE  -
Step 15) Check for the file created on sysB, earlier.
root@zone1:~# ls -l /opt
total 1
-rw-r--r--   1 root     root           0 Oct 22 14:47 fileA

Next Steps

Here is a brief list of some of the fun things you can try next.
  • Add space to the zone by adding a second storage device to the rootzpool. Make sure that you add it to the configurations of both zones!
  • Create a new zone, specifying two disks in the rootzpool when you first configure the zone. When you install that zone, or clone it from another zone, zoneadm uses those two disks to create a mirrored pool. (Three disks will result in a three-way mirror, etc.)

Conclusion

Hopefully you have seen the ease with which you can now move Solaris Zones from one system to another.

About

Jeff Victor writes this blog to help you understand Oracle's Solaris and virtualization technologies.

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today