Tuesday Apr 29, 2014

Solaris 11.2 Highlights [Part 1] in 6 Minutes or Less

This is not the complete list, of course. Just a few hand-picked ones.

First things first, Solaris 11.2 beta is out.

URLs: Download | What's New in Solaris 11.2 | Information Library (documentation)


Zones related ..

Kernel Zones

Kernel Zones bring the ability to run a non-global/local zone at a different kernel version from the global zone and can be patched or updated independently without the need to reboot the global zone. In other words, kernel zones are independent and isolated environments with a full kernel and user environment.

Creating a Kernel Zone:

  1. If not available, install the kernel zone brand
    # pkg install brand/brand-solaris-kz
  2. Create and install a kernel zone using the existing zonecfg and zoneadm commands. The only difference compared to creating a non-kernel zone (the zones we have been creating for the past 10 years) is the template to be used -- by default, SYSdefault template is used. To create a kernel zone, use SYSsolaris-kz template instead.

    # zonecfg -z <zone-name> create –t SYSsolaris-kz
    # zoneadm –z <zone-name> install
    # .. continue with the rest of the steps to complete zone configuration ..

Kernel Zones can be used in combination with logical domains (Oracle VM for SPARC), but cannot be used in combination with other virtualization solutions such as Oracle VM VirtualBox that does not support nested virtualization.

Live Zone Re-configuration

This release (11.2) added support for the dynamic re-configuration of local zones. Now the following configuration changes do not require a zone reboot.

  • Resource controls and pools
  • Network configuration
  • Adding or removing file systems
  • Adding or removing virtual and physical devices

Read-Only Global Zones

Recent releases of Solaris have support for Immutable Non-Global Zones already. Solaris 11.2 extends the immutable zone support to Global Zones. Immutable zones will have a read-only zone root.

Make a Global Zone Read-Only/Immutable by:

# zonecfg -z global set file-mac-profile=fixed-configuration

Installing Packages across multiple Non-Global Zones from the Global Zone

  • -r option of pkg can be used to install/update/uninstall software packages into/in/from all non-global zones from the global zone.
  • Use -Z option along with -r to exclude a zone in applying the package operation. Similarly use -z along with -r to apply the intended package operation only in a specific zone

Multiple Boot Environments for Solaris 10 Zones

Multiple BE support has been extended to Solaris 10 Zones in this release. This feature is useful when performing operations such as patching within an Solaris 10 environment running on a Solaris 11 system.

CMT Aware Zones and Resource Pool Configuration

It is now possible to allocate CMT based resources -- vCPUs, Cores and Sockets, using the existing zonecfg and poolcfg commands. This is useful from performance and/or licensing point of view as it provides flexibility and control for managing licensing boundaries or dedicating hardware resources solely to a zone.

Cloud related ..

Centralized Cloud Management with OpenStack

Solaris 11.2 is the first release to incorporate a complete OpenStack distribution. OpenStack allows managing and sharing compute, network and storage resources in the data center through a centralized web portal. In other words, now administrators can set up an enterprise ready private cloud Infrastructure-as-a-Service (IaaS) environment with ease.

Check this quick How-To article out at Oracle Technology Network -- Getting Started with OpenStack on Oracle Solaris 11.2

Cloning and Disaster Recovery with Unified Archives

Unified Archives is a new native archive type that enables quick cloning for rapid application deployment in the cloud, fast and reliable disaster recovery. Both bare metal and virtual environments are supported. Check the archiveadm(1M) man page for details.

Create a clone archive of a system
# archiveadm create ./clone.uar

Create bootable media
# archiveadm create-media ./archive.uar				/* USB image */
# archiveadm create-media -f iso <other options> ./bootarch.uar	/* ISO image */

Create a full system recovery archive
# archiveadm create --recovery ./recovery.uar

Extract information from a Unified Archive
# archiveadm info somearchive.uar

To be continued .. Stay tuned.

Friday Apr 27, 2012

Solaris Volume Manager (SVM) on Solaris 11

SVM is not installed on Solaris 11 by default.

# metadb
-bash: metadb: command not found

# /usr/sbin/metadb
-bash: /usr/sbin/metadb: No such file or directory

Install it using pkg utility.

# pkg info svm
pkg: info: no packages matching the following patterns you specified are
installed on the system.  Try specifying -r to query remotely:


# pkg info -r svm
          Name: storage/svm
       Summary: Solaris Volume Manager
   Description: Solaris Volume Manager commands
      Category: System/Core
         State: Not installed
     Publisher: solaris
       Version: 0.5.11
 Build Release: 5.11
Packaging Date: October 19, 2011 06:42:14 AM 
          Size: 3.48 MB
          FMRI: pkg://solaris/storage/svm@0.5.11,5.11-

# pkg install storage/svm
           Packages to install:   1
       Create boot environment:  No
Create backup boot environment: Yes
            Services to change:   1

DOWNLOAD                                  PKGS       FILES    XFER (MB)
Completed                                  1/1     104/104      1.6/1.6

PHASE                                        ACTIONS
Install Phase                                168/168 

PHASE                                          ITEMS
Package State Update Phase                       1/1 
Image State Update Phase                         2/2 

# which metadb

This time metadb may fail with a different error.

# metadb
metadb: <HOST>: /dev/md/admin: No such file or directory

Check if md.conf exists.

# ls -l  /kernel/drv/md.conf 
-rw-r--r--   1 root     sys          295 Apr 26 15:07 /kernel/drv/md.conf

Dynamically re-scan md.conf so the device tree gets updated.

# update_drv -f md

# ls -l  /dev/md/admin
lrwxrwxrwx   1 root root 31 Apr 20 10:12 /dev/md/admin -> ../../devices/pseudo/md@0:admin

# metadb
metadb: <HOST>: there are no existing databases

Now Solaris Volume Manager is ready to use.

#  metadb -f -a c0t5000CCA00A5A7878d0s0

# metadb
        flags           first blk       block count
     a        u         16              8192          /dev/dsk/c0t5000CCA00A5A7878d0s0

Friday Oct 08, 2010

Is it really Solaris versus Windows & Linux?

(Even though the title explicitly states "Solaris Versus .. ", this blog entry is equally applicable to all the operating systems in the world with few changes.)

Lately I have seen quite a few e-mails and heard few customer representatives talking about the performance of their application(s) on Solaris, Windows and Linux. Typically they go like the following with a bunch of supporting data (all numbers) and no hardware configuration specified whatsoever.

  • "Transaction X is nearly twice as slow on Solaris compared to the same transaction running on Windows or Linux"
  • "Transaction X runs much faster on my Windows laptop than on a Solaris box"

Lack of awareness and taking the hardware completely out of the discussions and context are the biggest problems with complaints like these. Those claims make sense only when the underlying hardware is the same in all test cases. For example, comparing a single user, single threaded transaction running on Windows, Linux and Solaris on x86 hardware is appropriate (as long as the type and speed of the processor are identical), but not against Solaris running on SPARC hardware. This is mainly because the processor architecture is completely different for x86 and SPARC platforms.

Besides, these days Oracle offers two types of SPARC hardware - 1. T-series and 2. M-series, which serve different purposes though they are compatible with each other. It is hard to compare and analyze the performance discrimination between different SPARC offerings (T- and M-series) too with no proper understanding of the characteristics of the CPUs in use. Choosing the right hardware for the right job is the key.

It is improper to compare the business transactions running on x86 with SPARC systems or even between different types of SPARC systems, and to incorrectly attribute the hardware strength or weakness to the operating system that runs on top of the bare metal. If there is so much of discrepancy among different operating environments, it is recommended to spend some time understanding the nuances in testing hardware before spending enormous amounts of time trying to tune the application and the operating system.

The bottomline: in addition to the software (application + OS), hardware plays an important role in the performance and scalability of an application - so, unless the testing hardware is the same for all test cases on different operating systems, don't you just focus on the operating system alone and make hasty decisions to switch to other operating platforms. Carefully choose appropriate hardware for the task in hand.

Sunday Jul 19, 2009

Solaris 10: Zone Creation for Dummies

(Reproducing the three and half year old blog entry, a top 5 one, "as is" from my other blog hosted on blogger. Source URL: http://technopark02.blogspot.com/2006/02/solaris-10-zone-creation-for-dummies.html)

About Zones

In its simple form, a zone is a virtual operating system environment created within a single instance of the Solaris operating system. Efficient resource utilization is the main goal of this technology.

Solaris 10's zone partitioning technology can be used to create local zones that behave like virtual servers. All local zones are controlled from the system's global zone. Processes running in a zone are completely isolated from the rest of the system. This isolation prevents processes that are running in one zone from monitoring or affecting processes that are running in other zones. Note that processes running in a local zone can be monitored from global zone; but the processes running in a global zone or even in another local zone cannot be monitored from a local zone.

As of now, the upper limit for the number of zones that can be created/run on a system is 8192; of course, depending on the resource availability, a single system may or may not run all the configured zones effectively.

Global Zone

When we install Solaris 10, a global zone gets installed automatically; and the core operating system runs under global zone. To list all the configured zones, we can use zoneadm command:

 % zoneadm list -v
  ID NAME             STATUS         PATH
   0 global           running        /

Global zone is the only one:

  • bootable from the system hardware
  • to be used for system-wide administrative control, such as physical devices, routing, or dynamic reconfiguration (DR). ie., global zone is the only zone that is aware of all devices and all file systems
  • from which a non-global zone can be configured, installed, managed, or uninstalled. ie., global zone is the only zone that is aware of the existence of non-global (local) zones and their configurations. It is not possible to create local zones, within a local zone

Steps to create a Local Zone


  • Plenty of disk space to hold the newly installed zone. It needs at least 2G space to copy the essential files to the local zone, and of course the disk space needed by the application(s) you are planning to run, in this zone; and
  • A dedicated IP for network connectivity

Basic Zone creation steps with examples:

  1. Check the disk space & network configuration
     % df -h /
     Filesystem             size   used  avail capacity  Mounted on
     /dev/dsk/c1t1d0s0       29G    22G   7.1G    76%    /
     % ifconfig -a
     lo0: flags=2001000849 mtu 8232 index 1
             inet netmask ff000000
     eri0: flags=1000843 mtu 1500 index 2
             inet netmask fffffe00 broadcast
  2. Since there is more than 5G free space, I've decided to install a local zone under /zones.
     % mkdir /zones
  3. Next step is to define/create the zone root. This is the path to zone's root directory that is relative to the global zone's root directory. Zone root must be owned by root user with the mode 700. This will be used in setting the zonepath property, during the zone creation process
     % cd /zones
     % mkdir appserver
     % chmod 700 appserver
     % ls -l
     total 2
     drwx------   2 root     root         512 Feb 17 12:46 appserver
  4. Create & configure a new 'sparse root' local zone, with root privileges
     % zonecfg -z appserv
     appserv: No such zone configured
     Use 'create' to begin configuring a new zone.
     zonecfg:appserv> create
     zonecfg:appserv> set zonepath=/zones/appserver
     zonecfg:appserv> set autoboot=true
     zonecfg:appserv> add net
     zonecfg:appserv:net> set physical=eri0
     zonecfg:appserv:net> set address=
     zonecfg:appserv:net> end
     zonecfg:appserv> add fs
     zonecfg:appserv:fs> set dir=/repo2
     zonecfg:appserv:fs> set special=/dev/dsk/c2t40d1s6
     zonecfg:appserv:fs> set raw=/dev/rdsk/c2t40d1s6
     zonecfg:appserv:fs> set type=ufs
     zonecfg:appserv:fs> set options noforcedirectio
     zonecfg:appserv:fs> end
     zonecfg:appserv> add inherit-pkg-dir
     zonecfg:appserv:inherit-pkg-dir> set dir=/opt/csw
     zonecfg:appserv:inherit-pkg-dir> end
     zonecfg:appserv> info
     zonepath: /zones/appserver
     autoboot: true
             dir: /lib
             dir: /platform
             dir: /sbin
             dir: /usr
             dir: /opt/csw
             physical: eri0
     zonecfg:appserv> verify
     zonecfg:appserv> commit
     zonecfg:appserv> exit

    Sparse Root Zone Vs Whole Root Zone(Updated 05/07/2008)

    In a Sparse Root Zone, the directories /usr, /sbin, /lib and /platform will be mounted as loopback file systems. That is, although all those directories appear as normal directories under the sparse root zone, they will be mounted as read-only file systems. Any change to those directories in the global zone can be seen from the sparse root zone.

    However if you need the ability to write into any of those directories listed above, you may need to configure a Whole Root Zone. For example, softwares like ClearCase need write permissions to /usr directory. In that case configuring a Whole Root Zone is the way to go. The steps for creating and configuring a new 'Whole Root' local zone are as follows:

     % zonecfg -z appserv
     appserv: No such zone configured
     Use 'create' to begin configuring a new zone.
     zonecfg:appserv> create
     zonecfg:appserv> set zonepath=/zones/appserver
     zonecfg:appserv> set autoboot=true
     zonecfg:appserv> add net
     zonecfg:appserv:net> set physical=eri0
     zonecfg:appserv:net> set address=
     zonecfg:appserv:net> end
     zonecfg:appserv> add inherit-pkg-dir
     zonecfg:appserv:inherit-pkg-dir> set dir=/opt/csw
     zonecfg:appserv:inherit-pkg-dir> end
     zonecfg:appserv> remove inherit-pkg-dir dir=/usr
     zonecfg:appserv> remove inherit-pkg-dir dir=/sbin
     zonecfg:appserv> remove inherit-pkg-dir dir=/lib
     zonecfg:appserv> remove inherit-pkg-dir dir=/platform
     zonecfg:appserv> info
     zonepath: /zones/appserver
     autoboot: true
             dir: /opt/csw
             physical: eri0
     zonecfg:appserv> verify
     zonecfg:appserv> commit
     zonecfg:appserv> exit

    Brief explanation of the properties that I added:

    \* zonepath=/zones/appserver

    Local zone's root directory, relative to global zone's root directory. ie., local zone will have all the bin, lib, usr, dev, net, etc, var, opt etc., directories physically under /zones/appserver directory

    \* autoboot=true

    boot this zone automatically when the global zone is booted

    \* physical=eri0

    eri0 card is used for the physical interface

    \* address= is the IP address. It must have all necessary DNS entries

    [Added 08/25/08] The whole add fs section adds the file system to the zone. In this example, the file system that is being exported to the zone is an existing UFS file system.

    \* set dir=/repo2

    /repo2 is the mount point in the local zone

    \* set special=/dev/dsk/c2t40d1s6 set raw=/dev/rdsk/c2t40d1s6

    Grant access to the block (/dev/dsk/c2t40d1s6) and raw (/dev/rdsk/c2t40d1s6) devices so the file system can be mounted in the non-global zone. Make sure the block device is not mounted anywhere right before installing the non-global zone. Otherwise, the zone installation may fail with ERROR: file system check </usr/lib/fs/ufs/fsck> of </dev/rdsk/c2t40d1s6> failed: exit status <33>: run fsck manually. In that case, unmount the file system that is being exported, uninstall the partially installed zone (zoneadm -z <zone> uninstall) then install the zone from the scratch (no need to re-configure the zone, just do a re-install).

    \* set type=ufs

    The file system is of type UFS

    \* set options noforcedirectio

    Mount the file system with the option noforcedirectio[/Added 08/25/08]

    \* dir=/opt/csw

    read-only path, will be lofs'd (loop back mounted) from global zone. Note: it works for sparse root zone only -- whole root zone cannot have any shared file systems

    zonecfg commands verify and commit, verifies and commits the zone configuration for the zone, respectively. Note that it is not necessary to commit the zone configuration; it will be done automatically when we exit from zonecfg tool. info displays information about the current configuration

  5. Check the state of the newly created/configured zone
     % zoneadm list -cv
       ID NAME             STATUS         PATH
        0 global           running        /
        - appserv          configured     /zones/appserver
  6. Next step is to install the configured zone. It takes a while to install the necessary packages
     % zoneadm -z appserv install
     /zones must not be group writable.
     could not verify zonepath /zones/appserver because of the above errors.
     zoneadm: zone appserv failed to verify
     % ls -ld /zones
     drwxrwxr-x   3 root     root         512 Feb 17 12:46 /zones

    Since /zones must not be group writable, let's change the mode to 700.

     % chmod 700 /zones
     % ls -ld /zones
     drwx------   3 root     root         512 Feb 17 12:46 /zones
     % zoneadm -z appserv install
     Preparing to install zone .
     Creating list of files to copy from the global zone.
     Copying <2658> files to the zone.
     Initializing zone product registry.
     Determining zone package initialization order.
     Preparing to initialize <1128> packages on the zone.
     Initialized <1128> packages on zone.
     Zone  is initialized.
     Installation of these packages generated errors: 
     Installation of <2> packages was skipped.
     Installation of these packages generated warnings: <CSWbdb3 CSWtcpwrap 
      CSWreadline CSWlibnet CSWlibpcap CSWjpeg CSWzlib  CSWcommon CSWpkgget SMCethr CSWxpm 
      SMClsof SMClibgcc SMCossld OpenSSH SMCtar SUNWj3dmx CSWexpat CSWftype2 CSWfconfig 
      CSWiconv  CSWggettext CSWlibatk CSWpango CSWpng CSWtiff CSWgtk2 CSWpcre CSWlibmm 
      CSWgsed CSWlibtool CSWncurses CSWunixodbc CSWoldap  CSWt1lib CSWlibxml2 CSWbzip2 
      CSWlibidn CSWphp>
     The file  contains a log of the zone installation.
  7. Verify the state of the appserv zone, one more time
     % zoneadm list -cv
       ID NAME             STATUS         PATH
        0 global           running        /
        - appserv          installed      /zones/appserver
  8. Boot up the appserv zone. Let's note down the ifconfig output to see how it changes after the local zone boots up. Also observe that there is no answer from the server yet, since it is not up
     % ping
     no answer from
     % ifconfig -a
     lo0: flags=2001000849 mtu 8232 index 1
             inet netmask ff000000
     eri0: flags=1000843 mtu 1500 index 2
             inet netmask fffffe00 broadcast
             ether 0:3:ba:2d:0:84
     % zoneadm -z appserv boot
     zoneadm: zone 'appserv': WARNING: eri0:1: no matching subnet found in netmasks(4) for; 
     using default of
     % zoneadm list -cv
       ID NAME             STATUS         PATH
        0 global           running        /
        1 appserv          running        /zones/appserver
     % ping is alive
     % ifconfig -a
     lo0: flags=2001000849 mtu 8232 index 1
             inet netmask ff000000
     lo0:1: flags=2001000849 mtu 8232 index 1
             zone appserv
             inet netmask ff000000
     eri0: flags=1000843 mtu 1500 index 2
             inet netmask fffffe00 broadcast
             ether 0:3:ba:2d:0:84
     eri0:1: flags=1000843 mtu 1500 index 2
             zone appserv
             inet netmask ffff0000 broadcast

    Observe that the zone appserv has it's own virtual instance of lo0, the system's loopback interface and the zone's IP address is also being served by the eri0 network interface

  9. Login to the Zone {console} and performing the internal zone configuration. zlogin utility can be used to enter a zone. The first time we log in to the console, we get a chance to answer a series of questions for the desired zone configuraton. -C option of zlogin can be used to log in to the Zone console.
     % zlogin -C -e [ appserv
     [Connected to zone 'appserv' console]
     Select a Language
       0. English
       1. es
       2. fr
     Please make a choice (0 - 2), or press h or ? for help: 0
     Select a Locale
       0. English (C - 7-bit ASCII)
       1. Canada (English) (UTF-8)
       2. Canada-English (ISO8859-1)
       3. U.S.A. (UTF-8)
       4. U.S.A. (en_US.ISO8859-1)
       5. U.S.A. (en_US.ISO8859-15)
       6. Go Back to Previous Screen
     Please make a choice (0 - 6), or press h or ? for help: 0
      Enter the host name which identifies this system on the network.  The name
       must be unique within your domain; creating a duplicate host name will cause
       problems on the network after you install Solaris.
       A host name must have at least one character; it can contain letters,
       digits, and minus signs (-).
         Host name for eri0:1 appserv v440appserv
     System identification is completed. 
     rebooting system due to change(s) in /etc/default/init
     [NOTICE: Zone rebooting]
     SunOS Release 5.11 Version snv_23 64-bit
     Copyright 1983-2005 Sun Microsystems, Inc.  All rights reserved.
     Use is subject to license terms.
     Hostname: v440appserv
     v440appserv console login: root
     Feb 17 15:15:30 v440appserv login: ROOT LOGIN /dev/console
     Sun Microsystems Inc.   SunOS 5.11      snv_23  October 2007

That is all there is in the creation of a local zone. Now simply login to the newly created zone, just like connecting to any other system in the network.

[New 08/27/2008] Mounting file systems in a non-global zone

Sometimes it might be necessary to export file systems or create new file systems when the zone is already running. This section's focus is on exporting block devices and the raw devices in such situations i.e., when the local zone is already configured.

Exporting the Raw Device(s) to a non-global zone

If the file system does not exist on the device, raw devices can be exported as they are, so the file system can be created inside the non-global zone using the normal newfs command.

The following example shows how to export the raw device to a non-global zone when the zone is already configured.

# zonecfg -z appserv
zonecfg:appserv> add device
zonecfg:appserv:device> set match=/dev/rdsk/c5t0d0s6
zonecfg:appserv:device> end
zonecfg:appserv> verify
zonecfg:appserv> commit
zonecfg:appserv> exit

In this example /dev/rdsk/c5t0d0s6 is being exported.

After the zonecfg step, reboot the non-global zone to make the raw device visible inside the non-global zone. After the reboot, check the existence of the raw device.

# hostname

# ls -l /dev/rdsk/c5t0d0s6
crw-r-----   1 root     sys      118, 126 Aug 27 14:33 /dev/rdsk/c5t0d0s6

Now that the raw device is accessible within the non-global zone, we can use the regular Solaris commands to create any file system like UFS.


# newfs -v c5t0d0s6
newfs: construct a new file system /dev/rdsk/c5t0d0s6: (y/n)? y
mkfs -F ufs /dev/rdsk/c5t0d0s6 1140260864 -1 -1 8192 1024 251 1 120 8192 t 0 -1 8 128 n
Warning: 4096 sector(s) in last cylinder unallocated
/dev/rdsk/c5t0d0s6: 1140260864 sectors in 185590 cylinders of 48 tracks, 128 sectors
 556768.0MB in 11600 cyl groups (16 c/g, 48.00MB/g, 5824 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
 32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920,
Initializing cylinder groups:
super-block backups for last 10 cylinder groups at:
 1139344160, 1139442592, 1139541024, 1139639456, 1139737888, 1139836320,
 1139934752, 1140033184, 1140131616, 1140230048

Exporting the Block Device(s) to a non-global zone

If the file system exists on the device, block devices can be exported as they are, so the file system can be mounted inside the non-global zone using the normal Solaris command, mount.

The following example shows how to export the block device to a non-global zone when the zone is already configured.

# zonecfg -z appserv
zonecfg:appserv> add device
zonecfg:appserv:device> set match=/dev/dsk/c5t0d0s6
zonecfg:appserv:device> end
zonecfg:appserv> verify
zonecfg:appserv> commit
zonecfg:appserv> exit

In this example /dev/dsk/c5t0d0s6 is being exported.

After the zonecfg step, reboot the non-global zone to make the block device visible inside the non-global zone. After the reboot, check the existence of the block device; and mount the file system within the non-global zone.

# hostname

# ls -l /dev/dsk/c5t0d0s6
brw-r-----   1 root     sys      118, 126 Aug 27 14:40 /dev/dsk/c5t0d0s6

# fstyp /dev/dsk/c5t0d0s6

# mount /dev/dsk/c5t0d0s6 /mnt

# df -h /mnt
Filesystem             size   used  avail capacity  Mounted on
/dev/dsk/c5t0d0s6      535G    64M   530G     1%    /mnt

Mounting a file system from the global zone into the non-global zone

Sometimes it is desirable to have the flexibility of mounting a file system in the global zone or non-global zone on-demand. In such situations, rather than exporting the file systems or block devices into the non-global zone, create the file system in the global zone and mount the file system directly from the global zone into the non-global zone. Make sure to unmount that file system in the global zone if mounted, before attempting to mount it in the non-global zone.


In the non-global zone:

# mkdir /repo1

In the global zone:

# df -h /repo1
/dev/dsk/c2t40d0s6     134G    64M   133G     1%    /repo1

# umount /repo1

# ls -ld /zones/appserv/root/repo1
drwxr-xr-x   2 root     root         512 Aug 27 14:45 /zones/appserv/root/repo1

# mount /dev/dsk/c2t40d0s6 /zones/appserv/root/repo1

Now go back to the non-global zone and check the mounted file systems.

# hostname

# df -h /repo1
Filesystem             size   used  avail capacity  Mounted on
/repo1                 134G    64M   133G     1%    /repo1
To unmount the file system from the non-global zone, run the following command from the global zone.

# umount /zones/appserv/root/repo1

Removing the file system from the non-global zone


Earlier in the zone creation step, the block device /dev/dsk/c2t40d1s6 was exported and mounted on the mount point /repo2 inside the non-global zone. To remove the file system completely from the non-global zone, run the following in the global zone.

# zonecfg -z appserv
zonecfg:appserv> remove fs dir=/repo2
zonecfg:appserv> verify
zonecfg:appserv> commit
zonecfg:appserv> exit

Reboot the non-global zone for this setting to take effect.

Shutting down and booting up the local zones (Updated 01/15/2008)

  1. To bring down the local zone:
     % zlogin appserv shutdown -i 0
  2. To boot up the local zone:
     % zoneadm -z appserv boot

Just for the sake of completeness, the following steps show how to remove a local zone.

Steps to delete a Local Zone

  1. Shutdown the local zone
     % zoneadm -z appserv halt
     % zoneadm list -cv
       ID NAME             STATUS         PATH
        0 global           running        /
        - appserv          installed      /zones/appserver
  2. Uninstall the local zone -- remove the root file system
     % zoneadm -z appserv uninstall
     Are you sure you want to uninstall zone appserv (y/[n])? y
      zoneadm list -cv
       ID NAME             STATUS         PATH
        0 global           running        /
        - appserv          configured     /zones/appserver
  3. Delete the configured local zone
     % zonecfg -z appserv delete
     Are you sure you want to delete zone appserv (y/[n])? y
      zoneadm list -cv
       ID NAME             STATUS         PATH
        0 global           running        /

[New: 07/14/2009]

Cloning a Non-Global Zone

The following instructions are for cloning a non-global zone on the same system. The example shown below clones the siebeldb zone. After the cloning process, a brand new zone oraclebi emerges as a replica of siebeldb zone.


# zoneadm list -cv
  ID NAME             STATUS     PATH                           BRAND    IP    
   0 global           running    /                              native   shared
   - siebeldb         installed  /zones/dbserver                native   excl

  1. Export the configuration of the zone that you want to clone/copy
    # zonecfg -z siebeldb export > /tmp/siebeldb.config.cfg
  2. Change the configuration of the new zone that differ from the existing one -- for example, IP address, data set names, network interface etc. To make these changes, edit /tmp/siebeldb.config.cfg

  3. Create the zone root directory for the new zone being created
    # mkdir /zones3/oraclebi
    # chmod 700 /zones3/oraclebi
    # ls -ld /zones3/oraclebi
    drwx------   2 root     root         512 Mar 12 15:41 /zones3/oraclebi
  4. Create a new (empty, non-configured) zone in the usual manner with the edited configuration file as an input
    # zonecfg -z oraclebi -f /tmp/siebeldb.config.cfg
    #  zoneadm list -cv
      ID NAME             STATUS     PATH                           BRAND    IP    
       0 global           running    /                              native   shared
       - siebeldb         installed  /zones/dbserver                native   excl   
       - oraclebi         configured /zones3/oraclebi               native   excl
  5. Ensure that the zone you intend to clone/copy is not running
    # zoneadm -z siebeldb halt
  6. Clone the existing zone
    # zoneadm -z oraclebi clone siebeldb
    Cloning zonepath /zones/dbserver...

    This step takes at least 5 minutes to clone the whole zone. Larger zones may take longer to complete the cloning process.

  7. Boot the newly created zone
    # zoneadm -z oraclebi boot

    Bring up the halted zone (the source zone) as well, if wish.

  8. Login to the console of the new zone to configure IP, networking, etc., and you are done.
    # zlogin -C oraclebi

[New: 07/15/2009]

Migrating a Non-Global Zone from One Host to Another

Keywords: Solaris, Non-Global Zone, Migration, Attach, Detach

The following instructions demonstrate how to migrate the non-global zone, orabi to another server with examples.

# zoneadm list -cv
  ID NAME             STATUS     PATH                           BRAND    IP    
   0 global           running    /                              native   shared
   4 siebeldb         running    /zones/dbserver                native   excl  
   - orabi            installed  /zones3/orabi                  native   shared

  1. Halt the zone to be migrated, if running
    # zoneadm -z orabi halt
  2. Detach the zone. Once detached, it will be in the configured state
    # zoneadm -z orabi detach
    # zoneadm list -cv
      ID NAME             STATUS     PATH                           BRAND    IP    
       0 global           running    /                              native   shared
       4 siebeldb         running    /zones/dbserver                native   excl  
       - orabi            configured /zones3/orabi                  native   shared
  3. Move the zonepath for the zone to be migrated from the old host to the new host.

    Do the following on the old host:

    # cd /zones3
    # tar -Ecf orabi.tar orabi
    # compress orabi.tar
    # sftp newhost
    Connecting to newhost...
    sftp> cd /zones3
    sftp> put orabi.tar.Z
    Uploading orabi.tar.Z to /zones3/orabi.tar.Z
    sftp> quit

    On the newhost:

    # cd /zones3
    # uncompress orabi.tar.Z
    # tar xf orabi.tar
  4. On the new host, configure the zone.

    Create the equivalent zone orabi on the new host -- use the zonecfg command with the -a option and the zonepath on the new host. Make any required adjustments to the configuration and commit the configuration.

    # zonecfg -z orabi
    orabi: No such zone configured
    Use 'create' to begin configuring a new zone.
    zonecfg:orabi> create -a /zones3/orabi
    zonecfg:orabi> info
    zonename: orabi
    zonepath: /zones3/orabi
    brand: native
    autoboot: false
    limitpriv: all,!sys_suser_compat,!sys_res_config,!sys_net_config,!sys_linkdir,!sys_devices,!sys_config,!proc_zone,!dtrace_kernel,!sys_ip_config
    ip-type: shared
     dir: /lib
     dir: /platform
     dir: /sbin
     dir: /usr
     address: IPaddress
     physical: nxge1
     defrouter not specified
    zonecfg:orabi> set capped-memory
    zonecfg:orabi:capped-memory> set physical=8G
    zonecfg:orabi:capped-memory> end
    zonecfg:orabi> commit
    zonecfg:orabi> exit
  5. Attach the zone on the new host with a validation check and update the zone to match a host running later versions of the dependent packages
    # ls -ld /zones3
    drwxrwxrwx   5 root     root         512 Jul 15 12:30 /zones3
    # chmod g-w,o-w /zones3
    # ls -ld /zones3
    drwxr-xr-x   5 root     root         512 Jul 15 12:30 /zones3
    # zoneadm -z orabi attach -u
    Getting the list of files to remove
    Removing 1740 files
    Remove 607 of 607 packages
    Installing 1878 files
    Add 627 of 627 packages
    Updating editable files
    The file  within the zone contains a log of the zone update.
    # zoneadm list -cv
      ID NAME             STATUS     PATH                           BRAND    IP    
       0 global           running    /                              native   shared
       - orabi            installed  /zones3/orabi                  native   shared


    It is possible to force the attach operation without performing the validation. You can do so with the help of -F option

    # zoneadm -z orabi attach -F

    Be careful when using this option because it could lead to an incorrect configuration; and an incorrect configuration could result in undefined behavior

[New: 07/19/2009]

Tip: How to find out whether connected to the primary OS instance or the virtual instance?

If the command zonename returns global, then you are connected to the OS instance that was booted from the physical hardware. If you see any string other than global, you might have connected to the virtual OS instance.

Alternatively try running prstat -Z or zoneadm list -cv commands. If you see exactly one non-zero Zone ID, it is an indication that you are connected to a non-global zone.

Suggested reading:

Monday Apr 06, 2009

Controlling [Virtual] Network Interfaces in a Non-Global Solaris Zone

In the software world, some tools like SAP NetWeaver's Adaptive Computing Controller (ACC) require full control over a network interface, so they can bring up/down the NICs at their will to fulfill their responsibilities. Those tools may function normally on Solaris 10 [and later] as long as they are run in the global zone. However there might be some trouble when those tools are attempted to run in a non-global zone, especially on machines with only one physical network interface installed, and when the non-global zones are created with the default configuration. This blog post attempts to suggest few solutions to get around those issues, so the tools can function the way they normally do in the global zone.

If the machine has only one NIC installed, there are at least two issues that will prevent tools like ACC from working in a non-global zone.

  1. Since there is only one network interface on the system, it is not possible to dedicate that interface to the non-global zone where ACC is supposed to run. Hence all the zones, including the global zone, must share the physical network interface.
  2. When the physical network interface is being shared across multiple zones, it is not possible to plumb/unplumb the network interface from a Shared-IP Non-Global Zone. Only the root users in the global zone can plumb/unplumb the lone physical network interface.
    • When a non-global zone is created with the default configuration, Shared-IP zone is created by default. Shared-IP zones have separate IP addresses, but share the IP routing configuration with the global zone.

Fortunately, Solaris 10 has a solution to the aforementioned issues in the form of Network Virtualization. Crossbow is the code name for network virtualization in Solaris. Crossbow provides the necessary building blocks to virtualize a single physical network interface into multiple virtual network interfaces (VNICs) - so the solution to the issue at hand is to create a virtual network interface, and then to create an Exclusive-IP Non-Global Zone using the virtual NIC. Rest of the blog post demonstrates the simple steps to create a VNIC, and to configure a non-global zone as Exclusive-IP Zone.

Create a Virtual Network Interface using Crossbow

  • Make sure the OS has Crossbow functionality
    global# cat /etc/release
                     Solaris Express Community Edition snv_111 SPARC
               Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
                            Use is subject to license terms.
                                 Assembled 23 March 2009

    Crossbow has been integrated into Solaris Express Community Edition (Nevada) build 105 - hence all Nevada builds starting with build 105 will have the Crossbow functionality. OpenSolaris 2009.06 and the next major update to Solaris 10 are expected to have the support for network virtualization out-of-the-box.

  • Check the existing zones and the available physical and virtual network interfaces.
    global# zoneadm list -cv
      ID NAME             STATUS     PATH                           BRAND    IP    
       0 global           running    /                              native   shared
    global# dladm show-link
    LINK        CLASS    MTU    STATE    OVER
    e1000g0     phys     1500   up       --

    In this example, there is only one NIC, e1000g0, on the server; and there are no non-global zones installed.

  • Create a virtual network interface based on device e1000g0 with an automatically generated MAC address. If the NIC has factory MAC addresses available, one of them will be used. Otherwise, a random address is selected. The auto mode is the default action if none is specified.
    global# dladm create-vnic -l e1000g0 vnic1
  • Check the available network interfaces one more time. Now you should be able to see the newly created virtual NIC in addition to the existing physical network interface. It is also possible to list only the virtual NICs.
    global# dladm show-link
    LINK        CLASS    MTU    STATE    OVER
    e1000g0     phys     1500   up       --
    vnic1       vnic     1500   up       e1000g0
    global# dladm show-vnic
    LINK         OVER         SPEED  MACADDRESS           MACADDRTYPE         VID
    vnic1        e1000g0      1000   2:8:20:32:9:10       random              0

Create a Non-Global Zone with the VNIC

  • Create an Exclusive-IP Non-Global Zone with the newly created VNIC being the primary network interface.
    global # mkdir -p /export/zones/sapacc
    global # chmod 700 /export/zones/sapacc
    global # zonecfg -z sapacc
    sapacc: No such zone configured
    Use 'create' to begin configuring a new zone.
    zonecfg:sapacc> create
    zonecfg:sapacc> set zonepath=/export/zones/sapacc
    zonecfg:sapacc> set autoboot=false
    zonecfg:sapacc> set ip-type=exclusive
    zonecfg:sapacc> add net
    zonecfg:sapacc:net> set physical=vnic1
    zonecfg:sapacc:net> end
    zonecfg:sapacc> verify
    zonecfg:sapacc> commit
    zonecfg:sapacc> exit
    global # zoneadm -z sapacc install
    global # zoneadm -z sapacc boot
    global #  zoneadm list -cv
      ID NAME             STATUS     PATH                           BRAND    IP    
       0 global           running    /                              native   shared
       1 sapacc           running    /export/zones/sapacc        	native   excl
  • Configure the new non-global zone including the IP address and the network services
    global # zlogin -C -e [ sapacc
      > Confirm the following information.  If it is correct, press F2;             
        to change any information, press F4.                                        
                      Host name: sap-zone2
                     IP address:                                       
        System part of a subnet: Yes                                                
                    Enable IPv6: No                                                 
                  Default Route: Detect one upon reboot                             
  • Inside the non-global zone, check the status of the VNIC and the status of the network service
    local# hostname
    local# zonename
    local# ifconfig -a
    lo0: flags=2001000849 mtu 8232 index 1
            inet netmask ff000000 
    vnic1: flags=1000843 mtu 1500 index 2
            inet netmask ffffff00 broadcast
            ether 2:8:20:32:9:10 
    lo0: flags=2002000849 mtu 8252 index 1
            inet6 ::1/128 
    local# svcs svc:/network/physical
    STATE          STIME    FMRI
    disabled       13:02:18 svc:/network/physical:nwam
    online         13:02:24 svc:/network/physical:default
  • Check the network connectivity.

    From inside the non-global zone to the outside world:

    local# ping -s sap29
    PING sap29: 56 data bytes
    64 bytes from sap29 ( icmp_seq=0. time=0.680 ms
    64 bytes from sap29 ( icmp_seq=1. time=0.452 ms
    64 bytes from sap29 ( icmp_seq=2. time=0.561 ms
    64 bytes from sap29 ( icmp_seq=3. time=0.616 ms
    ----sap29 PING Statistics----
    4 packets transmitted, 4 packets received, 0% packet loss
    round-trip (ms)  min/avg/max/stddev = 0.452/0.577/0.680/0.097
    From the outside world to the non-global zone:
    remotehostonWAN# telnet sap-zone2
    Connected to sap-zone2.sun.com.
    Escape character is '\^]'.
    login: test
    Sun Microsystems Inc.   SunOS 5.11      snv_111 November 2008
    -bash-3.2$ /usr/sbin/ifconfig -a
    lo0: flags=2001000849 mtu 8232 index 1
            inet netmask ff000000 
    vnic1: flags=1000843 mtu 1500 index 2
            inet netmask ffffff00 broadcast
    lo0: flags=2002000849 mtu 8252 index 1
            inet6 ::1/128 
    -bash-3.2$ exit
    Connection to sap-zone2 closed.

Dynamic [Re]Configuration of the [Virtual] Network Interface in a Non-Global Zone

  • Finally try plumbing down/up the virtual network interface inside the Exclusive-IP Non-Global Zone
    global # zlogin -C -e [ sapacc
    [Connected to zone 'sapacc' console]
    zoneconsole# ifconfig vnic1 unplumb
    zoneconsole# /usr/sbin/ifconfig -a
    lo0: flags=2001000849 mtu 8232 index 1
            inet netmask ff000000
    zoneconsole# ifconfig vnic1 plumb
    zoneconsole# ifconfig vnic1 netmask up
    zoneconsole# /usr/sbin/ifconfig -a
    lo0: flags=2001000849 mtu 8232 index 1
            inet netmask ff000000
    vnic1: flags=1000843 mtu 1500 index 2
            inet netmask ffffff00 broadcast
    lo0: flags=2002000849 mtu 8252 index 1
            inet6 ::1/128

As simple as that! Before we conclude, be informed that prior to Crossbow, Solaris system administrators were required to use Virtual Local Area Networks (VLAN) to achieve similar outcomes.

Check Zones and Containers FAQ, if you are stuck with a strange situation or if you need some interesting ideas around virtualization on Solaris.

Saturday Feb 07, 2009

Mounting Windows' NTFS on [Open]Solaris x86/x64

The steps outlined in this blog post are derived from the Miscellaneous filesystem support for OpenSolaris on x86 (link removed on 12/23/14) web page. I just added few examples to illustrate the steps to mount a partition with NTFS filesystem that exists on the external hard drive (in this case, it is a Seagate FreeAgent external hard drive).

Step-by-Step instructions to mount NTFS filesystem on [Open]Solaris

  1. Install the packages : FSWpart and FSWfsmisc.

  2. Find the logical device name for the NTFS partition. -l option of the rmformat command lists all removable devices along with their device names.

    # rmformat -l   
    Looking for devices...
         1. Logical Node: /dev/rdsk/c1t0d0p0
            Physical Node: /pci@0,0/pci-ide@1f,1/ide@1/sd@0,0
            Connected Device: MATSHITA UJDA750 DVD/CDRW 1.60
            Device Type: DVD Reader
    	Bus: IDE
    	Access permissions: 
         2. Logical Node: /dev/rdsk/c2t0d0p0
            Physical Node: /pci@0,0/pci1179,1@1d,7/storage@1/disk@0,0
            Connected Device: Seagate  FreeAgentDesktop 100F
            Device Type: Removable
    	Bus: USB
    	Size: 953.9 GB
    	Access permissions: 
  3. Identify the NTFS partition on the external disk with the help of fdisk
    # fdisk /dev/rdsk/c2t0d0p0
                 Total disk size is 60800 cylinders
                 Cylinder size is 32130 (512 byte) blocks
          Partition   Status    Type          Start   End   Length    %
          =========   ======    ============  =====   ===   ======   ===
              1                 IFS: NTFS         0  60800    60801    100
       1. Create a partition
       2. Specify the active partition
       3. Delete a partition
       4. Change between Solaris and Solaris2 Partition IDs
       5. Exit (update disk configuration and exit)
       6. Cancel (exit without updating disk configuration)
    Enter Selection: 6

    In this example, partition #1 i.e., c2t0d0p1 has the NTFS filesystem.

  4. Mount the NTFS partition just like mounting an UFS filesystem using the mount command. Use the argument ntfs to the command line option -F. Since the filesystem was mounted in a slightly different way than the conventional way, use /usr/bin/xlsmounts to see the detailed mount table information.

    # mount -F ntfs /dev/dsk/c2t0d0p1 /mnt
    # /usr/bin/xlsmounts
      PHYSICAL DEVICE                 LOGICAL DEVICE      FS    PID         ADDR Mounted on
    /dev/dsk/c2t0d0p1              /dev/dsk/c2t0d0p1    ntfs   6755 /mnt
    # ls /mnt
    expForSun.dmp             MySQL5.1                   RECYCLER
    medium-64-bit             $RECYCLE.BIN               System Volume Information

    Notice the under ADDR column in the output of xlsmounts. NTFS mount uses userland NFSv2 server to access the filesystems on raw partitions. That is why the mount was shown as NFS client mounted from

  5. To unmount the NTFS filesystem, use /usr/bin/xumount. Solaris standard umount command unmounts the filesystem but does not terminate the background NFS server process.

    # /usr/bin/xumount /mnt
             - OR -
    # /usr/bin/xumount /dev/dsk/c2t0d0p1

Check the Miscellaneous filesystem support for OpenSolaris on x86 page (link removed on 12/23/14) and Moinak Ghosh's blog post Mount and Access NTFS and Ext2FS from Solaris x86 for the rest of the fine details.

Sunday Nov 30, 2008

PeopleSoft on Solaris 10: Fixing the "msgget: No space left on device" Error

(Crossposting the 8+ month old blog entry from my other blog hosted on blogger. Source URL:

When a large number of application server processes are configured in a single PeopleSoft domain or in multiple domains cumulative, it is very likely that the PeopleSoft application server domain boot process may fail with errors like:

Booting server processes ...
exec PSSAMSRV -A -- -C psappsrv.cfg -D CS90SPV -S PSSAMSRV :
113954.ben15!PSSAMSRV.29746.1.0: LIBTUX_CAT:681: ERROR: Failure to create message queue
113954.ben15!PSSAMSRV.29746.1.0: LIBTUX_CAT:248: ERROR: System init function failed, Uunixerr = : 
                   msgget: No space left on device
113954.ben15!tmboot.29708.1.-2: CMDTUX_CAT:825: ERROR: Process PSSAMSRV at ben15 failed with /T 
                   tperrno (TPEOS - operating system error)

In this particular example, the PeopleSoft Enterprise is running on a Solaris 10 system. Fortunately the error message is very clear in this case; and the failure is related to the message queues. During the domain boot up process, there is a call to msgget() to create a message queue. If the call to msgget() succeeds, it returns a non-negative integer that serves as the identifier for the newly created message queue. However in the case of a failure, it returns -1 and sets the error number to EACCES, EEXIST, ENOENT or ENOSPC depending on the underlying reason.

From the above error messages it clear that the msgget() failed with the errno set to ENOSPC (No space left on device). Man page of msgget(2) has the following explanation for ENOSPC error code on Solaris:

     The msgget() function will fail if:
     ENOSPC    A message queue identifier is to  be  created  but
               the  system-imposed limit on the maximum number of
               allowed  message  queue  identifiers  system  wide
               would be exceeded. See NOTES.


     The system-imposed limit on  the  number  of  message  queue
     identifiers  is  maintained on a per-project basis using the
     project.max-msg-ids resource control.

It has enough clues to suspect the configured number for the message queue identifiers.

Prior to the release of Solaris 10, the /etc/system System V IPC tunable, msgsys:msginfo_msgmni, was used to control the maximum number of message queues that can be created. The default value on pre-Solaris 10 systems is 50.

With the release of Solaris 10, majority of the System V IPC tunables were obsoleted and equivalent resource controls were created for the remaining tunables to reduce the administrative overhead. On Solaris 10 and later versions, System V IPC can be tuned on a per project basis using the newly introduced resource controls.

On any Solaris 10 system, the resource control, project.max-msg-ids, replaced the old /etc/system tunable, msginfo_msgmni. And the default value has been raised to 128.

Now back to the failure in PeopleSoft environment. Let's first check the current value configured for project.max-msg-ids.

  • Get the project ID.
     % id -p
    uid=222227(psft) gid=2294(dba) projid=3(default)
  • Examine the project.max-msg-ids resource control for the project with ID 3, using the prctl utility.
     % prctl -n project.max-msg-ids -i project 3
    project: 3: default
    NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
            privileged        128       -   deny                                 -
            system          16.8M     max   deny                                 -

Alternatively run the command ipcs -q to check the number of active message queues. Note that the project with id '3' is configured to create a maximum of 128 (default) message queues. In any case, the number of active message queues from the ipcs -q output may almost match with the configured value for the project.max-msg-ids.

Since it appears the configured PeopleSoft domain(s) needs more than 128 message queues in order to bring up all the application server processes that constitute the PeopleSoft Enterprise, the solution is to increase the value for the resource control, project.max-msg-ids, to any value beyond 128. For the sake of simplicity, let's increase it to 256 (2 \* default value, that is). Again prctl utility can be used to set the new value for the resource control.

  • Assume the privileges of the 'root' user
     % su
  • Increase the maximum value for the message queue identifiers to 256 using the prctl utility.
     # prctl -n project.max-msg-ids -r -v 256 -i project 3
  • Verify the new maximum value for the message queue identifiers
     # prctl -n project.max-msg-ids -i project 3
    project: 3: default
    NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
            privileged        256       -   deny                                 -
            system          16.8M     max   deny                                 -

With this change, the PeopleSoft Enterprise should boot up at least with no Failure to create message queue .. msgget: No space left on device errors.

Before we conclude, note that the above mentioned solution is not persistent across multiple operating system reboots. To make it persistent, create a new project using the projadd command. The man page for projadd(1M) has an example showing the creation of a project.

Friday Nov 21, 2008

Oracle on Solaris 10 : Fixing the 'ORA-27102: out of memory' Error

(Crossposting the 2+ year old blog entry from my other blog hosted on blogger. Source URL:


As part of a database tuning effort you increase the SGA/PGA sizes; and Oracle greets with an ORA-27102: out of memory error message. The system had enough free memory to serve the needs of Oracle.

SQL> startup
ORA-27102: out of memory
SVR4 Error: 22: Invalid argument

$ oerr ORA 27102
27102, 00000, "out of memory"
// \*Cause: Out of memory
// \*Action: Consult the trace file for details

Not so helpful. Let's look the alert log for some clues.

% tail -2 alert.log
WARNING: EINVAL creating segment of size 0x000000028a006000
fix shm parameters in /etc/system or equivalent

Oracle is trying to create a 10G shared memory segment (depends on SGA/PGA sizes), but operating system (Solaris in this example) responded with an invalid argument (EINVAL) error message. There is a little hint about setting shm parameters in /etc/system.

Prior to Solaris 10, shmsys:shminfo_shmmax parameter has to be set in /etc/system with maximum memory segment value that can be created. 8M is the default value on Solaris 9 and prior versions; where as 1/4th of the physical memory is the default on Solaris 10 and later. On a Solaris 10 (or later) system, it can be verified as shown below:

% prtconf | grep Mem
Memory size: 32760 Megabytes

% id -p
uid=59008(oracle) gid=10001(dba) projid=3(default)

% prctl -n project.max-shm-memory -i project 3
project: 3: default
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
        privileged      7.84GB      -   deny                                 -
        system          16.0EB    max   deny                                 -

Now it is clear that the system is using the default value of 8G in this scenario, where as the application (Oracle) is trying to create a memory segment (10G) larger than 8G. Hence the failure.

So, the solution is to configure the system with a value large enough for the shared segment being created, so Oracle succeeds in starting up the database instance.

On Solaris 9 and prior releases, it can be done by adding the following line to /etc/system, followed by a reboot for the system to pick up the new value.

set shminfo_shmmax = 0x000000028a006000

However shminfo_shmmax parameter was obsoleted with the release of Solaris 10; and Sun doesn't recommend setting this parameter in /etc/system even though it works as expected.

On Solaris 10 and later, this value can be changed dynamically on a per project basis with the help of resource control facilities . This is how we do it on Solaris 10 and later:

% prctl -n project.max-shm-memory -r -v 10G -i project 3

% prctl -n project.max-shm-memory -i project 3
project: 3: default
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
        privileged      10.0GB      -   deny                                 -
        system          16.0EB    max   deny                                 -

Note that changes made with the prctl command on a running system are temporary, and will be lost when the system is rebooted. To make the changes permanent, create a project with projadd command and associate it with the user account as shown below:

% projadd -p 3  -c 'eBS benchmark' -U oracle -G dba  -K 'project.max-shm-memory=(privileged,10G,deny)' OASB
% usermod -K project=OASB oracle
Finally make sure the project is created with projects -l or cat /etc/project commands.

% projects -l
        projid : 3
        comment: "eBS benchmark"
        users  : oracle
        groups : dba
        attribs: project.max-shm-memory=(privileged,10737418240,deny)

% cat /etc/project
OASB:3:eBS benchmark:oracle:dba:project.max-shm-memory=(privileged,10737418240,deny)

With these changes, Oracle would start the database up normally.

SQL> startup
ORACLE instance started.

Total System Global Area 1.0905E+10 bytes
Fixed Size                  1316080 bytes
Variable Size            4429966096 bytes
Database Buffers         6442450944 bytes
Redo Buffers               31457280 bytes
Database mounted.
Database opened.

Related information:

  1. What's New in Solaris System Tuning in the Solaris 10 Release?
  2. Resource Controls (overview)
  3. System Setup Recommendations for Solaris 8 and Solaris 9
  4. Man page of prctl(1)
  5. Man page of projadd

Addendum : Oracle RAC settings

Anonymous Bob suggested the following settings for Oracle RAC in the form of a comment for the benefit of others who run into similar issue(s) when running Oracle RAC. I'm pasting the comment as is (Disclaimer: I have not verified these settings):

Thanks for a great explanation, I would like to add one comment that will help those with an Oracle RAC installation. Modifying the default project covers oracle processes great and is all that is needed for a single instance DB. In RAC however, the CRS process starts the DB and it is a root owned process and root does not use the default project. To fix ORA-27102 issue for RAC I added the following lines to an init script that runs before the init.crs script fires.

# Recommended Oracle RAC system params
ndd -set /dev/udp udp_xmit_hiwat 65536
ndd -set /dev/udp udp_recv_hiwat 65536

# For root processes like crsd
prctl -n project.max-shm-memory -r -v 8G -i project system
prctl -n project.max-shm-ids -r -v 512 -i project system

# For oracle processes like sqlplus
prctl -n project.max-shm-memory -r -v 8G -i project default
prctl -n project.max-shm-ids -r -v 512 -i project default

So simple yet it took me a week working with Oracle and SUN to come up with that answer...Hope that helps someone out.

# posted by Blogger Bob : 6:48 AM, April 25, 2008

Saturday Nov 01, 2008

Ramifications of the Solaris 10 kernel patch 137111


A recent code change in Solaris 10 inadvertantly exposed an inherent bug in some of the 32-bit applications that rely on their own memory allocators. Due to this, some of the 3rd party applications which were working earlier without the KU 137111 may crash on Solaris 10/SPARC with the KU 137111 (any revision).

Symptoms & the Cause

It was identified that majority of such application failures are mainly due to the applications' custom memory allocator that incorrectly returns 4-byte aligned mutexes in place of the required 8-byte aligned mutexes. In Solaris, mutex_t and pthread_mutex_t structures have been defined to be aligned on an 8-byte boundary. Both of those structures contain the upad64_t member, which is a double even for the 32-bit applications. The natural alignment of a double is 8 bytes; and per the SPARC Compliance Definition 2.4, the structures must be aligned according to their strictest member. That is, applications which create 4-byte aligned mutexes are technically non-compliant on Solaris/SPARC (for the sake of simplicity, such code will be referred to as the non-complying code for the remainder of this blog entry).

Due to a change in the implementation of the userland mutexes introduced by CR 6296770 in KU 137111-01, objects of type mutex_t and pthread_mutex_t must start at 8-byte aligned addresses. If this requirement is not satisfied, all non-compliant applications on Solaris/SPARC may fail with the signal SEGV with a callstack similar to the following one or with similar callstacks containing the function mutex_trylock_process.

  \*_atomic_cas_64(0x141f2c, 0x0, 0xff000000, 0x1651, 0xff000000, 0x466d90)
  set_lock_byte64(0x0, 0x1651, 0xff000000, 0x0, 0xfec82a00, 0x0)
  fast_process_lock(0x141f24, 0x0, 0x1, 0x1, 0x0, 0xfeae5780)

Patches & the Next Steps

Note that only non-compliant 32-bit applications will be affected by the KU 137111. All other complying 32-bit applications continue to run as expected even with the KU 137111 - hence the customers, partners, ISVs and the other software vendors must understand the fact that it is not a Solaris issue. Customers running into this issue must work with the respective software vendors to obtain a patch/fix. We suggest the ISVs and the rest of the software vendors to pro-actively check their 32-bit native code for any discrepancies like the one mentioned in this blog entry.

In our testing of some of the enterprise applications, we have identified Oracle's Siebel CRM as one of the potential applications that is vulnerable to the KU 137111. It appears that IBM's Lotus Domino Server is also prone to a crash on Solaris 10 with the same kernet patch. Speaking of these two known cases, Oracle/Siebel and IBM/Lotus Domino customers (running Solaris) should approach Oracle and IBM Corporations respectively but not Sun Microsystems for a proper fix.

As it may take some time for the ISVs / software vendors to identify and fix the non-complying code in their applications, Sun is planning to provide an interim fix to the mutex byte alignment issue in the form of a Solaris kernel patch. As of this writing, we expect the fix to be integrated into the KU 137137-07. The fix is already available in the latest update of the Solaris, Solaris 10 10/08. Those who cannot upgrade to Solaris 10 10/08 from the prior versions of Solaris 10 must wait for the patch KU [Updated 12/07/08] 137137-07 137137-09.

One must note that the fix in Solaris is a tentative one that allows the non-complying code to run on SPARC hardware for the time being. There is no guarantee that the non-complying code continues to run 'as is' in the future with new Solaris kernel patches and/or major updates/releases of the Solaris operating system. So the best long term solution is for the software vendors to fix the non-compliant code before it is too late.


Steve S and Roger F of Sun Microsystems.

Monday Oct 13, 2008

Siebel on Sun CMT hardware : Best Practices

The following suggested best practices are applicable to all Siebel deployments on CMT hardware (Tx00, T5x20, T5x40) running Solaris 10 [Note: some of this tuning applies to Siebel running on conventional hardware running Solaris]. These recommendations are based on our observations from the 14,000 user benchmark on Sun SPARC Enterprise T5440. Your mileage may vary.

All Tiers
  • Ensure that the system's firmware is up-to-date.

  • Upgrade to the latest update release of Solaris 10.

      Note to the customers running Siebel on Solaris 5/08: apply the kernel patch 137137-07 as soon as it is available on sunsolve.sun.com web site. Patch 137137-07 and later revisions, Solaris 10 10/08 will have the workaround to a critical Siebel specific bug. Oracle Corporation will eventually fix the bug in their codebase - in the meantime Solaris is covering for Siebel and all other 32-bit applications with their own memory allocators that return unaligned mutexes. Check the RFE 6729759 Need to accommodate non-8-byte-aligned mutexes and Oracle's Siebel support document 735451.1 Do NOT apply Kernel Patch 137111-04 on Solaris 10 for more details.

  • Enable 256M large pages on all nodes. By default, the latest update of Solaris 10 will use a maximum of 4M pages even when 256M pages are a good fit.

      256M pages can be enabled with the following /etc/system tunable.
      \* 256M pages
      set max_uheap_lpsize=0x10000000

  • Pro-actively avoid running into stdio's 256 file descriptors limitation.

      Set the following in a shell or add the following lines to the shell's profile (bash/ksh).
      ulimit -n 2048
      export LD_PRELOAD_32=/usr/lib/extendedFILE.so.1:$LD_PRELOAD_32

      Technically the file descriptor limit can be set to as high as 65536. However from the application's perspective, 2048 is a reasonable limit.

  • Improve scalability with MT-hot memory allocation library, libumem or libmtmalloc.

    To improve the scalability of the multi-threaded workloads, preload MT-hot object-caching memory allocation library like libumem(3LIB), mtmalloc(3MALLOC).

      eg., To preload the libumem library, set the LD_PRELOAD_32 environment variable in the shell (bash/ksh) as shown below.

      export LD_PRELOAD_32=/usr/lib/libumem.so.1:$LD_PRELOAD_32

      Web and the Application servers in the Siebel Enterprise stack are 32-bit. However Oracle 10g or 11g RDBMS on Solaris 10 SPARC is 64-bit. Hence the path to the libumem library in the PRELOAD statement differs slightly in the database-tier as shown below.

      export LD_PRELOAD_64=/usr/lib/sparcv9/libumem.so.1:$LD_PRELOAD_64

    Be aware that the trade-off is the increase in memory footprint -- you may notice 5 to 20% increase in the memory footprint with one of these MT-hot memory allocation libraries preloaded. Also not every Siebel application module benefits from MT-hot memory allocators. The recommendation is to experiment before implementing in production environments.

  • TCP/IP tunables

    Application fared well with the following set of TCP/IP parameters on Solaris 10 5/08.

    ndd -set /dev/tcp tcp_time_wait_interval 60000
    ndd -set /dev/tcp tcp_conn_req_max_q 1024
    ndd -set /dev/tcp tcp_conn_req_max_q0 4096
    ndd -set /dev/tcp tcp_ip_abort_interval 60000
    ndd -set /dev/tcp tcp_keepalive_interval 900000
    ndd -set /dev/tcp tcp_rexmit_interval_initial 3000
    ndd -set /dev/tcp tcp_rexmit_interval_max 10000
    ndd -set /dev/tcp tcp_rexmit_interval_min 3000
    ndd -set /dev/tcp tcp_smallest_anon_port 1024
    ndd -set /dev/tcp tcp_slow_start_initial 2
    ndd -set /dev/tcp tcp_xmit_hiwat 799744
    ndd -set /dev/tcp tcp_recv_hiwat 799744
    ndd -set /dev/tcp tcp_max_buf  8388608
    ndd -set /dev/tcp tcp_cwnd_max  4194304
    ndd -set /dev/tcp tcp_fin_wait_2_flush_interval 67500
    ndd -set /dev/udp udp_xmit_hiwat 799744
    ndd -set /dev/udp udp_recv_hiwat 799744
    ndd -set /dev/udp udp_max_buf 8388608

Siebel Application Tier
  • All T-series systems (T1000/T2000, T5120/T5220, T5120/T5240, T5440) support the 256M page size. However Siebel's siebmtshw script restricts the page size to 4M. Comment out the following lines in $SIEBEL_HOME/siebsrvr/bin/siebmtshw.
      # This will set 4M page size for Heap and 64 KB for stack

  • Experiment with less number of Siebel Object Managers.

      Configure the Object Managers in such a way that each OM will be handling at least 200 active users. Siebel's standard recommendation of 100 or less users per Object Manager is suitable for conventional systems but not ideal for CMT systems like Tx000, T5x20, T5x40, T5440. Sun's CMT systems are ideal for running multi-threaded processes with tons of LWPs per process. Besides, there will be significant improvement in the overall memory footprint with less number of Siebel Object Managers.

  • Try Oracle 11g R1 client in the application-tier. Oracle 10g R2 clients may crash under high load. For the symptoms of the crash, check Solaris/SPARC: Oracle 11gR1 client for Siebel 8.0.

      Oracle 10g R2 32-bit client is supposed to have a fix for the process crash issue - however it wasn't verified in our test environment.

Siebel Database Tier
  • Eliminate double buffering by forcing the file system to use direct I/O.

    Oracle database caches the data in its own cache within the shared global area (SGA) known as the database block buffer cache. Database reads and writes are cached in block buffer cache so the subsequent accesses for the same blocks do not need to re-read the data from the operating system. On the other hand, file systems on Solaris default to reading the data though the global file system cache for improved I/O operations. That is, by default each read is cached potentially twice - one copy in the operating system's file system cache, and the other copy in Oracle's block buffer cache. In addition to double caching, there is also some extra CPU overhead for the code which manages the operating system's file system cache. The solution is to eliminate double caching by forcing the file system to bypass the OS file system cache when reading and writing to the disk.

      In the 14,000 user benchmark setup, the UFS file systems (holding the data files and the redo logs) were mounted with the forcedirectio option.

      mount -o forcedirectio /dev/dsk/<partition> <mountpoint>

  • Store data files separate from the redo log files -- If the data files and the redo log files are stored on the same disk drive and if that disk drive fails, the files cannot be used in the database recovery procedures.

      In the 14,0000 user benchmark setup, there are two Sun StorateTek 2540 arrays connected to the T5440 - one array was holding the data files, where as the other was holding the Oracle redo log files.

  • Size online redo logs to control the frequency of log switches.

      In the 14,0000 user benchmark setup, two online redo logs were configured each with 10 GB disk space. When all 14,000 concurrent users are on-line, there is only one log switch in a 60 minute period.

  • If the storage array supports the read-ahead feature, enable it. When 'read-ahead enabled' is set to true, the write will be committed to the cache as opposed to the disk, and the OS signals the application that the write has been committed.

    Oracle Database Initialization Parameters

  • Set Oracle's initialization parameter DB_FILE_MULTIBLOCK_READ_COUNT to appropriate value. DB_FILE_MULTIBLOCK_READ_COUNT parameter specifies the maximum number of blocks read in one I/O operation during a sequential scan.

      In the 14,0000 user benchmark configuration, DB_BLOCK_SIZE was set to 8 kB. During the benchmark run, the average reads are around 18.5 kB per second. Hence setting DB_FILE_MULTIBLOCK_READ_COUNT to a high value does not necessarily improve the I/O performance. A value of 8 for the database init parameter DB_FILE_MULTIBLOCK_READ_COUNT seems to perform better.

  • On T5240 and T5440 servers, set the database initialization parameter CPU_COUNT to 64. Otherwise, by default Oracle RDBMS assumes 128 and 256 for the CPU_COUNT on T5240 and T5440 respectively. Oracle's optimizer might use a completely different execution plan when it notices such a large number for the CPU_COUNT; and the resulting execution plan need not necessarily be an optimal one. In the 14,000 user benchmark, setting CPU_COUNT to 64 produced optimal execution plans.

  • On T5240 and T5440 servers, explicitly set the database initialization parameter _enable_NUMA_optimization to FALSE. On these multi-socket servers, _enable_NUMA_optimization will be set to TRUE by default. During the 14,000 user benchmark run, we noticed intermittent shadow process crashes with the default behavior. We didn't realize any additional gains either with the default NUMA optimizations.

Siebel Web Tier
  • Upgrade to the latest service pack of Sun Java Web Server 6.1 (32-bit).

  • Run the Sun Java Web Server in multi-process mode by setting the MaxProcs directive in magnus.conf to a value that is greater than 1. In the multi-process mode, the web server can handle requests using multiple processes with multiple threads in each process.

      When you specify a value greater than 1 for the MaxProcs, the web server relies on the operating system to distribute connections among/between multiple web server processes. However many modern operating systems including Solaris do not distribute connections evenly, particularly when there are a small number of concurrent connections.

  • Tune the maximum simultaneous requests by setting the RqThrottle parameter in the magnus.conf file to appropriate value. A value of 1024 was used in the 14,000 user benchmark.

Benchmark announcements, HOW-TOs, Tips and Troubleshooting


« August 2015