Donnerstag Feb 16, 2017

Resource Juggling with Zones

Using resource pools to control the amount of CPU available to zones is not exactly a new feature.  In fact, for a long time, creating pools and binding zones to them was the only way to implement resource management for zones.  In more recent versions of Solaris, properties like "capped-cpu" were added to the zone configuration, making much of this easier to configure.  In many cases, the mechanisms provide all you need.

One aspect, however, is not well covered in the documentation: Shared resource pools.  Since I was recently asked to explain this to a customer, I thought I might as well publish it here for everyone's benefit...

Let's imagine this scenario:

  • We have a two tier application consisting of an application and a database tier.
  • Both software tiers come with a core-based licensing scheme similar to what Oracle uses for its products.  For both, it is legal to use resource pools for license capping.
  • We own licenses worth 2 cores for each of the two tiers: 2 for the application and 2 for the database.
  • We want two environments for this application - one for production and one for testing.  Unfortunately, we need all 2 cores for production, and we can't afford additional licenses for the testing environment.

The obvious solution is to share the resources for both test and production.  If required, we can also throttle the test environment to give priority to production.  Here's how to do this:

In a first step, let's look at the global zone's CPU resources and create the two production zones.  Before we start working with pools, they'll just share everything with the global zone.  I'll be using the "zonecores" script from here to visualize some of this.

root@mars:~# ./zonecores  -l
# Socket, Core, Strand and Zone Overview
Socket	Core	Strands	Zones
2	0	0,1,2,3,4,5,6,7 none
2	1	8,9,10,11,12,13,14,15 none
2	2	16,17,18,19,20,21,22,23 none
2	3	24,25,26,27,28,29,30,31 none
2	4	32,33,34,35,36,37,38,39 none
2	5	40,41,42,43,44,45,46,47 none
2	6	48,49,50,51,52,53,54,55 none
2	7	56,57,58,59,60,61,62,63 none
2	8	64,65,66,67,68,69,70,71 none
2	9	72,73,74,75,76,77,78,79 none
2	10	80,81,82,83,84,85,86,87 none
2	11	88,89,90,91,92,93,94,95 none
2	12	96,97,98,99,100,101,102,103 none
2	13	104,105,106,107,108,109,110,111 none
2	14	112,113,114,115,116,117,118,119 none
2	15	120,121,122,123,124,125,126,127 none
2	16	128,129,130,131,132,133,134,135 none
2	17	136,137,138,139,140,141,142,143 none
2	18	144,145,146,147,148,149,150,151 none
2	19	152,153,154,155,156,157,158,159 none

root@mars:~# zoneadm list -ivc
  ID NAME             STATUS      PATH                         BRAND      IP    
   0 global           running     /                            solaris    shared
   3 db-prod          running     /system/zones/db-prod        solaris    excl  
   4 app-prod         running     /system/zones/app-prod       solaris    excl  

root@mars:~# for i in db-prod app-prod; do zonecfg -z $i info pool; done

root@mars:~# ./zonecores 
# Checking Whole Core Assignments
OK - Zone db-prod using default pool.
OK - Zone app-prod using default pool.

root@mars:~# zlogin app-prod 'psrinfo |wc -l'

In the above example, we see the following:

  • The global zone has 20 cores (obviously some sort of LDom) which, at 8 strands/core, give 160 vCPUs.
  • Right now, none of these cores are assigned to any zone's resource pool.
  • We have two zones up and running.  Neither of them have any resource pool assigned to them right now.
  • So obviously, the two zones share all 160 vCPUs with the global zone.  Which also means that the "CPU count" in the zones is 160. 

Next, we'll create two resource pools, one for each application tier, and associate each with a processor set with 2 cores.  Then we'll bind each zone to its resource pool and recount CPUs.

root@mars:~# poolcfg -c 'create pool app-pool'
root@mars:~# poolcfg -c 'create pool db-pool'
root@mars:~# poolcfg -c 'create pset app-pset (uint pset.min = 16 ; 
                                               uint pset.max = 16 )'
root@mars:~# poolcfg -c 'create pset db-pset (uint pset.min = 16 ; 
                                              uint pset.max = 16 )'
root@mars:~# poolcfg -c 'associate pool app-pool ( pset app-pset) '
root@mars:~# poolcfg -c 'associate pool db-pool ( pset db-pset) '
root@mars:~# pooladm -c

root@mars:~# poolstat
 id pool                 size used load
  0 pool_default          128 0.00 0.58
  3 db-pool                16 0.00 0.14
  2 app-pool               16 0.00 0.15

root@mars:~# psrset
user processor set 1: processors 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
user processor set 2: processors 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

root@mars:~# psrinfo |wc -l

root@mars:~# zonecfg -z app-prod set pool=app-pool
root@mars:~# zoneadm -z app-prod apply
zone 'app-prod': Checking: Setting pool=app-pool
zone 'app-prod': Applying the changes
root@mars:~# zonecfg -z app-prod info pool
pool: app-pool

root@mars:~# zonecfg -z db-prod set pool=db-pool
root@mars:~# zoneadm -z db-prod apply
zone 'db-prod': Checking: Setting pool=db-pool
zone 'db-prod': Applying the changes
root@mars:~# zonecfg -z db-prod info pool
pool: db-pool

root@mars:~# zlogin app-prod 'psrinfo |wc -l'

root@mars:~# zlogin db-prod 'psrinfo |wc -l'

root@mars:~# ./zonecores 
# Checking Whole Core Assignments
OK - Zone db-prod using all 8 strands of core 0.
OK - Zone db-prod using all 8 strands of core 1.
OK - Zone app-prod using all 8 strands of core 2.
OK - Zone app-prod using all 8 strands of core 3.

Things to note in the example above:

  • The resource pool facility was already enabled.  If that's not the case, run "pooladm -e" first.
  • Whenever you change the pool configuration, you need to persist and enable those changes with "pooladm -c".
  • The binding of the zones to their pools was done using zone live reconfiguration.  So the zones were not rebooted for this operation.
  • Once complete, each zone now has exclusive use of 2 cores.  The other 16 cores remain for the global zone.

Next, we'll need the two zones for the testing environment.  Right after creation, they'll share CPUs with the global zone.  We can't afford that, so we'll assign them to their respective tier's pools.

root@mars:~# zoneadm list -ivc
  ID NAME             STATUS      PATH                         BRAND      IP    
   0 global           running     /                            solaris    shared
   3 db-prod          running     /system/zones/db-prod        solaris    excl  
   7 app-prod         running     /system/zones/app-prod       solaris    excl  
   8 app-test         running     /system/zones/app-test       solaris    excl  
   9 db-test          running     /system/zones/db-test        solaris    excl  

root@mars:~# for i in db-test app-test ; do zonecfg -z $i info pool ; done

root@mars:~# ./zonecores 
# Checking Whole Core Assignments
OK - Zone db-prod using all 8 strands of core 0.
OK - Zone db-prod using all 8 strands of core 1.
OK - Zone app-prod using all 8 strands of core 2.
OK - Zone app-prod using all 8 strands of core 3.
OK - Zone app-test using default pool.
OK - Zone db-test using default pool.

root@mars:~# zonecfg -z app-test set pool=app-pool
root@mars:~# zoneadm -z app-test apply
zone 'app-test': Checking: Setting pool=app-pool
zone 'app-test': Applying the changes
root@mars:~# zonecfg -z db-test  set pool=db-pool
root@mars:~# zoneadm -z db-test apply
zone 'db-test': Checking: Setting pool=db-pool
zone 'db-test': Applying the changes

root@mars:~# for i in db-test app-test ; do zonecfg -z $i info pool ; done
pool: db-pool
pool: app-pool

root@mars:~# zlogin db-test 'psrinfo |wc -l'

root@mars:~# ./zonecores 
# Checking Whole Core Assignments
OK - Zone db-prod using all 8 strands of core 0.
OK - Zone db-prod using all 8 strands of core 1.
OK - Zone app-prod using all 8 strands of core 2.
OK - Zone app-prod using all 8 strands of core 3.
OK - Zone app-test using all 8 strands of core 2.
OK - Zone app-test using all 8 strands of core 3.
OK - Zone db-test using all 8 strands of core 0.
OK - Zone db-test using all 8 strands of core 1.

root@mars:~# ./zonecores -s
# Checking Core Resource Sharing
INFO - Core 0 used by 2 zones!
	 -> db-prod
	 -> db-test
INFO - Core 1 used by 2 zones!
	 -> db-prod
	 -> db-test
INFO - Core 2 used by 2 zones!
	 -> app-prod
	 -> app-test
INFO - Core 3 used by 2 zones!
	 -> app-prod
	 -> app-test

So now we have two pairs of zones that each share their common resource pool.  Let's assume that we ran out of steam in the application tier and our friendly license manager granted a license upgrade to 3 cores for the application.  Of course we don't want to spoil the party by shutting down the application for this change:

root@mars:~# poolcfg -c 'modify pset app-pset (uint pset.min = 24; 
                                               uint pset.max = 24)'
root@mars:~# pooladm -c
root@mars:~# poolstat
 id pool                 size used load
  0 pool_default          120 0.00 0.03
  3 db-pool                16 0.00 0.00
  2 app-pool               24 0.00 0.00

root@mars:~# ./zonecores -sc
# Checking Core Resource Sharing
INFO - Core 0 used by 2 zones!
	 -> db-prod
	 -> db-test
INFO - Core 1 used by 2 zones!
	 -> db-prod
	 -> db-test
INFO - Core 2 used by 2 zones!
	 -> app-prod
	 -> app-test
INFO - Core 3 used by 2 zones!
	 -> app-prod
	 -> app-test
INFO - Core 4 used by 2 zones!
	 -> app-prod
	 -> app-test
# Checking Whole Core Assignments
OK - Zone db-prod using all 8 strands of core 0.
OK - Zone db-prod using all 8 strands of core 1.
OK - Zone app-prod using all 8 strands of core 2.
OK - Zone app-prod using all 8 strands of core 3.
OK - Zone app-prod using all 8 strands of core 4.
OK - Zone app-test using all 8 strands of core 2.
OK - Zone app-test using all 8 strands of core 3.
OK - Zone app-test using all 8 strands of core 4.
OK - Zone db-test using all 8 strands of core 0.
OK - Zone db-test using all 8 strands of core 1.

Great!  We added the core to the application environment on the fly.

Of course, we want to make sure that the test environment doesn't starve production by using too much CPU.  There are two simple ways to achieve this:  We can, within the pool, limit the number of CPUs available to the test zones.  This is also called CPU capping.  Or we can use the Solaris Fair Share Scheduler to guarantee a certain percentage of all available CPU to production.  In the next example, we'll limit the test database zone to just 1 core and configure CPU shares for the application environment to give production a 75% guarantee:

root@mars:~# zonecfg -z app-prod set scheduling-class=FSS
root@mars:~# zonecfg -z app-prod set cpu-shares=300      
root@mars:~# zonecfg -z app-test set scheduling-class=FSS
root@mars:~# zonecfg -z app-test set cpu-shares=100
root@mars:~# zoneadm -z app-prod apply
zone 'app-prod': Checking: Adding rctl name=zone.cpu-shares
zone 'app-prod': Checking: Setting scheduling-class=FSS
zone 'app-prod': Applying the changes
root@mars:~# zoneadm -z app-test apply
zone 'app-test': Checking: Adding rctl name=zone.cpu-shares
zone 'app-test': Checking: Setting scheduling-class=FSS
zone 'app-test': Applying the changes

root@mars:~# zonecfg -z db-test 'add capped-cpu;set ncpus=8;end;commit'
root@mars:~# zoneadm -z db-test apply
zone 'db-test': Checking: Modifying rctl name=zone.cpu-cap
zone 'db-test': Applying the changes

Again, a few things to note:

  • We changed the default scheduler class to the Fair Share Scheduler in the application zones.  Although this works with the zone running, I recommend to restart the zone at a convenient time for best results.
  • Potentially, you could also combine both controls: Capped CPU and CPU shares.  But this makes configuration rather confusing and also somewhat defeats the purpose of CPU shares.  I don't recommend this.
  • In SuperCluster environments, the Fair Share Scheduler is not supported for database environments.  I recommend to adhere to this in other installations as well, at least whenever RAC is involved.  Of course, resource pools without the FSS are very much supported in database environments on SuperCluster.  This is how database zones are configured by default.

Finally, a word about licensing.  The above example uses license restrictions as a motivation for CPU pools.  The way it is implemented in this example should, to the best of my knowledge, also satisfy the requirements for hard partitioning and thus license capping of Oracle core based licenses.  However, only Oracle's License Management Services is authorized to bless such a configuration, so I strongly recommend to validate any configuration with them before using.  Of course, the same is true for any other software vendor's license capping rules.  Always get the software vendor's blessing for such a configuration.

As always, here are some links to documentation and references for further reading:

Freitag Jul 22, 2016

Setting up Owncloud on Solaris

I recently had this private little project to try out Owncloud and Nextcloud for personal use.  But since I tried it on Solaris, I thought I might as well share a short summary here for whoever might find it useful.

To deploy either Owncloud or Nextcloud on Solaris, you generally follow the commandline installation instructions.  They are very short and straightforward.  In general, use the Linux manual installation for guidance. However, there are a few Solaris specifics like package dependencies, which are not documented.  Here's what you'll need to do:

  • I installed in a non-global zone (targeting to make it immutable once it's all up and running).  To resolve all the dependencies, you'll need to install these packages right after deploying the empty zone (not sure I need all those apache packages...):
  • Make sure your zone has internet access and DNS resolution.  It will need it to use the Owncloud/Nextcloud appstore.
  • It is easiest to install and run Owncloud/Nextcloud as webservd, since then you don't have to bother with tweaking apache into using a different user.
  • You'll need to enable a few extensions for php.  You do this in /ec/php/5.6/conf.d/extensions.ini  Here are the ones I enabled, I'm not sure I need them all...
  • Create a config file for the mysql extension in /etc/php/5.6/conf.d/mysql.ini.  I took the example from the Admin Guide.
  • I wanted to have a separate ZFS dataset for the software, the data and the mysql database.  This would give me snapshot capability as well as write access to the data once the zone is immutable.
    • Delegate a ZFS dataset to the zone.
      zonecfg -z nextcloud info dataset
      	name: datapool/nextcloud
      	alias: nextcloud
    • Create some filesystems in the dataset to host software, data and database
      root@nextcloud:~# zfs list -r nextcloud
      nextcloud          243M  2.52T  38.6K  /nextcloud
      nextcloud/apache  38.0K  2.52T  38.0K  /nextcloud/apache
      nextcloud/data    17.5M  2.52T  17.5M  /nextcloud/server/nextcloud/data
      nextcloud/mysql    146M  2.52T   146M  /nextcloud/mysql
      nextcloud/server  79.2M  2.52T  79.2M  /nextcloud/server
    • Change the mysql default to point to the new location:
      svccfg -s mysql:version_56 setprop mysql/data=/nextcloud/mysql/data 
      svccfg -s mysql:version_56 refresh
  • Now just follow the Admin Guide to create the mysql database:
    svcadm enable mysql
    mysqladmin -u root password "secret"
    mysql -u root -p
    mysql> create user 'admin'@'localhost' identified by 'secret';
    Query OK, 0 rows affected (0.25 sec)
    mysql> create database if not exists nextcloud ;
    Query OK, 1 row affected (0.00 sec)
    mysql> GRANT ALL PRIVILEGES ON nextcloud.* TO 'admin'@'localhost' identified by 'secret';
    Query OK, 0 rows affected (0.00 sec)
  • And finally, perform the installation:
    php occ maintenance:install --database "mysql" --database-name "nextcloud" --database-user "root" --database-pass "secret"\
    --admin-user "admin" --admin-pass "secret"
  • The rest is no different to the Linux installation.  You'll need to configure apache to serve the application.  Don't forget to do this with SSL if you're actually running this on the internet!
  • Don't forget to tighten file security as described in the Admin Guide!
  • Once done, I turned my zone immutable for additional security.  For this to work, I had to redirect the apache logs to a writable directory, so I created another zfs dataset in the nextcloud pool and had apache send it's logs there.  To turn immutability on, just do
    zoneadm -z nextcloud halt
    zonecfg -z nextcloud set file-mac-profile=fixed-configuration
    zoneadm -z nextcloud boot

Have fun!

Mittwoch Jun 12, 2013

Growing the root pool

Some small inbetween laptop experiences...  I finally decided to throw away that other OS (I used it so rarely that I regularily had to use the password reset procedure...).  That gave me another 50g of valuable laptop disk space - furtunately on the right part of the disk.  So in theory, all I'd have to do is resize the Solaris partition, tell ZFS about it and be happy...  Of course, there are the usual pitfalls.

To avoid confusion, much of this is x86 related.  On normal SPARC servers, you don't have any of the problems for which I describe solutions here...

First of all, you should *not* try to resize the partition that hosts your rpool while Solaris is up and running.  It works, but there are nicer ways to do a shutdown.  (What happens is that fdisk will not only create the new partition, but also write a default label in that partition, which means that ZFS will not find it's slice, which will make Solaris very unresponsive...)  The right way to do this is to boot off something else (PXE, USB, DVD, whatever) and then change the partition size.  Once that's done, re-create the slice for the ZFS rpool.  The important part is to use the very same starting cylinder.  The length, naturally, will be larger.  (At least, I had to do that, since the original zpool lived in a slice.)

After that, it's back to the book:  Boot Solaris and choose one of "zpool set autoexpand=on rpool" or "zpool online -e rpool c0t0d0s0" and there you go - 50g more space.

Did I forget to mention that I actually did a full backup before all of this?  I must be getting old...

Mittwoch Nov 07, 2012

20 Years of Solaris - 25 Years of SPARC!

I don't usually duplicate what can be found elsewhere.  But this is worth an exception.

20 Years of Solaris - Guess who got all those innovation awards!
25 Years of SPARC - And the future has just begun :-)

Check out those pages for some links pointing to the past, and, more interesting, to the future...

There are also some nice videos: 20 Years of Solaris - 25 Years of SPARC

(Come to think of it - I got to be part of all but the first 4 years of Solaris.  I must be getting older...)

Dienstag Apr 17, 2012

Solaris Zones: Virtualization that Speeds up Benchmarks

One of the first questions that typically comes up when I talk to customers about virtualization is the overhead involved.  Now we all know that virtualization with hypervisors comes with an overhead of some sort.  We should also all know that exactly how big that overhead is depends on the type of workload as much as it depends on the hypervisor used.  While there have been attempts to create standard benchmarks for this, quantifying hypervisor overhead is still mostly hidden in the mists of marketing and benchmark uncertainty.  However, what always raises eyebrows is when I come to Solaris Zones (called Containers in Solaris 10) as an alternative to hypervisor virtualization.  Since Zones are, greatly simplyfied, nothing more than a group of Unix processes contained by a set of rules which are enforced by the Solaris kernel, it is quite evident that there can't be much overhead involved.  Nevertheless, since many people think in hypervisor terms, there is almost always some doubt about this claim of zero overhead.  And as much as I find the explanation with technical details compelling, I also understand that seeing is so much better than believing.  So - look and see:

The Oracle benchmark teams are so convinced of the advantages of Solaris Zones that they actually use them in the configurations for public benchmarking.  Solaris resource management will also work in a non Zones environment, but Zones make it just so much easier to handle, especially with some of the more complex benchmark configurations.  There are numerous benchmark publications available using Solaris Containers, dating back to the days of the T5440.  Some recent examples, all of them world records, are:

The use of Solaris Zones is documented in all of these benchmark publications.

The benchmarking team also published a blog entry detailing how they make use of resource management with Solaris Zones to actually increase application performance.  That almost asks for calling this "negative overhead", if the term weren't somewhat misleading.

So, if you ever need to substantiate why Solaris Zones have no virtualization overhead, point to these (and probably some more) published benchmarks.

Montag Mrz 19, 2012

Setting up a local AI server - easy with Solaris 11

Many things are new in Solaris 11, Autoinstall is one of them.  If, like me, you've known Jumpstart for the last 2 centuries or so, you'll have to start from scratch.  Well, almost, as the concepts are similar, and it's not all that difficult.  Just new.

I wanted to have an AI server that I could use for demo purposes, on the train if need be.  That answers the question of hardware requirements: portable.  But let's start at the beginning.

First, you need an OS image, of course.  In the new world of Solaris 11, it is now called a repository.  The original can be downloaded from the Solaris 11 page at Oracle.   What you want is the "Oracle Solaris 11 11/11 Repository Image", which comes in two parts that can be combined using cat.  MD5 checksums for these (and all other downloads from that page) are available closer to the top of the page.

With that, building the repository is quick and simple:

# zfs create -o mountpoint=/export/repo rpool/ai/repo
# zfs create rpool/ai/repo/sol11
# mount -o ro -F hsfs /tmp/sol-11-1111-repo-full.iso /mnt
# rsync -aP /mnt/repo /export/repo/sol11
# umount /mnt
# pkgrepo rebuild -s /export/repo/sol11/repo
# zfs snapshot rpool/ai/repo/sol11@fcs
# pkgrepo info -s  /export/repo/sol11/repo
solaris   4292     online           2012-03-12T20:47:15.378639Z
That's all there's to it.  Let's make a snapshot, just to be on the safe side.  You never know when one will come in handy.  To use this repository, you could just add it as a file-based publisher:
# pkg set-publisher -g file:///export/repo/sol11/repo solaris
In case I'd want to access this repository through a (virtual) network, i'll now quickly activate the repository-service:
# svccfg -s application/pkg/server \
setprop pkg/inst_root=/export/repo/sol11/repo
# svccfg -s application/pkg/server setprop pkg/readonly=true
# svcadm refresh application/pkg/server
# svcadm enable application/pkg/server

That's all you need - now point your browser to http://localhost/ to view your beautiful repository-server. Step 1 is done.  All of this, by the way, is nicely documented in the README file that's contained in the repository image.

Of course, we already have updates to the original release.  You can find them in MOS in the Oracle Solaris 11 Support Repository Updates (SRU) Index.  You can simply add these to your existing repository or create separate repositories for each SRU.  The individual SRUs are self-sufficient and incremental - SRU4 includes all updates from SRU2 and SRU3.  With ZFS, you can also get both: A full repository with all updates and at the same time incremental ones up to each of the updates:

# mount -o ro -F hsfs /tmp/sol-11-1111-sru4-05-incr-repo.iso /mnt
# pkgrecv -s /mnt/repo -d /export/repo/sol11/repo '*'
# umount /mnt
# pkgrepo rebuild -s /export/repo/sol11/repo
# zfs snapshot rpool/ai/repo/sol11@sru4
# zfs set snapdir=visible rpool/ai/repo/sol11
# svcadm restart svc:/application/pkg/server:default
The normal repository is now updated to SRU4.  Thanks to the ZFS snapshots, there is also a valid repository of Solaris 11 11/11 without the update located at /export/repo/sol11/.zfs/snapshot/fcs . If you like, you can also create another repository service for each update, running on a separate port.

But now lets continue with the AI server.  Just a little bit of reading in the dokumentation makes it clear that we will need to run a DHCP server for this.  Since I already have one active (for my SunRay installation) and since it's a good idea to have these kinds of services separate anyway, I decided to create this in a Zone.  So, let's create one first:

# zfs create -o mountpoint=/export/install rpool/ai/install
# zfs create -o mountpoint=/zones rpool/zones
# zonecfg -z ai-server
zonecfg:ai-server> create
create: Using system default template 'SYSdefault'
zonecfg:ai-server> set zonepath=/zones/ai-server
zonecfg:ai-server> add dataset
zonecfg:ai-server:dataset> set name=rpool/ai/install
zonecfg:ai-server:dataset> set alias=install
zonecfg:ai-server:dataset> end
zonecfg:ai-server> commit
zonecfg:ai-server> exit
# zoneadm -z ai-server install
# zoneadm -z ai-server boot ; zlogin -C ai-server
Give it a hostname and IP address at first boot, and there's the Zone.  For a publisher for Solaris packages, it will be bound to the "System Publisher" from the Global Zone.  The /export/install filesystem, of course, is intended to be used by the AI server.  Let's configure it now:
#zlogin ai-server
root@ai-server:~# pkg install install/installadm
root@ai-server:~# installadm create-service -n x86-fcs -a i386 \
-s pkg://solaris/install-image/solaris-auto-install@5.11,5.11- \
-d /export/install/fcs -i -c 3

With that, the core AI server is already done.  What happened here?  First, I installed the AI server software.  IPS makes that nice and easy.  If necessary, it'll also pull in the required DHCP-Server and anything else that might be missing.  Watch out for that DHCP server software.  In Solaris 11, there are two different versions.  There's the one you might know from Solaris 10 and earlier, and then there's a new one from ISC.  The latter is the one we need for AI.  The SMF service names of both are very similar.  The "old" one is "svc:/network/dhcp-server:default". The ISC-server comes with several SMF-services. We at least need "svc:/network/dhcp/server:ipv4". 

The command "installadm create-service" creates the installation-service. It's called "x86-fcs", serves the "i386" architecture and gets its boot image from the repository of the system publisher, using version 5.11,5.11-, which is Solaris 11 11/11.  (The option "-a i386" in this example is optional, since the installserver itself runs on a x86 machine.) The boot-environment for clients is created in /export/install/fcs and the DHCP-server is configured for 3 IP-addresses starting at  This configuration is stored in a very human readable form in /etc/inet/dhcpd4.conf.  An AI-service for SPARC systems could be created in the very same way, using "-a sparc" as the architecture option.

Now we would be ready to register and install the first client.  It would be installed with the default "solaris-large-server" using the publisher "" and would query it's configuration interactively at first boot.  This makes it very clear that an AI-server is really only a boot-server.  The true source of packets to install can be different.  Since I don't like these defaults for my demo setup, I did some extra config work for my clients.

The configuration of a client is controlled by manifests and profiles.  The manifest controls which packets are installed and how the filesystems are layed out.  In that, it's very much like the old "rules.ok" file in Jumpstart.  Profiles contain additional configuration like root passwords, primary user account, IP addresses, keyboard layout etc.  Hence, profiles are very similar to the old sysid.cfg file.

The easiest way to get your hands on a manifest is to ask the AI server we just created to give us it's default one.  Then modify that to our liking and give it back to the installserver to use:

root@ai-server:~# mkdir -p /export/install/configs/manifests
root@ai-server:~# cd /export/install/configs/manifests
root@ai-server:~# installadm export -n x86-fcs -m orig_default \
-o orig_default.xml
root@ai-server:~# cp orig_default.xml s11-fcs.small.local.xml
root@ai-server:~# vi s11-fcs.small.local.xml
root@ai-server:~# more s11-fcs.small.local.xml
<!DOCTYPE auto_install SYSTEM "file:///usr/share/install/ai.dtd.1">
  <ai_instance name="S11 Small fcs local">
        <zpool name="rpool" is_root="true">
          <filesystem name="export" mountpoint="/export"/>
          <filesystem name="export/home"/>
          <be name="solaris"/>
    <software type="IPS">
          <!-- Specify locales to install -->
          <facet set="false">facet.locale.*</facet>
          <facet set="true"></facet>
          <facet set="true">facet.locale.de_DE</facet>
          <facet set="true">facet.locale.en</facet>
          <facet set="true">facet.locale.en_US</facet>
        <publisher name="solaris">
          <origin name=""/>
        By default the latest build available, in the specified IPS
        repository, is installed.  If another build is required, the
        build number has to be appended to the 'entire' package in the
        following form:

      <software_data action="install">

root@ai-server:~# installadm create-manifest -n x86-fcs -d \
-f ./s11-fcs.small.local.xml 
root@ai-server:~# installadm list -m -n x86-fcs
Manifest             Status    Criteria 
--------             ------    -------- 
S11 Small fcs local  Default   None
orig_default         Inactive  None

The major points in this new manifest are:

  • Install "solaris-small-server"
  • Install a few locales less than the default.  I'm not that fluid in French or Japanese...
  • Use my own package service as publisher, running on IP address
  • Install the initial release of Solaris 11:  pkg:/entire@0.5.11,5.11-

Using a similar approach, I'll create a default profile interactively and use it as a template for a few customized building blocks, each defining a part of the overall system configuration.  The modular approach makes it easy to configure numerous clients later on:

root@ai-server:~# mkdir -p /export/install/configs/profiles
root@ai-server:~# cd /export/install/configs/profiles
root@ai-server:~# sysconfig create-profile -o default.xml
root@ai-server:~# cp default.xml general.xml; cp default.xml mars.xml
root@ai-server:~# cp default.xml user.xml
root@ai-server:~# vi general.xml mars.xml user.xml
root@ai-server:~# more general.xml mars.xml user.xml
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<service_bundle type="profile" name="sysconfig">
  <service version="1" type="service" name="system/timezone">
    <instance enabled="true" name="default">
      <property_group type="application" name="timezone">
        <propval type="astring" name="localtime" value="Europe/Berlin"/>
  <service version="1" type="service" name="system/environment">
    <instance enabled="true" name="init">
      <property_group type="application" name="environment">
        <propval type="astring" name="LANG" value="C"/>
  <service version="1" type="service" name="system/keymap">
    <instance enabled="true" name="default">
      <property_group type="system" name="keymap">
        <propval type="astring" name="layout" value="US-English"/>
  <service version="1" type="service" name="system/console-login">
    <instance enabled="true" name="default">
      <property_group type="application" name="ttymon">
        <propval type="astring" name="terminal_type" value="vt100"/>
  <service version="1" type="service" name="network/physical">
    <instance enabled="true" name="default">
      <property_group type="application" name="netcfg">
        <propval type="astring" name="active_ncp" value="DefaultFixed"/>
  <service version="1" type="service" name="system/name-service/switch">
    <property_group type="application" name="config">
      <propval type="astring" name="default" value="files"/>
      <propval type="astring" name="host" value="files dns"/>
      <propval type="astring" name="printer" value="user files"/>
    <instance enabled="true" name="default"/>
  <service version="1" type="service" name="system/name-service/cache">
    <instance enabled="true" name="default"/>
  <service version="1" type="service" name="network/dns/client">
    <property_group type="application" name="config">
      <property type="net_address" name="nameserver">
          <value_node value=""/>
    <instance enabled="true" name="default"/>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<service_bundle type="profile" name="sysconfig">
  <service version="1" type="service" name="network/install">
    <instance enabled="true" name="default">
      <property_group type="application" name="install_ipv4_interface">
        <propval type="astring" name="address_type" value="static"/>
        <propval type="net_address_v4" name="static_address" 
        <propval type="astring" name="name" value="net0/v4"/>
        <propval type="net_address_v4" name="default_route" 
      <property_group type="application" name="install_ipv6_interface">
        <propval type="astring" name="stateful" value="yes"/>
        <propval type="astring" name="stateless" value="yes"/>
        <propval type="astring" name="address_type" value="addrconf"/>
        <propval type="astring" name="name" value="net0/v6"/>
  <service version="1" type="service" name="system/identity">
    <instance enabled="true" name="node">
      <property_group type="application" name="config">
        <propval type="astring" name="nodename" value="mars"/>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<service_bundle type="profile" name="sysconfig">
  <service version="1" type="service" name="system/config-user">
    <instance enabled="true" name="default">
      <property_group type="application" name="root_account">
        <propval type="astring" name="login" value="root"/>
        <propval type="astring" name="password" 
        <propval type="astring" name="type" value="role"/>
      <property_group type="application" name="user_account">
        <propval type="astring" name="login" value="stefan"/>
        <propval type="astring" name="password" 
        <propval type="astring" name="type" value="normal"/>
        <propval type="astring" name="description" value="Stefan Hinker"/>
        <propval type="count" name="uid" value="12345"/>
        <propval type="count" name="gid" value="10"/>
        <propval type="astring" name="shell" value="/usr/bin/bash"/>
        <propval type="astring" name="roles" value="root"/>
        <propval type="astring" name="profiles" value="System Administrator"/>
        <propval type="astring" name="sudoers" value="ALL=(ALL) ALL"/>
root@ai-server:~# installadm create-profile -n x86-fcs -f general.xml
root@ai-server:~# installadm create-profile -n x86-fcs -f user.xml
root@ai-server:~# installadm create-profile -n x86-fcs -f mars.xml \
-c ipv4=
root@ai-server:~# installadm list -p

Service Name  Profile     
------------  -------     
x86-fcs       general.xml

root@ai-server:~# installadm list -n x86-fcs -p

Profile      Criteria 
-------      -------- 
general.xml  None
mars.xml     ipv4 =
user.xml     None

Here's the idea behind these files:

  • "general.xml" contains settings valid for all my clients.  Stuff like DNS servers, for example, which in my case will always be the same.
  • "user.xml" only contains user definitions.  That is, a root password and a primary user.
    Both of these profiles will be valid for all clients (for now).
  • "mars.xml" defines network settings for an individual client.  This profile is associated with an IP-Address.  For this to work, I'll have to tweak the DHCP-settings in the next step:
root@ai-server:~# installadm create-client -e 08:00:27:AA:3D:B1 -n x86-fcs
root@ai-server:~# vi /etc/inet/dhcpd4.conf
root@ai-server:~# tail -5 /etc/inet/dhcpd4.conf
host 080027AA3DB1 {
  hardware ethernet 08:00:27:AA:3D:B1;
  filename "01080027AA3DB1";

This completes the client preparations.  I manually added the IP-Address for mars to /etc/inet/dhcpd4.conf.  This is needed for the "mars.xml" profile.  Disabling arbitrary DHCP-replies will shut up this DHCP server, making my life in a shared environment a lot more peaceful ;-)

Note: The above example shows the configuration for x86 clients.  SPARC clients have a slightly different entry in the dhcp config file, again with some manual tweaking to create a fixed IP address for my client:

subnet netmask {
  option broadcast-address;
  option routers;

class "SPARC" {
  match if not (substring(option vendor-class-identifier, 0, 9) = "PXEClient");
  filename "";

host sparcy {
   hardware ethernet 00:14:4f:fb:52:3c ;
   fixed-address ;
Now, I of course want this installation to be completely hands-off.  For this to work, I'll need to modify the grub boot menu for this client slightly.  You can find it in /etc/netboot.  "installadm create-client" will create a new boot menu for every client, identified by the client's MAC address.  The template for this can be found in a subdirectory with the name of the install service, /etc/netboot/x86-fcs in our case.  If you don't want to change this manually for every client, modify that template to your liking instead.
root@ai-server:~# cd /etc/netboot
root@ai-server:~# cp menu.lst.01080027AA3DB1
root@ai-server:~# vi menu.lst.01080027AA3DB1
root@ai-server:~# diff menu.lst.01080027AA3DB1
< default=1
< timeout=10
> default=0
> timeout=30
root@ai-server:~# more menu.lst.01080027AA3DB1

title Oracle Solaris 11 11/11 Text Installer and command line
	kernel$ /x86-fcs/platform/i86pc/kernel/$ISADIR/unix -B install_media=htt
	module$ /x86-fcs/platform/i86pc/$ISADIR/boot_archive

title Oracle Solaris 11 11/11 Automated Install
	kernel$ /x86-fcs/platform/i86pc/kernel/$ISADIR/unix -B install=true,inst
	module$ /x86-fcs/platform/i86pc/$ISADIR/boot_archive

Now just boot the client off the network using PXE-boot.  For my demo purposes, that's a client from VirtualBox, of course.   Again, if this were a SPARC system, you'd instead be typing "boot net:dhcp - install" at the OK prompt and then just watch the installation.

That's all there's to it.  And despite the fact that this blog entry is a little longer - that wasn't that hard now, was it?

Mittwoch Feb 29, 2012

Solaris Fingerprint Database - How it's done in Solaris 11

Many remember the Solaris Fingerprint Database. It was a great tool to verify the integrity of a solaris binary.  Unfortunately, it went away with the rest of sunsolve, and was not revived in the replacement, "My Oracle Support".  Here's the good news:  It's back for Solaris 11, and it's better than ever!

It is now totally integrated with IPS...  Read more

[Read More]

Montag Feb 20, 2012

Solaris 11 submitted for EAL4+ certification

Solaris 11 has been submitted for certification by the Canadian Common Criteria Scheme in Level EAL4+. They will be certifying against the protection profile "Operating System Protection Profile (OS PP)" as well as the extensions

  • Advanced Management (AM)
  • Extended Identification and Authentication (EIA)
  • Labeled Security (LS)
  • Virtualization (VIRT)

EAL4+ is the highest level typically achievable for commercial software,
and is the highest level mutually recognized by 26 countries, including Germany and the USA. Completion of the certification lies in the hands of the certification authority.

You can check the current status of this certification (as well as other certified Oracle software) on the page Oracle Security Evaluations.

Freitag Okt 07, 2011

Solaris 11 Launch

There have been many questions and rumors about the upcoming launch of Solaris 11.  Now it's out:  Watch the webcast on

November 9, 2011
at 10am ET

Be invited to join!

(I hope to get around summarizing all the OpenWorld announcements, especially around T4, soon...)

Montag Aug 01, 2011

Oracle Solaris Studio 12.3 Beta Program

The beta program for Oracle Solaris Studio 12.3 is now open for participation.  Anyone willing to test the newest compiler and developer tools is welcome to join.  You may expect performance improvements over earlier versions of Studio as well as GCC that make testing worth your while.

Happy testing!

Donnerstag Mrz 31, 2011

What's up with Solaris 11?

Interested in the upcoming Solaris 11?  What will be the highlights?  What exactly is the new packaging format, how does the new installer work?  What do the analysts think?

All this will be covered in the Solaris Online Forum on April 14, starting at 9 am PST.  This will be a live event where you can ask questions. (A recording will be available afterwards.)  Speakers are all high level members of development and product management.

All further details can be found at the registration page.

Freitag Dez 17, 2010

Solaris knows Hardware - pgstat explains it

When Sun's engineering teams observed large differences in memory latency on the E25K, they introduced the concept of locality groups (lgrp) into Solaris 9 9/02. They describe the hierarchy of system components, which can be very different in different hardware systems. When creating processes and scheduling them onto CPUs for execution, Solaris will try to minimize the distance between CPU and memory for optimal latency. This feature, known as Memory Placement Optimization (MPO) can, depending on hardware and appliation, significantly enhance performance.

There are, among many other things, thousands of counters in the Solaris kernel. They can be queried using kstat, cpustat, or more widely used tools like mpstat or iostat. Especially the counters made available with cpustat depend heavily on the underlying hardware. The it hasn't always been easy to analyze the performance benefit of MPO and the utilization of individual parts of the hardware using these counters. For cpustat, there was only a perl-script called corestat to help understand T1/T2 core utilization. This has finally changed with Solaris 11 Express

There are now three new commands: lgrpinfo, pginfo und pgstat.

lgrpinfo shows the hierarchy of the lgroups - the NUMA-architecture of the hardware. This can be useful when configuring resource groups (for containers or standalone) to select the right CPUs.

pginfo shows a different view of this information: A tree of the hardware hierarchy. The leaves of this tree are the individual integer and floatingpoint unit of each core.  Here's a little example from a T2 LDom configured with 16 strands from different cores:

# pginfo -v
0 (System [system]) CPUs: 0-15
|-- 3 (Data_Pipe_to_memory [chip]) CPUs: 0-7
| |-- 2 (Floating_Point_Unit [core]) CPUs: 0-3
| | `-- 1 (Integer_Pipeline [core]) CPUs: 0-3
| `-- 5 (Floating_Point_Unit [core]) CPUs: 4-7
| `-- 4 (Integer_Pipeline [core]) CPUs: 4-7
`-- 8 (Data_Pipe_to_memory [core,chip]) CPUs: 8-15
`-- 7 (Floating_Point_Unit [core,chip]) CPUs: 8-15
|-- 6 (Integer_Pipeline) CPUs: 8-11
`-- 9 (Integer_Pipeline) CPUs: 12-15

As you can see, the mapping of strands to pipelines and cores is easily visible.

pgstat finally, is a worthy successor of corestat. It gives you a good overview of the utilization of all components. Again, an example, on the same LDom, which at the same time shows almost 100% core utilization, something I don't find very often...

# pgstat -Apv 1 2
0 System [system] - - - 100.0% 99.6% 0.4% 0.0% 0-15
3 Data_Pipe_to_memory [chip] - - - 100.0% 99.1% 0.9% 0.0% 0-7
2 Floating_Point_Unit [core] 0.0% 179K 1.3B 100.0% 99.1% 0.9% 0.0% 0-3
1 Integer_Pipeline [core] 80.0% 1.3B 1.7B 100.0% 99.1% 0.9% 0.0% 0-3
5 Floating_Point_Unit [core] 0.0% 50K 1.3B 100.0% 99.1% 0.9% 0.0% 4-7
4 Integer_Pipeline [core] 80.2% 1.3B 1.7B 100.0% 99.1% 0.9% 0.0% 4-7
8 Data_Pipe_to_memory [core,chip] - - - 100.0% 100.0% 0.0% 0.0% 8-15
7 Floating_Point_Unit [core,chip] 0.0% 80K 1.3B 100.0% 100.0% 0.0% 0.0% 8-15
6 Integer_Pipeline 76.4% 1.3B 1.7B 100.0% 100.0% 0.0% 0.0% 8-11
9 Integer_Pipeline 76.4% 1.3B 1.7B 100.0% 100.0% 0.0% 0.0% 12-15
0 System [system] - - - 100.0% 99.7% 0.3% 0.0% 0-15
3 Data_Pipe_to_memory [chip] - - - 100.0% 99.5% 0.5% 0.0% 0-7
2 Floating_Point_Unit [core] 0.0% 76K 1.2B 100.0% 99.5% 0.5% 0.0% 0-3
1 Integer_Pipeline [core] 79.7% 1.2B 1.5B 100.0% 99.5% 0.5% 0.0% 0-3
5 Floating_Point_Unit [core] 0.0% 42K 1.2B 100.0% 99.5% 0.5% 0.0% 4-7
4 Integer_Pipeline [core] 79.8% 1.2B 1.5B 100.0% 99.5% 0.5% 0.0% 4-7
8 Data_Pipe_to_memory [core,chip] - - - 100.0% 99.9% 0.1% 0.0% 8-15
7 Floating_Point_Unit [core,chip] 0.0% 80K 1.2B 100.0% 99.9% 0.1% 0.0% 8-15
6 Integer_Pipeline 76.3% 1.2B 1.5B 100.0% 100.0% 0.0% 0.0% 8-11
9 Integer_Pipeline 76.4% 1.2B 1.5B 100.0% 99.8% 0.2% 0.0% 12-15


------HARDWARE------ ------SOFTWARE------
0 System [system] - - - - - 100.0% 100.0% 100.0% 0-15
3 Data_Pipe_to_memory [chip] - - - - - 100.0% 100.0% 100.0% 0-7
2 Floating_Point_Unit [core] 76K 1.2B 0.0% 0.0% 0.0% 100.0% 100.0% 100.0% 0-3
1 Integer_Pipeline [core] 1.2B 1.5B 79.7% 79.7% 80.0% 100.0% 100.0% 100.0% 0-3
5 Floating_Point_Unit [core] 42K 1.2B 0.0% 0.0% 0.0% 100.0% 100.0% 100.0% 4-7
4 Integer_Pipeline [core] 1.2B 1.5B 79.8% 79.8% 80.2% 100.0% 100.0% 100.0% 4-7
8 Data_Pipe_to_memory [core,chip] - - - - - 100.0% 100.0% 100.0% 8-15
7 Floating_Point_Unit [core,chip] 80K 1.2B 0.0% 0.0% 0.0% 100.0% 100.0% 100.0% 8-15
6 Integer_Pipeline 1.2B 1.5B 76.3% 76.3% 76.4% 100.0% 100.0% 100.0% 8-11
9 Integer_Pipeline 1.2B 1.5B 76.4% 76.4% 76.4% 100.0% 100.0% 100.0% 12-15

The exact meaning of these values is nicely described in the manpage for pgstat, so I'll leave the interpretation to the reader. With this little tool, performance analysis, especially on T2/T3 systems, will be even more fun ;-)

Dienstag Sep 14, 2010

How auto_reg really works in Solaris 10 09/10

The newest update of Solaris 10 (09/10) brings a new feature called autoregistration. This can be automated using the new "auto_reg" option in the sysidcfg file.  Or rather, it can be sometimes.  Due to a (already known) bug, this new parameter is ignored by the GUI-installer, which will query you for the registration details no matter what you put in the sysidcfg file.  The GUI-installer usually runs if you have a screen (and video card) attached to the system.  On headless servers, the text-installer runs, which correctly acts upon "auto_reg" settings.

As a workaround for workstations, use "boot net - install nowin" instead of the usual "boot net - install", and you're all set.

 Many thanks to Peter Tribble, who suffered through this and eventually found the solution for me.

Montag Jun 07, 2010

prstat and microstate accounting

You never stop learning.  As a reply to my last blog entry, it was pointed out to me that with Solaris 10, microstate accounting is always enabled, and prstat supports this with the option "-m".  This option removes the moving average lags from the values displayed, and is much more accurate.  I wanted to know more about the background.  Eric Schrock was kind enough to provide it on his blog.  Here's a short summary.

The legacy output of prstat (and some of the other monitoring commands) represents moving averages based on regular samples.  With higher CPU frequencies, it is more and more likely that some scheduling events will be missed completely by these samples.  This makes the reports more and more unreliable.  Microstate accounting collects event statistics for every event, when the event happens.  Thanks to some implementation tricks introduced with Solaris 10, this is now efficient enough to be turned on all the time.  If you use this more precise data with prstat, a CPU hog will show up immediately, showing 100% CPU on all threads involved.  In this way, you're much more precise, and you need'nt convert from the number of CPUs in the system to the corresponding %-age as in the example in my blog entry.  A singlethreaded process will be visible instantly. This is easier do to, easier to understand, less error prone and more exact.

I've also updated the presentation to represent this.

Thanks for the hint - you know who you are!  It's from things like this that I notice that I've been using prstat and the likes (successfully) for too long .  It's just like Eric mentioned in his blog: This great feature slipped past me, with all the more prominent stuff like containers, zfs, smf etc.  Thanks again!

Donnerstag Mrz 11, 2010

Demo for Solaris Resource Manager

Here's an example how to build such a live demo:

  1. Create two projects:
    add to /etc/project:


  2. Start your interactive demo:

    newtask -p srm-demo java -jar /usr/java/demo/jfc/Java2D/Java2Demo.jar &

  3. Now start as many background processes as you need:

    newtask -p srm-loader &
    where could be:

    while true
    echo adflkjasdflkjasdflkjasdflkj > /dev/null

  4. Now move your processes to the FSS class

    priocntl -s -c FSS -i projid 101
    priocntl -s -c FSS -i projid 100

    or back to the TS class to show the effect of resource management.

    priocntl -s -c TS -i projid 100
    priocntl -s -c TS -i projid 101


Neuigkeiten, Tipps und Wissenswertes rund um SPARC, CMT, Performance und ihre Analyse sowie Erfahrungen mit Solaris auf dem Server und dem Laptop.

This is a bilingual blog (most of the time). Please select your prefered language:
The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.


« February 2017