Tuesday Jul 28, 2015

A trip down memory lane

In Scott Lynn's announcement of Oracle's membership in the Open Container Initiative, he gives a great summary of how Solaris virtualization got to the point it's at.  Quite an interesting read!

Thursday Jul 16, 2015

Solaris 11.3 zones blog entries

When I was interviewing with the zones team a number of years ago, I was told that Zones were the peanut butter that was spread throughout the operating system.  I'm not so sure peanut butter is exactly the analogy that I'd go for... perhaps something a bit more viscous and hip like Sriracha sauce.  Whatever the analogy, there's a lot of innovation related to zones throughout Solaris by people that don't work on the zones team.  Here's a sampling of zones-related hotness in Solaris 11.3 blogged about elsewhere.

Enjoy!

Thursday Jul 09, 2015

Multi-CPU bindings for Solaris Project

Traditionally, assigning specific processes to a certain set of CPUs has been done by using processor sets (and resource pools). This is quite useful, but it requires the hard partitioning of processors in the system. That means, we can't restrict process A to run on CPUs 1,2,3 and process B to run on CPUs 3,4,5, because these partitions overlap.

There is another way to assign CPUs to processes, which is called processor affinity, or Multi-CPU binding (MCB for short). Oracle Solaris 11.2 introduced MCB binding, as described in pbind(1M) and processor_affinity(2). With the release of Oracle Solaris 11.3, we have a new interface to assign/modify/remove MCBs, via Solaris project.

Briefly, a Solaris project is a collection of processes with predefined attributes. These attributes include various resource controls one can apply to processes that belong to the project. For more details, see projects(4) and resource_controls(5). What's new is that MCB becomes simply another resource control we can manage through Solaris projects.

We start by making a new project with MCB property. We assume that we have enough privilege to create a project and there's no project called test-project in the system, and all CPUs described by project.mcb.cpus entry exist in the system and online. We also assume that the listed cpus are in the resource pool to which current zone is bound. For manipulating project, we use standard command line tools projadd(1M)/projdel(1M)/projmod(1M).

root@sqkx4450-1:~# projects -l test-project
projects: project "test-project" does not exist
root@sqkx4450-1:~# projadd -K project.mcb.cpus=0,3-5,9-11 -K project.mcb.flags=weak -K project.pool=pool_default test-project
root@sqkx4450-1:~# projects -l test-project
test-project
        projid : 100
        comment: ""
        users  : (none)
        groups : (none)
        attribs: project.mcb.cpus=0,3-5,9-11
                 project.mcb.flags=weak
                 project.pool=pool_default

This means that processes in test-project will be weakly bound to CPUs 0,3,4,5, 9,10,11. (Note: For the concept of strong/weak binding, see processor_affinity(2). In short, strong binding guarantees that processes will run ONLY on designated CPUs, while weak binding does not have such a guarantee.)

Next thing is to assign some processes to test-project. If we know PIDs of target processes, it can be done by newtask(1).

root@sqkx4450-1:~# newtask -c 4156 -p test-project
root@sqkx4450-1:~# newtask -c 4170 -p test-project
root@sqkx4450-1:~# newtask -c 4184 -p test-project

Let's check the result by using the following command.

root@sqkx4450-1:~# pbind -q -i projid 100
pbind(1M): pid 4156 weakly bound to processor(s) 0 3 4 5 9 10 11.
pbind(1M): pid 4170 weakly bound to processor(s) 0 3 4 5 9 10 11.
pbind(1M): pid 4184 weakly bound to processor(s) 0 3 4 5 9 10 11.

Good. Now suppose we want to change the binding type to strong binding. In that case, all we need to do is change the value of project.mcb.flags to "strong", or even delete the project.mcb.flag key, because the default value is set to "strong".

root@sqkx4450-1:~# projmod -s -K project.mcb.flags=strong test-project
root@sqkx4450-1:~# projects -l test-project
test-project
        projid : 100
        comment: ""
        users  : (none)
        groups : (none)
        attribs: project.mcb.cpus=0,3-5,9-11
                 project.mcb.flags=strong
                 project.pool=pool_default

Things look good, but...

root@sqkx4450-1:~# pbind -q -i projid 100
pbind(1M): pid 4156 weakly bound to processor(s) 0 3 4 5 9 10 11.
pbind(1M): pid 4170 weakly bound to processor(s) 0 3 4 5 9 10 11.
pbind(1M): pid 4184 weakly bound to processor(s) 0 3 4 5 9 10 11.

Nothing changed actually! WARNING: By default, projmod(1M) only modifies project configuration file, but do not attempt to apply it to its processes. To do that, use the "-A" option.

root@sqkx4450-1:~# projmod -A test-project
root@sqkx4450-1:~# pbind -q -i projid 100
pbind(1M): pid 4156 strongly bound to processor(s) 0 3 4 5 9 10 11.
pbind(1M): pid 4170 strongly bound to processor(s) 0 3 4 5 9 10 11.
pbind(1M): pid 4184 strongly bound to processor(s) 0 3 4 5 9 10 11.

Now, suppose we want to change the list of CPUs, but oops, we made some typos.

root@sqkx4450-1:~# projmod -s -K project.mcb.cpus=0,3-5,13-17 -A test-project
projmod: Updating project test-project succeeded with following warning message.
WARNING: Following ids of cpus are not found in the system:16-17
root@sqkx4450-1:~# projects -l test-project
test-project
        projid : 100
        comment: ""
        users  : (none)
        groups : (none)
        attribs: project.mcb.cpus=0,3-5,13-17
                 project.mcb.flags=strong
                 project.pool=pool_default

Our system has CPUs 0 to 15, not up to 17. In that case, we get some warnings. But the command succeeded anyway. The command simply ignores missing CPUs.

root@sqkx4450-1:~# pbind -q -i projid 100
pbind(1M): pid 4156 strongly bound to processor(s) 0 3 4 5 13 14 15.
pbind(1M): pid 4170 strongly bound to processor(s) 0 3 4 5 13 14 15.
pbind(1M): pid 4184 strongly bound to processor(s) 0 3 4 5 13 14 15.

And one more thing: If you want to check the validity of project file only, use projmod(1M) without any options.

root@sqkx4450-1:~# projmod
projmod: Validation warning on line 6, WARNING: Following ids of cpus are not found in the system:16-17

But projmod is not tolerant if it can't find any CPUs at all.

root@sqkx4450-1:~# projmod -s -K project.mcb.cpus=17-20 -A test-project
projmod: WARNING: Following ids of cpus are not found in the system:17-20
projmod: ERROR: All of given multi-CPU binding (MCB) ids are not found in the system: project.mcb.cpus=17-20
root@sqkx4450-1:~# projects -l test-project
test-project
        projid : 100
        comment: ""
        users  : (none)
        groups : (none)
        attribs: project.mcb.cpus=0,3-5,13-17
                 project.mcb.flags=strong
                 project.pool=pool_default

Now we see ERROR. It's something that actually fails the command. Please read the error message carefully when you see it. Note that project configuration is not updated also.

Before moving to next topic, one small but important tip. How do we clear MCB from a project? Set the value of project.mcb.cpus to "none" and remove project.mcb.flags if there is.

root@sqkx4450-1:~# projects -l test-project
test-project
        projid : 100
        comment: ""
        users  : (none)
        groups : (none)
        attribs: project.mcb.cpus=none
                 project.pool=pool_default
root@sqkx4450-1:~# projmod -A test-project
root@sqkx4450-1:~# pbind -q -i projid 100
root@sqkx4450-1:~#

Let's move on to a little bit of advanced usage. In Oracle Solaris systems, as well as other systems, CPUs are grouped in certain units. Currently there are 'cores', 'sockets', 'processor-groups' and 'lgroups'. Utilizing these units can improve performance aided by hardware design. (I have less familiarity with those topics, so look at the following post about lgroups: Locality Group Observability on Solaris.) MCB for projects supports all of these CPU structures. The usage is simple. Just change "project.mcb.cpus" to "project.mcb.cores", "project.mcb.sockets", "project.mcb.pgs", or "project.mcb.lgroups".

Note: To get information about CPU structures on a given system, use following commands. "psrinfo -t" gives information about cpu/core/socket structure, "pginfo" gives information about processor groups, and "lgrpinfo -c" gives information about lgroups.

root@sqkx4450-1:~# projects -l test-project
test-project
        projid : 100
        comment: ""
        users  : (none)
        groups : (none)
        attribs: project.mcb.sockets=1
                 project.pool=pool_default
root@sqkx4450-1:~# projmod -A test-project
root@sqkx4450-1:~# pbind -q -i projid 100
pbind(1M): pid 4156 strongly bound to processor(s) 1 5 9 13.
pbind(1M): pid 4170 strongly bound to processor(s) 1 5 9 13.
pbind(1M): pid 4184 strongly bound to processor(s) 1 5 9 13.

These examples explain the basics of MCB for projects. For more details, you can refer to the appropriate man pages. But, let me briefly summarize some features we didn't explain here. And, final warning: Many features we used in this post are not supported on Oracle Solaris 11.2, even those not related to MCB directly.

1. newtask(1) also utilizes projects. When we set MCB for a project in the project configuration file, an unprivileged user that is a member of project can use newtask(1) to put new or his/her existing processes in it.

2. For Solaris projects APIs, look at libproject(3LIB). Warning: some features work only for 64-bit version of the library for now.

3. There are many other existing attributes of project. Combining them with MCB usually causes no problems, but there is one exception: project.pool. Ignoring all the detail, there's only one important guideline when using both project.pool and project.mcb.(cpus|cores|sockets|pgs|lgroups): all the CPUs in project.mcb.(cpus|cores|sockets|pgs|lgroups) should reside in the project.pool.

When we don't specifiy project.pool and use project.mcb.(cpus|cores|sockets|pgs|lgroups), the system ASSUMES that project.pool is the default pool of the current zone. In this case, when we try to apply the project's attributes to processes, we'll see following warning message.

root@sqkx4450-1:~# projects -l test-project
test-project
        projid : 100
        comment: ""
        users  : (none)
        groups : (none)
        attribs: project.mcb.cpus=0,3-5,9-11
                 project.mcb.flags=weak
root@sqkx4450-1:~# projmod -A test-project
projmod: Updating project test-project succeeded with following warning message.
WARNING: We bind the target project to the default pool of the zone because an multi-CPU binding (MCB) entry exists.

Man page references.
    General information:
        Project file configuration: project(4)
        How to manage resource control by project: resource_controls(5)
    Project utilization:
        Get information of projects: projects(1)
        Manage projects: projadd(1M) / projdel(1M) / projmod(1M)
        Assign a process to project: newtask(1)
        project control APIs: libproject(3LIB)
    Existing interfaces dealing MCB:
        command line interface: pbind(1M)
        system call interface: processor_affinity(2)
    Processor information:
        psrinfo(1M) / pginfo(1M) / lgrpinfo(1M)

Managing Orphan Zone BEs

Zone Boot environments that do not have any global zone BE associated with them - called orphan ZBE - are generally a byproduct of zone migrating from one host to another. Managing them is a tough 'nut to crack' as it requires mucky manual steps to get rid of them/retain them during migration or otherwise. Solaris 11.3 introduces changes to zoneadm(1M) and beadm(1M) to manage them better.

To find out more about these enhancements, click here

rcapd enhancements in Solaris 11.3

The resource capping daemon, or rcapd has been a key VM resource manager for solaris(5) zones and projects to limit their rss usage to an admin set cap. There was a need to reduce the complexity of its configuration in addition to provide a handle to the admin to manage out of control zones/projects that were slowing down the system due to cap enforcement. In Solaris 11.3, we introduce these changes amongst other optimizations to rcapd to improve cap enforcement effectiveness and application performance for user benefit.

To know more about these enhancements and how to use them to your advantage, click here.

Secure multi-threaded live migration for kernel zones

As mentioned in the What's New document, Solaris 11.3 now supports live migration for kernel zones!  Let's try it out.

As mentioned in the fine manual, live migration requires the use of zones on shared storage (ZOSS) and a few other things. In Solaris 11.2, we could use logical units (i.e. fibre channel) or iscsi.  Always living on the edge, I decide to try out the new ZOSS NFS feature.  Since the previous post did such a great job of explaining how to set it up, I won't go into the details.  Here's what my zone configuration looks like:

zonecfg:mig1> info
zonename: mig1
brand: solaris-kz
...
anet 0:
        ...
device 0:
	match not specified
	storage.template: nfs://zoss:zoss@kzx-05/zones/zoss/%{zonename}.disk%{id}
	storage: nfs://zoss:zoss@kzx-05/zones/zoss/mig1.disk0
	id: 0
	bootpri: 0
virtual-cpu:
	ncpus: 4
capped-memory:
	physical: 4G
keysource:
	raw redacted

And the zone is running.

root@vzl-216:~# zoneadm -z mig1 list -s
NAME             STATUS           AUXILIARY STATE                               
mig1             running                                        

In order for live migration to work, the kz-migr and rad:remote services need to be online.  They are disabled by default.

# svcadm enable -s svc:/system/rad:remote svc:/network/kz-migr:stream
# svcs svc:/system/rad:remote svc:/network/kz-migr:stream
STATE          STIME    FMRI
online          6:40:12 svc:/network/kz-migr:stream
online          6:40:12 svc:/system/rad:remote

While these services are only needed on the remote end, I enable them on both sides because there's a pretty good chance that I will migrate kernel zones in both directions.  Now we are ready to perform the migration.  I'm migrating mig1 from vzl-216 to vzl-212.  Both vzl-216 and vzl-212 are logical domains on T5's.

root@vzl-216:~# zoneadm -z mig1 migrate vzl-212
Password: 
zoneadm: zone 'mig1': Importing zone configuration.
zoneadm: zone 'mig1': Attaching zone.
zoneadm: zone 'mig1': Booting zone in 'migrating-in' mode.
zoneadm: zone 'mig1': Checking migration compatibility.
zoneadm: zone 'mig1': Starting migration.
zoneadm: zone 'mig1': Suspending zone on source host.
zoneadm: zone 'mig1': Waiting for migration to complete.
zoneadm: zone 'mig1': Migration successful.
zoneadm: zone 'mig1': Halting and detaching zone on source host.

Afterwards, we see that the zone is now configured on vzl-216 and running on vzl-212.

root@vzl-216:~# zoneadm -z mig1 list -s
NAME             STATUS           AUXILIARY STATE                               
mig1             configured                                    
root@vzl-212:~# zoneadm -z mig1 list -s
NAME             STATUS           AUXILIARY STATE                               
mig1             running                

Ok, cool.  But what really happened?  During the migration, I was also running tcptop, one of our demo dtrace scripts.  Unfortunately, it doesn't print the pretty colors: I added those so we can see what's going on.

root@vzl-216:~# dtrace -s /usr/demo/dtrace/tcptop.d
Sampling... Please wait.
...

2015 Jul  9 06:50:30,  load: 0.10,  TCPin:      0 Kb,  TCPout:      0 Kb
  ZONE    PID LADDR           LPORT RADDR           RPORT      SIZE
     0    613 10.134.18.216      22 10.134.18.202   48168       112
     0   2640 10.134.18.216   60773 10.134.18.212   12302       137
     0    613 10.134.18.216      22 10.134.18.202   60194       336

2015 Jul  9 06:50:35,  load: 0.10,  TCPin:      0 Kb,  TCPout: 832420 Kb
  ZONE    PID LADDR           LPORT RADDR           RPORT      SIZE
     0    613 10.134.18.216      22 10.134.18.202   48168       208
     0   2640 10.134.18.216   60773 10.134.18.212   12302       246
     0    613 10.134.18.216      22 10.134.18.202   60194       480
     0   2640 10.134.18.216   45661 10.134.18.212    8102      8253
     0   2640 10.134.18.216   41441 10.134.18.212    8102 418467721
     0   2640 10.134.18.216   59051 10.134.18.212    8102 459765481

...

2015 Jul  9 06:50:50,  load: 0.41,  TCPin:      1 Kb,  TCPout: 758608 Kb
  ZONE    PID LADDR           LPORT RADDR           RPORT      SIZE
     0   2640 10.134.18.216   60773 10.134.18.212   12302       388
     0    613 10.134.18.216      22 10.134.18.202   60194       544
     0    613 10.134.18.216      22 10.134.18.202   48168       592
     0   2640 10.134.18.216   45661 10.134.18.212    8102    119032
     0   2640 10.134.18.216   59051 10.134.18.212    8102 151883984
     0   2640 10.134.18.216   41441 10.134.18.212    8102 620449680

2015 Jul  9 06:50:55,  load: 0.48,  TCPin:      0 Kb,  TCPout:      0 Kb
  ZONE    PID LADDR           LPORT RADDR           RPORT      SIZE
     0    613 10.134.18.216      22 10.134.18.202   60194       736
^C

In the first sample, we see that vzl-216 (10.134.18.216) has established a RAD connection to vzl-212.  We know it is RAD because it is over port 12302.  RAD is used to connect the relevant zone migration processes on the two machines.  One connection between the zone migration processes is used for orchestrating various aspects of the migration.  There are two others that are used for synchronizing the memory between the machines.  In each of the samples, there is also some traffic from each of a couple ssh sessions I have between vzl-216 and another machine.

As the amount of kernel zone memory increases, the number of connections will also increase.  Currently that scaling factor is one connection per 2 GB of kernel zone memory, with an upper limit based on the number of CPUs in the machine.  The scaling is limited by the number of CPUs because each connection corresponds to a sending and a receiving thread. Those threads are responsible for encrypting and decrypting the traffic.  The multiple connections can work nicely with IPMP's outbound load sharing and/or link aggregations to spread the load across multiple physical network links. The algorithm for selecting the number of connections may change from time to time, so don't be surprised if your observations don't match what is shown above.

All of the communication between the two machines is encrypted.  The RAD connection (in this case) is encrypted with TLS, as described in rad(1M).  This RAD connection supports a series of calls that are used to negotiate various things, including encryption parameters for connections to kz-migr (port 8102).  You have control over the encryption algorithm used with the -c <cipher> option to zoneadm migrate.  You can see the list of available ciphers with:

root@vzl-216:~# zoneadm -z mig1 migrate -c list vzl-216
Password: 
source ciphers: aes-128-ccm aes-128-gcm none
destination ciphers: aes-128-ccm aes-128-gcm none

If for some reason you don't want to use encryption, you can use migrate -c none.  There's not much reason to do that, though.  The default encryption, aes-128-ccm, makes use of hardware crypto instructions found in all of the SPARC and x86 processors that are supported with kernel zones.  In tests, I regularly saturated a 10 gigabit link while migrating a single kernel zone.

One final note.... If you don't like typing the root password every time you migrate, you can also set up key-based authentication between the two machines.  In that case, you will use a command like:

# zoneadm -z <zone> migrate ssh://<remotehost>

Happy secure live migrating! 

Wednesday Jul 08, 2015

Kernel zone suspend now goes zoom!

Solaris 11.2 had the rather nice feature that you can have kernel zones automatically suspend and resume across global zone reboots.  We've made some improvements in this area in Solaris 11.3 to help in the cases where more CPU cycles could make suspend and resume go faster.

As a recap, automatic suspend/resume of kernel zones across global zone reboots can be accomplished by having a suspend resource, setting autoboot=true and autoshutdown=suspend.

# zonecfg -z kz1
zonecfg:kz1> set autoboot=true
zonecfg:kz1> set autoshutdown=suspend
zonecfg:kz1> exit
zonecfg:kz1:suspend> info
suspend:
	path.template: /export/%{zonename}.suspend
	path: /export/kz1.suspend
	storage not specified

When a graceful reboot is performed (that is, shutdown -r or init 6) svc:/system/zones:default will suspend the zone as it shuts down and resume it as the system boots.  Obviously, reading from memory and writing to disk would have the inclination to saturate the disk bandwidth.  To create a more balanced system, the suspend image is compressed.  While this greatly slows down the write rate, several kernel zones that were concurrently suspending would still saturate available bandwidth in typical configurations.  More balanced and faster - good, right?

Well, this more balanced system came at a cost.  When suspending one zone the performance was not so great.  For example, a basic kernel zone with 2 GB of RAM on a T5 ldom shows:

# tail /var/log/zones/kz1.messages
...
2015-07-08 12:33:15 notice: NOTICE: Zone suspending
2015-07-08 12:33:39 notice: NOTICE: Zone halted
root@vzl-212:~# ls -lh /export/kz1.suspend
-rw-------   1 root     root        289M Jul  8 12:33 /export/kz1.suspend
# bc -l
289 / 24
12.04166666666666666666

Yikes - 12 MB/s to disk.  During this time, I used prstat -mLc -n 5 1 and iostat -xzn and could see that the compression thread in zoneadmd was using 100% of a CPU and the disk had idle times then spurts of being busy as zfs flushed out each transaction group.  Note that this rate of 12 MB/s is artificially low because some other things are going on before and after writing the suspend file that may take up to a couple of seconds.

I then updated my system to the Solaris 11.3 beta release and tried again.  This time things look better.

# zoneadm -z kz1 suspend
# tail /var/log/zones/kz1.messages
...
2015-07-08 12:59:49 info: Processing command suspend flags 0x0 from ruid/euid/suid 0/0/0 pid 3141
2015-07-08 12:59:49 notice: NOTICE: Zone suspending
2015-07-08 12:59:58 info: Processing command halt flags 0x0 from ruid/euid/suid 0/0/0 pid 0
2015-07-08 12:59:58 notice: NOTICE: Zone halted
# ls -lh /export/kz1.suspend 
-rw-------   1 root     root        290M Jul  8 12:59 /export/kz1.suspend
# echo 290 / 9 | bc -l
32.22222222222222222222

That's better, but not great.  Remember what I said about the rate being artificially low above?  While writing the multi-threaded suspend/resume support, I also created some super secret debug code that gives more visibility into the rate.  That shows:

Suspend raw: 1043 MB in 5.9 sec 177.5 MB/s
Suspend compressed: 289 MB in 5.9 sec 49.1 MB/s
Suspend raw-fast-fsync: 1043 MB in 3.5 sec 299.1 MB/s
Suspend compressed-fast-fsync: 289 MB in 3.5 sec 82.8 MB/s

What this is telling me is that my kernel zone with 2 GB of RAM actually had 1043 MB that actually needed to be suspended - the remaining was blocks of zeroes.  The total suspend time was 5.9 seconds, giving a read from memory rate of 177.5 MB/s and write to disk rate of 49.1 MB/s.  The -fsync lines are saying that if suspend didn't fsync(3C) the suspend file before returning, it would have completed in 3.5 seconds, giving a suspend rate of 82.8 MB/s.  That's looking better.

In another experiment, we aim to make the storage not be the limiting factor. This time, let's do 16 GB of RAM and write the suspend image to /tmp.

# zonecfg -z kz1 info
zonename: kz1
brand: solaris-kz
autoboot: true
autoshutdown: suspend
...
virtual-cpu:
	ncpus: 12
capped-memory:
	physical: 16G
suspend:
	path: /tmp/kz1.suspend
	storage not specified

To ensure that most of the RAM wasn't just blocks of zeroes (and as such wouldn't be in the suspend file), I created a tar file of /usr in kz1's /tmp and made copies of it until the kernel zone's RAM was rather full.

This time around, we are seeing that we are able to write the 15 GB of active memory in 52.5 seconds.  Notice that this is roughly 15x the amount of memory in only double the time from our Solaris 11.2 baseline.

Suspend raw: 15007 MB in 52.5 sec 286.1 MB/s
Suspend compressed: 5416 MB in 52.5 sec 103.3 MB/s

While the focus of this entry has been multi-threaded compression during suspend, it's also worth noting that:

  • The suspend image is also encrypted. If someone gets a copy of the suspend image, it doesn't mean that they can read the guest memory.  Oh, and the encryption is multi-threaded as well.
  • Decryption is also multi-threaded.
  • And so is uncompression.  The parallel compression and uncompression code is freely available, even. :)

The performance numbers here should be taken with a grain of salt.  Many factors influence the actual rate you will see.  In particular:

  • Different CPUs have very different performance characteristics.
  • If the zone has a dedicated-cpu resource, only the CPU's that are dedicated to the zone will be used for compression and encryption.
  • More CPUs tend to go faster, but only to a certain point.
  • Various types of storage will perform vastly differently.
  • When many zones are suspending or resuming at the same time, they will compete for resources.
And one last thing... for those of you that are too impatient to wait until Solaris 11.3 to try this out, it is actually in Solaris 11.2 SRU 8 and later.

Shared Storage on NFS for Kernel Zones

In Solaris 11.2 Zones could be installed on shared storage (ZOSS) using iSCSI devices.  With Solaris 11.3 Beta the shared storage for Kernel Zones can also be placed on NFS files.

To setup an NFS SURI (storage URI), you'll need to identify the NFS host, share and path where the file will be placed and the user and group allowed to access the file.  The file does not need to exist, but the parent directory of the file must exist.  The user and group are specified so a user can control access of their zone storage via NFS.

Then in the zone configuration, you can setup a device (including a boot device) using the NFS SURI that looks like:
    - nfs://user:group@host/NFS_share/path_to_file

If the file does not yet exist, you'll need to specify a size.

Here's my setup of a 16g file for the zone root on an NFS share "/test" on system "sys1" owned by user "user1". My NFS server has this mode/owner for the directory /test/z1kz:

# ls -ld /test/z1kz
drwx------   2 user1  staff          4 Jun 12 12:36 /test/z1kz 
  

In zonecfg for the kernel zone "z1kz", select device 0 (the boot device) and set storage and create-size:

zonecfg:z1kz> select device 0
zonecfg:z1kz:device> set storage=nfs://user1:staff@sys1/test/z1kz/z1kz_root
zonecfg:z1kz:device> set create-size=16g
zonecfg:z1kz:device> end
zonecfg:z1kz> info device
device 0:
    match not specified
    storage: nfs://user1:staff@sys1/test/z1kz_root 
    create-size: 16g
    id: 0 
    bootpri: 0
zonecfg:z1kz> commit 

To add another device to this kernel zone, do:

zonecfg:z1kz> add device
zonecfg:z1kz:device> set storage=nfs://user1:staff@sys1/test/z1kz/z1kz_disk1 
zonecfg:z1kz:device> set create-size=8g
zonecfg:z1kz:device> end 
zonecfg:z1kz> commit
When installing the kernel zone, use the "-x storage-create-missing" option to create the NFS files owned by user1:staff.
# zoneadm -z z1kz install -x storage-create-missing
<output deleted> 
#
On my NFS server:
# ls -l /test/z1kz
total 407628 
-rw-------   1 user1  staff    8589934592 Jun 12 12:36 z1kz_disk1
-rw-------   1 user1  staff    17179869184 Jun 12 12:43 z1kz_root 

When the zone is uninstalled, the option "-x force-storage-destroy-all" will be needed to destroy the NFS files z1kz_root and z1kz_disk1.  If the "-x force-storage-destroy-all" option isn't used, then the NFS files will still exist on the NFS server after the zone uninstall.


        
    

Different time in different zones

Ever since Zones was introduced way back in Solaris 10, there has been a demand for the ability for zones to set its own time. In Solaris 11.3, that is finally possible and we deliver it as default for solaris(5) and solaris10(5) branded zones.

For more information, on how to enable and use this new feature click here

Saturday Feb 28, 2015

One image for native zones, kernel zones, ldoms, metal, ...

In my previous post, I described how to convert a global zone to a non-global zone using a unified archive.  Since then, I've fielded a few questions about whether this same approach can be used to create a master image that is used to install Solaris regardless of virtualization type (including no virtualization).  The answer is: of course!  That was one of the key goals of the project that invented unified archives.

In my earlier example, I was focused on preserving the identity and other aspects of the global zone and knew I had only one place that I planned to deploy it.  Hence, I chose to skip media creation (--exclude-media) and used a recovery archive (-r).  To generate a unified archive of a global zone that is ideal for use as an image for installing to another global zone or native zone, just use a simpler command line.

root@global# archiveadm create /path/to/golden-image.uar

Notice that by using fewer options we get something that is more usable.

What's different about this image compared to the one created in the previous post?

  • This archive as an embedded AI iso that will be used if you install a kernel zone from it.  That is, zoneadm -z kzname install -a /path/to/golden-image.uar will boot from that embedded AI image and perform an automated install from that archive.
  • This archive only contains the active boot environment and other ZFS snapshots are not archived.
  • This archive has been stripped of its identity.  When installing, you either need to provide a sysconfig profile or interactively configure the system or zone on the console or zone console on the first post-installation boot.

Friday Feb 20, 2015

global to non-global conversion with multiple zpools

Suppose you have a global zone with multiple zpools that you would like to convert into a native zone.  You can do that, thanks to unified archives (introduced in Solaris 11.2) and dataset aliasing (introduced in Solaris 11.0).  The source system looks like this:
root@buzz:~# zoneadm list -cv
  ID NAME             STATUS      PATH                         BRAND      IP
   0 global           running     /                            solaris    shared
root@buzz:~# zpool list
NAME    SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
rpool  15.9G  4.38G  11.5G  27%  1.00x  ONLINE  -
tank   1008M    93K  1008M   0%  1.00x  ONLINE  -
root@buzz:~# df -h /tank
Filesystem             Size   Used  Available Capacity  Mounted on
tank                   976M    31K       976M     1%    /tank
root@buzz:~# cat /tank/README
this is tank
Since we are converting a system rather than cloning it, we want to use a recovery archive and use the -r option.  Also, since the target is a native zone, there's no need for the unified archive to include media.
root@buzz:~# archiveadm create --exclude-media -r /net/kzx-02/export/uar/p2v.uar
Initializing Unified Archive creation resources...
Unified Archive initialized: /net/kzx-02/export/uar/p2v.uar
Logging to: /system/volatile/archive_log.1014
Executing dataset discovery...
Dataset discovery complete
Preparing archive system image...
Beginning archive stream creation...
Archive stream creation complete
Beginning final archive assembly...
Archive creation complete
Now we will go to the global zone that will have the zone installed.  First, we must configure the zone.  The archive contains a zone configuration that is almost correct, but needs a little help because archiveadm(1M) doesn't know the particulars of where you will deploy it.

Most examples that show configuration of a zone from an archive show the non-interactive mode.  Here we use the interactive mode.
root@vzl-212:~# zonecfg -z p2v
Use 'create' to begin configuring a new zone.
zonecfg:p2v> create -a /net/kzx-02/export/uar/p2v.uar
After the create command completes (in a fraction of a second) we can see the configuration that was embedded in the archive.  I've trimmed out a bunch of uninteresting stuff from the anet interface.
zonecfg:p2v> info
zonename: p2v
zonepath.template: /system/zones/%{zonename}
zonepath: /system/zones/p2v
brand: solaris
autoboot: false
autoshutdown: shutdown
bootargs:
file-mac-profile:
pool:
limitpriv:
scheduling-class:
ip-type: exclusive
hostid:
tenant:
fs-allowed:
[max-lwps: 40000]
[max-processes: 20000]
anet:
        linkname: net0
        lower-link: auto
    [snip]
attr:
        name: zonep2vchk-num-cpus
        type: string
        value: "original system had 4 cpus: consider capped-cpu (ncpus=4.0) or dedicated-cpu (ncpus=4)"
attr:
        name: zonep2vchk-memory
        type: string
        value: "original system had 2048 MB RAM and 2047 MB swap: consider capped-memory (physical=2048M swap=4095M)"
attr:
        name: zonep2vchk-net-net0
        type: string
        value: "interface net0 has lower-link set to 'auto'.  Consider changing to match the name of a global zone link."
dataset:
        name: __change_me__/tank
        alias: tank
rctl:
        name: zone.max-processes
        value: (priv=privileged,limit=20000,action=deny)
rctl:
        name: zone.max-lwps
        value: (priv=privileged,limit=40000,action=deny)
In this case, I want to be sure that the zone's network uses a particular global zone interface, so I need to muck with that a bit.
zonecfg:p2v> select anet linkname=net0
zonecfg:p2v:anet> set lower-link=stub0
zonecfg:p2v:anet> end
The zpool list output in the beginning of this post showed that the system had two ZFS pools: rpool and tank.  We need to tweak the configuration to point the tank virtual ZFS pool to the right ZFS file system.  The name in the dataset resource refers to the location in the global zone.  This particular system has a zpool named export - a more basic Solaris installation would probably need to use rpool/export/....  The alias in the dataset resource needs to match the name of the secondary ZFS pool in the archive.
zonecfg:p2v> select dataset alias=tank
zonecfg:p2v:dataset> set name=export/tank/%{zonename}
zonecfg:p2v:dataset> info
dataset:
        name.template: export/tank/%{zonename}
        name: export/tank/p2v
        alias: tank
zonecfg:p2v:dataset> end
zonecfg:p2v> exit
I did something tricky above - I used a template property to make it easier to clone this zone configuration and have the dataset name point at a different dataset.

Let's try an installation.  NOTE: Before you get around to booting the new zone, be sure the old system is offline else you will have IP address conflicts.
root@vzl-212:~# zoneadm -z p2v install -a /net/kzx-02/export/uar/p2v.uar
could not verify zfs dataset export/tank/p2v: filesystem does not exist
zoneadm: zone p2v failed to verify
Oops.  I forgot to create the dataset.  Let's do that.  I use -o zoned=on to prevent the dataset from being mounted in the global zone.  If you forget that, it's no biggy - the system will fix it for you soon enough.
root@vzl-212:~# zfs create -p -o zoned=on export/tank/p2v
root@vzl-212:~# zoneadm -z p2v install -a /net/kzx-02/export/uar/p2v.uar
The following ZFS file system(s) have been created:
    rpool/VARSHARE/zones/p2v
Progress being logged to /var/log/zones/zoneadm.20150220T060031Z.p2v.install
    Installing: This may take several minutes...
 Install Log: /system/volatile/install.5892/install_log
 AI Manifest: /tmp/manifest.p2v.YmaOEl.xml
    Zonename: p2v
Installation: Starting ...
        Commencing transfer of stream: 0f048163-2943-cde5-cb27-d46914ec6ed3-0.zfs to rpool/VARSHARE/zones/p2v/rpool
        Commencing transfer of stream: 0f048163-2943-cde5-cb27-d46914ec6ed3-1.zfs to export/tank/p2v
        Completed transfer of stream: '0f048163-2943-cde5-cb27-d46914ec6ed3-1.zfs' from file:///net/kzx-02/export/uar/p2v.uar
        Completed transfer of stream: '0f048163-2943-cde5-cb27-d46914ec6ed3-0.zfs' from file:///net/kzx-02/export/uar/p2v.uar
        Archive transfer completed
        Changing target pkg variant. This operation may take a while
Installation: Succeeded
      Zone BE root dataset: rpool/VARSHARE/zones/p2v/rpool/ROOT/solaris-recovery
                     Cache: Using /var/pkg/publisher.
Updating image format
Image format already current.
  Updating non-global zone: Linking to image /.
Processing linked: 1/1 done
  Updating non-global zone: Syncing packages.
No updates necessary for this image. (zone:p2v)
  Updating non-global zone: Zone updated.
                    Result: Attach Succeeded.
        Done: Installation completed in 165.355 seconds.
  Next Steps: Boot the zone, then log into the zone console (zlogin -C)
              to complete the configuration process.
Log saved in non-global zone as /system/zones/p2v/root/var/log/zones/zoneadm.20150220T060031Z.p2v.install
root@vzl-212:~# zoneadm -z p2v boot
After booting we see that everything in the zone is in order.
root@vzl-212:~# zlogin p2v
[Connected to zone 'p2v' pts/3]
Oracle Corporation      SunOS 5.11      11.2    September 2014
root@buzz:~# svcs -x
root@buzz:~# zpool list
NAME    SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
rpool  99.8G  66.3G  33.5G  66%  1.00x  ONLINE  -
tank    199G  49.6G   149G  24%  1.00x  ONLINE  -
root@buzz:~# df -h /tank
Filesystem             Size   Used  Available Capacity  Mounted on
tank                   103G    31K       103G     1%    /tank
root@buzz:~# cat /tank/README
this is tank
root@buzz:~# zonename
p2v
root@buzz:~#
Happy p2v-ing!  Or rather, g2ng-ing.

Thursday Jan 15, 2015

fronting isolated zones

This is a continuation of a series of posts.  While this one may be interesting all on its own, you may want to start from the top to get the context.

In this post, we'll create teeter - the load balancer.  This zone will be a native (solaris brand) zone.  The intent of this arrangement is to make it so that paying customers get served by the zone named premium and the freeloaders have to scrape by with free.  Since that logic is clearly highly dependent on each webapp, I'll take the shortcut of having a more simplistic load balancer.

Once again, we'll configure the zone's networking from the global zone.  This time around both networks get a static configuration - one attached to the red (192.168.1.0/24) network and the other attached to the global zone's first network interface.

root@global:~# zonecfg -z teeter
Use 'create' to begin configuring a new zone.
zonecfg:teeter> create
zonecfg:teeter> set zonepath=/zones/%{zonename}
zonecfg:teeter> select anet linkname=net0
zonecfg:teeter:anet> set lower-link=balstub0
zonecfg:teeter:anet> set allowed-address=192.168.1.1/24
zonecfg:teeter:anet> set configure-allowed-address=true
zonecfg:teeter:anet> end
zonecfg:teeter> add anet
zonecfg:teeter:anet> set lower-link=net0
zonecfg:teeter:anet> set allowed-address=10.134.17.196/24
zonecfg:teeter:anet> set defrouter=10.134.17.1
zonecfg:teeter:anet> set configure-allowed-address=true
zonecfg:teeter:anet> end
zonecfg:teeter> exit

root@global:~# zoneadm -z teeter install
The following ZFS file system(s) have been created:
    zones/teeter
Progress being logged to /var/log/zones/zoneadm.20150114T222949Z.teeter.install
       Image: Preparing at /zones/teeter/root.
...
Log saved in non-global zone as /zones/teeter/root/var/log/zones/zoneadm.20150114T222949Z.teeter.install
root@global:~# zoneadm -z teeter boot
root@global:~# zlogin -C teeter
   sysconfig, again.  Gee, I really should have created a sysconfig.xml...

 In a solaris zone, there are no dependencies that bring in the apache web server, so that needs to be installed.

root@vzl-212:~# zlogin teeter
[Connected to zone 'teeter' pts/3]
Oracle Corporation    SunOS 5.11    11.2    December 2014
root@teeter:~# pkg install apache-22
...

Once the web server is installed, we'll configure a simple load balancer using mod_proxy_balancer.

root@teeter:~# cd /etc/apache2/2.2/conf.d/
root@teeter:/etc/apache2/2.2/conf.d# cat > mod_proxy_balancer.conf <<EOF
<Proxy balancer://mycluster>
BalancerMember http://192.168.1.3:80
BalancerMember http://192.168.1.4:80
</Proxy>
ProxyPass /test balancer://mycluster 
EOF
root@teeter:/etc/apache2/2.2/conf.d# svcadm enable apache22

To see if this is working, we will use a symbolic link on the NFS server to point to a unique file on each of the web servers.  Unless you are trying to paste your output into Oracle's blogging software, you won't need to define $download as I did.

root@global:~# ln -s /tmp/hostname /export/web/hostname
root@global:~# zlogin free 'hostname > /tmp/hostname'
root@global:~# zlogin premium 'hostname > /tmp/hostname'
root@global:~# download=cu; download=${download}rl
root@global:~# $download -s http://10.134.17.196/test/hostname
premium
root@global:~# $download http://10.134.17.196/test/hostname
free
root@global:~# for i in {1..100}; do \
    $download -s http://10.134.17.196/test/hostname; done| sort | uniq -c
  50 free
  50 premium

This concludes this series.  Surely there are things that I've glossed over and many more interesting things I could have done.  Please leave comments with any questions and I'll try to fill in the details.

Wednesday Jan 14, 2015

stamping out web servers

This is a continuation of a series of posts.  While this one may be interesting all on its own, you may want to start from the top to get the context.

The diagram above shows one global zone with a few zones in it.  That's not very exciting in a world where we need to rapidly provision new instances that are preconfigured and as hack-proof as we can make them.  This post will show how to create a unified archive that includes the kernel zone configuration and content that makes for a hard-to-compromise web server.  I'd like to say impossible, but history has shown us that software has bugs that affects everyone across the industry.

We'll start of by configuring and installing a kernel zone called web.  It will have two automatic networks, each attached to the appropriate etherstubs.  Notice the use of template properties - using %{zonename} and %{id} make it so that we don't have to futz with so much of the configuration when we configure the next zone based on this one.

root@global:~# zonecfg -z web
Use 'create' to begin configuring a new zone.
zonecfg:web> create -t SYSsolaris-kz
zonecfg:web> select device id=0
zonecfg:web:device> set storage=dev:zvol/dsk/zones/%{zonename}/disk%{id}
zonecfg:web:device> end
zonecfg:web> select anet id=0
zonecfg:web:anet> set lower-link=balstub0
zonecfg:web:anet> set allowed-address=192.168.1.2/24
zonecfg:web:anet> set configure-allowed-address=true
zonecfg:web:anet> end
zonecfg:web> add anet
zonecfg:web:anet> set lower-link=internalstub0
zonecfg:web:anet> set allowed-dhcp-cids=%{zonename}
zonecfg:web:anet> end
zonecfg:web> info
zonename: web
brand: solaris-kz
autoboot: false
autoshutdown: shutdown
bootargs:
pool:
scheduling-class:
hostid: 0xdf87388
tenant:
anet:
        lower-link: balstub0
        allowed-address: 192.168.1.2/24
        configure-allowed-address: true
        defrouter not specified
        allowed-dhcp-cids not specified
        link-protection: "mac-nospoof, ip-nospoof"
        ...
        id: 0
anet:
        lower-link: internalstub0
        allowed-address not specified
        configure-allowed-address: true
        defrouter not specified
        allowed-dhcp-cids.template: %{zonename}
        allowed-dhcp-cids: web
        link-protection: mac-nospoof
        ...
        id: 1
device:
        match not specified
        storage.template: dev:zvol/dsk/zones/%{zonename}/disk%{id}
        storage: dev:zvol/dsk/zones/web/disk0
        id: 0
        bootpri: 0
capped-memory:
        physical: 2G
zonecfg:web> exit
root@global:~# zoneadm -z web install
Progress being logged to /var/log/zones/zoneadm.20150114T193808Z.web.install
pkg cache: Using /var/pkg/publisher.
 Install Log: /system/volatile/install.4391/install_log
 AI Manifest: /tmp/zoneadm3808.vTayai/devel-ai-manifest.xml
  SC Profile: /usr/share/auto_install/sc_profiles/enable_sci.xml
Installation: Starting ...

        Creating IPS image
        Installing packages from:
            solaris
                origin:  file:///export/repo/11.2/repo/
        The following licenses have been accepted and not displayed.
        Please review the licenses for the following packages post-install:
          consolidation/osnet/osnet-incorporation
        Package licenses may be viewed using the command:
          pkg info --license <pkg_fmri>

DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED
Completed                            451/451   63686/63686  579.9/579.9    0B/s

PHASE                                          ITEMS
Installing new actions                   86968/86968
Updating package state database                 Done
Updating package cache                           0/0
Updating image state                            Done
Creating fast lookup database                   Done
Installation: Succeeded
        Done: Installation completed in 431.445 seconds.

root@global:~# zoneadm -z web boot
root@global:~# zlogin -C web
        Perform sysconfig.  Allow networking to be configured automatically.
        ~~.   (one ~ for ssh, one for zlogin -C)
root@global:~# zlogin web

At this point, networking inside the zone should look like this:

root@web:~# ipadm
NAME              CLASS/TYPE STATE        UNDER      ADDR
lo0               loopback   ok           --         --
   lo0/v4         static     ok           --         127.0.0.1/8
   lo0/v6         static     ok           --         ::1/128
net0              ip         ok           --         --
   net0/v4        inherited  ok           --         192.168.1.2/24
net1              ip         ok           --         --
   net1/v4        dhcp       ok           --         192.168.0.2/24
   net1/v6        addrconf   ok           --         <IPv6addr>

Configure the NFS mounts for web content (/web) and IPS repos (/repo).

root@web:~# cat >> /etc/vfstab
192.168.0.1:/export/repo/11.2/repo - /repo      nfs     -       yes     -
192.168.0.1:/export/web -       /web    nfs     -       yes     -
^D
root@web:~# svcadm enable -r nfs/client

Now, update the pkg image configuration so that it uses the repository from the correct path. 

root@web:~# pkg set-publisher -O file:///repo/ solaris

Update the apache configuration so that it looks to /web for the document root. 

root@web:~# vi /etc/apache2/2.2/httpd.conf

This involves (at a minimum) changing DocumentRoot to "/web" and changing the <Directory "/var/apache/2.2/htdocs"> line to <Directory "/web">.  Your needs will be different and probably more complicated.  This is not an Apache tutorial and I'm not qualified to give it.  After modifying the configuration file, start the web server.

root@web:~# svcadm enable -r svc:/network/http:apache22

This is a good time to do any other configuration (users, other software, etc.) that you need.  If you did the changes above really quickly, you may also want to wait for first boot tasks like man-index to complete.  Allowing it to complete now will mean that it doesn't need to be redone for every instance of this zone that you create.

Since this is a type of a zone that shouldn't need to have its configuration changed a whole lot, let's use the immutable global zone (IMGZ) feature to lock down the web zone.  Note that we use IMGZ inside a kernel zone because a kernel zone is another global zone.

root@web:~# zonecfg -z global set file-mac-profile=fixed-configuration
updating /platform/sun4v/boot_archive
root@web:~# init 6

Back in the global zone, we are ready to create a clone archive once the zone reboots.

root@global:~# archiveadm create -z web /export/web.uar
Initializing Unified Archive creation resources...
Unified Archive initialized: /export/web.uar
Logging to: /system/volatile/archive_log.6835
Executing dataset discovery...
Dataset discovery complete
Creating install media for zone(s)...
Media creation complete
Preparing archive system image...
Beginning archive stream creation...
Archive stream creation complete
Beginning final archive assembly...
Archive creation complete

Now that the web clone unified archive has been created, it can be used on this machine or any other with a similar global zone configuration (etherstubs of same names, dhcp server, same nfs exports, etc.) to quickly create new web servers that fit the model described in the diagram at the top of this post.  To create the free kernel zone:

root@global:~# zonecfg -z free
Use 'create' to begin configuring a new zone.
zonecfg:free> create -a /export/web.uar
zonecfg:free> select anet id=0
zonecfg:free:anet> set allowed-address=192.168.1.3/24
zonecfg:free:anet> end
zonecfg:free> select capped-memory
zonecfg:free:capped-memory> set physical=4g
zonecfg:free:capped-memory> end
zonecfg:free> add virtual-cpu
zonecfg:free:virtual-cpu> set ncpus=2
zonecfg:free:virtual-cpu> end
zonecfg:free> exit

If I were doing this for a purpose other than this blog post, I would have also created a sysconfig profile and passed it to zoneadm install.  This would have made the first boot completely hands-off.

root@global:~# zoneadm -z free install -a /export/web.uar
...
root@global:~# zoneadm -z free boot
root@global:~# zlogin -C free
[Connected to zone 'free' console]
   run sysconfig because I didn't do zoneadm install -c sc_profile.xml
SC profile successfully generated as:
/etc/svc/profile/sysconfig/sysconfig-20150114-210014/sc_profile.xml
...

Once we log into free, we see that there's no more setup to do.

root@free:~# df -h -F nfs
Filesystem             Size   Used  Available Capacity  Mounted on
192.168.0.1:/export/repo/11.2/repo
                       194G    36G       158G    19%    /repo
192.168.0.1:/export/web
                       158G    31K       158G     1%    /web
root@free:~# ipadm
NAME              CLASS/TYPE STATE        UNDER      ADDR
lo0               loopback   ok           --         --
   lo0/v4         static     ok           --         127.0.0.1/8
   lo0/v6         static     ok           --         ::1/128
net0              ip         ok           --         --
   net0/v4        inherited  ok           --         192.168.1.3/24
net1              ip         ok           --         --
   net1/v4        dhcp       ok           --         192.168.0.3/24
   net1/v6        addrconf   ok           --         fe80::8:20ff:fed0:5eb/10

Nearly identical steps can be taken with the deployment of premium.  The key difference there is that we are dedicating two cores (add dedicated-cpu; set cores=...) rather than allocating to virtual-cpus (add virtual-cpu; set ncpus=...).  That is, no one else can use any of the cpus on premium's  cores but free has to compete with the rest of the system for cpu time.

root@global:~# psrinfo -t
socket: 0
  core: 201457665
    cpus: 0-7
  core: 201654273
    cpus: 8-15
  core: 201850881
    cpus: 16-23
  core: 202047489
    cpus: 24-31
root@global:~# zonecfg -z premium
zonecfg:premium> create -a /export/web.uar
zonecfg:premium> select anet id=0
zonecfg:premium:anet> set allowed-address=192.168.1.4/24
zonecfg:premium:anet> end
zonecfg:premium> select capped-memory
zonecfg:premium:capped-memory> set physical=8g
zonecfg:premium:capped-memory> end
zonecfg:premium> add dedicated-cpu
zonecfg:premium:dedicated-cpu> set cores=201850881,202047489
zonecfg:premium:dedicated-cpu> end
zonecfg:premium> exit

The install and boot of premium will then be the same as that of free.  After both zones are up, we can see that psrinfo reports the number of cores for premium but not for free.

root@global:~# zlogin free psrinfo -pv
The physical processor has 2 virtual processors (0-1)
  SPARC-T5 (chipid 0, clock 3600 MHz)
root@global:~# zlogin free prtconf | grep Memory
Memory size: 4096 Megabytes
root@global:~# zlogin premium psrinfo -pv
The physical processor has 2 cores and 16 virtual processors (0-15)
  The core has 8 virtual processors (0-7)
  The core has 8 virtual processors (8-15)
    SPARC-T5 (chipid 0, clock 3600 MHz)
root@global:~# zlogin premium prtconf | grep Memory
Memory size: 8192 Megabytes

That's enough for this post.  Next time, we'll get teeter going.

in-the-box NFS and pkg repository

This is the third in a series of short blog entries.  If you are new to the series, I suggest you start from the top.

As shown in our system diagram, the kernel zones have no direct connection to the outside world.  This will make it quite hard for them to apply updates.  To get past that we will set up a pkg repository in the global zone and export it via NFS to the zones.  I won't belabor the topic of a Local IPS Repositories, because our fine doc writers have already covered that.

As a quick summary, I first created a zfs file system for the repo.  On this system, export is a separate pool with its topmost dataset mounted at /export.  By default /export is the rpool/export dataset - you may need to adjust commands to match your system.

root@global:~# zfs create -p export/repo/11.2

I then followed the procedure in MOS Doc ID 1928542.1 for creating a Solaris 11.2 repository, including all of the SRUs.  That resulted in having a repo with the solaris publisher at /export/repo/11.2/repo.

Since I have a local repo for the kernel zones to use, I figured the global zone may as well use it too.

root@global:~# pkg set-publisher -O  file:///export/repo/11.2/repo/ solaris

To make this publisher accessible (read-only) to the zones on the 192.168.0.0/24 network, it needs to be NFS exported.

root@global:~# share -F nfs -o ro=@192.168.0.0/24 /export/repo/11.2/repo

Now I'll get ahead of myself a bit - I've not actually covered the installation of the free or premium zones yet.  Let's pretend we have a kernel zone called web and we want the repository to be accessible at /repo in web.

root@global:~# zlogin web
root@web:~# vi /etc/vfstab
   (add an entry)
root@web:~# grep /repo /etc/vfstab
192.168.0.1:/export/repo/11.2/repo - /repo      nfs     -       yes     -
root@web:~# svcadm enable -r nfs/client

If svc:/network/nfs/client was already enabled, use mount /repo instead of svcadm enable.  Once /repo is mounted, update the solaris publisher.

root@web:~# pkg set-publisher -O  file:///repo/ solaris

In this example, we also want to have some content shared from the global zone into each of the web zones.  To make that possible:

root@global:~# zfs create export/web
root@global:~# share -F nfs -o ro=@192.168.0.0/24 /export/web

Again, this is exported read-only to the zones.  Adjust for your own needs.

That's it for this post.  Next time we'll create a unified archive that can be used for quickly stamping out lots of web zones.

in-the-box networking

In my previous post, I described a scenario where a couple networks are needed to shuffle NFS and web traffic between a few zones.  In this post, I'll describe the configuration of the networking.  As a reminder, here's the configuration we are after.


The green (192.168.0.0/24) network is used for the two web server zones that need to connect services in the global zone.  The red (102.168.1.0/24) network is used for communication between the load balancer and the web servers.  The basis for this simplistic in-the-box network is an etherstub.

The red network is a bit simpler than the green network, so we'll start with that.

root@global:~# dladm create-etherstub balstub0

That's it!  The (empty) network has been created by simply creating an etherstub.  As zones are configured to use balstub0 as the lower-link in their anet resources, they will attach to this network.

The green network is just a little bit more involved because there will be services (DHCP and NFS) in the global zone that will use this network.

root@global:~# dladm create-etherstub internalstub0
root@global:~# dladm create-vnic -l internalstub0 internal0
root@global:~# ipadm create-ip internal0
root@global:~# ipadm create-addr -T static -a 192.168.0.1/24 internal0
internal0/v4

That wasn't a lot harder.  What we did here was create an etherstub named internalstub0.  On top of it, we created a vnic called internal0, attached an IP interface onto it, then set a static IP address.

As was mentioned in the introductory post, we'll have DHCP manage the IP address allocation for zones that use internalstub0.  Setup of that is pretty straight-forward too.

root@global:~# cat > /etc/inet/dhcpd4.conf
default-lease-time 86400;
log-facility local7;
subnet 192.168.0.0 netmask 255.255.255.0 {
    range 192.168.0.2 192.168.0.254;
    option broadcast-address 192.168.0.255;
}
^D
root@global:~# svcadm enable svc:/network/dhcp/server:ipv4

The real Solaris blogs junkie will recognize this as a simplified version of something from the Maine office.

About

Contributors:

  • Mike Gerdts - Principal Software Engineer
  • Lawrence Chung - Software Engineer
  • More coming soon!

Search

Categories
  • Oracle
Archives
« July 2015
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
10
11
12
13
14
15
17
18
19
20
21
22
23
24
25
26
27
29
30
31
 
       
Today