• July 8, 2015

Kernel zone suspend now goes zoom!

Mike Gerdts
Principal Software Engineer

Solaris 11.2 had the rather nice feature that you can have kernel zones automatically suspend and resume across global zone reboots.  We've made some improvements in this area in Solaris 11.3 to help in the cases where more CPU cycles could make suspend and resume go faster.

As a recap, automatic suspend/resume of kernel zones across global zone reboots can be accomplished by having a suspend resource, setting autoboot=true and autoshutdown=suspend.

# zonecfg -z kz1
zonecfg:kz1> set autoboot=true
zonecfg:kz1> set autoshutdown=suspend
zonecfg:kz1> exit
zonecfg:kz1:suspend> info

path.template: /export/%{zonename}.suspend

path: /export/kz1.suspend

storage not specified

When a graceful reboot is performed (that is, shutdown -r or init 6) svc:/system/zones:default will suspend the zone as it shuts down and resume it as the system boots.  Obviously, reading from memory and writing to disk would have the inclination to saturate the disk bandwidth.  To create a more balanced system, the suspend image is compressed.  While this greatly slows down the write rate, several kernel zones that were concurrently suspending would still saturate available bandwidth in typical configurations.  More balanced and faster - good, right?

Well, this more balanced system came at a cost.  When suspending one zone the performance was not so great.  For example, a basic kernel zone with 2 GB of RAM on a T5 ldom shows:

# tail /var/log/zones/kz1.messages
2015-07-08 12:33:15 notice: NOTICE: Zone suspending
2015-07-08 12:33:39 notice: NOTICE: Zone halted
root@vzl-212:~# ls -lh /export/kz1.suspend
-rw-------   1 root     root        289M Jul  8 12:33 /export/kz1.suspend
# bc -l
289 / 24

Yikes - 12 MB/s to disk.  During this time, I used prstat -mLc -n 5 1 and iostat -xzn and could see that the compression thread in zoneadmd was using 100% of a CPU and the disk had idle times then spurts of being busy as zfs flushed out each transaction group.  Note that this rate of 12 MB/s is artificially low because some other things are going on before and after writing the suspend file that may take up to a couple of seconds.

I then updated my system to the Solaris 11.3 beta release and tried again.  This time things look better.

# zoneadm -z kz1 suspend
# tail /var/log/zones/kz1.messages
2015-07-08 12:59:49 info: Processing command suspend flags 0x0 from ruid/euid/suid 0/0/0 pid 3141
2015-07-08 12:59:49 notice: NOTICE: Zone suspending
2015-07-08 12:59:58 info: Processing command halt flags 0x0 from ruid/euid/suid 0/0/0 pid 0
2015-07-08 12:59:58 notice: NOTICE: Zone halted
# ls -lh /export/kz1.suspend 
-rw-------   1 root     root        290M Jul  8 12:59 /export/kz1.suspend
# echo 290 / 9 | bc -l

That's better, but not great.  Remember what I said about the rate being artificially low above?  While writing the multi-threaded suspend/resume support, I also created some super secret debug code that gives more visibility into the rate.  That shows:

Suspend raw: 1043 MB in 5.9 sec 177.5 MB/s
Suspend compressed: 289 MB in 5.9 sec 49.1 MB/s
Suspend raw-fast-fsync: 1043 MB in 3.5 sec 299.1 MB/s
Suspend compressed-fast-fsync: 289 MB in 3.5 sec 82.8 MB/s

What this is telling me is that my kernel zone with 2 GB of RAM actually had 1043 MB that actually needed to be suspended - the remaining was blocks of zeroes.  The total suspend time was 5.9 seconds, giving a read from memory rate of 177.5 MB/s and write to disk rate of 49.1 MB/s.  The -fsync lines are saying that if suspend didn't fsync(3C) the suspend file before returning, it would have completed in 3.5 seconds, giving a suspend rate of 82.8 MB/s.  That's looking better.

In another experiment, we aim to make the storage not be the limiting factor. This time, let's do 16 GB of RAM and write the suspend image to /tmp.

# zonecfg -z kz1 info
zonename: kz1
brand: solaris-kz
autoboot: true
autoshutdown: suspend

ncpus: 12

physical: 16G

path: /tmp/kz1.suspend

storage not specified

To ensure that most of the RAM wasn't just blocks of zeroes (and as such wouldn't be in the suspend file), I created a tar file of /usr in kz1's /tmp and made copies of it until the kernel zone's RAM was rather full.

This time around, we are seeing that we are able to write the 15 GB of active memory in 52.5 seconds.  Notice that this is roughly 15x the amount of memory in only double the time from our Solaris 11.2 baseline.

Suspend raw: 15007 MB in 52.5 sec 286.1 MB/s
Suspend compressed: 5416 MB in 52.5 sec 103.3 MB/s

While the focus of this entry has been multi-threaded compression during suspend, it's also worth noting that:

  • The suspend image is also encrypted. If someone gets a copy of the suspend image, it doesn't mean that they can read the guest memory.  Oh, and the encryption is multi-threaded as well.
  • Decryption is also multi-threaded.
  • And so is uncompression.  The parallel compression and uncompression code is freely available, even. :)

The performance numbers here should be taken with a grain of salt.  Many factors influence the actual rate you will see.  In particular:

  • Different CPUs have very different performance characteristics.
  • If the zone has a dedicated-cpu resource, only the CPU's that are dedicated to the zone will be used for compression and encryption.
  • More CPUs tend to go faster, but only to a certain point.
  • Various types of storage will perform vastly differently.
  • When many zones are suspending or resuming at the same time, they will compete for resources.
And one last thing... for those of you that are too impatient to wait until Solaris 11.3 to try this out, it is actually in Solaris 11.2 SRU 8 and later.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha