Tuesday Jul 07, 2015

Changes to ZFS ARC Memory Allocation in 11.3

New in Solaris 11.3 is a kernel memory allocation subsystem called the
kernel object manager, or KOM. The first consumer of this subsystem is the

Prior to Solaris 11.3, the ZFS ARC allocated its memory from the kernel heap
space using kmem caches. This has several drawbacks: first, internal
fragmentation can result in memory used by the ARC not being reclaimed by the
system. This problem is particularly acute if large pages are being used, since
the buffer size is considerably smaller than the large page size -- even one
buffer still allocated will prevent the system from freeing the large page.
Another drawback of ZFS ARC using the kernel heap is that all of the kernel
heap is non-relocatable in memory, and thus must reside in the kernel cage.
This can lead to issues allocating large pages or performing DR memory remove
operations once the ARC has grown large, even if it shrinks successfully. As a
workaround for the cage growth issue, many sysadmins have limited the size of
the ZFS ARC cache in /etc/system. Finally, scalability of ARC shrinking prior
to Solaris 11.3 is limited by heap page unmapping speed on large SPARC systems.

In Solaris 11.3, the ZFS ARC allocates its memory through KOM. The metadata
which is frequently accessed by ZFS (such as directory files) remains in
the kernel cage, but the vast majority of the cache which is not frequently
accessed by ZFS now resides outside of the kernel cage, where it can be
relocated by DR and page coalescing. KOM uses a slab size of 2M on x86 or 4M on
SPARC, so internal fragmenation is much less of an issue than it was with 256M
heap pages on SPARC. Scalability is vastly improved, as KOM takes advantage of
64-bit systems by using the seg_kpm framework for its address translations.

With this change, many systems which required limiting the ARC size will no
longer require a hard limit, since the system is able to manage its memory much
better. Metadata heavy workloads, and systems hosting kernel zones, will still
need to limit the ARC size through /etc/system tuning in Solaris 11.3, however.

Friday Mar 16, 2012

Great Solaris 10 features paving the way to Solaris 11

Karoly Vegh writes on the Oracle Systems Blog Austria about what you can do with Solaris 10 today that will get you ready for Solaris 11.

Even today, many people still use Solaris 10 as if it were a patch update to Solaris 8 or 9, missing out on the power behind Live Upgrade, Zones, resource management, and ZFS. Learning more about these will help set your feet on the road to the even more sophisticated capabilities of Oracle Solaris 11.

[Read More]

Friday Aug 19, 2011

Replacing the system HDD by a larger one on Solaris 11 X86

A feedback on replacing the internal disk drive on a Solaris 11 Express labtop by a larger one, using ZFS mirrorring and ZFS split.
[Read More]

Tuesday Nov 30, 2010

Upgrading ZFS

Upgrading to the latest release of Solaris doesn't automatically upgrade your existing ZFS pools and datasets. So if you're trying to take advantage of cool new features like dedup and crypto on those datasets, you'll have to update ZFS first.

Upgrading Your ZFS Pools

Beware - upgrading your root ZFS pool will essentially invalidate your prior boot environments as you will no longer be able to boot into them.

The version property will tell where you're currently at:

bleonard@opensolaris:~$ zpool get version rpool
rpool  version   14       local

And the -v (lower case) option will tell you what's available:

bleonard@opensolaris:~$ zpool upgrade -v
This system is currently running ZFS pool version 31.

The following versions are supported:

---  --------------------------------------------------------
 1   Initial ZFS version
 2   Ditto blocks (replicated metadata)
 3   Hot spares and double parity RAID-Z
 4   zpool history
 5   Compression using the gzip algorithm
 6   bootfs pool property
 7   Separate intent log devices
 8   Delegated administration
 9   refquota and refreservation properties
 10  Cache devices
 11  Improved scrub performance
 12  Snapshot properties
 13  snapused property
 14  passthrough-x aclinherit
 15  user/group space accounting
 16  stmf property support
 17  Triple-parity RAID-Z
 18  Snapshot user holds
 19  Log device removal
 20  Compression using zle (zero-length encoding)
 21  Deduplication
 22  Received properties
 23  Slim ZIL
 24  System attributes
 25  Improved scrub stats
 26  Improved snapshot deletion performance
 27  Improved snapshot creation performance
 28  Multiple vdev replacements
 29  RAID-Z/mirror hybrid allocator
 30  Encryption
 31  Improved 'zfs list' performance

For more information on a particular version, including supported releases,
see the ZFS Administration Guide.

You can move to any version you'd like using the -V (upper case) option, however, I'm just going to move the the latest and greatest:
bleonard@opensolaris:~$ pfexec zpool upgrade rpool
This system is currently running ZFS pool version 31.

Successfully upgraded 'rpool' from version 14 to version 31

Just to verify the upgrade:

bleonard@opensolaris:~$ zpool get version rpool
rpool  version   31       default

Upgrading Your ZFS Datasets

The process is very similar. First, if you're interested in the version you're current at:

bleonard@opensolaris:~$ zfs get version rpool
rpool  version   3        -

Then to see what's available:

bleonard@opensolaris:~$ zfs upgrade -v
The following filesystem versions are supported:

---  --------------------------------------------------------
 1   Initial ZFS filesystem version
 2   Enhanced directory entries
 3   Case insensitive and File system unique identifier (FUID)
 4   userquota, groupquota properties
 5   System attributes

For more information on a particular version, including supported releases,
see the ZFS Administration Guide.

I'm going to upgrade all the file systems at once. However, there is the option to upgrade just a particular file system, for example:

bleonard@opensolaris:~$ pfexec zfs upgrade rpool/export/home
1 filesystems upgraded 

Or a file system and its descendants:

bleonard@opensolaris:~$ pfexec zfs upgrade -r rpool/export
1 filesystems upgraded
1 filesystems already at this version

But to simply upgrade all:

bleonard@opensolaris:~$ pfexec zfs upgrade -a
25 filesystems upgraded
2 filesystems already at this version

This process may take several minutes to complete, depending of the number file systems that need to be upgraded.

Wednesday Nov 03, 2010


I was about to write a new blog on Solaris' Fault Management Architecture when I realized my friend Bob Netherton has beaten to it by almost 3 years.

We like to tout the benefits of FMA a lot, but it's often hard to demonstrate because you don't want to go around destroying CPUs and memory modules just to see FMA in action. However, by creating a ZFS pool with files as disks, it's quite easy to demonstrate and that's exactly what Bob does in his blog ZFS and FMA - Two great tastes ......

Note, there's also an FMA Demo Kit that you can use to simulate faults to other hardware components, but I haven't played with that myself yet. Other resources I found helpful:

Tuesday Sep 28, 2010

Understanding the Space Used by ZFS

Until recently, I've been confused and frustrated by the zfs list output as I try to clear up space on my hard drive.

[Read More]

Tuesday Jun 16, 2009

File Version Explorer

If you're using OpenSolaris you should certainly have Time Slider enabled, at least on your home directory.[Read More]

Tuesday Apr 28, 2009

ZFS Compression - A Win-Win

This may sound counterintuitive, but turning on ZFS compression not only saves space, but also improves performance. This is because the time it takes to compress and decompress the data is quicker than then time it takes to read and write the uncompressed data to disk (at least on newer laptops with multi-core chips).

To turn on compression simply run:

pfexec zfs set compression=on rpool

All the child datasets of rpool will inherit this setting. For example:

bleonard@opensolaris:~$ zfs get compression rpool/export/home
NAME               PROPERTY     VALUE              SOURCE
rpool/export/home  compression  on                 inherited from rpool/export

Note, only new data written after turning on compression is effected. Your existing data is left in its uncompressed state.

To check the compression you're achieving on a dataset, use the compressratio property:

bleonard@opensolaris:~$ zfs get compressratio rpool/export/home
NAME               PROPERTY       VALUE              SOURCE
rpool/export/home  compressratio  1.00x  

I'm seeing 1.00x because I just enabled compression. Over time, as I write new data, this number will increase. 

An Example

This part is optional but will give you a better feeling for how compression works.

Start by creating a new (therefore, empty) file system, ensuring compression is off (otherwise it will inherit the setting from rpool):

bleonard@opensolaris:~$ pfexec zfs create -o compression=off rpool/test
bleonard@opensolaris:~$ cd /rpool/test

Copy a file into the test file system, turn on compression, and copy it again:

bleonard@opensolaris:/rpool/test$ pfexec cp /platform/i86pc/kernel/amd64/unix unix1
bleonard@opensolaris:/rpool/test$ pfexec zfs set compression=on rpool/test
bleonard@opensolaris:/rpool/test$ pfexec cp /platform/i86pc/kernel/amd64/unix unix2

Check the compression ratio:

bleonard@opensolaris:/rpool/test$ pfexec zfs get compressratio rpool/test
rpool/test  compressratio  1.28x       -

Check the difference in file size:

bleonard@opensolaris:/rpool/test$ du -hs u\*
1.7M	unix1
936K	unix2

Also note the difference between using du and ls:

bleonard@opensolaris:/rpool/test$ ls -lh
total 2.6M
-rwxr-xr-x 1 root root 1.6M 2009-04-28 13:32 unix1\*
-rwxr-xr-x 1 root root 1.6M 2009-04-28 13:33 unix2\*

ls does not show the compressed file size! See ZFS Compression, whcn du and ls appear to disagree for a great explanation of this.

Finally, delete unix1, which was not compressed, and notice how the compression ratio for the file system rises accordingly:

bleonard@opensolaris:/rpool/test$ pfexec rm unix1
bleonard@opensolaris:/rpool/test$ pfexec zfs get compressratio rpool/test
rpool/test  compressratio  1.77x       - 

Finally, clean up:

bleonard@opensolaris:/rpool/test$ cd
bleonard@opensolaris:~$ pfexec zfs destroy rpool/test

Saturday Apr 11, 2009

Another OpenSolaris presentation from Community One

Nick Solter and Dave Miner (authors of OpenSolaris Bible) presented a session called Becoming an OpenSolaris Power User, which covers topics such as ZFS, DTrace and networking at Community One. Source: Nick Solter's blog.

Tuesday Mar 03, 2009

OpenSolaris 2008.11 Training in Brno

Me, Martin Man and Lubos Kocman (all three CZOSUG leaders) organized a full day training at Faculty of Informatics Masaryk university, Brno. Sun is partnering this university and we cooperate a lot around OpenSolaris. I've been to the university about 6 times so far. The most recent training was focused on 2008.11 and it was also an install fest. Here are the slides from the event:[Read More]

Tuesday Feb 03, 2009

Allowing ZFS Snapshots

I find myself working with ZFS snapshots quite a bit, and as a convenience, I'd prefer if I didn't have to prefix the commands with pfexec. For example:
pfexec zfs rollback rpool/vbox@clean

Fortunately, it is possible to delegate the permission to run ZFS commands to my user account using zfs allow. For example:

pfexec zfs allow bleonard snapshot,rollback,mount rpool/vbox

Note, the ability to 'mount' is required in order to create and rollback snapshots. See the zfs man page for details.

To see the permissions assigned to a file system:

bleonard@opensolaris:~$ zfs allow rpool/vbox
Local+Descendent permissions on (rpool/vbox)
	user bleonard mount,rollback,snapshot

Now rollbacks are a bit easier:

zfs rollback rpool/vbox@clean

You can delegate any of the zfs commands, including allow. For more details see Delegating ZFS Permissions.

Thursday Jan 08, 2009

A Hands on Introduction to ZFS Pools

If you have a USB hub and some spare USB sticks, Stefan Schneider, has an excellent 3 part tutorial on working with ZFS pools:

The tutorials are independent of one another, so for example, if you just want to experiment with RAIDZ2, just do Part 3.

Tuesday Nov 04, 2008

Screencast: ZFS Basics

I started to record various screencasts for OpenSolaris to evangelize different features of Solaris/OpenSolaris and make it easier for people to discover how to use these features. The first demo I created shows the basics of using ZFS. The screencast requires Flash player and contains audio as well. Enjoy!

Click on the image to play the screencast

Thursday Oct 23, 2008

VirtualBox Rollback

As a software developer, I use VirtualBox to test how my applications look, feel and behave across different operating systems and/or browser combinations.

However, how do you keep the virtual machine clean? Over time it will become as friendly to your application as your host operating system. VirtualBox does have a Snapshot feature. However, if you have multiple machines, it could be tedious to roll them back individually. A better option in this case is to use a ZFS snapshot. By putting VirtualBox in its own ZFS file system, you can take a snapshot whenever you like, and then roll back to your clean slate once testing is complete. Here's how I set it up:

[Read More]

Tuesday Jul 15, 2008

Increasing Storage Capacity

One of the ways I run OpenSolaris is under VirtualBox. The virtual disk image I initially created was only 8 GB and yesterday I found myself out of space. Running out of disk space is a situation we've all found ourselves in before - regardless if whether we're running virtualized or not. Initially I just thought I could add another disk to the pool, but it turns out that it is not possible to boot OpenSolaris from a striped disk (see: OpenSolaris will not boot after adding to the ZFS pool). So my options are to upgrade the entire disk or to move one or more of my file systems onto a new disk. For this entry I'm going to cover the latter, by moving /export off of the boot disk.

[Read More]

The Observatory is a blog for users of Oracle Solaris. Tune in here for tips, tricks and more as we explore the Solaris operating system from Oracle.


« July 2016