Wednesday Jan 07, 2015

ZFS Encryption in Oracle ZFS Storage Appliance

With the 2013.1.3.0 (aka OS8.3) release of the software for the Oracle ZFS Storage Appliance the underlying ZFS encryption functionality is now available for use.  This is the same ZFS encryption that is available in general purpose Solaris but with appliance interfaces added for key management.

I originally wrote the following quick start guide for our internal test engineers and other developers while were developing the functionality and since the functionality is now available I thought I'd share it here. It walks through the required steps to configure encryption on the ZFSSA and perform some basic steps with keys and encrypted shares. Note that the BUI and CLI screenshots are not showing exactly the same system and configuration.

Setup Encryption with LOCAL keystore (CLI)

The first step is to setup the master passphrase, then we can create keys that will be used for assigning to encrypted shares.

brm7330-020:> shares encryption 
brm7330-020:shares encryption> show
                              okm => Manage encryption keys
                            local => Manage encryption keys

brm7330-020:shares encryption> local
brm7330-020:shares encryption local> show
             master_passphrase = 

                       keys => Manage this Keystore's Keys

brm7330-020:shares encryption local> set master_passphrase
Enter new master_passphrase: 
Re-enter new master_passphrase: 
             master_passphrase = *********

Setup Encryption with LOCAL keystore (BUI)


Creating Keys

Now lets create our first key, the only thing we have to provide is the keyname field, this is the name that is used in the CLI and BUI when assigning a key to a project or share.  

It is possible to provide a hex encoded raw 256 bit key in the key field, if that is no provided a new randomly generated key value is used instead.  Note that the keys are stored in an encrypted form using the master_passphrase supplied above. For this simple walkthrough we will let the system generate the key value for us.

brm7330-020:shares encryption local> keys create
brm7330-020:shares encryption local>
                        cipher = AES
                           key = 
                       keyname = (unset)
brm7330-020:shares encryption local key (uncommitted)>set keyname=MyFirstKey
                       keyname = MyFirstKey (uncommitted)
brm7330-020:shares encryption local key (uncommitted)> commit

If we were doing this from the BUI it would look like this:


Setup Encryption with OKM keystore (CLI)

For OKM you need to set the agent_id and the IP address (NOT hostname) and registration_pin given to you by your OKM security officer, the example below shows an already configured setup for OKM.

brm7330-020:> shares encryption 
brm7330-020:shares encryption> show
                              okm => Manage encryption keys
                            local => Manage encryption keys
brm7330-020:shares encryption> okm
brm7330-020:shares encryption okm> show
                      agent_id = ExternalClient041
              registration_pin = *********
                   server_addr =
                             keys => Manage this Keystore's Keys

We are now ready to create our first encrypted share/project

Creating an Encrypted Share

Creation of encrypted project results in all shares in that project being encrypted, by default the shares (filesystems & LUNs) will inherit the encryption properties from the parent project.

brm7330-020:shares> project myproject
brm7330-020:shares myproject (uncommitted)> set encryption=aes-128-ccm
                    encryption = aes-128-ccm (uncommitted)
brm7330-020:shares myproject (uncommitted)> set keystore=LOCAL
                      keystore = LOCAL (uncommitted)
brm7330-020:shares myproject (uncommitted)> set keyname=MyFirstKey 
                       keyname = MyFirstKey (uncommitted)
brm7330-020:shares myproject (uncommitted)> commit

That is it now all shares we create under this project are automatically encrypted with AES 128 CCM using the key named "MyFirstKey" from the LOCAL keystore.

Lets now create a filesystem in our new project and show that it inherited the encryption properties:

brm7330-020:shares> select myproject
brm7330-020:shares myproject> filesystem f1
brm7330-020:shares myproject/f1 (uncommitted)> commit
brm7330-020:shares myproject> select f1
brm7330-020:shares myproject/f1> get encryption keystore keyname keystatus
                    encryption = aes-128-ccm (inherited)
                      keystore = LOCAL (inherited)
                       keyname = MyFirstKey (inherited)
                     keystatus = available
brm7330-020:shares myproject/f1> done

 For the BUI the filesystem and LUN creation dialogs allow selection of encryption properties.


Key Change

It is possible to change the key associated with a Project/Share at any time, even while it is in use by client systems.

Lets now create an additional key and perform a key change on the project we have just created.

brm7330">-00:> shares encryption local keys create
brm7330-020:shares encryption local key (uncommitted)> set keyname=MySecondKey
                       keyname = MySecondKey (uncommitted)
brm7330-020:sares encryption local key (uncommitted)> commit

Now lets change the key used for "myproject" and all the shares in it that are inheriting the key properties:

brm7330-020:> shares select myproject 
brm7330-020:shares myproject> set keyname=MySecondKey
                       keyname = MySecondKey (uncommitted)
brm7330-020:shares myproject> commit

If we look at the keyname property of our share "myproject/f1" we will see it has changed. The filesystem remained shared during the key change and was accesible for clients writting to it.

brm7330-020:shares myproject> select f1 get keyname
                       keyname = MySecondKey (inherited)
brm7330-020:shares myproject>


Deleting Keys

Deletion of a key is a very fast and effective way to make a large amount of data inaccessible.  Keys can be deleted even if they are in use.  If the key is in use a warning will be given and confirmation is required.  All shares using that key will be unshared and will no longer be able to be accessed by clients.

Example of deleting a key that is in use:

brm7330-020:shares encryption local keys> destroy keyname=MyFirstKey
This key has the following dependent shares:
Destroying this key will render the data inaccessible. Are you sure? (Y/N) Y

A similar message is displayed via a popup dialog in the BUI


Now lets look at a share in a project that was using that key:

brm7330-010:> shares select HR select EMEA
brm7330-010:shares HR/EMEA> get encryption keystore keyname keystatus
                    encryption = aes-128-ccm (inherited)
                      keystore = LOCAL (inherited)
                       keyname = 1 (inherited)
                     keystatus = unavailable

Thursday Jul 31, 2014

OpenStack Security integration for Solaris 11.2

As a part-time member/meddeler of the Solaris OpenStack engineering team I was asked to create some posts for the team's new OpenStack blog.

I've so far written up two short articles, one covering using ZFS encryption with Cinder, and one on Immutable OpenStack VMs.

Tuesday Oct 30, 2012

New ZFS Encryption features in Solaris 11.1

Solaris 11.1 brings a few small but significant improvements to ZFS dataset encryption.  There is a new readonly property 'keychangedate' that shows that date and time of the last wrapping key change (basically the last time 'zfs key -c' was run on the dataset), this is similar to the 'rekeydate' property that shows the last time we added a new data encryption key.

$ zfs get creation,keychangedate,rekeydate rpool/export/home/bob
NAME                   PROPERTY       VALUE                  SOURCE
rpool/export/home/bob  creation       Mon Mar 21 11:05 2011  -
rpool/export/home/bob  keychangedate  Fri Oct 26 11:50 2012  local
rpool/export/home/bob  rekeydate      Tue Oct 30  9:53 2012  local

The above example shows that we have changed both the wrapping key and added new data encryption keys since the filesystem was initially created.  If we haven't changed a wrapping key then it will be the same as the creation date.  It should be obvious but for filesystems that were created prior to Solaris 11.1 we don't have this data so it will be displayed as '-' instead.

Another change that I made was to relax the restriction that the size of the wrapping key needed to match the size of the data encryption key (ie the size given in the encryption property).  In Solaris 11 Express and Solaris 11 if you set encryption=aes-256-ccm we required that the wrapping key be 256 bits in length.  This restriction was unnecessary and made it impossible to select encryption property values with key lengths 128 and 192 when the wrapping key was stored in the Oracle Key Manager.  This is because currently the Oracle Key Manager stores AES 256 bit keys only.  Now with Solaris 11.1 this restriciton has been removed.

There is still one case were the wrapping key size and data encryption key size will always match and that is where they keysource property sets the format to be 'passphrase', since this is a key generated internally to libzfs and to preseve compatibility on upgrade from older releases the code will always generate a wrapping key (using PKCS#5 PBKDF2 as before) that matches the key length size of the encryption property.

The pam_zfs_key module has been updated so that it allows you to specify encryption=off.

There were also some bugs fixed including not attempting to load keys for datasets that are delegated to zones and some other fixes to error paths to ensure that we could support Zones On Shared Storage where all the datasets in the ZFS pool were encrypted that I discussed in my previous blog entry.

If there are features you would like to see for ZFS encryption please let me know (direct email or comments on this blog are fine, or if you have a support contract having your support rep log an enhancement request).



Thursday Sep 13, 2012

To encryption=on or encryption=off a simple ZFS Crypto demo

I've just been asked twice this week how I would demonstrate ZFS encryption really is encrypting the data on disk.  It needs to be really simple and the target isn't forensics or cryptanalysis just a quick demo to show the before and after.

I usually do this small demo using a pool based on files so I can run strings(1) on the "disks" that make up the pool. The demo will work with real disks too but it will take a lot longer (how much longer depends on the size of your disks).  The file hamlet.txt is this one from

# mkfile 64m /tmp/pool1_file
# zpool create clear_pool /tmp/pool1_file
# cp hamlet.txt /clear_pool
# grep -i hamlet /clear_pool/hamlet.txt | wc -l

Note the number of times hamlet appears

# zpool export clear_pool
# strings /tmp/pool1_file | grep -i hamlet | wc -l

Note the number of times hamlet appears on disk - it is 2 more because the file is called hamlet.txt and file names are in the clear as well and we keep at least two copies of metadata.

Now lets encrypt the file systems in the pool.
Note you MUST use a new pool file don't reuse the one from above.

# mkfile 64m /tmp/pool2_file
# zpool create -O encryption=on enc_pool /tmp/pool2_file
Enter passphrase for 'enc_pool': 
Enter again: 
# cp hamlet.txt /enc_pool
# grep -i hamlet /enc_pool/hamlet.txt | wc -l

Note the number of times hamlet appears is the same as before

# zpool export enc_pool
# strings /tmp/pool2_file | grep -i hamlet | wc -l

Note the word hamlet doesn't appear at all!

As a said above this isn't indended as "proof" that ZFS does encryption properly just as a quick to do demo.

Wednesday Nov 09, 2011

User home directory encryption with ZFS

ZFS encryption has a very flexible key management capability, including the option to delegate key management to individual users.  We can use this together with a PAM module I wrote to provide per user encrypted home directories.  My laptop and workstation at Oracle are configured like this:

First lest setup console login for encrypted home directories:

    root@ltz:~# cat >> /etc/pam.conf<<_EOM
    login auth     required create
    other password required

The first line ensures that when we login on the console bob's home directory is created with as an encrypted ZFS file system if it doesn't already exist, the second one ensures that the passphrase for it stays in sync with his login password.

Now lets create a new user 'bob' who looks after his own encryption key for is home directory, note that we do not specify '-m' to useradd so that pam_zfs_key will create the home directory when the user logs in.

root@ltz:~# useradd bob
root@ltz:~# passwd bob
New Password: 
Re-enter new Password: 
passwd: password successfully changed for bob
root@ltz:~# passwd -f bob
passwd: password information changed for bob

We have now created the user bob with an expired password. Lets login as bob and see what happens:

    ltz console login: bob
    Choose a new password.
    New Password: 
    Re-enter new Password: 
    login: password successfully changed for bob
    Creating home directory with encryption=on.
    Your login password will be used as the wrapping key.
    Last login: Tue Oct 18 12:55:59 on console
    Oracle Corporation      SunOS 5.11      11.0    November 2011
    -bash-4.1$ /usr/sbin/zfs get encryption,keysource rpool/export/home/bob
    NAME                   PROPERTY    VALUE              SOURCE
    rpool/export/home/bob  encryption  on                 local
    rpool/export/home/bob  keysource   passphrase,prompt  local

Note that bob had to first change the expired password. After we provided a new login password a new ZFS file system for bob's home directory was created. The new login password that bob chose is also the passphrase for this ZFS encrypted home directory. This means that at no time did the administrator ever know the passphrase for bob's home directory. After the machine reboots bob's home directory won't be mounted anymore until bob logs in again.  If we want bob's home directory to be unmounted and the key removed from the kernel when bob logs out (even if the system isn't rebooting) then we can add the 'force' option to the module line in /etc/pam.conf

If users login with GDM or ssh then there is a little more configuration needed in /etc/pam.conf to enable pam_zfs_key for those services as well.

root@ltz:~# cat >> /etc/pam.conf<<_EOM
gdm     auth requisite
gdm     auth required 
gdm     auth required 
gdm     auth required  create
gdm     auth required 

root@ltz:~# cat >> /etc/pam.conf<<_EOM
sshd-kbdint     auth requisite
sshd-kbdint     auth required 
sshd-kbdint     auth required 
sshd-kbdint     auth required  create
sshd-kbdint     auth required 

Note that this only works when we are logging in to SSH with a password. Not if we are doing pubkey authentication because the encryption passphrase for the home directory hasn't been supplied. However pubkey and gssapi will work for later authentications after the home directory is mounted up since the ZFS passphrase is supplied during that first ssh or gdm login.

Tuesday May 10, 2011

Encrypting /var/tmp & swap in Solaris 11 Express

As some readers might remember from previous posts it isn't possible in Solaris 11 Express to boot from an encrypted ZFS dataset.  However it is possible to have encrypted swap space and thus (by default) an encrypted /tmp.  That still leaves /var/tmp unencrypted

First lets look at swap space encryption.  That is as simple as putting the word "encrypted" into the mount options field of /etc/vfstab for the swap ZVOL.  If swap is a ZVOL then ZFS encryption will be used, if swap is a raw disk slice or file then lofi will be interposed between the device/file using a randomly generated key.  That is a fully supported solution in Solaris 11 Express implemented by the swapadd command.

For encrypting /var/tmp we need to beyond the provided services and the following (unsupported) method takes its lead from what I did for swapadd and applies it to /var/tmp.  Note however that this assumes that nothing in /var/tmp should be preserved on boot and won't even be readable from another boot environment, so if you use this don't put stuff into /var/tmp you want to get access to after a reboot.

This takes advantage of the fact that in SMF we can place dependencies onto other services without modifying them.  So while the following makes some basic assumptions about the Solaris ZFS datasets layout it doesn't require modifying any existing binaries or configuration files.

We create a new service svc:/site/system/filesystem/tmp:default this service will create an encrypted dataset for /var/tmp using the manifest and method script that follows:

<?xml version='1.0'?>
<!DOCTYPE service_bundle SYSTEM '/usr/share/lib/xml/dtd/service_bundle.dtd.1'>


 The contents of this file are subject to the terms of the
 Common Development and Distribution License (the "License").
 You may not use this file except in compliance with the License.

 You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
 See the License for the specific language governing permissions
 and limitations under the License.

 When distributing Covered Code, include this CDDL HEADER in each
 file and include the License file at usr/src/OPENSOLARIS.LICENSE.
 If applicable, add the following below this CDDL HEADER, with the
 fields enclosed by brackets "[]" replaced with your own identifying
 information: Portions Copyright [yyyy] [name of copyright owner]


 Copyright (c) 2011, Oracle and/or its affiliates. All rights revserved.


<service_bundle type='manifest' name='darrenm:etmp'>


        <create_default_instance enabled='true' />

        <single_instance />

                <service_fmri value='svc:/system/cryptosvc' />

                <service_fmri value='svc:/system/filesystem/minimal' />

                timeout_seconds='30' />

                timeout_seconds='1' />

        <property_group name='startd' type='framework'>
                <propval name='duration' type='astring' value='transient' />



# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or
# See the License for the specific language governing permissions
# and limitations under the License.
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
# Copyright (c) 2011, Oracle and/or its affiliates. All rights reserved.
# /lib/svc/method/site-etmp

zfs destroy rpool/tmp > /dev/null 2>&1
zfs create -o mountpoint=/var/tmp -o encryption=on -o keysource=raw,file:///dev/random rpool/tmp
chmod 1777 /var/tmp

exit 0

On reboot what will happen is that a new dataset for /var/tmp will be created.   It would be possible to have a more sophisticated method script that doesn't use a hardcoded dataset name of root/tmp but this seems sufficient for now.  It will look something like this:

darrenm-pc:pts/1$ cd /var/tmp
darrenm-pc:pts/1$ df -hl .
Filesystem             Size   Used  Available Capacity  Mounted on
rpool/tmp              113G    79K        77G     1%    /var/tmp
darrenm-pc:pts/1$ ls -ld .
drwxrwxrwt   5 root     root           5 May 10 14:39 ./
darrenm-pc:pts/1$ zfs get encryption,keysource,keystatus rpool/tmp
NAME       PROPERTY    VALUE                   SOURCE
rpool/tmp  encryption  on                      local
rpool/tmp  keysource   raw,file:///dev/random  local
rpool/tmp  keystatus   available               -



Friday Nov 19, 2010

ZFS encryption what is on disk ?

This article is about what is and isn't stored encrypted on disk for ZFS datasets that are encrypted and how we do the actual encryption. It does require some understanding of Solaris and ZFS debugging tools.

The first important thing to understand about ZFS is that it is not providing "full disk" encryption and you will be able to tell that a disk that has data on it that was encrypted by ZFS is part of a ZFS pool.

This is in part because one of the requirements for adding encryption support to ZFS was that a given ZFS pool be able to contain a mix of encrypted and cleartext datasets and those that are encrypted be able to use different algorithms/keylengths and different encryption keys.

We also require that the key material does not need to have been made available in order for pool wide operations and certain dataset operations (such zfs destroy) to succeed.  One of the most important pool wide operations is scrub/resilver; we need to ensure that hotspare, disk replacement and self healing work even if the key material has never been made available to this running instance of the system. We must also be able to claim (but not necessarily replay) the log blocks (ZIL) on reboot after power loss or panic without requiring the key material (ZFS must remain consistent on disk at all times).

What this means is that even in a pool were all of the datasets are marked as being encrypted (eg zpool create -O encryption=on tank ...) there is some ZFS metadata that is always in the clear.

What is always in the clear even for encrypted datasets?
  • ZFS pool layout
  • Dataset names
  • Pool and dataset properties, including user defined properties
    • compression, encryption, share, etc.
  • Dataset level quotas (zfs set quota)
  • Dataset delegations (zfs allow)
  • The pool history (zpool history)
  • All dnode blocks
    • Needed to traverse the pool for resilver/scrub
  • Block pointer
    • The blkptr_t contains the MAC/AuthTag from AES-CCM or AES-GCM in the top 96 bits of the checksum field. The SHA256 checksum is truncated to provide this 96 bits of space.
      • The checksum for an encrypted block is always sha256-mac
    • The 96bit IV for the block is in dva[2] of the blkptr_t
      • This means that an encrypted block can have a maximum of 2 copies not 3

What is encrypted when a dataset is encrypted?

  • All file content written to a ZFS filesystem via the the ZPL/VFS interfaces (ie POSIX interfaces)
    • open(2), write(2), mmap(2), etc.
  • All POSIX (and ZFS filesystem) metadata: ACLs, file and directory names, permissions, system and extended attributes on files and all file timestamps
    • ZPL metadata is normally contained in the bonusbuf area of a dnode_phys_t but the dnode is in the clear on disk. For encrypted datasets the bonusbuf is always empty and the content normally have been there is pushed out to an encrypted "spill" block, called System Attribtue block.  Normally for ZPL filesystems spill blocks are only used for files with large ACLs.
  • System Attribute (spill) blocks (used for any purpose)
  • All data written to a ZVOL
  • User/group quota information for ZFS filesystems, both the policy and space accounting (zfs set userquota@ | groupquota@)
  • FUID mappings for UNIX <-> CIFS user identities
  • All of the above if it is in a ZIL (ZFS Intent Log) record.
    • Note that the actual ZIL blocks have block pointers and a record header that includes the sizing information that is in the clear.
  • Data encryption keys
    • These are stored in an on disk keychain referenced from the dsl_dir_phys_t. 

The ondisk keychain

The keychain entries are ZAP objects that are indexed by the transaction they were created in. The entries are individually wrapped by the dataset's wrapping key each with their own IV and an indicator of what wrapping key algorithm was used (at this time the wrapping key crypto algorithm always matches the encryption property).  Every encrypted dataset has at least one keychain entry.  Clones have their own keychain and do not reference the one of their origin, because the clone may have a different wrapping key and the clone may have different keychain entries to its origin.

Encrypting a block

Each ZFS on disk block (smallest size is 512 bytes, largest is 128k) is encrypted using AES in either CCM or GCM mode as indicated by the encryption property. Even though CCM and GCM provide the ability to have additional authenticated data that isn't encrypted this isn't used because (with the exception of the ZIL blocks) all data in the block is encrypted.  A 96 bit IV per disk block is used and both CCM and GCM are requested to provide a 96 bit MAC/AuthTag in addition to the ciphertext.  While we could get a larger MAC space in the ZFS on disk blkptr_t is very tight and we need to leave some of it available for future features.  After encryption each block is also checksummed by the ZIO pipeline using SHA256 (fletcher is not available for encrypted datasets).

IV generation for encrypted blocks

Every encrypted on disk block has its own IV, (stored in dva[2] of the blkptr_t).  The IV is generated by taking the first 96 bits of a SHA256 hash of the contents of the zbookmark_t and the transaction the block was first written in.  We actually have all this information available both at read and write time so we don't need to store the IV in the simplest case. However snapshots, clones and deduplication as well as some (non encryption related) future features complicate this so we do store the IV.

If dedup=on for the dataset the per block IVs are generated differently.  They are generated by taking an HMAC-SHA256 of the plaintext and using the left most 96 bits of that as the IV.  The key used for the HMAC-SHA256 is different to the one used by AES for the data encryption, but is stored (wrapped) in the same keychain entry, just like the data encryption key a new one is generated when doing a 'zfs key -K <dataset>'.  Obviously we couldn't calculate this IV when doing a read so it has to be stored.

ZIL blocks

The ZIL log blocks are written in exactly the same way regardless of whether the ZIL is in the main pool disks or a separate intent log (slog) is being used.  The ZIL blocks are encrypted a different way to blocks going through the "normal" write path; this is because log blocks are formated on disk differently anyway.  The log blocks are chained together and have a header (zil_chain_t) that indicates what size the log block is and the blkptr_t to the next block as well as an embedded checksum that chains the blocks together.  For encrypted log blocks the MAC from AES CCM/GCM is also stored in this header (zil_chain_t).   It is log blocks rather than log records that are encrypted.  Within a given log block there maybe multiple log records.  Some of these log records may contain pointers to blocks that were written directly (via dmu_sync), in order for us to claim the ZIL when the pool is imported these embedded block pointers need to be readable even if the encryption keys are not available (which they won't be in most cases during the claim phase).  These means that  we don't encrypt whole log blocks, the log record headers and any blkptr_t embedded in a log record is in the clear, the rest of the log block content is encrypted.

How is the passphrase turned into a wrapping key (keysource=passphrase,prompt)?

When the dataset 'keysource' property indicates that a passphrase should be used we have to derive a wrapping key from it.  The wrapping key is derived from the passphrase provided and a per dataset salt (which is stored as hidden property of the dataset) by using PKCS#5 PBKD2_HMAC_SHA1 with 1000 iterations.  The wrapping key is not stored on disk.  The salt is randomly generated when the dataset is created (with keysource=passphrase,prompt) and changed each time the 'zfs key -c' is run, even if the passphrase the user provides is the same the salt and thus the actual wrapping key will be different.

Looking at the on disk structures

Using mdb macros and zdb we can actually look at some of this.  Remember that mdb and zdb are debugging tools only, use of mdb on a live kernel without understanding what you are doing can corrupt data.  The interfaces used below are not committed interfaces and are subject to change.

Firstly using mdb on the live kernel (of an x86 machine) I've placed a breakpoint on the zio_decrypt function, lets look at the block pointer using the mdb blkptr dcmd:

[2]> <rdi::print zio_t io_bp | ::blkptr
[L0 PLAIN_FILE_CONTENTS] SHA256_MAC OFF LE contiguous unique encrypted 1-copy
size=20000L/20000P birth=10L/10P fill=1

This blkptr_t is for the contents of a file, we can see that it is encrypted and we only have one copy of it - so only one DVA entry. The checksum is SHA256_MAC so the actual MAC value is 2e24913e6b94fbd569cf3cd9.  The blkptr macro doesn't show us the IV that is stored in DVA[2], but we can see that if we print the raw structure using ::print

[2]> <rdi::print zio_t io_bp->blk_dva[2]
blk_dva[2] = {
    blk_dva[2].dva_word = [ 0x521926d500000000, 0x3b13ba46ab9f8a51 ]

Now lets use zdb, to look at some things (the output is trimmed slightly for the sake of this article)

# zdb -dd -e tank

Dataset mos [META], ID 0, cr_txg 4, 311K, 54 objects

    Object  lvl   iblk   dblk  dsize  lsize   %full  type          0    1    16K    16K  96.0K    32K   84.38  DMU dnode          1    1    16K     1K  1.50K     1K  100.00  object directory          2    1    16K    512      0    512    0.00  DSL directory          3    1    16K    512  1.50K    512  100.00  DSL props


        26    1    16K   128K  18.0K   128K  100.00  SPA history


36 1 16K 128K 0 128K 0.00 bpobj 37 1 512 512 3.00K 1K 100.00 DSL keychain 38 1 16K 4K 12.0K 4K 100.00 SPA space map ...

This pool (tank) currently has 3 datasets, one of which is encrypted.  We can see from the above zdb output that the keychains are kept in the special "mos" dataset along with some other pool wide metadata.  Now lets look at those keychains in a bit more detail by asking zdb to be more verbose (again the output is trimmed to show relevant information only):

    # zdb -dddd -e tank
    Object  lvl   iblk   dblk  dsize  lsize   %full  type
        37    1    512    512  3.00K     1K  100.00  DSL keychain
        dnode flags: USED_BYTES 
        dnode maxblkid: 1
        Fat ZAP stats:
                Pointer table:
                        32 elements
                        zt_blk: 0
                        zt_numblks: 0
                        zt_shift: 5
                        zt_blks_copied: 0
                        zt_nextblk: 0
                ZAP entries: 2
                Leaf blocks: 1
                Total blocks: 2
                zap_block_type: 0x8000000000000001
                zap_magic: 0x2f52ab2ab
                zap_salt: 0x16c6fb
                Leafs with 2\^n pointers:
                        5:      1 \*
                Blocks with n\*5 entries:
                        0:      1 \*
                Blocks n/10 full:
                        9:      1 \*
                Entries with n chunks:
                        9:      2 \*\*
                Buckets with n entries:
                        0:     14 \*\*\*\*\*\*\*\*\*\*\*\*\*\*
                        1:      2 \*\*
        Keychain entries by txg:
                txg 5 : wkeylen = 136
                txg 85 : wkeylen = 136

The above keychain  object shows it has two entries in it, the lowest numbered one (5) is from when the dataset was initially created and the second one (85) is because I had run 'zfs key -K tank/fs' on the dataset a little later.  Now lets illustrate with zdb what I discussed in the previous article about assured delete where I discussed about clones being able to have different set of entries in the keychain to their origin.

To illustrate this I ran the following:

# zfs snapshot tank/fs@1
# zfs clone -K tank/fs@1 tank/fsc1
# zfs key -K tank/fs

First lets look at the keychain object 37 which is for tank/fs, and then at the keychain object for the clone (I've trimmed the output a little more this time):

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
        37    2    512    512  7.50K     2K  100.00  DSL keychain
        Keychain entries by txg:
                txg 5 : wkeylen = 136
                txg 85 : wkeylen = 136
                txg 174 : wkeylen = 136

     Object  lvl   iblk   dblk  dsize  lsize   %full  type
        101    1    512    512  4.50K  1.50K  100.00  DSL keychain
        Keychain entries by txg:
                txg 5 : wkeylen = 136
                txg 85 : wkeylen = 136
                txg 152 : wkeylen = 136

What we see above is that the original tank/fs dataset now has an additional entry from the 'zfs key -K tank/fs' that was run.  The keychain  for the clone (object 101) also has three entries in it, it shares the same entries as tank/fs for txg 5 and txg 85 (though they maybe encrypted differently on disk depending on where the wrapping key is inherited from) and it has as a unique entry created at txg 152.  We can see similar information by looking at the 'zpool history -il' output:

2010-11-19.05:58:25 [internal encryption key create txg:85] rekey succeeded dataset = 33 [user root on borg-nas]
2010-11-19.06:05:59 [internal encryption key create txg:152] rekey succeeded dataset = 96 from dataset = 77 [user root on borg-nas]
2010-11-19.06:06:40 [internal encryption key create txg:174] rekey succeeded dataset = 33 [user root on borg-nas]

What is decrypted in memory?

As all ready mentioned the data encryption keys are stored wrapped (encrypted) on disk but they are stored in memory in the clear along with the wrapping key (we need the wrapping key to stay around for 'zfs key -K' and for 'zfs create' where the keysource property is inherited).  They are stored only in non swappable kernel memory (though remember you can swap on an encrypted ZVOL).  They are accessible to someone with all privilege that is able to use mdb on the live kernel or on a crash dump - but so is your plaintext data.  A suitable hardware keystore could be used so that key material is only ever inside its FIPS 140 boundary but that support is not yet complete (note this is not a commitment from Oracle to provide this support in any future release of ZFS or Solaris) - there would be no on disk change required to support it though.

Any data or metadata blocks that are encrypted on disk are in the in-memory cache (ARC) in the clear, this is required because the in memory ARC buffers are sometimes "loaned" using zero copy to other parts of the system - including other file systems such as NFS and CIFS.  If this is too much of a risk for you then you can force the system to always go back to disk and decrypt blocks only when needed but note you will not benefit from the caching and this will have a significant performance penalty: zfs set primarycache=metadata <dataset>.

The L2ARC is not currently available for use by encrypted datasets (note this is not a commitment from Oracle to provide this support in any future release of ZFS or Solaris) it is equivalent to having done 'zfs set secondarycache=none <dataset>'. The DDT for deduplication is not encrypted data and is pool wide metadata (stored in the MOS) so it is still able to be stored in the L2ARC.

All of the above article content could have been discovered by reading the zfs(1M) man page and using mdb, DTrace and zdb while experimenting on a live system, which is actually how I wrote the article.  There is a lot more you can examine about the on disk and in memory state of Solaris, not just ZFS by using mdb and DTrace - neither of which you can hide from, since the kernel modules have CTF data it in them full structure definitions - Note though that unless the interfaces/structures are documented in the Solaris DDI, or other official documentation from Oracle, you are looking at implementation details that are subject to change - often even in an update/patch.

Tuesday Nov 16, 2010

Choosing a value for the ZFS encryption property

The 'on' value for the ZFS encryption property maps to 'aes-128-ccm', because it is the fastest of the 6 available modes of encryption currently provided and is believed to provide sufficient security for many deployments.  Depending on the filesystem/zvol workload you may not be able to notice (or care if you do notice) the difference between the AES key lengths and modes.  However note that at this time I believe the collective wisdom in the cryptography community appears to be to recommend AES128 over AES256. [Note that this is not a statement of Oracle's endorsement or verification of that research].

Both CCM and GCM are provided so that if one turns out to have flaws, and modes of an encryption algorithm some times do have flaws independent of the base algorithm, hopefully the other will still be available for use safely.

On systems without hardware/cpu support for Galios multiplication (for example Intel Westmere  or SPARC T3) GCM will be slower because the Galios field multiplication has to happen in software without any hardware/cpu assist.  However depending on your workload you might not even notice the difference between CCM and GCM.

One reason you may want to select aes-128-gcm rather than aes-128-ccm is that GCM is one of the modes for AES in NSA Suite B but CCM is not.

ZFS encryption was designed and implemented to be extensible to new algorithm/mode combinations for data encryption and key wrapping.

Are there symmetric algorithms, for data encryption, other than AES that are of interest?

The wrapping key algorithm currently matches the data encryption key algorithm, is there interest in providing different wrapping key algorithms and configuration properties for selecting which one ? For example doing key wrapping with an RSA keypair/certificate ?  

[Note this is not a commitment from Oracle to implementing/providing any suggested additions in any release of any product but if there are others of interest we would like to know so they can be considered.]

Monday Nov 15, 2010

Having my secured cake and Cloning it too (aka Encryption + Dedup with ZFS)

The main goal of encryption is to make the (presumably sensitive) cleartext data indistinguishable from random data.  Good file system encryption usually aims to have the same plaintext encrypt to different ciphertext at least when written at a different "location" even if the same key is used.  One way to achieve that is that the initialisation vector (IV) is some how derived from where the blocks of the files are stored on disk.  In this respect the encryption support in ZFS is no different, by default we derive the IV from a combination of what dataset / object the block is for and also when (its transaction) written.  This means that the same block of plaintext data written to a different file in the same filesystem will get a different IV and thus different ciphertext.  Since ZFS is copy-on-write and we use the transaction identifier it also means that if we "overwrite" the same block of a file at a later time it still ends up having a different IV and thus will be different ciphertext.  Each encrypted dataset in ZFS has a different set of data encryption keys (see my earlier post on assured delete for more details on that), so there we change the IV and the encryption key so have a really high level of confidence of getting different ciphertext when written to different datasets.

The goal of deduplication in storage is to coalesce matching disk blocks into a smaller number of copies (ideally 1, but in ZFS that nunber depends on the value of the copies property on the dataset and the pool wide dedupditto property so it could be more than 1).  Given the above description of how we do encryption it would seem that encryption and deduplication are fundamentally at odds with each other - and usually that is true.

When we write a block to disk in ZFS it goes through the ZIO pipeline and in doing so a number of transforms are optionally applied to the data:  compress -> encryption -> checksum -> dedup -> raid.

The deduplication step uses the checksums of the blocks to find suitable matches. This means it is acting on the already compressed and encrypted data.  Also in ZFS deduplication matches are searched for in all datasets in the pool with dedup=on.

So we have very little chance of getting any deduplication hits with encrypted datasets because of how the IV is generated and the fact that each dataset has its own set of encryption keys.  In fact not getting hits with deduplication is actually a good test that we are using different keys and IVs and thus getting different ciphertext for the same plaintext.

So encryption=on + dedup=on is pointless, right ?

Not so with ZFS, I wasn't happy about giving up on deduplication for encrypted datasets, so we found a solution, it has some restrictions but I think they are reasonable and realistic ones.

Within what I'll call a "clone family", ie all datasets are clones of the same original dataset or are clones of those clones, we would be sharing data encryption keys in the default case, because they share data (again see my earlier post on assured delete for info on the data encryption keys). So I found a method of generating the IV such that within the "clone family" we will get dedup hits for the same plaintext.  For this to work you must not run 'zfs key -K' on any of the clones and you must not pass '-K' to 'zfs clone' when you create your clones.  Note that dedup does not apply to child datasets only to the snapshots/clones, and by that I mean it doesn't break you just won't get deduplication matches.

So no it isn't pointless and whats more for some configurations it will actually work really well.  A common use case for a configuration that does work well is a set of visualisation image (maybe filesystems for local Zones or ZVOLs shared over iSCSI for  OVM or similar) where they are all derived from the same original master by using zfs clones and that all get patched/updated with the pretty much the same set of patches/updaets.  This is a case where clones+dedup work well for the unencrypted case, and one which as shown above can still work well even when encryption is enabled.

The usual deployment caveats with ZFS deduplication still apply, ie it is block based and it works best when you have lots of available DRAM and/or L2ARC for caching the DDT.  ZFS Encryption doesn't add any additional requirements to this. 

So we can happily do this type of thing, and have it "work as expected":

$ zfs create -o compression=on -o encryption=on -o dedup=on tank/builds
$ zfs create tank/builds/master
$ zfs clone tank/builds/master@1tank/builds/project-one
$ zfs clone tank/builds/master@1 tank/builds/project-two 

General documentation for ZFS support of encryption is in the Oracle Solaris ZFS Administration Guide in the Encrypting ZFS File Systems section.

Assured delete with ZFS dataset encryption

Need to be assured that your data is inaccessible after a certain point in time ?

Many government agency and private sector security policies allow you to achieve that if the data is encrypted and you can show with an acceptable level of confidence that the encryption keys are no longer accessible.  The alternative is overriding all the disk blocks that contained the data, that is both time consuming, very expensive in IOPS and in a copy-on-write filesystem like ZFS actually very difficult to achieve.  So often this is only done on full disks as they come out of production use for recycling/repurposing, but this isn't ideal in a complex RAID layout.

In some situations (compliance or privacy are common reasons) it is desirable to have an assured delete of a subset of the data on a disk (or whole storage pool). Having the encryption policy / key management at that ZFS dataset (file system / ZVOL) level allows us to provide assured delete via key destruction at a much smaller granularity than full disks, it also means that unlike full disk encryption we can do this on a subset of the data while the disk drives remain live in the system.

If the subset of data matches a ZFS file system (or ZVOL) boundary we can provide this assured delete via key destruction; remember ZFS filesystems are relatively very cheap.

Lets start with a simple case of a single encrypted file system:

$ zfs create -o encryption=on -o raw,file:///media/keys/g projects/glasgow
$ zfs create -o encryption=on -o raw,file:///media/keys/e projects/edinburgh

After some time we decide we want to make projects/glasgow completely inaccessible.  The simplest way is to just destroy the wrapping key, in this case it is on /media/keys/g, and destroying the projects/glasgow dataset.  The data on disk will still be there until ZFS starts using those blocks again but since we have destroyed /media/keys/g (which I'm assuming here is on some separate file system) we have a high level of assurance that the encrypted data can't be recovered even by reading "below" ZFS by looking at the disk blocks directly.

I'd recommend a tiny additional step just to make sure that the last version of the data encryption keys (which are stored wrapped on disk in the ZFS pool) are not encrypted by anything the user/admin knows:

$ zfs key -c -o raw,file:///dev/random projects/glasgow
$ zfs key -u projects/glasgow
$ zfs destroy projects/glasgow

While the step of re-wrapping the keys with a key the user/admin doesn't know doesn't provide a huge amount of additional security/assurance it makes the administrative intent much clearer and at least allows the user to assert that they did not know the wrapping key at the point the dataset was destroyed.

If we have clones this situation is slightly more complex since clones share their data encryption key with their origin - since they share data written before the clone was branched off the clone needs to be able to read the shared and unique data as if it was its own.

We can make sure that the unique data in a clone uses a different data encryption key than the origin does from the point the clone was taken:

... time passes data is placed in projects/glasgow
$ zfs snapshot projects/glasgow@1
$ zfs clone -K projects/glasgow@1 projects/mungo

By passing '-K' to 'zfs clone' we ensure that any unique data in projects/mungo is using a different dataset encryption key from projects/glasgow, this means we can use the same operations as above to provide assured delete for the unique data in projects/mungo even though it is a clone.

Additionally we could also do 'zfs key -K projects/glasgow' and have any new data written to projects/glasgow after the projects/mungo clone was taken use a different data encryption key was well.  Note however that that is not atomic so I would recommend making projects/glasgow read-only before taking the snapshot even though normally this isn't necessary, the full sequence then becomes:

$ zfs set readonly=on projects/glasgow
$ zfs snapshot projects/glasgow@1
$ zfs clone -K projects/glasgow@1 projects/mungo
$ zfs set readonly=off projects/mungo
$ zfs key -K projects/glasgow
$ zfs set readonly=off projects/glasgow

If you don't have projects/glasgow marked as read-only then there is a risk that data could be written to projects/glasgow  after the snapshot is taken and before we get to the 'zfs key -K'.  This may be more than is necessary in some cases but it is the safest method.

General documentation for ZFS support of encryption is in the Oracle Solaris ZFS Administration Guide in the Encrypting ZFS File Systems section.


Wednesday Dec 17, 2008

OpenSolaris "disk" encryption in snv_105

lofi(7D) encryption

The encryption part of the OpenSolaris lofi compression & encryption project integrated into snv_105. I initially started this as a proof of concept several years ago but it never became high enough priority for such a long time. Casper Dik made a working version of it that was "distributed" internally for quite a few years as part of frkit. Now Dina has finished it off and got it integrated.

Finishing it off took much longer than we originally projected due to interactions with the compression code that was added to lofi and some very hard to track down bugs where lofi is used by xVM (the Xen based hypervisor) - particularly the interations with dom0 and domU lofi use.

So what can you do with it ? It is similar to what has been available for many many years on Linux using the cryptoloop system. It isn't perfect but it is better than the nothing we had before.

Creating an encrypted UFS filesystem with lofi

   # mkfile 128m /export/lofi-backing-file
   # lofiadm -a /export/lofi-backing-file -c aes-256-cbc
   Enter passphrase: 
   Re-enter passphrase:   
   # newfs /dev/rlofi/1
   newfs: construct a new file system /dev/rlofi/1: (y/n)? y
   /dev/rlofi/1:   262036 sectors in 436 cylinders of 1 tracks, 601 sectors
        127.9MB in 28 cyl groups (16 c/g, 4.70MB/g, 2240 i/g)
   super-block backups (for fsck -F ufs -o b=#) at:
   32, 9648, 19264, 28880, 38496, 48112, 57728, 67344, 76960, 86576,
   173120, 182736, 192352, 201968, 211584, 221200, 230816, 240432, 250048, 259664
   # mount /dev/lofi/1 /mnt

Nice and simple. We can also store the key in a file, key generation can be done with pktool(1). Or we can store it in any PKCS#11 accessible keystore:

   # pktool genkey keystore=pkcs11 keytype=aes keylen=256 label=mylofikey
   Enter PIN for Sun Software PKCS#11 softtoken :
   # lofiadm -a /export/lofi-backing-file -c aes-256-cbc -T :::mylofikey 
   Enter PIN for Sun Software PKCS#11 softtoken : 

Issues with the lofi encryption

  • For lofi compression and encryption are mutually exclusive, compression is readonly lofi anyway. If you need both wait for the integration of encryption support in ZFS.
  • No integrity check. Currently the lofi encryption use CBC mode because we needed a non expanding cipher. Once the OpenSolaris crypto framework has support for XTS (or similar) mode we will likely update the lofi crypto to use that instead.
  • Lofi performance isn't great - this isn't a crypto issue, lofi performance in general just isn't great and adding crypto into the mix doesn't help much.
  • No way to detect the wrong key. We have a reserved area where we could add meta-data to determine if the correct key and algorithm params have been supplied but this hasn't been implemented yet.

I still think this is better than nothing even if we are delivering it much later than we had hoped. Ultimately ZFS encryption is the solution for OpenSolaris encrypted filesystems and volumes.




« April 2015