Tuesday May 10, 2011

Encrypting /var/tmp & swap in Solaris 11 Express

As some readers might remember from previous posts it isn't possible in Solaris 11 Express to boot from an encrypted ZFS dataset.  However it is possible to have encrypted swap space and thus (by default) an encrypted /tmp.  That still leaves /var/tmp unencrypted

First lets look at swap space encryption.  That is as simple as putting the word "encrypted" into the mount options field of /etc/vfstab for the swap ZVOL.  If swap is a ZVOL then ZFS encryption will be used, if swap is a raw disk slice or file then lofi will be interposed between the device/file using a randomly generated key.  That is a fully supported solution in Solaris 11 Express implemented by the swapadd command.

For encrypting /var/tmp we need to beyond the provided services and the following (unsupported) method takes its lead from what I did for swapadd and applies it to /var/tmp.  Note however that this assumes that nothing in /var/tmp should be preserved on boot and won't even be readable from another boot environment, so if you use this don't put stuff into /var/tmp you want to get access to after a reboot.

This takes advantage of the fact that in SMF we can place dependencies onto other services without modifying them.  So while the following makes some basic assumptions about the Solaris ZFS datasets layout it doesn't require modifying any existing binaries or configuration files.

We create a new service svc:/site/system/filesystem/tmp:default this service will create an encrypted dataset for /var/tmp using the manifest and method script that follows:


<?xml version='1.0'?>
<!DOCTYPE service_bundle SYSTEM '/usr/share/lib/xml/dtd/service_bundle.dtd.1'>
<!--

 CDDL HEADER START

 The contents of this file are subject to the terms of the
 Common Development and Distribution License (the "License").
 You may not use this file except in compliance with the License.

 You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
 or http://www.opensolaris.org/os/licensing.
 See the License for the specific language governing permissions
 and limitations under the License.

 When distributing Covered Code, include this CDDL HEADER in each
 file and include the License file at usr/src/OPENSOLARIS.LICENSE.
 If applicable, add the following below this CDDL HEADER, with the
 fields enclosed by brackets "[]" replaced with your own identifying
 information: Portions Copyright [yyyy] [name of copyright owner]

 CDDL HEADER END

 Copyright (c) 2011, Oracle and/or its affiliates. All rights revserved.

-->


<service_bundle type='manifest' name='darrenm:etmp'>

<service
        name='site/system/filesystem/tmp'
        type='service'
        version='1'>

        <create_default_instance enabled='true' />

        <single_instance />

        <dependency
                name='cryptosvc'
                grouping='require_all'
                type='service'
                restart_on='none'>
                <service_fmri value='svc:/system/cryptosvc' />
        </dependency>

        <dependent
                name='var-tmp'
                grouping='optional_all'
                restart_on='none'>
                <service_fmri value='svc:/system/filesystem/minimal' />
        </dependent>


        <exec_method
                type='method'
                name='start'
                exec='/lib/svc/method/site-etmp'
                timeout_seconds='30' />

        <exec_method
                type='method'
                name='stop'
                exec=':true'
                timeout_seconds='1' />

        <property_group name='startd' type='framework'>
                <propval name='duration' type='astring' value='transient' />
        </property_group>

</service>

</service_bundle>


#!/usr/sbin/sh
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or http://www.opensolaris.org/os/licensing.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
# Copyright (c) 2011, Oracle and/or its affiliates. All rights reserved.
#
# /lib/svc/method/site-etmp

zfs destroy rpool/tmp > /dev/null 2>&1
zfs create -o mountpoint=/var/tmp -o encryption=on -o keysource=raw,file:///dev/random rpool/tmp
chmod 1777 /var/tmp

exit 0

On reboot what will happen is that a new dataset for /var/tmp will be created.   It would be possible to have a more sophisticated method script that doesn't use a hardcoded dataset name of root/tmp but this seems sufficient for now.  It will look something like this:

darrenm-pc:pts/1$ cd /var/tmp
darrenm-pc:pts/1$ df -hl .
Filesystem             Size   Used  Available Capacity  Mounted on
rpool/tmp              113G    79K        77G     1%    /var/tmp
darrenm-pc:pts/1$ ls -ld .
drwxrwxrwt   5 root     root           5 May 10 14:39 ./
darrenm-pc:pts/1$ zfs get encryption,keysource,keystatus rpool/tmp
NAME       PROPERTY    VALUE                   SOURCE
rpool/tmp  encryption  on                      local
rpool/tmp  keysource   raw,file:///dev/random  local
rpool/tmp  keystatus   available               -

 
  

        
    

Saturday Nov 01, 2008

ZFS Crypto update

It is been a little while since I gave an update on the status of the ZFS Crypto project. A lot has happened recently and I've been really "heads down" writing code.

We had believed we were development complete and had even started code review ready for integration. All our new encryption tests passed and I only had one or two small regression test issues to reconfirm/resolve. However a set of changes (and very good and important ones for performance I might add) were integrated into the onnv_100 build that caused the ZFS Crypto project some serious remerge work. It took me just over 2 weeks to get almost back to where were. In doing that I discovered that the code I had written for the ZFS Intent Log (the ZIL) encryption/decryption wasn't going to work.

The ZIL encryption was "safe" from a crypto view point because all the data written to the ZIL was encrypted. However because of how the ZIL is claimed and replayed after a system failure (the only time the ZIL is actually read since it is mostly write only) mean't that the claim happened before the pool had any encryption keys available to it. The resulted in the ZFS code just thinking that all the datasets had no ZIL needing claimed, so ZFS stayed consistent on disk but anything that was in the ZIL that hadn't been committed in a transaction was lost - so it was equivalent to running without a ZIL. OOPS!

So I had to redesign how the ZIL encryption/decryption works. The ZIL is dealt with in two main stages, first there is the zil_claim which happens during pool import (before the pool starts doing any new transactions and long before userland is up and running). The second stage happens much much later when the filesystems (datasets really because the ZIL applies to ZVOLs too) are mounted - this is good because we have crypto keys available by then.

This mean't I need to really learn how the ZIL gets written to disk. The ZIL has 0 or more log blocks with each log block having 1 or more log records in it. There is a different type of log record (each of different sizes) for the different types of sync operations that can come through the ZIL. Each log record has a common part at the start of it that says how big it is - this is good from a crypto view. So I left the common part in the clear and I encrypt the rest of the log record. This is needed so that the claim can "walk" all the log records in all the log blocks and sanity check the ZIL and do the claim IO. So far this is similar to what was done for the DNODES. Unfortunately it wasn't quite that simple (never is really!).

All the log records other than the TX_WRITE type fell into the nice simple case described above. The TX_WRITE records are "funky" and from a crypto view point a really pain in how they are structured. Like all the other log records there is a common section at the start that says what type it is and how big the log record is. What is different about the TX_WRITE log records though is that they have a blkptr_t embedded in them. That blkptr_t needs to be in the clear because it may point of to where the data really is (especially if it is a "big" write). The blkptr_t is at the end of the log record. So no problem, just leave the common part at the start and the blkptr_t at the end in the clear, right ? Well that only deals with some of the cases. There is another TX_WRITE case, where the log record has the actually write data tagged on after the blkptr inside the log record and this is of variable size. Even in this case there is a blkptr_t embedded inside the log record. Problem is that blkptr_t is now "inside" the area we want to encrypt. So I ended up having to had a clear text log common area, encrypted TX_WRITE content, clear text blkptr_t and maybe encrypted data. Good job the OpenSolaris Cryptographic Framework supports passing in a uio_t for scatter gather!

So with all that done, it worked, right ? Sadly no there was still more to do. Turns out I still had two other special ZIL cases I had to resolve. Not with the log records though. Remember I said there are multiple log records in a log block ? Well at the end of the log block there is a special trailer record that says how big the sum of all the log records is (a good sanity checker!) but this also has an embedded checksum for the log block in it. This is quite unlike how checksums are normally done in ZFS, normally the checksum is in the blkptr_t not with the data. The reason it is done this way for the ZIL is for write performance and the risk is acceptable because the ZIL is very very rarely ever read and only then in a recovery situation. For ZFS Crypto we not only encrypt the data but we have a cryptographically strong keyed authentication as well (the AES_CCM MAC) that is stored in 16 bytes of the blkptr_t checksum field. Problem with the ZIL log blocks is that we don't have a blkptr_t for them we can put that 16 byte MAC into because the checksum for the log block is inside the log block in the trailer record, the blkptr checksum for the ZIL is used for record sequencing. So was there any reserved or padding fields ? Luckily yes there was but there was only 8 bytes available not 16. That is enough though since I could just change the params for CCM mode to ouput an 8 byte MAC instead of a 16 byte one. A little less security but still plenty sufficient - and especially so since we expect to never need to read the ZIL back - it is still cryptographically sound and strong enough for the size of the data blocks we are encrypting (less that 4k at a time). So with that resolved (this took a lot of time in Dtrace and MDB to work out!) I could decrypt the ZIL records after a (simulated) failure.

Still one last problem to go though, the records weren't decrypting properly. I verified, using dtrace and kmdb, that the data was correct and the CCM MAC was correct. So what was wrong why wouldn't they decrypt ? That only left the key and the nonce. Verifying the key was correct was easy, and it was. So what was wrong with the nonce ?

We don't actually store the nonce on disk for ZFS Crypto but instead we calculate it based on other stuff that is stored on disk. The nonce for a normal block is made up from: the block birth transaction (txg: a monatonically increasing unsigned 64 bit integer) , the object, the level, and the blkid. Those are manipulated (via a truncated SHA256 hash) into 12 bytes and used as the nonce. For a ZIL write the txg is always 0 because it isn't being written in a txg (that is the whole point of it!) but the blkid is actually the ZIL record sequence number which has the same properties as the txg. The problem is that when we replay the ZIL (not when we claim it though) we do have a txg number. This mean't we had a different nonce on the decrypt to the encrypt. The solution ? Remove the txg from the nonce for ZIL records - no big loss there since on a ZIL write it is 0 anyway and the blkid (the zil sequence number) has the security properties we want to keep AES CCM safe.

With all that done I managed to get the ZIL test suite to pass. I have a couple of minor follow-on issues to resolve so that zdb doesn't get its knickers in a twist (SEGV really) when it tries to display encrypted log blocks (which it can't decrypt since it is running in userland and without the keys available).


That turned out to be more that I expected to write up. So where are we with schedule ? I expect us to start codereview again shortly. I've just completed the resync to build 102.

Friday Sep 05, 2008

ZFS Crypto Codereview starts today

Prelim codereview for the OpenSolaris ZFS Crypto project starts today (Friday 5th September 2008 at 1200 US/Pacific) and is scheduled to end on Friday 3rd October 2008 at 2359 US/Pacific. Comments recieved after this time will still be considered but unless there are serious in nature (data corruption, security issue, regression from existing ZFS) they may have to wait until post onnv-gate integration to be addressed; however every comment will be looked at and assessed on its own merit.

For the rest of the pointers to the review materials and how to send comments see the project codereview page.

About

DarrenMoffat

Search

Categories
Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today