Wednesday Oct 25, 2006

53394 snapshots

After almost 2 months of running ZFS at home things are stabilizing with the configuration. I'm still surprised by the number of file systems and even more surprised by the number of snapshots:

: pearson TS 1 $; zfs list -t filesystem | wc -l
: pearson TS 2 $; zfs list -t snapshot | wc -l
: pearson TS 3 $; df -h /tank
Filesystem             size   used  avail capacity  Mounted on
tank                   272G    33K   194G     1%    /tank
: pearson TS 4 $;

Approximately 1334 snapshots per file system. I've used the snapshots 3 times to recover various things I have cocked up (I'm refraining from using the F word after the storm in a tea cup Tim's posting caused, even if no one would notice). However I sleep better knowing that my families data is safe from their user error. Only I can mess them up!


Saturday Oct 21, 2006

Shared samba directories

The samba set up on the new server for users has been flawless, but the shared directories slightly less so. I had a problem where if one of the family created a directory then the rest of the family could not add to that directory. Looking on the Solaris side it was clear the problem, the directory was created mode 755. Typing this I realize just how bad that is. 755 could not possibly mean anything to anyone who was not up to their armpits into UNIX computing and the explication would fill pages and indeed it does.

The permissions I want to force for directories are "read, write and execute for group" as well as the owner. Ie mode 775. It would also be nice if I could stop one user deleting the other users work so setting the sticky bit would also be good giving mode 1755.

Trundling through the smb.conf manual page tells me that there is an option, "force directory mode" that does exactly what it implies and what I want. I'm sure I could achieve the same with an ACL and will do that later so that SMB and NFS give the same results. However for now smb.conf serves this purpose.

So the new entry in the smb.conf for the shared area where we keep pictures looks like this:

   comment = Pictures
   path = /tank/shared/pics
   public = yes
   writable = yes
   printable = no
   write list = @staff
   force directory mode = 1775
   force create mode = 0444
   root preexec = ksh -c '/usr/sbin/zfs snapshot tank/shared/pics@smb$(/tank/local/smbdate)'

Now everyone can add to the file system but can't delete others photos, plus I get a snapshot every time someone starts to access the file system.


Friday Oct 13, 2006

Build 50@home & NAT

Build 50 of nevada hit my home server today with little fuss thanks to live upgrade. So far no unpleasant surprises although I had to loose the zone for the web server as live upgrade in nevada unlike live upgrade in 10 can't handle zones yet. I will however still used zones as testing grounds.

The system has been live now for a few weeks, doing NAT, firewall, email (imaps and SMTP) via exim with spamassasin and clamd for antivirus, Samba providing widows server support, ntp, DNS and DHCP I have fallen someway behind in the documentation of it though.

Getting NAT (Network Address Translation) for any non geeks still here was a breeze. I simply followed the instruction on Ford's blog, substituting my network device (rtls0) in the right places and stopping before any of the zones stuff due to not needing it.

My /etc/ipf/ipnat.conf has ended up looking like this:

: pearson TS 15 $; cat /etc/ipf/ipnat.conf
map rtls0 -> 0/32 proxy port ftp ftp/tcp
map rtls0 -> 0/32 portmap tcp/udp auto
map rtls0 -> 0/32
: pearson TS 16 $;

and smf starts it without fault.


Monday Sep 25, 2006

Has ZFS just saved my data?

My new home server has had it's first ZFS checksum error. The problem here is that zfs has not told me what that error was so it is impossible for me to say how bad it is, or heaven forbid, that it could be a false positive.

It leaves lots of questions in my mind about what ZFS does, if anything, to verify the kind of problem to attempt to narrow down where the fault is. Need to do some reading of the zfs source.

# zpool status
  pool: tank
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
 scrub: scrub in progress, 0.01% done, 20h7m to go

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c1d0s7  ONLINE       0     0     1
            c5d0s7  ONLINE       0     0     0

errors: No known data errors

One thing I did straight away was to scrub the pool. However the scrub never completed, just exercised the disks all weekend. Checking the OpenSolaris ZFS discussion forum I was hitting this bug:

6343667 need itinerary so interrupted scrub/resilver doesn't have to start over

Where the scrub gets restarted when ever a snapshot is taken. Not so good if you snaphost every 10 minutes.


Wednesday Sep 13, 2006

exim and pam authetication meets privileges

For reasons that I will go into later the new home server is using exim for it's mail transport rather than the standard sendmail. I wanted to be able to authenticate users sending email using their login and password from the local password and shadow files. This is a snip with exim with the following in the exim.conf file:

driver = plaintext
public_name = PLAIN
server_condition = "${if pam{$2:$3}{1}{0}}"
server_set_id = $2

driver = plaintext
public_name = LOGIN
server_prompts = "Username:: : Password::"
server_condition = "${if pam{$1:$2}{1}{0}}"
server_set_id = $1

or so I thought. Since exim is security conscious it runs as it's own user and not as root so it is unable to read the /etc/shadow file so no matter what you enter as you login you can't. My quick solution to this was to give the exim daemon permission to read all files using privileges. So the start script now does:

ppriv -s PI+file_dac_read -e $DAEMON $EXIM_PARAMS

Which allows it to read any file on the system which is a risk but not as great a risk as having it run as root. I look forward to someone telling me a better way.


Thursday Sep 07, 2006

ddclient meets SMF

Nice quick bit of progress on the home server today and one more service of the Qube.

Downloaded ddclient and copied over the configuration from the Qube. Then just had to write a smf manifest:

<?xml version="1.0"?>

<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">

<service_bundle type='manifest' name='ddclient'>


        <create_default_instance enabled='true' />

                <service_fmri value='svc:/network/initial:default' />

                timeout_seconds='0' />

                exec=':kill -15'
                timeout_seconds='3' />

        <stability value='Unstable'/>

                        <loctext xml:lang='C'>Dynamic DNS client</loctext>

Then the usual 'svccfg import ddclient.xml' and then we are going:

# svcs -x ddclient
svc:/network/ddclient:default (Dynamic DNS client)
 State: online since Thu Sep 07 22:36:47 2006
   See: /var/svc/log/network-ddclient:default.log
Impact: None.


How many snapshots?

Having a non laptop system that is on all the time running zfs with automatic snapshots you start to build up the snapshots at an alarming rate.

# for i in $(zfs list -Ho name -t snapshot )
zfs get -pH used $i
done | nawk '{ count++;
        if ($3 > 0 ) {
        print count, nonzero, total/(1024\^2)
7071 188 83.939

So after one week I have 7071 snapshots of which only 188 currently contain data taking just 85 megabytes with the total pool taking 42.8G.

No downsides have been seen so far so while the numbers appear alarming I see no reason not to continue.


Tuesday Sep 05, 2006

Cleaning up zfs snapshots

Thank you to the anonymous comments about samba and ZFS and the clean up script.

A days worth of samba snapshots look like this:

tank/users/cjg@smb1157437900  37.5K      -  21.1G  -
tank/users/cjg@smb1157441840      0      -  21.1G  -
tank/users/cjg@smb1157441861      0      -  21.1G  -
tank/users/cjg@smb1157000000  40.5K      -  21.1G  -
tank/users/cjg@smb1157445557  40.5K      -  21.1G  -
tank/users/cjg@smb2006-09-05-12:03  40.5K      -  21.1G  -
tank/users/cjg@smb2006-09-05-18:27      0      -  21.1G  -
tank/users/keg@smb2006-09-05-18:29      0      -   465M  -
tank/users/rlg@smb1157441373      0      -   673M  -
tank/users/rlg@smb1157446766      0      -   675M  -
tank/users/rlg@smb1157449795    21K      -   675M  -
tank/users/rlg@smb2006-09-05-17:14      0      -   675M  -
tank/users/rlg@smb2006-09-05-17:54      0      -   675M  -
tank/users/rlg@smb2006-09-05-18:07      0      -   675M  -
tank/users/stc@smb1157437923      0      -   294M  -
tank/users/stc@smb1157446971      0      -   294M  -
tank/users/stc@smb2006-09-05-15:34      0      -   294M  -
tank/users/stc@smb2006-09-05-17:47      0      -   294M  -
tank/users/stc@smb2006-09-05-20:27      0      -   294M  -

from which you can see I experimented with naming them with the seconds from the epoch to make the clean up script simpler. However after a few minutes I realized there was a better way.

I now have a clean up script that uses the zfs file system creation time to do all the sorting. Getting this to work quickly requires a script to convert the time stamp into seconds from the epoch:

puts [clock scan $argv ]

Call the script “convert2secs” and then the rest of script is simple;

#!/bin/ksh -p
#       Quick scipt to clean up the snapshots created by each samma login.
#       See:
#       It is clear that this could be much more generic. Espeically if you
#       could add a property to the snapshot to say when it should be deleted.
ALL_TYPES="smb minute hour day month boot"





NUMBER_OF_SNAPSHOTS_hour=$((7 \* 24 \* 2))
DAYS_TO_KEEP_hour=$((7 \* 24))


today=$(convert2secs $(date))

function do_fs
        typeset fd
        typeset -i count=0
        typeset -i seconds2keep
        typeset -i time2go
        typeset -i number_of_snapshots
        typeset type=$2
        # days2keep and number_of_snaphots should come from
        # file system properties. Until then the are fed from the
        # global entities.
        days2keep=$(eval echo \\${DAYS_TO_KEEP_${type}})
        number_of_snapshots=$(eval echo \\${NUMBER_OF_SNAPSHOTS_${type}})

        seconds2keep=$(( days2keep \* 24 \* 60 \* 60 ))
        time2go=$((today - seconds2keep))

        for fs in $(zfs list -r -t snapshot -o name $1 | grep $type | sort -r -t @ -k 1)
                snap_time=$(convert2secs $(/usr/sbin/zfs list -H -o creation ${fs}))

                if (( count > number_of_snapshots )) && \\
                        (( snap_time < time2go ))
                        zfs destroy $fs
                        : echo $fs is kept for $((snap_time - time2go)) seconds
                let count=count+1

for type in ${ALL_TYPES}
        for i in $(zfs list -H -t snapshot -r $@ | sort | nawk -F '@' '/'$type'/ { print $1 }' | uniq)
                do_fs $i $type

When zfs has user defined options all the configuration can be kept in the file system but until then the configuration variables will do.

The script allows me to have different classes of snapshot: smb, minute, hour, day, month and boot. This allows the same script to clean up both the snapshots taken by samba and the ones taken via cron and boot.

The script errs on the side of not destroying snapshots so for each class I'm keeping all snapshots less than a certain number of days old and also keeping a minimun number of snapshots.


Minimum number of snapshots

Number of days to keep snapshots











28 \* 2



7 \* 24 \* 2

7 \* 24


60 \* 24


The advantage is that I can now both keep the snapshots longer and also give them more user friendly names. The new snapshot cron job script is here. I'm sure the number of snapshots generated is overkill but while I have the disk space why not?

Now if I can stop smb mangling the names all would be perfect.


Friday Sep 01, 2006

Home server progress

Progress on the new home server:

All the user accounts have been created each with their own ZFS file system for their home directory.

I've installed my ZFS snapshot script and have crontab entries like this:

10,20,30,40,50 \* \* \* \* /tank/local/snapshot minute
0 \* \* \* \* /tank/local/snapshot hour
1 1 \* \* \* /tank/local/snapshot day
2 1 1 \* \* /tank/local/snapshot month

I think there is some scope for improvement here which would mean keeping the snapshots for longer. When the proposal to have user defined properties becomes real I will define the snapshot rules in the file systems and have the script use those rather than a one size fits all.

I have samba running from SMF thanks to Trevor's manifest and script. I did the changes to the script suggested in the comments. This all worked perfectly and now the PC flies and that is before I install gigabit Ethernet in it. Already you can see the snapshot directories under .zfs in each of the samba file systems on the PC which is just about as cool as it can get.

Finally I have solaris serving dhcp and have turned off the server on the Qube. Most uncharacteristically I used the GUI tool to configure dhcp and apart from having to create another ZFS file system to put the data in the GUI did it all. Very slick. Plus by some magic it managed to hand out the same IP addresses as the Qube used to to each computer. I suspect I should have done DNS before DHCP since the DHCP server can update the DNS records so this may have to be done again.


Thursday Aug 31, 2006

New home server arrived

The new server arrived and like a good geek I stayed up late last night putting it together and loading the Solaris Operating System on it. So far I've not got that far. A base install with mirrored root file system, plus a second boot environment for live upgrade and the rest of the disk(s) are there for real data on ZFS.

Laying out the disks was harder than it should have been due to me wanting to put all the non ZFS bits at the end of the disk not the beginning so that when we have a complete ZFS on root solution I can delete the two root mirrors and allow ZFS to grow over the whole drive. So on the disks the vtoc looks like this:

Part      Tag    Flag     Cylinders         Size            Blocks
  0 unassigned    wm   37368 - 38642        9.77GB    (1275/0/0)   20482875
  1 unassigned    wu       0                0         (0/0/0)             0
  2     backup    wm       0 - 38909      298.07GB    (38910/0/0) 625089150
  3 unassigned    wu       0                0         (0/0/0)             0
  4       root    wu   36093 - 37367        9.77GB    (1275/0/0)   20482875
  5 unassigned    wu   38643 - 38645       23.53MB    (3/0/0)         48195
  6 unassigned    wu   38646 - 38909        2.02GB    (264/0/0)     4241160
  7       home    wm       3 - 36092      276.46GB    (36090/0/0) 579785850
  8       boot    wu       0 -     0        7.84MB    (1/0/0)         16065
  9 alternates    wu       1 -     2       15.69MB    (2/0/0)         32130

Which looks even worse than it really is. On the disk starting at the lowest LBA I have:

  1. The boot blocks on slice 8

  2. Then alternates on slice 9 (format just gives you these for “free”)

  3. The Zpool on slice 7

  4. The second boot environment on slice 4

  5. The first boot environment on slice 0

  6. The metadbs on slice 5

  7. A dump device on slice 6

All the partitions, except the dump device are mirrored onto the other disk so both drives have the same vtoc. As you can see I can grow the zpool over both boot blocks and the metadbs when ZFS on root is completely here.

The next thing to do will be SAMBA.


Saturday Aug 26, 2006

New home server ordered

Finally a replacement for my Qube has been ordered. Based around this bare bones system with 2G of RAM and a pair of 300G disks.

Since one of the Qubes successfully lost all of it's data due to ext2 file system mayhem this week, thankfully I have backups, I can't wait to get to be using better filesystems and mirroring software. That Qube is now not even booting so rather than mess with the recovery CD time to move into the 21st century.

The goals for the system are to:

  1. Provide Email Service

  2. Act as a Windows File server

  3. DHCP server

  4. DNS server

  5. Small Web service

  6. Single user Sun Ray server

Moving email and data over will be quick as I am naturally nervous about the data on the Qube. The other services will be staged more slowly.


Thursday May 25, 2006

Living with ZFS

A few things are coming together as I live with ZFS on my laptops and the external USB drive. The first and most surprising is that I don't like interacting with the disks on the Qube 3 anymore. No snapshot to roll back to if I make a mistake? I honestly was not expecting that I would get so used to the safety of snapping everything all the time.

The external drive, after much investigation of some issues, is working flawlessly. It claims that it's write cache is off and behaves as such. As I have mentioned before it is partitioned with an EFI label into 3 partitions. Partitions 0 and 1 act as mirrors for my two laptops I online them when I can and let it resilver. The third partition zpool containing file systems that contain backups of everything else, including the data on the Qube 3.

Things I have learnt about using ZFS:

  1. Don't put your file systems in the “root” of a pool. Administratively it makes life easier if you group them. So all my users home directories are in “home/users” and not directly in “home”. It allows you to set properties that are inherited on “home/users” and have them effect all the users home directories. This is not dissimilar to the old advice of never sharing the root of a file system via NFS. If you did it severely limits your options to change things in the future. The NFS advice is of course no longer valid with ZFS since there is no longer the one to one connection between file systems and volumes.

  2. If you are taking lots of snapshots make sure you either have lots of disk space or have a way to clear the old snapshots. My snapping every minute does this where as the snapping on boot did not.

  3. If you even suspect that your pool may be imported on another system use a unique name. I'm not even sure what happens if you have two pools with the same name but just in case it is bad make your pools unique.

Things I would like:

  1. Ability to delegate creation of file systems and snapshots to users. Rbac lets me create do this but leaves some nasties that require scripts. One is that the user can not create any files in the file system with out runing chmod or chown on it. Essentially the same as this request on the OpenSolaris ZFS forum.

  2. Boot of ZFS and live upgrade to be fully integrated. I know it is coming but once you have your root file system on ZFS is seems a great shame to have to keep two UFS boot environments around for upgrade purposes. Especially as they are not even compressed (another thing I have just got used to).



This is the old blog of Chris Gerhard. It has mostly moved to


« July 2016