Friday Feb 13, 2009

I Want My ZFS

If you want to try out ZFS, you'll be glad to learn that when you install OpenSolaris 2008.11, the user you create during installation will have its home directory on a distinct ZFS filesystem. Running zfs list will show you that /export, /export/home, and /export/home/<username> are each ZFS filesystems.

haik@opensolaris:~$ zfs list
NAME                     USED  AVAIL  REFER  MOUNTPOINT
rpool                   2.41G  1.50G    72K  /rpool
rpool/ROOT              2.38G  1.50G    18K  legacy
rpool/ROOT/opensolaris  2.38G  1.50G  2.26G  /
rpool/export            21.2M  1.50G    19K  /export
rpool/export/home       21.2M  1.50G    19K  /export/home
rpool/export/home/haik  21.1M  1.50G  19.2M  /export/home/haik

If you enable the Time Slider service, a cron job will run periodically and create snapshots of these filesystems including your home directory. To see these snapshots you would run zfs list -t snapshot. I've limited the output to the name column here:

haik@opensolaris:~$ zfs list -t snapshot -o name
NAME
rpool/ROOT/opensolaris@install
rpool/export/home/haik@zfs-auto-snap:daily-2009-01-14-00:00
rpool/export/home/haik@zfs-auto-snap:hourly-2009-01-14-00:00
rpool/export/home/haik@zfs-auto-snap:hourly-2009-01-14-01:00
rpool/export/home/haik@zfs-auto-snap:hourly-2009-01-14-02:00
rpool/export/home/haik@zfs-auto-snap:hourly-2009-01-14-03:00
rpool/export/home/haik@zfs-auto-snap:hourly-2009-01-14-04:00
rpool/export/home/haik@zfs-auto-snap:hourly-2009-01-14-05:00
rpool/export/home/haik@zfs-auto-snap:hourly-2009-01-14-06:00
rpool/export/home/haik@zfs-auto-snap:hourly-2009-01-14-07:00
rpool/export/home/haik@zfs-auto-snap:hourly-2009-01-14-08:00
rpool/export/home/haik@zfs-auto-snap:hourly-2009-01-14-09:00
rpool/export/home/haik@zfs-auto-snap:hourly-2009-01-14-10:00
rpool/export/home/haik@zfs-auto-snap:hourly-2009-01-14-11:00
rpool/export/home/haik@zfs-auto-snap:hourly-2009-01-14-12:00
rpool/export/home/haik@zfs-auto-snap:hourly-2009-01-14-13:00
rpool/export/home/haik@zfs-auto-snap:hourly-2009-01-14-14:00
rpool/export/home/haik@zfs-auto-snap:hourly-2009-01-14-15:00
rpool/export/home/haik@zfs-auto-snap:hourly-2009-01-14-16:00
rpool/export/home/haik@zfs-auto-snap:hourly-2009-01-14-17:00
rpool/export/home/haik@zfs-auto-snap:frequent-2009-01-14-17:00
rpool/export/home/haik@zfs-auto-snap:frequent-2009-01-14-17:15
rpool/export/home/haik@zfs-auto-snap:frequent-2009-01-14-17:30
rpool/export/home/haik@zfs-auto-snap:frequent-2009-01-14-17:45

You can browse these snapshots in Nautilus (the Gnome graphical file browser) by navigating to your home directory and clicking the Time Slider button.


But what is it good for?

I've been using ZFS to backup my laptop home directory. I've created a ZFS fileystem "backup-powerbook" in my home directory on my OpenSolaris workstation. I use rsync over ssh to copy my laptop home directory over from my laptop to my OpenSolaris machine and then I take a snapshot when the rsync completes. For example, on the laptop:

haik@powerbook $ sudo /bin/bash
Password:
root@powerbook # pwd
/Users/haik
root@powerbook # cd ..
root@powerbook # rsync -avz --delete -e ssh \\
    haik haik@opensolaris.local:~/backup-powerbook/
and then on the workstation:
haik@opensolaris:~$ pfexec zfs snapshot \\
    rpool/export/home/haik/backup-powerbook@2009-02-12

You need administrator privileges? That ain't cool.

Note how I prefixed that zfs snapshot command with pfexec. Back in the old days of ZFS, all ZFS operations required root privileges. Since then, ZFS Delegated Administration was introduced. This feature allows the administrator to delegate ZFS privileges to certain users for certain filesystems. As I mentioned, by default OpenSolaris 2008.11 puts the initial user's home directory on a ZFS fileystem, but in order to take snapshots or create filesystems, the user will need to either a) su to the root role, b) use the administrator profile by executing pfexec or c) have been given the necessary ZFS permissions. On a standard 2008.11 fresh install, only the initial user can su to the root role or use the administrator profile.

For example,

haik@opensolaris:~$ pfexec zfs create rpool/export/home/haik/backup-powerbook
haik@opensolaris:~$ 

Without pfexec, the create fails:

haik@opensolaris:~$ zfs create rpool/export/home/haik/backup-powerbook
cannot create 'rpool/export/home/haik/backup-powerbook': permission denied

I like the idea of letting users manage their own filesystems within their home directory so I've used ZFS Delegated Administration to allow that. This is done with the zfs allow command. You can grant ZFS permissions to individual users with one command or you can create permission sets which allow you to easily grant the same permissions to different users on different filesystems. In my case, I create two permission sets. One is for home directories and one is for file systems that are descendents of home directories. I use two different sets because I do not want users to be allowed to rename their home directory, only descendents of their home directory. Here's a script I use after adding a new user. Checkout zfs(1M) for more information.

This script simply looks at the contents of the /export/home directory. For each file or directory within /export/home, it assumes that a user exists with the same username and that a filesystem exists of the form rpool/export/home/<username>. It then gives <username> various permissions on filesystem rpool/export/home/<username>. There are better, more reliable, ways to do this. You would need to execute the script with pfexec.

#!/usr/bin/bash

HOME_FS=rpool/export/home
HOME_PATH=/export/home

HOME_PERMS=create,snapshot,clone,mount,share,send,compression,promote,destroy
HOME_DESCENDENT_PERMS=$HOME_PERMS,rename

HOME_SET_NAME='@home_set'
HOME_DESCENDENT_SET_NAME='@home_descendent_set'

# Create the permission sets
zfs allow -s $HOME_SET_NAME $HOME_PERMS $HOME_FS
zfs allow -s $HOME_DESCENDENT_SET_NAME $HOME_DESCENDENT_PERMS $HOME_FS

# Assign the permission sets
for user in `ls $HOME_PATH`
do
	zfs allow -l $user $HOME_SET_NAME $HOME_FS/$user
	zfs allow -d $user $HOME_DESCENDENT_SET_NAME $HOME_FS/$user
done

You can view the results with the zfs allow command:

haik@opensolaris:~$ zfs allow rpool/export/home/haik
-------------------------------------------------------------
Local permissions on (rpool/export/home/haik)
	user haik @home_set
Descendent permissions on (rpool/export/home/haik)
	user haik @home_descendent_set
-------------------------------------------------------------
Permission sets on (rpool/export/home)
	@home_descendent_set allow,clone,create,destroy,
                mount,promote,rename,send,share,snapshot
	@home_set allow,clone,create,destroy,mount,promote,
                send,share,snapshot
-------------------------------------------------------------

Without the script and without using permission sets, I could accomplish the same thing with the following commands for each user.

haik@opensolaris:~$ pfexec zfs allow -l haik \\
    create,snapshot,clone,mount,share,send,compression,promote,destroy \\
    rpool/export/home/haik
haik@opensolaris:~$ pfexec zfs allow -d haik \\
    create,snapshot,clone,mount,share,send,compression,promote,destroy,rename \\
    rpool/export/home/haik

Now that my user has these permissions, I can create my own filesystems:

haik@opensolaris:~$ zfs create rpool/export/home/haik/music
haik@opensolaris:~$ zfs create rpool/export/home/haik/email
haik@opensolaris:~$ zfs set compression=on rpool/export/home/haik/email
rpool/export/home/haik/email    18K  1.53G    18K  /export/home/haik/email
haik@opensolaris:~$ zfs snapshot -r rpool/export/home/haik@2008-02-10

As expected, I'm still prevented from creating a file system directly on rpool/export/home.

haik@opensolaris:~$ zfs create rpool/export/home/foo
cannot create 'rpool/export/home/foo': permission denied

Wednesday Feb 11, 2009

LDoms 1.1 Released, Includes Migration Support and More

Late in December our LDoms project finally reached version 1.1. A lot of work has gone into this and we've added some big new features including Domain Migration, Virtual I/O Dynamic Reconfiguration, performance improvements, and more. I'm not going to go into detail here because Alex has already blogged about it with an excellent writeup.

LDoms Community Page
LDoms Discussion Board

Thursday Jan 03, 2008

New Year, New Post

Since my last post, in 2005, opensolaris.org has become quite a busy site. Check out the huge list of projects and communities. And, during this time, I've changed roles at Sun twice. Back in 2005, I switched to the sun4v kernel team. Then, more recently, I was moved over to the Logical Domains group (aka LDoms) in a reorganization effort. LDoms? LDoms is hardware virtualization for our recent sun4v class SPARC servers. You might have heard of these servers using the CPUs codenamed Niagara or Niagara 2. Anyway, you can find out more about it at the LDoms OpenSolaris community page if that's your thing.

The forums I keep up with are...

I use the forum RSS feeds along with Firefox's live bookmark feature which gives me a pull down menu for each forum/mailing list and then sub menus showing each recent thread. It makes it really easy and I don't have to subscribe to a bunch of mailing lists if I just want to look at a forum from time to time. Here's what it looks like.

As Jason pointed out, at our last San Diego OpenSolaris User Group (SDOSUG) meeting, I gave a presentation on LDoms. At our next meeting, scheduled for January 16th, Ryan has agreed to give a presentation on Xen and OpenSolaris. You could say we have a virtualization theme going on right now.

Tuesday Jun 14, 2005

A Little Bit About cfgadm(1M)

OpenSolaris is online. If you're looking to run, build, hack on, peruse, or discuss it, you should be able to find everything you need at opensolaris.org.

If you happened to be taking a look at the commands in usr/src/cmd, you might notice "cfgadm - configuration administration." cfgadm is a command line interface through which dynamically reconfigurable hardware is manipulated on Solaris and now OpenSolaris. Dynamically reconfigurable hardware? That's fancy talk for things like hot pluggable PCI slots, hot swappable SCSI disks, USB slots, and on the midrange and high end SPARC servers, memory, CPUs, I/O devices, and entire system boards. A plugin architecture is used to handle different types of hardware. In usr/src/lib/cfgadm_plugins you can find the plugin directories which include scsi, usb, and pci. These contain directories such as i386, amd64, sparc, and sparcv9 which indicate a version of the plugin is built for the named architecture. For example, the USB plugin is built for all of those, but the sbd (system board) plugin is only built for sparc and sparcv9. That's 32-bit and 64-bit SPARC respectively. Currently, cfgadm is compiled as a 32-bit application, therefore it will just be using the 32-bit plugins.

cfgadm's interface is based on an abstraction centered around attachment points. A reconfigurable hardware device is displayed as an attachment point with a receptacle and occupant. All the work required to display the status of or manipulate an attachment point is handled by a plugin. Running cfgadm as root without any options will list all the attachment points. Check out the cfgadm(1M) man page for all the command line options. Plugins also have man pages, for example, cfgadm_scsi(1M). In the plugin man pages, you should find documentation on how to pass plugin specific options to cfgadm.

At home, I've got an old old 300MHz x86 Celeron box which is running Solaris 10. Running cfgadm as root on this box yields

root@x86solaris #cfgadm
Ap_Id                Type         Receptacle   Occupant     Condition
usb0/1               unknown      empty        unconfigured ok
usb0/2               unknown      empty        unconfigured ok

The cfgadm_usb(1M) man page explains that this means the computer has one USB controller with two ports--neither of which have anything plugged into them. The two ports have Ap_Id's (attachment point identifiers) usb0/1 and usb0/2.

After plugging in a USB mouse:

root@x86solaris #cfgadm
Ap_Id                Type         Receptacle   Occupant     Condition
usb0/1               usb-mouse    connected    configured   ok
usb0/2               unknown      empty        unconfigured ok

After plugging in my monitor/USB hub with keyboard and mouse attached:

root@x86solaris #cfgadm
Ap_Id                Type         Receptacle   Occupant     Condition
usb0/1               usb-hub      connected    configured   ok
usb0/1.1             usb-hub      connected    configured   ok
usb0/1.1.1           usb-mouse    connected    configured   ok
usb0/1.1.2           unknown      empty        unconfigured ok
usb0/1.1.3           usb-device   connected    configured   ok
usb0/1.2             unknown      empty        unconfigured ok
usb0/1.3             unknown      empty        unconfigured ok
usb0/1.4             unknown      empty        unconfigured ok
usb0/2               unknown      empty        unconfigured ok

OK, you get the idea. So how does it work? If you run ldd on cfgadm, you'll see it makes use of libcfgadm (libcfgadm(3LIB)) and libdevinfo (libdevinfo(3LIB).)

root@x86solaris #ldd /usr/sbin/cfgadm 
libcfgadm.so.1 =>    /usr/lib/libcfgadm.so.1
libc.so.1 =>         /lib/libc.so.1
libdevinfo.so.1 =>   /lib/libdevinfo.so.1
libnvpair.so.1 =>    /lib/libnvpair.so.1
libnsl.so.1 =>       /lib/libnsl.so.1
libmp.so.2 =>        /lib/libmp.so.2
libmd5.so.1 =>       /lib/libmd5.so.1
libscf.so.1 =>       /lib/libscf.so.1
libdoor.so.1 =>      /lib/libdoor.so.1
libuutil.so.1 =>     /lib/libuutil.so.1
libm.so.2 =>         /lib/libm.so.2

To list all the attachment points on the system, cfgadm calls into libcfgadm, then libcfgadm uses libdevinfo to get a snapshot of the device tree. It then walks the device tree and attempts to find a suitable plugin for each device node in the tree. If a suitable plugin is found, the plugin is used to get the status of the attachment point. This happens in the list_common() routine inside libcfgadm. di_init(3DEVINFO) is used to get the device tree snapshot and di_walk_minor(3DEVINFO) is used to walk the tree, calling the supplied routine do_list_common() on each minor node. Then, once control is handed off to the plugin, it can call into the kernel and talk to the driver.

I've fixed some bugs in the sbd cfgadm plugin so I'll give an example of how that plugin works. First, here's the output from cfgadm on a domain on one of our high end SPARC systems.

# cfgadm
Ap_Id                Type         Receptacle   Occupant     Condition
IO5                  HPCI         connected    configured   ok
IO5_C3V0             pci-pci/hp   connected    configured   ok
IO5_C3V1             unknown      connected    unconfigured unknown
IO5_C5V0             pci-pci/hp   connected    configured   ok
IO5_C5V1             pci-pci/hp   connected    configured   ok
SB5                  V3CPU        connected    configured   ok
SB6                  V3CPU        connected    configured   ok
c0                   scsi-bus     connected    configured   unknown
c1                   scsi-bus     connected    configured   unknown
c2                   scsi-bus     connected    unconfigured unknown
c3                   scsi-bus     connected    unconfigured unknown

# cfgadm -a SB6
Ap_Id                Type         Receptacle   Occupant     Condition
SB6                  V3CPU        connected    configured   ok
SB6::cpu0            cpu          connected    configured   ok
SB6::cpu1            cpu          connected    configured   ok
SB6::cpu2            cpu          connected    configured   ok
SB6::cpu3            cpu          connected    configured   ok
SB6::memory          memory       connected    configured   ok

The SB5, SB6, and IO5 attachment points represent system boards which are handled by the sbd plugin. Without getting into too much detail about domains and system boards, running "cfgadm -c disconnect SB6" on this system will start Solaris off on a quest to remove board SB6 from the system bus. This capability is a large part of Solaris' Dynamic Reconfiguration (DR) feature. The sbd plugin is going to schedule a sequence of operations required to transition the board board from being configured and in use by Solaris, to having the board (and all its resident CPUs and memory) be completely isolated from Solaris and ready to be powered down. The plugin uses an ordered list of commands, which are defined in ap.h. The function ap_seq_get() looks at the current status of the board and determines the first and last command that should be executed in sequence from the list. ap_seq_exec() is where the plugin iterates through the command list and for many of the commands, such as CMD_DISCONNECT, ap_ioctl() is called. So this is where the plugin calls into the kernel, letting the DR driver handle the work required to get Solaris off the CPUs and memory being disconnected.

Technorati Tag:
Technorati Tag:

Monday Jun 13, 2005

Introduction

Sup? Welcome to my blog. I'm Haik Aftandilian and I've been working at Sun for the past three years. My work here has been focused on fixing bugs and making enhancements to Solaris on our midrange and highend SPARC systems. That includes about a year I've spent exclusively sustaining the Solaris feature Dynamic Reconfiguration. Back on the topic of the blog, I plan to make an occasional post about Solaris, OpenSolaris, bugs, debugging, dtracing, or anything else that I find to be interesting or cool.

About

It's a blog.

Search

Top Tags
Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today