Wednesday May 14, 2008

FAQ: Using ZFS for Swap

You may have seen my earlier blog entry on myths and facts about swap space in which I mentioned that ZFS file systems cannot be used for swap files.

# cd /zpool1
# mkfile 10g swapfile
# swap -a /zpool1/swapfile
"/zpool1/swapfile" may contain holes - can't swap on it.

You can, however, use zvols to add swap space onto a ZFS pool:

#
# Add swap partition in the /export/home zfs partition
#
echo "adding zfs swap"
if [ ! -L /dev/zvol/dsk/export/swap ]
then
       echo "creating swap area"
       zfs create -V 1gb export/swap
fi
echo "/dev/zvol/dsk/export/swap -  -  swap  -  no   -" >> /etc/vfstab
/usr/sbin/swap -a /dev/zvol/dsk/export/swap
 

 Thanks to Jim Litchfield for pulling this info from the documentation for zpool

 


Wednesday Feb 27, 2008

Updated: Playing with ZFS, USB memory disks and VMware Fusion

 

Update 2/28: Made some minor corrections.  Provided an English and high quality version of the German video.  Added a ZFS GUI screenshot and instructions.  Added a link to Constantin's ZFS and Virtual Box blog entry.


This week I am at "Immersion Week" in suburban Chicago.  Immersion Week is an annual training event for Sun Technical staff in the field sales and professional services organizations.  Included in our "goodie bags" was a USB hub and three USB memory sticks along with the suggestion that we use them to demonstrate the open source ZFS file system included with Solaris 10.

Being a Solaris (and Mac) propeller head and fueled by a few Coronas, I found it hard to refuse this challenge. For an advanced version of this, check out this YouTube video (high quality MP4 version) from my colleagues across the pond.  Here are the steps that I followed.

System under test:  MacBook Pro running MacOS 10.5.2, VMware Fusion 1.1.1 and Solaris 10 08/07.

 1. Enable USB device access per the VMware Fusion instructions: <script type="text/javascript" language="JavaScript1.2">WebWorks_WriteArrow(WebWorksRootPath, "wwdd1825234", true);</script>

2
Choose Virtual Machine > Settings or click the Settings button in the toolbar to open the virtual machine Settings sheet.
3
Select + and Add USB controller.
<script type="text/javascript" language="JavaScript1.2">WebWorks_WriteDIVOpen("wwdd1825234", true);</script>
5
Click Apply.

2. Boot the Solaris VM. Login. Open a Solaris terminal window.  Assume root privileges.  Disable the Volume Management service volfs.  This prevents Solaris from automounting the removable disks. This stays in effect across reboots until you "enable" it.

    svcadm disable volfs 

3. Insert the USB hub with 3 sticks into the Mac's USB port

4. Fusion menus: Virtual Machine > USB > Connect ....  for each of the 3 USB devices.  This "grabs" them away from MacOS into Solaris control.

5. Find out the device names for the three USB disks:

# rmformat
Looking for devices...
     1. Logical Node: /dev/rdsk/c0t0d0p0
        Physical Node: /pci@0,0/pci-ide@7,1/ide@1/sd@0,0
        Connected Device: NECVMWar VMware IDE CDR10 1.00
        Device Type: DVD Reader/Writer
     2. Logical Node: /dev/rdsk/c2t0d0p0
        Physical Node: /pci@0,0/pci15ad,790@11/pci15ad,770@2/storage@1/disk@0,0
        Connected Device: CBM      Flash Disk       5.00
        Device Type: Removable
     3. Logical Node: /dev/rdsk/c3t0d0p0
        Physical Node: /pci@0,0/pci15ad,790@11/pci15ad,770@2/storage@2/disk@0,0
        Connected Device: USB      Flash Disk       1100
        Device Type: Removable
     4. Logical Node: /dev/rdsk/c4t0d0p0
        Physical Node: /pci@0,0/pci15ad,790@11/pci15ad,770@2/storage@3/disk@0,0
        Connected Device: CBM      Flash Disk       5.00
        Device Type: Removable

6.  Create a zpool using RAID Z on the three devices.

# zpool create usbdisk raidz c2t0d0p0 c3t0d0p0 c4t0d0p0
invalid vdev specification
use '-f' to override the following errors:
raidz contains devices of different sizes

Wasn't that nice of ZFS to warn us!
# zpool create -f usbdisk raidz c2t0d0p0 c3t0d0p0 c4t0d0p0
# # zpool status

  pool: usbdisk
 state: ONLINE
 scrub: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        usbdisk       ONLINE       0     0     0
          raidz1      ONLINE       0     0     0
            c3t0d0p0  ONLINE       0     0     0
            c2t0d0p0  ONLINE       0     0     0
            c4t0d0p0  ONLINE       0     0     0

errors: No known data errors

# zpool list
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
usbdisk                 360M     91K    360M     0%  ONLINE     -


7.  Now lets have some fun......

8. Create a 5 MB file

cd /usbdisk
mkfile 5m test
# ls -l
total 10245
-rw------T   1 root     root     5242880 Feb 27 23:43 test
# du -ak
5122    ./test
5124    .

Notice how du and ls agree on sizes.

9. Enable compresssion

zfs set compression=on usbdisk
# pwd
/usbdisk
# mkfile 5m testcompression
# ls -l
total 10246
-rw------T   1 root     root     5242880 Feb 27 23:43 test
-rw------T   1 root     root     5242880 Feb 27 23:48 testcompression
# du -ak
5122    ./test
0       ./testcompression
5124    .

 Notice that ls shows a 5 MB file but du -ak shows a zero size file because zero filled files compress so well.

10.  Now remove one of the USB memory sticks from the hub and attempt to create file.

# mkfile 5m test2
# zpool status

  pool: usbdisk
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        usbdisk       ONLINE       0     0     0
          raidz1      ONLINE       0     0     0
            c2t0d0p0  ONLINE       0     0     0
            c3t0d0p0  ONLINE       0   156     0
            c4t0d0p0  ONLINE       0     0     0

errors: No known data errors

zpool status reports that although a device is missing, data is intact.

Re-insert the removed memory stick and...

# zpool scrub usbdisk
# zpool status

  pool: usbdisk
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver completed with 0 errors on Thu Feb 28 00:37:03 2008
config:

        NAME          STATE     READ WRITE CKSUM
        usbdisk       ONLINE       0     0     0
          raidz1      ONLINE       0     0     0
            c2t0d0p0  ONLINE       0     0     0
            c3t0d0p0  ONLINE       0   254     0
            c4t0d0p0  ONLINE       0     0     0

errors: No known data errors
# zpool clear usbdisk
# zpool status

  pool: usbdisk
 state: ONLINE
 scrub: resilver completed with 0 errors on Thu Feb 28 00:37:03 2008
config:

        NAME          STATE     READ WRITE CKSUM
        usbdisk       ONLINE       0     0     0
          raidz1      ONLINE       0     0     0
            c2t0d0p0  ONLINE       0     0     0
            c3t0d0p0  ONLINE       0     0     0
            c4t0d0p0  ONLINE       0     0     0

errors: No known data errors

zpool scrub examines all data in the specified pools to verify that it checksums correctly. For  replicated  (mirror  or raidz)  devices,  ZFS  automatically  repairs any damage discovered during the scrub.

11.  Now for some real fun with export and import.

# cd /
# zpool export usbdisk
# zpool list

Note that the pool usbdisk is no longer listed.  Remove all three memory sticks.  Mix them up.  Re-insert them.

# zpool import
  pool: usbdisk
    id: 13155150575270542445
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

        usbdisk       ONLINE
          raidz1      ONLINE
            c2t0d0p0  ONLINE
            c4t0d0p0  ONLINE
            c3t0d0p0  ONLINE
# zpool import usbdisk
# zpool status
 
  pool: usbdisk
 state: ONLINE
 scrub: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        usbdisk       ONLINE       0     0     0
          raidz1      ONLINE       0     0     0
            c2t0d0p0  ONLINE       0     0     0
            c4t0d0p0  ONLINE       0     0     0
            c3t0d0p0  ONLINE       0     0     0

errors: No known data errors

Notice how politely, ZFS tells you the name of the pool (even if you forgot it) and asks you to import it by name.  It doesn't matter that the actual "disks" have changed location.

12.  Transfer the disks to another systems (in this case a MacOS system). First note the files that exist and then export the file system. 

 On the Solaris system....

# ls -l
total 20473
-rw------T   1 root     root     5242880 Feb 28 00:32 test
-rw------T   1 root     root     5242880 Feb 28 00:49 testcompression
# du -a
10236   ./test
1       ./testcompression
20477   .
# cd /
# zpool export usbdisk

Shutdown the virtual machine and exit VMware to avoid confusion. Remove the USB hub from the Mac.

Now on Mac OS X 10.5 Re-insert the USB hub. MacOS X Finder produces an error: "Disk inserted was not readable by this computer."

Click "Ignore." Open the MacOS X terminal applications.

$ sudo -s
Password:
bash-3.2# zpool import
  pool: usbdisk
    id: 13155150575270542445
 state: ONLINE
status: The pool is formatted using an older on-disk version.
action: The pool can be imported using its name or numeric identifier, though
    some features will not be available without an explicit 'zpool upgrade'.
config:

    usbdisk     ONLINE
      raidz1    ONLINE
        disk4   ONLINE
        disk3   ONLINE
        disk5   ONLINE
bash-3.2# zpool import usbdisk
bash-3.2# cd /Volumes/usbdisk
bash-3.2# ls
test        testcompression
bash-3.2# du -a
10236    ./test
1    ./testcompression
10241    .

# zfs get all usbdisk
NAME     PROPERTY       VALUE                  SOURCE
usbdisk  type           filesystem             -
usbdisk  creation       Thu Feb 28  0:32 2008  -
usbdisk  used           5.14M                  -
usbdisk  available      200M                   -
usbdisk  referenced     5.03M                  -
usbdisk  compressratio  1.00x                  -
usbdisk  mounted        yes                    -
usbdisk  quota          none                   default
usbdisk  reservation    none                   default
usbdisk  recordsize     128K                   default
usbdisk  mountpoint     /Volumes/usbdisk       default
usbdisk  sharenfs       off                    default
usbdisk  checksum       on                     default
usbdisk  compression    on                     local
usbdisk  atime          on                     default
usbdisk  devices        on                     default
usbdisk  exec           on                     default
usbdisk  setuid         on                     default
usbdisk  readonly       off                    default
usbdisk  zoned          off                    default
usbdisk  snapdir        hidden                 default
usbdisk  aclmode        groupmask              default
usbdisk  aclinherit     secure                 default
usbdisk  canmount       on                     default
usbdisk  shareiscsi     off                    default
usbdisk  xattr          on                     default
usbdisk  copies         1                      default


Like magic, the USB-based ZFS array is now accessible (read-only) to MacOS X 10.5.  A future update is expected to support R/W access. The compression property is still turned on as it was in Solaris.

PS.  I tried mounting the devices in Solaris using Virtual Box by Innotek (recently acquired by Sun).  This software for MacOS X is currently in Beta test.  I received some rather nasty messages about: Failing to create proxy device for USB device.  Virtual Box also runs on Linux, Windows and OpenSolaris hosts.

 See here what Constantin has done with Virtual Box on Open Solaris with ZFS.

Using the ZFS GUI.

I used the command line but ZFS also has a fully capable browser interface.  To use it the webconsole service must be enabled:

 

# svcadm enable webconsole

Point your browser to:  https://localhost:6789.  Login with the root username and password.

ZFS BUI Screenshot








Tuesday Nov 06, 2007

Testing MacOS X read only ZFS capability

When I first heard Jonathon Schwartz announce that MacOS 10.5 (aka Leopard) would include ZFS, I was psyched!  As a Microsoft free user of Macs and Unix since the late 1980s, I was looking forward to seeing Sun's open source file system in MacOS and was convinced that its snapshot capability would be the basis of Time Machine, Apple's new backup facility.  Imagine my disappointment when news trickled out that the first release of Leopard would only included a basic, read-only implementation of ZFS.  What good is a read only file system?

Leopard shipped two weeks ago and ZFS is almost impossible to find by anyone but developers and OS nuts like me.  It's completely invisible to the typical Mac user.  Then I heard a different piece of news.  Apple shipped 2 MILLION copies of Leopard in the first weekend!  Once ZFS becomes a more prominent part of MacOS, they will be able to touch many more people than Sun ever could in our enterprise ready Solaris OS.  I feel confident that Apple will continue to innovate on top of ZFS.  And in typical Apple style, the end user (like my 82 year old mother who loves her Mac and has no idea that she's running Unix) may never know what ZFS is, but they will appreciate the benefits that they get.   The same will no doubt be true  in their  implementation of Sun's Dtrace technology.

With that in mind, I set about to find a way to prove to myself that ZFS is in there and compatible with ZFS in Solaris 10.  Here's what I did using my MacBook Pro, VMware Fusion 1.1RC1 beta and Solaris 10 08/07.

  • Halt Solaris and shut down the VM
  • VM > Settings > + > Add USB controller
  • Boot Solaris
  • Plug in the USB memory stick. (the VM must have focus)
    • This was actually the most time consuming part of the whole exercise.  It did not mount reliably)
  • If you're lucky, mount shows: /rmdisk/noname on /vol/dev/dsk/c2t0d0/noname:c
  • umount /rmdisk/noname 
  • zpool create usbpool /vol/dsk/noname
  • zpool list
    NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
    usbpool                 120M     88K    120M     0%  ONLINE     -

  • zfs list
    NAME         USED  AVAIL  REFER  MOUNTPOINT
    usbpool       85K  87.9M  24.5K  /usbpool

  • zpool export usbpool
  • Suspend the VM and quit Fusion to avoid confusion
  • Re-insert the USB stick.
  • Finder complains that the disk is not readable.  Click Ignore
  • Open a terminal on the Mac.
  • sudo bash
  • zpool import
      pool: usbpool
        id: 13927799406997242219
     state: ONLINE
    status: The pool is formatted using an older on-disk version.
    action: The pool can be imported using its name or numeric identifier, though
        some features will not be available without an explicit 'zpool upgrade'.
    config:

        usbpool     ONLINE
          disk2     ONLINE
  • zpool import usbpool 
  • Mount shows:
    • usbpool on /Volumes/usbpool (zfs, local, read-only)
  • I was then able to view and copy files from the newly mounted pool
  • Woooo Hoooo! 

Why should you care?

ZFS is a truly easy to use, open source, endian independent, scalable, reliable file system.  This is the first example of it being ported to a commercial, consumer oriented product.

Things to like about ZFS:

Learn more at the ZFS learning center.

Saturday Nov 03, 2007

Using ZFS to expand my virtual Solaris disk in VMware Fusion

Here you will find my chronicles of several hours of failed attempts to add disk space to a Solaris VM disk image.  It turns out that some "newthink" was required.  If you want the correct solution, just skip to the end.

I'm running my Solaris images under VMware Fusion on a MacBook Pro.  The question has come up on how to expand the virtual disk size. 

  • Download the VMware Virtual Disk manager for MacOS X. This is a GUI to command line tools provided with Fusion.  If you really like command lines, you can find it at: /Library/Application\\ Support/VMware\\ Fusion/vmware-vdiskmanager. Figure it out yourself.  I know you're man enough!
  • Duplicate your virtual machine.  Only work on the copy! Select it in the Finder and choose Edit > Duplicate. (Apple-D). The VM must NOT be running or even in use and suspended when you make the copy. Fusion complains about this.
  • Start Fusion
  • File Open... your new VM Copy
  • Fusion notices that the name has changed and asks you if you have copied it. 
  • Suspend the VM
  • You must discard any snapshots before expanding this disk. Virtual Machine > Discard Snapshot.
  • Start the Vdiskmanager GUI
  • Click Expand and locate the vmdk file in your VM.  Select your desired size.
  • Click Go (the GUI echoes the command line it uses at the bottom of the windows for cheaters)
  • The GUI does NOT show the progress of this activity.
  • The Results Tab will open when complete with the status.

Now the real fun begins.  Format, however, shows my disk at its original 10 GB size rather than the new 18 GB size.  This is where fdisk comes into play.

fdisk /dev/rdsk/c8t0d0p0 shows that my disk has one partition that is 56% of the entire disk.  This proves that the operation worked. Now we will attempt to delete the partition and recreate it with a larger size while the OS is running (holding breath). Unfortunately, this attempt failed, if you don't care about learning from my failures, skip to the next section.

  • fdisk /dev/rdsk/c8t0d0p0
  • Select 3 to delete the partition, select partition 1 and confirm
  • Select 1 to create a partition. Specify 100% of the Disk.
  • Select 5 to exit and pray!
  • Run Format and crash (Oh crap!  Glad it was only a copy!)
  • System reboots and Grub has no menu. All attempts to boot the kernel fail.Oops. try again.

Ok, so Solaris doesn't like me removing and recreating it's fdisk partition while it's running.  How about creating a separate partition and mounting it?  Throw away this VM and make another copy of the original.  Repeat the steps to enlarge the disk, then... This attempt also failed, if you don't care about learning from my failures, skip to the next section.

  • reboot is required for fdisk to recognize the new larger size
  • fdisk /dev/rdsk/c8t0d0p0
  • 1 to create new partition, enter size, do NOT make active

Now I'm stuck again.  I can't find a way to get format to recognize the disk in order to build slices.  newfs refuses to write a new file system with no partition table.

In SunSolve, I found this bug 6307998 which has been closed with these comments.

I have verified that fundamentally Solaris has a limitation in that 
it does not allow more than one physical Solaris partition on the same disk.

This lack of functionality goes beyond the installer, it's something lacking in
Solaris in general. Having 2 Solaris partitions on the same disk is not
supported in Solaris because the disk driver assumes there's only one
Solaris partition per disk. For example, if we reference /dev/dsk/c0d0s0, how do
we determine which Solaris partition we're intending to access on c0d0.

 ZFS to the rescue

Who needs that nasty old format and mkfs stuff when you have ZFS! 

  • reboot is required for fdisk to recognize the new larger size
  • fdisk /dev/rdsk/c8t0d0p0
  • 1 to create new partition, enter size, do NOT make active
  • zpool create mypool /dev/dsk/c8t0d0p1
  • zfs create mypool/jim

I've successfully increased by virtual storage!

 Alternative method:  Add a second disk to the image

In order to add second hard disk with Fusion.

  • solaris must be halted.
  • VM must be shut down.
  • Click the + sign, add disk and enter a size.
  • devfsadm  (almost typed reboot -- -r but that would be "old think" so that format sees the new device.)

format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c1t0d0 <DEFAULT cyl 1302 alt 2 hd 255 sec 63>
          /pci@0,0/pci1000,30@10/sd@0,0
       1. c1t1d0 <DEFAULT cyl 2557 alt 2 hd 128 sec 32>
          /pci@0,0/pci1000,30@10/sd@1,0

# zpool create mypool /dev/dsk/c1t1d0
# zfs create mypool/jim

# zpool status
  pool: mypool
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        mypool      ONLINE       0     0     0
          c1t1d0    ONLINE       0     0     0

# zpool list
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
mypool                 4.97G    116K   4.97G     0%  ONLINE     -
 

Why should you care

I found myself guilty here if something that my customers also do frequently.  That is, deal with Solaris 10 as if it were Solaris 2.2.  The new capabilities of the open sourced ZFS are not only easier to use, they support a wider variety of options for the user.


 

Wednesday Jun 13, 2007

Cool new ZFS news and data

There are some cool new reports comparing ZFS to other file systems.  In order to same myself some time, I'll just refer you to others who have already summarized the results.

 http://blogs.sun.com/Peerapong/entry/solaris_zfs_performance
 

There's also quite a bit of discussion about ZFS for MacOS and Linux going on including public statements by Linus Torvalds and Johnathan Schwartz.  See: 

http://blogs.sun.com/jimgris/entry/zfs1 

 Full reports

 Solaris ZFS vs Linux Ext3 :
http://blogs.sun.com/Peerapong/resource/zfs_linux.pdf

 Solaris ZFS vs VERITAS VxFS:
http://blogs.sun.com/Peerapong/resource/zfs_veritas.pdf

Solaris ZFS vs W2K NTFS: 
http://blogs.sun.com/Peerapong/resource/zfs_msft.pdf

Solaris ZFS vs Linux RAID : http://unixconsult.org/zfs_vs_lvm.html

 

About

Jim Laurent is an Oracle Sales consultant based in Reston, Virginia. He supports US DoD customers as part of the North American Public Sector hardware organization. With over 17 years experience at Sun and Oracle, he specializes in Solaris and server technologies. Prior to Oracle, Jim worked 11 years for Gould Computer Systems (later known as Encore).

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today