Monday Nov 02, 2009

Copying directory trees, continued

As mentioned earlier, I am using rsync to copy complete directory trees:

$ cd /source_dir
$ rsync -avz . /dest_dir

It will copy all files, including hidden ones, in directory /source_dir, into the destination directory /dest_dir. In case /dest_dir does not already exist, rsync will create it (at least on my Nevada 72 system).

There may be cases where rsync does not work correctly, for example when there are very long file names. Good to know that there's another easy way to do it:

$ cd /source_dir
$ find . -print -depth | cpio -pdm /dest_dir

The -depth parameter makes find print file names first, and -pdm with cpio means (from man cpio):
-p: reads a list of file path names from stdio and copies those files to the destination tree.
-d: creates directories if necessary
-m: keep file modification times

Destination directory /dest_dir must exist before starting the above commands.

Friday Jul 10, 2009

Firefox 3.5 for Solaris!

These are the direct links for downloading Firefox 3.5 for Solaris:
Version x86 SPARC
Firefox 3.5 OpenSolaris pkg  |  tar pkg  |  tar
Firefox 3.5 Solaris 10 pkg  |  tar pkg  |  tar

You can still find the links to the latest Firefox 2 version (2.0.0.20) in this blog entry, and to Firefox 3.0 in this blog entry.

If you are visiting the Mozilla web site from a Solaris system, your system will be automatically detected, and you can a see link which shows "Download Firefox - Free" and "3.5 for SunOS". That link directs you to the Mozilla development page on OpenSolaris.org. Unfortunately, the Solaris versions are not yet mentioned on the "Other Systems and Languages" web page (actually, that's why I am maintaining the Firefox for Solaris links in my blog), but who knows - maybe we'll see it there in a while.


Firefox 3 for Solaris - latest versions as of June 2009

These are the direct links for downloading the latest versions of Firefox 3 for Solaris:
Version x86 SPARC
Firefox 3.0.11 OpenSolaris pkg  |  tar pkg  |  tar
Firefox 3.0.11 Solaris 10 pkg  |  tar pkg  |  tar

You can still find the links to the latest Firefox 2 version (2.0.0.20) in this blog entry.

If you would like to install multiple versions of Firefox on your Solaris system, you can use the tarballs or my script for renaming a Firefox package.

Oh - and if you are running Solaris or OpenSolaris on x86, I suggest to install the recently released Adobe Reader 9.1 for Solaris x86!

Thursday Mar 26, 2009

Finally: Adobe Reader 9 for Solaris x86 is available for download !!!

Believe it or not. Adobe has finished and made available Adobe Reader 9 for Solaris x86! You can download it from here (in .bin (self-extracting executable), .tar, and .pkg format). System requirements are mentioned here (minimum Solaris level are s10u5 or OpenSolaris 2008.11).

Monday Feb 09, 2009

Local backups with rsync (forget tar -cvf - . | (cd /dest; tar -xpf -)

Recently, I encountered the rsync man page by accident (maybe as one of the lines in the top 10 Google search results?) and was quite surprised to find rsync examples where there was no remote host in any of its arguments. Doesn't rsync stand for something like "remote synchronization"?

So here's how it works:

If you want to copy all files in a directory and all directories and files below to another directory (for example on another file system on a different disk), use the following command:

$ rsync -avz /source_dir/ /dest_dir

Note the added slash after /source_dir. This command will recursively copy all files and directories in directory /source_dir to directory /dest_dir (will create it if it doesn't exist). If you omit the trailing slash, it will create a new directory /dest_dir/source_dir. The rsync command will copy links as links, not as the original files they point to (similar to the default behavior of Solaris or GNU tar). If the rsync command was run before at least once, it will copy only the changed or newly added files. It will not remove destination files if files have been removed in the source directory.

Example: Copy all directories and files in directory /tmp/1 to empty directory /tmp/2:

  • Using the cp command (option P will copy links as links):
    $ cp -Ppr /tmp/1 /tmp/2
  • Using the tar command (Solaris or GNU. Solaris tar will report that a link has been created while GNU tar will only mention the file name of the link):
    $ mkdir /tmp/2
    $ cd /tmp/1
    $ tar -cvf - . | ( cd /tmp/2; tar -xpf -)
  • Using the rsync command:
    $ rsync -avz /tmp/1/ /tmp/2

Tuesday Jan 27, 2009

Using ZFS as (an iSCSI) target for Mac OS X Time Machine

Inspired by this and then this blog entry, I thought it was now time for me to get my own experience with iSCSI.

Here's the result:

  1. On my eco-friendly server running OpenSolaris 2008.11, I created a new ZFS volume (not a ZFS file system!) with iSCSI sharing switched on:
    $ zfs create -o shareiscsi=on -V 180G pool2/mac-tm
    cannot share 'pool2/mac-tm': iscsitgtd failed request to share
    filesystem successfully created, but not shared
  2. Well, that did not work well. Better search and install the iSCSI packages first:
    $ pkg search -rl iscsi | nawk '{print $NF}' | \\
    nawk 'BEGIN{FS="@"}{print $1}' | sort -u
    PACKAGE
    pkg:/SUNWiscsi
    pkg:/SUNWiscsitgt
    $ pkg install SUNWiscsi SUNWiscsitgt
    DOWNLOAD                                    PKGS       FILES     XFER (MB)
    Completed                                    2/2       18/18     0.86/0.86
    
    PHASE                                        ACTIONS
    Install Phase                                  74/74
    PHASE                                          ITEMS
    Reading Existing Index                           9/9
    Indexing Packages                                2/2
    
  3. Then, I wanted to delete (destroy, in ZFS speak) and create the zvol again:
    $ zfs destroy pool2/mac-tm
    cannot destroy 'pool2/mac-tm': volume has children
    use '-r' to destroy the following datasets:
    pool2/mac-tm@zfs-auto-snap:frequent-2009-01-23-12:15#
  4. OK, I understand that an automated snapshot had already been created in the meantime. Destroy the zvol with its snapshots, and create the zvol again:
    $ zfs destroy -r pool2/mac-tm
    $ zfs create -o shareiscsi=on -V 180G pool2/mac-tm
    
  5. Check if the shareiscsi property is on for our volume:
    $ zfs get shareiscsi pool2/mac-tm
    NAME          PROPERTY    VALUE         SOURCE
    pool2/mac-tm  shareiscsi  on            local
  6. List all defined iSCSI targets:
    $ iscsitadm list target
    Target: pool2/mac-tm
        iSCSI Name: iqn.1986-03.com.sun:02:3f4f551a-41ab-4a3a-adf9-ea3ce5c2789c
        Connections: 0
  7. Looks great! On the MacBook Pro running Mac OS X 10.5.6, I installed the globalSAN iSCSI initiator software (version 3.3.0.43) from Studio Network Solutions, after downloading from this link.

  8. Then I rebooted the Mac (as required by the globalSAN iSCSI software).

  9. Next step was to mount the iSCSI drive:
    Mac OS X System Preferences
    a) Click on the globalSAN iSCSI icon to display its control panel:
    GlobalSAN iSCSI control panel #1
    b) Click on the + symbol in the lower left corner to get the following popup:
    GlobalSAN iSCSI control panel #2
    c) Enter the IP address or host name of the OpenSolaris server, leave the port number as it is, and enter the target name (the last column in the line starting with iSCSI Name: in the output of the iscsitadm list target command on your OpenSolaris server - in our case, it's iqn.1986-03.com.sun:02:3f4f551a-41ab-4a3a-adf9-ea3ce5c2789c ), and press the OK button. The iSCSI control panel will then look like:
    GlobalSAN iSCSI control panel #3
    d) Click the Connected switch at the end of the iSCSI target line (the line which starts with iqn) to get the following popup:
    GlobalSAN iSCSI control panel #4
    e) Press the Connect button to connect to that iSCSI target. As we did not specify CHAP or Kerberos authentication, the connect will work without user and password. For a walkthrough and more on CHAP authentication, click this link.
    After pressing the Connect button, the control panel will look like:
    GlobalSAN iSCSI control panel #5
    At this time, the newly created volume will show up in Disk Utility. Note that I clicked on the Persistent button to build the connection again after a reboot - I didn't try rebooting to check, but believe it will work.

  10. Then, I created a Mac OS X volume in Disk Utility.
    Disk Utility #1
    a) Click on the disk drive and then on the Erase tab, enter a new name for the volume (or leave it as it is), and press the Erase... button. The following screen will displayed to show the progesss:
    Disk Utility #2
    After the erase is completed, the new volume will show up in the left part of the Disk Utility (For this screen shot, I created the volume again after providing the name ZFS-180GB for the volume. Not sure if it's possible to rename a volume without formatting it):
    Disk Utility #3
  11. Now the volume is usable in Time Machine.
    a) Click on the Time Machine icon in System Preferences to start its control panel:
    TM control panel #1
    b) Click on Change Disk to change the destination volume for Time Machine (the lock in the lower left corner has to be unlocked first to allow for the change):
    TM control panel #2
    c) Select the new volume and press Use for Backup. Then, just start the backup (or wait 120 seconds until it starts automatically):
    TM control panel #3
    Mac OS X Time Machine has started its first backup on a ZFS volume!

However, as always in my blog entries, this is no guarantee that it will always work as described, or that the backup and restore will also work after your next Mac OS X upgrade, or that there will be no errors or problems with such a setup. What I can tell you is that a simple restore attempt worked for me just as if I had done it from a USB disk!

Up to now, I have always disconnected the USB disk drive before closing the Mac's lid so that a Time Machine backup would not be interrupted in the middle. Not sure what would happen if a Time Machine backup is running while you close the lid, so better read the docs and test it, or just always unmount Time Machine's active volume before letting your Mac sleep.

And I discovered that if an iSCSI volume is mounted before closing the lid, the Mac Book Pro cannot transition into deep sleep mode with a power consumption similar to the switched off state. It somehow sleeps, but with rotating fan and a steady front LED. And in order to wake it up, I had to open and close the lid several times. So the steps to do before closing the Mac's lid are:

  1. Eject (unmount) the volume (use the eject menu item after right-clicking on the volume's icon on the desktop).
  2. Disconnect the iSCSI target (and all others) in the globalSAN iSCSI control panel in the Mac OS X System Preferences, by unmarking the tick in column Connected for all targets. A confirmation popup will be shown when unmarking the Connected tick.

After waking up your Mac next time, just tick the Connected mark in the globalSAN iSCSI control panel again and confirm the popup that will be shown. If you did not choose another destination disk for Time Machine in the meantime, Time Machine will recognize the iSCSI drive as a valid destination volume automatically and use it for its next scheduled backup.

BTW For an interesting article on how to use ZFS iSCSI sharing with a Linux client, please click here.

Friday Jan 23, 2009

DTrace at its best!

Using DTrace's destructive actions, you can perform actions on your operating system you never thought of before, like backup files with a zfs snapshot right before a user deletes them, or halt any process before it is ended.

With the help of the files in directory /usr/demo/dtrace, the DTrace Toolkit, and my colleagues who showed me the best probe for intervening before a process is really ended, I wrote and tested the following short DTrace script to halt certain processes before they are ended. It can be very useful if you encounter a large number of short-lived processes which you could not analyze otherwise. In the following example, we are looking only for processes running with userid 4 (username adm) and with the executable name date. I saved it as file name stopper.d.

#!/usr/sbin/dtrace -ws

syscall::rexit:entry
/(execname == "date") && uid == 4/
{
   printf ("%d(%d): %d %d %d %d, %s, >%s<: %Y", pid, ppid, uid,
     curpsinfo->pr_projid, curpsinfo->pr_zoneid,
     curpsinfo->pr_dmodel, cwd, curpsinfo->pr_psargs, walltimestamp);
/\*   stack(); \*/
/\*   ustack(); \*/
/\*   system ("pmap -x %d", pid); \*/
   printf ("\\nStopping Process %d ...", pid);
   stop();
   printf (" done.");
   system ("ps -eo user,pid,ppid,s,zone,projid,pri,class,nice,args | nawk '$2==\\"%d\\"{print}'", pid);
}

Be warned! Adapt the filter rules carefully on a test system before using the script on the system on which you want to halt processes! Use the script on your own risk - I cannot guarantee for anything!

For listing the stopped processes, you can use the following command:

$ ps -eo user,pid,ppid,s,zone,projid,pri,class,nice,args | \\
   nawk '$4=="T" && /date/{print}'

And for ending these stopped processes, you can use that one: 

$ ps -eo user,pid,ppid,s,zone,projid,pri,class,nice,args | \\
   nawk '$4=="T" && /date/{system ("kill -9 "$2)}'

For testing, I created a script named start-50-date-processes.ksh to start 50 date processes roughly at the same time, then started the DTrace script above as user root:

./stopper.d
and afterwards started the test script as user adm:
./start-50-date-processes.ksh

A sample output looks like:

$ ./stopper.d 
dtrace: script './stopper.d' matched 1 probe
dtrace: allowing destructive actions
CPU     ID  FUNCTION:NAME
  1   3413  rexit:entry 21922(5058): 4 3 0 1, /var/adm/bin, >date<: 2009 Jan 23 13:47:37
Stopping Process 21922 ... done.
     adm 21922  5058 T   global     3  57   IA 24 date

  1   3413  rexit:entry 22005(5058): 4 3 0 1, /var/adm/bin, >date<: 2009 Jan 23 13:47:41
Stopping Process 22005 ... done.
     adm 22005  5058 T   global     3  47   IA 24 date

  0   3413  rexit:entry 22090(22089): 4 3 0 1, /var/adm/bin, >date<: 2009 Jan 23 13:47:56
Stopping Process 22090 ... done.
     adm 22090     1 T   global     3  44   IA 20 date
... (some more lines)

For stopping the DTrace script, just press <ctrl>c in the window where you started it. Stopping the script will not let the stopped processes continue - they remain in the "T" (for Trace) status until they are killed.

The DTrace script will run faster if you comment out its last line (where it executes the ps command for each stopped process).

And here's the test script (for starting 50 date processes) which I executed as user adm:

$ cat start-50-date-processes.ksh
#!/bin/ksh
i=50
while [[ i -gt 0 ]]; do
   date &
   (( i = i - 1 ))
# or (( i-- )) with ksh93 on Solaris 10, or OpenSolaris
done

Wednesday Jan 21, 2009

(n)awk: print matching lines and some more

I think some of you will find the following (n)awk one-liner useful. I am using it from time to time and thought I should document it so that I do not have to think about where to place the "a++" part, for example.

It will print out lines of a file that match a certain pattern, plus some more lines that follow. In this example, I am searching for lines that contain the string "usb" in /var/adm/messages, and print that line plus 4 more (for more lines, decrement the number in the first curly brackets accordingly). A line number will also be printed out for each line:

$ nawk '/usb/{a=-5}{if (a<0){print NR, $0};a++}' /var/adm/messages

How to collect relevant data for application core analysis

Not sure if you knew already - there is a tool available which collects all relevant data for analyzing application cores, for example by Sun's service and support engineers. It's called pkgapp (not pkgadd ;-) ) and can be directly downloaded by clicking this link.

When started (after extracting in a directory like /opt/pkgapp) using:
$ /opt/pkgapp/pkgapp -c /path_to_core_file -p /path_to_executable

it collects all libraries for that executable, several proc tool outputs, and more, and packs it into a tar file in a new directory below the current directory.

If you would like to test it, you can just create a core with the gcore command:

$ gcore PID

Where PID is the process ID of a currently running process for which you would like to create the core file. The process will not be killed; it will continue running.

A sample session looks like the following (hostname and hostid replaced by dummy entries):

$ id
uid=0(root) gid=0(root)
$ /usr/bin/sleep 30&
[1]     17825
$ gcore 17825
gcore: core.17825 dumped
$ /opt/pkgapp/pkgapp -c core.17825 /usr/bin/sleep 
\* ----------------------------------------------------------------------------------
\* Sun Microsystems RSD pkgapp 3.0 Solaris                               [01/21/2009]
\* ----------------------------------------------------------------------------------
\* OS release                            [5.11]
\* Platform                              [SUNW,Sun-Blade-1000]
\* Checking [-c] is a core or pid        [using core /var/tmp/pkgapp/core.17825]
\* Checking corefile for a valid pldd    [pldd is good with 4 elements]
\* Process root                          [/usr/bin/sleep]
\* Databin parameter [-s] checks         [reset to /var/tmp/pkgapp]
\* Databin found                         [/var/tmp/pkgapp]
\* Databin writable check                [success]
\* Databin used/created is               [/var/tmp/pkgapp/pkgapp-012109-02]
\* Creating temp area                    [/tmp/pkgapp.18627/]
\* Checking if corefile is truncated     [core seems truncated - may not be useful]
\* Checking if corefile cksum = filesz   [core cksum matches file size, may be fine]
\* Process binary                        [sleep]
\* Checking usage history                [not recently run]
\* sleep binary bit version              [32]
\* Checking path [-p] to binary name     [failed, path includes binary name]
\* Resetting path [-p] parameter         [/usr/bin]
\* Checking path [-p] is a directory     [success]
\* Locating sleep                        [success]
\* Checking located sleep is 32 bit      [success]
\* Binary located                        [/usr/bin/sleep]
\* Adding binary to pkgapp.pldd          [success]
\* Grabbing pldd                         [success]
\* Grabbing pstack                       [success]
\* Grabbing pmap                         [success]
\* Grabbing pcred                        [success]
\* Grabbing pflags                       [success]
\* Grabbing pargs                        [success]
\* Not Including the core/gcore
\* Javatools [-j] not set                [skipped]
\* Grabbing /var/adm/messages            [success]
\* Grabbing uname -a                     [success]
\* Grabbing date/time                    [success]
\* Grabbing showrev -p                   [success]
\* Grabbing pkginfo -l                   [success]
\* Grabbing /etc/release                 [success]
\* Grabbing coreadm                      [success]
\* Grabbing ulimit                       [success]
\* Grabbing libs                         [success]
\* Making lib paths app/                 [success]
\* Making lib paths libs/                [success]
\* Processing file 1 of 48
\* Processing file 2 of 48
\* Processing file 3 of 48
...
\* Processing file 46 of 48
\* Processing file 47 of 48
\* Processing file 48 of 48
\* Linking libraries                     [success]
\* Libraries linked                      [48 ttl]
\* 
\* Using hostid for naming .tar.gz       [12345678]
\* Writing file                          [pkgapp-12345678-sol01-090121-094026.tar.gz]
\* 
\* Done gathering files
\* Writing dbxrc & opencore.sh files     [success]
\* Writing manifest-090121-094026.log    [success]
\* Writing pkgapp-args-090121-094026     [success]
\* Creating final tarfile                [success]
\* Compressing tarfile                   [success]
\* End of runtime logging
\* Saving history info                   [/var/tmp/pkgapp-history/history-090121-094026.log]
\* Saving runtime log                    [/var/tmp/pkgapp-history/runtime-090121-094026.log]
\* Removing [-r] temp area/files         [left alone]
\* 
\* Operations Complete
----------------------------------------------------------------------------------
Upload the following file(s) to your supportfiles.sun.com Cores Directory at Sun

1) File(s) located in directory /var/tmp/pkgapp/pkgapp-012109-02

                [ pkgapp-12345678-sol01-090121-094026.tar.gz ]

2) File(s) located in directory /var/tmp/pkgapp/

                [ core.17825 ]


Note: pkgapp has not included the core.17825 with the above pkgapp tar
Please rename the file appropriately and upload to the same location

                                Thank you.
                                Sun Software Technology Service Center (STSC)
...

Thursday Jan 15, 2009

Solaris booting from ZFS, explained by Lori Alt

Lori Alt, project lead for the ZFS boot project, explains in this video how booting Solaris from ZFS works.

The first 5 minutes covers the main feaures of ZFS. At about 05:45, Lori starts describing the Solaris boot process (SPARC and x86) and the special considerationg when using ZFS for booting, including some remarks on grub. A great primer if you want to know what happens when Solaris is booting. The video is also available in m4v format for viewing on an iPod, for example.

Friday Jan 09, 2009

ZFS and the uberblock, part 2

In my last blog entry, I described how to use DTrace to learn more about how and where the ZFS uberblock is written. I showed that an uberblock update causes 4 writes to 4 different areas, and only after 128 write cycles will an old uberblock be overwritten by a new one. Now, using zdb, I will show how often various zpools update their uberblocks so we can examine after which time a USB flash without wear leveling would probably be worn out.

Using zdb -uuuv, as in the following script:

#!/bin/ksh
TMPDIR=/var/tmp
LOGFILE=${TMPDIR}/ub-stats.out

echo "-----" >> ${LOGFILE}
date +"date+time: %Y-%m-%d %H:%M:%S" >> ${LOGFILE}
for ZPOOL in $(zpool list -o name -H); do
   echo ${ZPOOL} >> ${LOGFILE}
   zdb -uuuv ${ZPOOL} >> ${LOGFILE}                                        
done

, we can show important information about the uberblocks of each zpool and write it to a file (for example once per minute, per crontab). A timestamp is useful for further analysis.

After some some hours or days, we can extract the data of certain zpools, import it into a spreadsheet and create graphs to show how often new uberblocks are written to disk. Here's how we can print the data for rpool:

nawk '/date/{date=$2; time=$3}
/pool/{pool=$NF}
/txg/{printf ("%s %s\\t%s\\t%s\\n", date, time, pool, $NF)}' ub-stats.out
A sample output might look like:
2009-01-06 22:14:00     dpool   66273
2009-01-06 22:14:00     dpool-2 58707
2009-01-06 22:14:00     rpool   33165
2009-01-06 22:15:00     dpool   66273
2009-01-06 22:15:00     dpool-2 58707
2009-01-06 22:15:00     rpool   33167
2009-01-06 22:16:00     dpool   66273
2009-01-06 22:16:00     dpool-2 58707
2009-01-06 22:16:00     rpool   33169
2009-01-06 22:17:00     dpool   66273
2009-01-06 22:17:00     dpool-2 58707
2009-01-06 22:17:00     rpool   33171
We can easily extract just one pool, using nawk '$3=="dpool"{print}', for example, read the resulting file into OpenOffice, and create a graph of type x/y to show the txg numbers over time. For rpool, I got the following picture:

txg over time (rpool)

The data (you can even see it in the above table) shows that for rpool, txg is incremented at about two times per minute. Each additional manual "sync" command increments txg by one and also writes the uberblock to disk. Assuming that a USB flash drive without wear leveling wears out after data is written to the same location 10.000 times (which is the minimum guaranteed number of cycles for a standard USB flash drive mentioned in this document), knowing that the same block is overwritten 2\*60\*24/128 = less than 23 times after one day, we can calculate 10.000/22.5 to get the number of days until such a USB stick would be worn out. The result is about 444 days (=1 year 79 days). Assuming the USB stick is worn out only after 100.000 writes to the same location, the result would be 4444 days which is more than 12 years! At the end of this blog entry, I will explain why the USB stick might even live much longer than those 444 or 4444 days.

Interestingly, for another pool, name dpool, I got a completely different graph:

txg over time (dpool)

What I know is that on this pool, at about 06:00 am, a small number of files (about 10 to 20) is been created, again at 07:00 am, and during other times, there is no or not much write activity. Using another dtrace script, we can watch in real time and see that the uberblock's txg number is indeed increased every 30 seconds but only written to disk when there is a change to the pool to which it belongs or, for example, when a sync command is issued on the command line. Here's the script I used (a colleague recommended to also include spa_sync as this is the call for synching a transaction group, so I added two more probes for watching these events):

io:genunix:bdev_strategy:start
{
   printf ("%Y %s %d %d", walltimestamp, execname, args[0]->b_blkno, args[0]->b_bcount);
}

fbt:zfs:spa_sync:entry
{
   printf ("%Y %s %d", walltimestamp, execname, args[1]);
}

fbt:zfs:spa_sync:return,
fbt:zfs:vdev_uberblock_sync:entry
{
   printf ("%Y %s", walltimestamp, execname);
}

fbt:zfs:uberblock_update:entry
{
   printf ("%Y %d %d %d", walltimestamp,
     args[0]->ub_rootbp.blk_birth, args[0]->ub_rootbp.blk_fill,
     args[2]);
}

fbt:zfs:uberblock_update:return
{
   printf ("%Y %d %d", walltimestamp, args[0], args[1]);
}

And here's a sample output. I removed certain columns, including the hour (19 p.m.) from the original output. The times are minutes:seconds. For better readability, I have marked all entries and comments regarding writing the uberblock update for rpool to disk in green color, and all actions I performed in another terminal window while the dtrace script was running in red color.

           spa_sync:entry 11:09 sched 41533           <- spa_sync() rpool
      bdev_strategy:start 11:09 sched 25671451 512
      bdev_strategy:start 11:09 sched 4310260 512
      bdev_strategy:start 11:09 sched 25571476 3072
      bdev_strategy:start 11:09 sched 25671452 14848
	   spa_sync:entry 11:09 sched 69698           <- spa_sync() dpool-2
   uberblock_update:entry 11:09 67082 178 69698       <- updating dpool-2. txg != blk_birth!
  uberblock_update:return 11:09 85 0                  <- args[1]=0: no sync!
	  spa_sync:return 11:09 sched
	   spa_sync:entry 11:09 sched 76452           <- spa_sync() dpool
   uberblock_update:entry 11:09 75007 122 76452       <- updating dpool. Note that txg is 76452
                                                         whereas blk_birth is 75007: means that
						         this file system has not been changed for
						         (76452-75007)/2+1 = 723m 30s = 12h 3m 30s
						         So the last change to dpool was
						         at about 7:08 am. This is exactly the time
                                                         stamp as shown in zdb -uuuv dpool. And the
                                                         txg number shown in that output is the one
                                                         that is read from disk, so it is 75007:

(zdb output follows; dtrace output will be continued below)

---
$ zdb -uuuv dpool
Uberblock

        magic = 0000000000bab10c
        version = 13
        txg = 75007
        guid_sum = 10502025224250020770
        timestamp = 1231481321 UTC = Fri Jan  9 07:08:41 2009
        rootbp = [L0 DMU objset] 400L/200P DVA[0]=<0:fd8c54a00:200> DVA[1]=<0:1f8b7f8800:200> DVA[2]=<0:c02e0600:200> fletcher4 lzjb LE contiguous birth=75007 fill=122 cksum=a0e45aac0:42a5af596ce:e0b8e86ece60:201ef17f0b98da
---

(dtrace output continued)

  uberblock_update:return 11:09 85 0                  <- args[1]=0: no sync!
	  spa_sync:return 11:09 sched
      bdev_strategy:start 11:09 sched 4310261 27136
      bdev_strategy:start 11:09 sched 25571482 15360
.
... some more IO of various blocks and sizes
.
      bdev_strategy:start 11:09 sched 25671511 512
      bdev_strategy:start 11:09 sched 25671517 5632
   uberblock_update:entry 11:09 41533 159 41533       <- updating rpool. txg == blk_birth!
  uberblock_update:return 11:09 85 1                  <- args[1]=1: sync!
vdev_uberblock_sync:entry 11:09 sched
      bdev_strategy:start 11:09 sched 378 1024        <- write to block #378!
bdev_strategy:start 11:09 sched 890 1024 bdev_strategy:start 11:09 sched 39005050 1024 bdev_strategy:start 11:09 sched 39005562 1024 spa_sync:return 11:09 sched spa_sync:entry 11:39 sched 41534 <- spa_sync() rpool uberblock_update:entry 11:39 41533 159 41534 <- updating rpool. txg != blk_birth! uberblock_update:return 11:39 85 0 <- args[1]=0: no sync! skip writing to block #380!
spa_sync:return 11:39 sched spa_sync:entry 11:39 sched 76453 <- spa_sync() dpool uberblock_update:entry 11:39 75007 122 76453 uberblock_update:return 11:39 85 0 spa_sync:return 11:39 sched spa_sync:entry 11:39 sched 69699 <- spa_sync() dpool-2 uberblock_update:entry 11:39 67082 178 69699 uberblock_update:return 11:39 85 0 spa_sync:return 11:39 sched spa_sync:entry 12:09 sched 69700 <- spa_sync() dpool-2 uberblock_update:entry 12:09 67082 178 69700 uberblock_update:return 12:09 85 0 spa_sync:return 12:09 sched spa_sync:entry 12:09 sched 76454 <- spa_sync() dpool uberblock_update:entry 12:09 75007 122 76454 uberblock_update:return 12:09 85 0 spa_sync:return 12:09 sched spa_sync:entry 12:09 sched 41535 <- spa_sync() rpool bdev_strategy:start 12:09 sched 25571530 1024 bdev_strategy:start 12:09 sched 4310376 1024 . ... some more IO of various blocks and sizes . bdev_strategy:start 12:09 sched 25671620 512 bdev_strategy:start 12:09 sched 25671626 5120 uberblock_update:entry 12:09 41535 159 41535 <- updating rpool. txg == blk_birth! uberblock_update:return 12:09 85 1 <- args[1]=1: sync! vdev_uberblock_sync:entry 12:09 sched bdev_strategy:start 12:09 sched 382 1024 <- write to block #382!
bdev_strategy:start 12:09 sched 894 1024 bdev_strategy:start 12:09 sched 39005054 1024 bdev_strategy:start 12:09 sched 39005566 1024 spa_sync:return 12:09 sched
(sync command issued in another terminal window)

(end sync command)
spa_sync:entry 12:23 sched 76455 <- spa_sync() dpool uberblock_update:entry 12:23 75007 122 76455 uberblock_update:return 12:23 85 0 spa_sync:return 12:23 sched spa_sync:entry 12:23 sched 69701 <- spa_sync() dpool-2 uberblock_update:entry 12:23 67082 178 69701 uberblock_update:return 12:23 85 0 spa_sync:return 12:23 sched spa_sync:entry 12:23 sched 41536 <- spa_sync() rpool bdev_strategy:start 12:23 sched 25671636 1536 bdev_strategy:start 12:23 sched 25671639 2560 . ... some more IO of various blocks and sizes . bdev_strategy:start 12:23 sched 25671718 3584 bdev_strategy:start 12:23 sched 25682884 2048 uberblock_update:entry 12:23 41536 159 41536 <- updating rpool. txg == blk_birth! uberblock_update:return 12:23 85 1 <- args[1]=1: sync! vdev_uberblock_sync:entry 12:23 sched bdev_strategy:start 12:23 sched 384 1024 <- write to next block, #384!
bdev_strategy:start 12:23 sched 896 1024 bdev_strategy:start 12:23 sched 39005056 1024 bdev_strategy:start 12:23 sched 39005568 1024 spa_sync:return 12:23 sched bdev_strategy:start 12:53 sched 234913576 12288 bdev_strategy:start 12:53 sched 234910656 16384 spa_sync:entry 12:53 sched 41537 <- spa_sync() rpool bdev_strategy:start 12:53 sched 25671725 14848 bdev_strategy:start 12:53 sched 25571590 1024 bdev_strategy:start 12:53 sched 4310548 1024 bdev_strategy:start 12:53 sched 25571588 1024 spa_sync:entry 12:53 sched 69702 <- spa_sync() dpool-2 bdev_strategy:start 12:53 sched 234915243 512 bdev_strategy:start 12:53 sched 306224021 512 bdev_strategy:start 12:53 sched 234913640 12288 bdev_strategy:start 12:53 sched 234915250 1024 spa_sync:entry 12:53 sched 76456 <- spa_sync() dpool uberblock_update:entry 12:53 75007 122 76456 uberblock_update:return 12:53 85 0 spa_sync:return 12:53 sched bdev_strategy:start 12:53 sched 234915252 1536 bdev_strategy:start 12:53 sched 234915256 7168 . ... some more IO of various blocks and sizes . bdev_strategy:start 12:53 sched 379625154 1536 bdev_strategy:start 12:53 sched 379625160 4608 uberblock_update:entry 12:53 41537 159 41537 <- updating rpool. txg == blk_birth! uberblock_update:return 12:53 85 1 <- args[1]=1: sync! uberblock_update:entry 12:53 69702 178 69702 uberblock_update:return 12:53 85 1 <- args[1]=1: sync! vdev_uberblock_sync:entry 12:53 sched bdev_strategy:start 12:53 sched 396 1024 bdev_strategy:start 12:53 sched 908 1024 bdev_strategy:start 12:53 sched 390704524 1024 bdev_strategy:start 12:53 sched 390705036 1024 spa_sync:return 12:53 sched vdev_uberblock_sync:entry 12:54 sched bdev_strategy:start 12:54 sched 386 1024 <- write to next block, #386
bdev_strategy:start 12:54 sched 898 1024 bdev_strategy:start 12:54 sched 39005058 1024 bdev_strategy:start 12:54 sched 39005570 1024 spa_sync:return 12:54 sched (wrote file in dpool-2, with :w in vi) 0 6668 bdev_strategy:start 2009 Jan 9 19:13:08 sched 234913600 20480 (done writing) spa_sync:entry 13:23 sched 41538 <- spa_sync() rpool uberblock_update:entry 13:23 41537 159 41538 <- updating rpool. txg != blk_birth! uberblock_update:return 13:23 85 0 <- args[1]=0: no sync! skip writing to block #388!
spa_sync:return 13:23 sched spa_sync:entry 13:23 sched 69703 <- spa_sync() dpool-2 bdev_strategy:start 13:23 sched 234915299 512 bdev_strategy:start 13:23 sched 306224077 512 bdev_strategy:start 13:23 sched 234915300 512 bdev_strategy:start 13:23 sched 234915302 1024 spa_sync:entry 13:23 sched 76457 <- spa_sync() dpool uberblock_update:entry 13:23 75007 122 76457 uberblock_update:return 13:23 85 0 spa_sync:return 13:23 sched bdev_strategy:start 13:23 sched 234915304 1536 bdev_strategy:start 13:23 sched 234915308 7168 . ... some more IO of various blocks and sizes . bdev_strategy:start 13:23 sched 379625170 6144 bdev_strategy:start 13:23 sched 379625184 6144 uberblock_update:entry 13:23 69703 178 69703 <- updating dpool-2. txg == blk_birth! uberblock_update:return 13:23 85 1 <- args[1]=1: sync! vdev_uberblock_sync:entry 13:23 sched bdev_strategy:start 13:23 sched 398 1024 bdev_strategy:start 13:23 sched 910 1024 bdev_strategy:start 13:23 sched 390704526 1024 bdev_strategy:start 13:23 sched 390705038 1024 spa_sync:return 13:23 sched (rm file in dpool-2)

(done rm) spa_sync:entry 13:53 sched 69704 <- spa_sync() dpool-2 bdev_strategy:start 13:53 sched 234915348 512 bdev_strategy:start 13:53 sched 306224124 512 bdev_strategy:start 13:53 sched 234915364 1024 bdev_strategy:start 13:53 sched 306224140 1024 spa_sync:entry 13:53 sched 76458 <- spa_sync() dpool uberblock_update:entry 13:53 75007 122 76458 uberblock_update:return 13:53 85 0 spa_sync:return 13:53 sched spa_sync:entry 13:53 sched 41539 <- spa_sync() rpool bdev_strategy:start 13:53 sched 25571606 15360 bdev_strategy:start 13:53 sched 25571642 1024 . ... some more IO of various blocks and sizes . bdev_strategy:start 13:53 sched 379625196 6144 bdev_strategy:start 13:53 sched 379625210 6144 uberblock_update:entry 13:53 69704 178 69704 <- updating dpool-2. txg == blk_birth! uberblock_update:return 13:53 85 1 <- args[1]=1: sync! bdev_strategy:start 13:53 sched 25671818 2560 bdev_strategy:start 13:53 sched 4310632 2560 . ... some more IO of various blocks and sizes . bdev_strategy:start 13:53 sched 25671823 3584 bdev_strategy:start 13:53 sched 25682892 2048 uberblock_update:entry 13:53 41539 159 41539 <- updating rpool. txg == blk_birth! uberblock_update:return 13:53 85 1 <- args[1]=1: sync! vdev_uberblock_sync:entry 13:53 sched bdev_strategy:start 13:53 sched 400 1024 bdev_strategy:start 13:53 sched 912 1024 bdev_strategy:start 13:53 sched 390704528 1024 bdev_strategy:start 13:53 sched 390705040 1024 spa_sync:return 13:53 sched vdev_uberblock_sync:entry 13:54 sched bdev_strategy:start 13:54 sched 390 1024 <- write to block #390
bdev_strategy:start 13:54 sched 902 1024 bdev_strategy:start 13:54 sched 39005062 1024 bdev_strategy:start 13:54 sched 39005574 1024 spa_sync:return 13:54 sched spa_sync:entry 14:23 sched 41540 <- spa_sync() rpool uberblock_update:entry 14:23 41539 159 41540 <- updating rpool. txg != blk_birth! uberblock_update:return 14:23 85 0 <- args[1]=0: no sync! skip writing to block #392!
spa_sync:return 14:23 sched spa_sync:entry 14:23 sched 69705 <- spa_sync() dpool-2 uberblock_update:entry 14:23 69704 178 69705 <- updating dpool-2. txg != blk_birth! uberblock_update:return 14:23 85 0 <- args[1]=0: no sync! spa_sync:return 14:23 sched spa_sync:entry 14:23 sched 76459 <- spa_sync() dpool uberblock_update:entry 14:23 75007 122 76459 <- updating dpool. txg != blk_birth! uberblock_update:return 14:23 85 0 <- args[1]=0: no sync! spa_sync:return 14:23 sched spa_sync:entry 14:53 sched 76460 <- spa_sync() dpool uberblock_update:entry 14:53 75007 122 76460 <- updating dpool. txg != blk_birth! uberblock_update:return 14:53 85 0 <- args[1]=0: no sync! spa_sync:return 14:53 sched spa_sync:entry 14:53 sched 69706 <- spa_sync() dpool-2 uberblock_update:entry 14:53 69704 178 69706 <- updating dpool-2. txg != blk_birth! uberblock_update:return 14:53 85 0 <- args[1]=0: no sync! spa_sync:return 14:53 sched spa_sync:entry 14:53 sched 41541 <- spa_sync() rpool uberblock_update:entry 14:53 41539 159 41541 <- updating rpool. txg != blk_birth! uberblock_update:return 14:53 85 0 <- args[1]=0: no sync! skip writing to block #394!
spa_sync:return 14:53 sched spa_sync:entry 15:23 sched 41542 <- spa_sync() rpool uberblock_update:entry 15:23 41539 159 41542 <- updating rpool. txg != blk_birth! uberblock_update:return 15:23 85 0 <- args[1]=0: no sync! skip writing to block #396!
spa_sync:return 15:23 sched spa_sync:entry 15:23 sched 69707 <- spa_sync() dpool-2 uberblock_update:entry 15:23 69704 178 69707 <- updating dpool-2. txg != blk_birth! uberblock_update:return 15:23 85 0 <- args[1]=0: no sync! spa_sync:return 15:23 sched spa_sync:entry 15:23 sched 76461 <- spa_sync() dpool uberblock_update:entry 15:23 75007 122 76461 <- updating dpool. txg != blk_birth! uberblock_update:return 15:23 85 0 <- args[1]=0: no sync! spa_sync:return 15:23 sched (created new file in rpool, using the touch command)

(done creating file)
spa_sync:entry 15:53 sched 76462 <- spa_sync() dpool uberblock_update:entry 15:53 75007 122 76462 <- updating dpool. txg != blk_birth! uberblock_update:return 15:53 85 0 <- args[1]=0: no sync! spa_sync:return 15:53 sched spa_sync:entry 15:53 sched 69708 <- spa_sync() dpool-2 uberblock_update:entry 15:53 69704 178 69708 <- updating dpool-2. txg != blk_birth! uberblock_update:return 15:53 85 0 <- args[1]=0: no sync! spa_sync:return 15:53 sched spa_sync:entry 15:53 sched 41543 <- spa_sync() rpool bdev_strategy:start 15:53 sched 25571658 15360 bdev_strategy:start 15:53 sched 25571688 1024 . ... some more IO of various blocks and sizes . bdev_strategy:start 15:53 sched 25671855 4096 bdev_strategy:start 15:53 sched 25682908 2048 uberblock_update:entry 15:53 41543 159 41543 <- updating rpool. txg == blk_birth! uberblock_update:return 15:53 85 1 <- args[1]=1: sync! vdev_uberblock_sync:entry 15:53 sched bdev_strategy:start 15:53 sched 398 1024 <- write to block #398! bdev_strategy:start 15:53 sched 910 1024 bdev_strategy:start 15:53 sched 39005070 1024 bdev_strategy:start 15:53 sched 39005582 1024 spa_sync:return 15:54 sched
I think this is a very nice example of what you can find out with DTrace. You may ask, "Why did the rpool graph at the top of this blog entry show an update in rpool 2 times per minute whereas the DTrace output shows an update not that often (which, by the way, would further extend the life of the USB flash drive...)?" This is because the output of the zdb script was written to file in /var/tmp every minute, causing a change in rpool more often than during the run time of the DTrace script!


Tuesday Jan 06, 2009

ZFS and the uberblock

Inspired by Constantin's comment on USB sticks wearing out Matthias's blog entry about an eco-friendly home server, I tried to find out more about how and how often the ZFS uberblock is written.

Using DTrace, it's not that difficult:

We start by finding out which DTrace probes exist for the uberblock:

$ dtrace -l | grep -i uberblock
31726        fbt               zfs            vdev_uberblock_compare entry
31727        fbt               zfs            vdev_uberblock_compare return
31728        fbt               zfs          vdev_uberblock_load_done entry
31729        fbt               zfs          vdev_uberblock_load_done return
31730        fbt               zfs          vdev_uberblock_sync_done entry
31731        fbt               zfs          vdev_uberblock_sync_done return
31732        fbt               zfs               vdev_uberblock_sync entry
31733        fbt               zfs               vdev_uberblock_sync return
34304        fbt               zfs          vdev_uberblock_sync_list entry
34305        fbt               zfs          vdev_uberblock_sync_list return
34404        fbt               zfs                  uberblock_update entry
34405        fbt               zfs                  uberblock_update return
34408        fbt               zfs                  uberblock_verify entry
34409        fbt               zfs                  uberblock_verify return
34416        fbt               zfs               vdev_uberblock_load entry
34417        fbt               zfs               vdev_uberblock_load return

So there are two probes on uberblock_update: fbt:zfs:uberblock_update:entry and fbt:zfs:uberblock_update:return!

Now we can find out more about it by searching the OpenSolaris sources: When searching for definition of uberblock_update in project onnv, we find one hit for line 49 in file uberblock.c, and when clicking on it, we see:

source extract: line 49 of file uberblock.c

Now, when searching again for the definitions of the first two arguments (args[0| and args[1|) of uberblock_update (which is uberblock and vdev), we get:

For uberblock, the following hits are shown:

When clicking on the link on the definition of struct uberblock (around line 53 in file uberblock_impl.h), we get:

For the members of struct vdev, it's not that easy. First, we get a long hit list when searching for the definition of vdev in the source browser. But if we search for "struct vdev" in that list, using the browser's search function, we get:


When clicking on the definition of struct vdev (around line 108 in file vdev_impl.h), we can see all the members of this structure.

Here are all the links, plus one more for struct blkprt (a member of struct uberblock), again in one place:

Now we are prepared to access the data via DTrace, by printing the arguments and members as in the following example:

printf ("%d %d %d", args[0]->ub_timestamp, args[1]->vdev_id, args[2]);

So a sample final DTrace script to print out as much information in the event of an uberblock_update as we can, and also print out any relevant I/O (hoping that from showing both at the same time, we can see where and how often the uberblocks are written):

io:genunix:default_physio:start,
io:genunix:bdev_strategy:start,
io:genunix:biodone:done
{
   printf ("%d %s %d %d", timestamp, execname,
     args[0]->b_blkno, args[0]->b_bcount);
}

fbt:zfs:uberblock_update:entry
{
   printf ("%d %s, %d, %d, %d, %d", timestamp, execname,
     pid, args[0]->ub_rootbp.blk_prop, args[1]->vdev_asize, args[2]);
}

The lines for showing the I/O are derived from DTrace scripts for I/O analysis in the DTrace Toolkit

Although I was unable to print out members of struct vdev (the second argument to uberblock_update() ) with the fbt:zfs:uberblock_update:entry probe (I also tried fbt:zfs:uberblock_update:return but had other problems with that one), the results when running this script, using
$ dtrace -s zfs-uberblock-report-02.d
, are quite interesting. Here's an extract (long lines shortened):

  0  33280  uberblock_update:entry 102523281435514 sched, 0, 922..345, 0, 21005
  0   5510     bdev_strategy:start 102523490757174 sched 282 1024
  0   5510     bdev_strategy:start 102523490840779 sched 794 1024
  0   5510     bdev_strategy:start 102523490873844 sched 18493722 1024
  0   5510     bdev_strategy:start 102523490903928 sched 18494234 1024
  0   5498            biodone:done 102523491215729 sched 282 1024
  0   5498            biodone:done 102523491576878 sched 794 1024
  0   5498            biodone:done 102523491873015 sched 18493722 1024
  0   5498            biodone:done 102523492232464 sched 18494234 1024
...
  0  33280  uberblock_update:entry 102553280316974 sched, 0, 922..345, 0, 21006
  0   5510     bdev_strategy:start 102553910907205 sched 284 1024
  0   5510     bdev_strategy:start 102553910989248 sched 796 1024
  0   5510     bdev_strategy:start 102553911022603 sched 18493724 1024
  0   5510     bdev_strategy:start 102553911052733 sched 18494236 1024
  0   5498            biodone:done 102553911344640 sched 284 1024
  0   5498            biodone:done 102553911623733 sched 796 1024
  0   5498            biodone:done 102553911981236 sched 18493724 1024
  0   5498            biodone:done 102553912250614 sched 18494236 1024
...
  0  33280  uberblock_update:entry 102583279275573 sched, 0, 922..345, 0, 21007
  0   5510     bdev_strategy:start 102583540376459 sched 286 1024
  0   5510     bdev_strategy:start 102583540459265 sched 798 1024
  0   5510     bdev_strategy:start 102583540492968 sched 18493726 1024
  0   5510     bdev_strategy:start 102583540522840 sched 18494238 1024
  0   5498            biodone:done 102583540814677 sched 286 1024
  0   5498            biodone:done 102583541091636 sched 798 1024
  0   5498            biodone:done 102583541406962 sched 18493726 1024
  0   5498            biodone:done 102583541743494 sched 18494238 1024

Using the following (n)awk one-liners:

$ nawk '/uberblock/{print}}' zfs-ub-report-02.d.out
$ nawk '/uberblock/{a=0}{a++;if ((a==2)){print}}' zfs-ub-report-02.d.out
$ nawk '/uberblock/{a=0}{a++;if ((a>=1)&&(a<=5)){print}}' zfs-ub-report-02.d.out
, we can print:
  • only the uberblock_update lines, or
  • just the next line after the line that matches the uberblock_update entry, or
  • all 4 lines after that entry, including the entry itself.

When running the script for a while and capturing its output, we can later analyze at which block number the first block after uberblock_update() is written, and we can see that the numbers are always even, the lowest number is 256 and the highest number is 510, with a block size of 1024. Those block numbers always go from 256, 258, 260, and so forth, until they reach 510. Then, they start with 256 again. So every (510-256)/2+1 = 128th iteration (yes, it's one more, as we have to include the first element after subtracting the first from the last element), the first block is overwritten again. The same is true for blocks 768...1022, 18493696...18493950 and 18494208...18494462 (the third and fourth block ranges should be different for different zpool sizes).

Now that we understand how and in which order the uberblocks are written, we are prepared to examine after how many days the uberblock area of a USB stick without wear leveling would probably be worn out. More on that and how we can use zdb for that, in my next blog entry.

Some more links on this topic:

Monday Jan 05, 2009

A compact primer on RBAC

A nice and compact primer on RBAC (Role-Based Access Control), with some examples for a quick start, is available here.

Friday Jan 02, 2009

ZFS on external disks: Why free disk space might not increase when deleting files...

Interesting: After copying about 30,000 files from an internal disk (OpenSolaris 2008.11) to a new directory of an external USB disk and later removing a similar number of files, the free disk space as reported by zpool status or in the column "Available" of df -k did not increase! And the df -k output showed that the number of total blocks of the only file system on that disk had decreased! What had happened?

Here's the explanation:

For backup purposes, I wanted to use an external 2.5" USB disk with a capacity of about 186GB. On that USB disk, I had created a zpool, using:
$ zpool create -f dpool-2 c5t0d0

For a first backup, I copied the mentioned 30,000 files with a total size of about 75GB from one system to a USB disk connected to another server (running OpenSolaris 2008.11 as well) via the network. I used a simple scp -pr which preserves modes and times but not uids and gids. One possible procedude for preserving file ownership as well, as mentioned here, for example, would be:
cd source_dir; tar -cf - .|ssh user@targethost "cd target_dir; tar -xvf -".

Anyway, I thought it would be a good idea to export that USB disk and import it on another server (also on OpenSolaris 2008.11) to check if there were any problems. I also wanted to run a
$ zpool scrub dpool-2
to verify the data integrity. Everything went smoothly.

OK. So I exported the zpool again, connected the disk to the server on which the original data is located, imported the zpool and started copying the same data which I had copied before (via the network) locally, using
$ tar -cf - | ( cd /dpool-2; tar -xpf - )

As this finished without errors as well, and a comparison using /usr/bin/diff or the default /usr/gnu/bin/diff only showed a problem with one of the files that had an interesting character in its file name (the problem went away after changing LANG from en_US.UTF-8 to C), I decided to remove the old 30,000 files (the ones which I had copied via the network). I had removed files on ZFS before, using rm -rf, and it always was amazingly fast, compared to PCFS or UFS. But this time, removing those files took quite a while, with some disk activity.

After all the "old" files were removed, I discovered the following:

  • Free space as in df -k or zpool status dropped from 112GB to 36GB!
  • Used space as in zpool status increased from 75GB to 150GB!
  • df -k reported a total size of the only file system on that zpool of just 108GB, compared to 183GB as it was reported before!

The solution was to execute the following command:
$ zfs list -t snapshot
The output showed several snapshots, taken during the time the disk was connected to the second server (on which I had set up time slider for creating automated snapshots). When I connected the disk to the final server, the snapshots were still there, and when removing files, the snapshots were still valid so it would have been possible to create the deleted files again. So the easy solution for recreating the free space again was just to remove all those snapshots, using zfs destroy, as in the following example:
$ zfs destroy dpool-2@zfs-auto-snap:frequent-2009-01-02-18:45
After the final snapshot for dpool-2 was removed, things showed up in df -k and zpool status as expected. "Problem" solved!

Wednesday Dec 31, 2008

A great start into the new year 2009!

I am sure you agree - at least after watching the video on Brendan's blog entry about I/O performance analysis on the Sun Storage 7000 series. The direct link is here.

Happy New Year to everyone!

Tuesday Dec 30, 2008

Multibooting OpenSolaris 2008.11, Kubuntu 8.10, and Windows XP (one disk)

A while after I published my blog entries on Multibooting Solaris and Windows XP and on Multibooting Solaris, Ubuntu, and Windows XP, I thought it was time for some instructions on multibooting OpenSolaris 2008.11, Kubuntu 8.10, and Windows XP.

On my laptop, a Fujitsu-Siemens S6120D (Pentium M 1.4 GHz, 512 MB RAM, 40 GB IDE disk), I wanted to replace my old Solaris Nevada installation by the new OpenSolaris 2008.11, and then also an old Ubuntu 6.10 by Kubuntu 8.10.

After the OpenSolaris installation, I could boot into OpenSolaris and Windows XP just fine. But when I installed Kubuntu (using ext3 as the / filesystem), with the default install options, the grub that had been installed by OpenSolaris had been overwritten by Kubutu's grub. Even after adding old OpenSolaris entries to the new grub, booting into OpenSolaris failed. After installing OpenSolaris again, that problem was solved, but I could not boot into Kubuntu! The following message was shown on screen:

  Booting 'Ubuntu 8.10, kernel 2.6.27-7-generic'

root            (hd0,6)
 Filesystem type is ext2fs, partition type 0x83
kernel          /boot/vmlinuz-2.6.27-7-generic root=UUID=69585a5b-ba20-43da-984c-148c40f82a7e ro quiet splash

Error 2: Bad file or directory type

Press any key to continue...

Apparently, neither of the two grub installations could read the root filesystem of its "opposite" operating system. The solution for this problem in my case was to chainload the Linux grub from the Solaris grub. I used the following procedure after installing OpenSolaris:

  1. Live boot from the Kubuntu CD and install, or use the "install to disk" menu item (2nd menu item in Kubuntu's entry screen).
  2. Configure all necessary settings, as in the following screen shots:

    Start by choosing the installer's language:



    Select a city close to your desired time zone:



    Select a keyboard layout:



    In the Prepare disk space screen, a proposal like the following might be shown:



    Select the manual partitioning feature to overwrite the old Linux partition:



    The Prepare partitions screen is shown, for example:



    Customize that screen according to your needs, like the following:



    Enter data into all required fields of the Who are you? screen:



    The Ready to install screen is shown. Please DO NOT CLICK on the Install button at this point in time. Instead, click on Advanced:


    This will show up the following screen:


    Change the default Device for boot loader installation to the device name of the Linux partition as shown in the partitioning screen (in my case /dev/sda7). This is to install the boot manager not in the master boot record but in the partition in which Kubuntu is installed.
    Press OK to continue, and then Install in the screen that was shown before (Step 6 of 6). This will install Kubuntu.
  3. After the installation has finished, reboot into OpenSolaris and change file
    /rpool/boot/grub/menu.lst (yes, that's the location of menu.lst in OpenSolaris 2008.11) as follows:
    Add the following lines to boot from the OpenSolaris grub into the Ubuntu grub:
    menu Kubuntu 8.10
    root (hd0,6)
    chainloader +1
    Note that (hd0,6) corresponds to partition number 7 of the first disk (=device /dev/sda7 in my case).
  4. Reboot. You should now be able to boot into Kubuntu's grub and from there into the various menu items configured during the Kubuntu installation (Kubuntu, Kubuntu failsafe/single user, memtest, and even Windows XP).
If it doesn't work (e.g. if you installed the Kubuntu grub boot manager to the master boot record), you do not have to install Solaris again. Just do the following to overwrite the Kubuntu grub by the Solaris grub:
  1. Boot from the OpenSolaris 2008.11 CD.
  2. After booting has finished, open a terminal window and log in as root (password is "opensolaris").
  3. Run the following command:
    $ format
    Notice the device name (e.g. c3t0d0). Install the Solaris grub into the master boot record, using the "-m" option of the installgrub command. For the last argument, add s0 to the device name: /usr/sbin/installgrub -m /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c3t0d0s0
    This will install the boot manager into the master boot record.
  4. Reboot. You should now be able to boot into Solaris or Windows.
  5. Boot into Solaris and add the Kubuntu lines as mentioned above.
  6. If the Kubuntu grub had not been installed into the Kubuntu partition before, you might be able to do this from the Kubuntu live CD or a Linux rescue CD (I haven't tried that yet). If not, install Kubuntu again using the steps described in the first part of this blog entry.

Some remarks:

This is how the two FAT partitions look like after mounting them in OpenSolaris:

/dev/dsk/c3d0p0:1        55958     34547     21411  62% /p/sda1
/dev/dsk/c3d0p0:2      8481752   7093052   1388700  84% /p/sda8

These are the /etc/vfstab entries for mounting the two FAT partitions from OpenSolaris:

/dev/dsk/c3d0p0:1  /dev/rdsk/c3d0p0:1  /p/sda1  pcfs  2  yes  -
/dev/dsk/c3d0p0:2  /dev/rdsk/c3d0p0:2  /p/sda8  pcfs  2  yes  -

This is how my partition table looks like, from a Linux fdisk /dev/sda point of view:

Disk /dev/sda: 40.0 GB, 40007761920 bytes
255 heads, 63 sectors/track, 4864 cylinders
Units = cylinders of 16065 \* 512 = 8225280 bytes
Disk identifier: 0xbb88bb88

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   \*           1           7       56196    6  FAT16
/dev/sda2               8        1224     9775552+  bf  Solaris
/dev/sda3            1225        4864    29238300    5  Extended
/dev/sda5            1225        3100    15068938+   7  HPFS/NTFS
/dev/sda6            3101        3164      514048+  82  Linux swap / Solaris
/dev/sda7            3165        3806     5156833+  83  Linux
/dev/sda8            3807        4864     8498353+   b  W95 FAT32

Please back up all your files before installing a new OS to your system! And as always in my blog, no guarantee, warranty, or similar, for anything! This is just a (hopefully error-free) documentation of how things worked for me. Maybe you'll find it useful.

Thursday Dec 25, 2008

OpenSolaris 2008.11: useful packages to install

Just a list of OpenSolaris packages I often install (using the pkg install command or the Package Manager, or the "Add More Software" icon) after OpenSolaris 2008.11 installation:

Package nameContents
openofficeOpenOffice
SUNWgnome-img-editorGIMP, the image manipulation program
SUNWvncviewerVNC Viewer
SUNWtvncTight VNC
SUNWcurlCurl
SUNWgnome-config-editorThe Gnome configuration editor
SUNWj6devJava 6 SDK
sunstudioexpressSun Studio (Compiler and Tools)
SUNWgccGNU Compiler
SUNWgnome-common-develGnome Development Tools
SUNWapch22Apache 2.2 web server
ampAMP (Apache, MySQL, PHP) development kit
amp-devAMP development cluster
SUNWgnu-mcGNU Midnight Commander

For unattended zones installation, some more packages should be installed if the zone locale is not "C": This is done in file /usr/lib/brand/ipkg/pkgcreatezone. I added the following lines after line 179:

#
# Get some locale and language packages:
#
pkglist="$pkglist SUNWloc SUNWlang-enUS"

Multiple versions of Firefox packages on one Solaris system

Up to now, in cases where I had to use multiple versions of Firefox on the same Solaris system, I installed the latest version as pkgadd, and the other versions as tar, for example in /usr/local. This is because the Solaris contributed builds always are published under the same name (which is probably OK for most users).

But it is also possible to install multiple versions of Firefox (3.0.5, 3.0.4, 2.0.0.19, and so forth) on the same system, after extracting and then changing each package before installation. Here's how it works:

  1. Expand the pkg.bz2 file to pkg, using bzip2 -dc.
  2. Create a temporary directory where the package content will be stored.
  3. Using the -s pkgadd option, spool (extract) the package into this directory.
  4. In the spool directory, rename the old Firefox directory to a new name (for example firefox to firefox-305).
  5. Change the pkginfo file to the new package name (e.g. MOZfirefox to MOZfirefox-305).
  6. Change the pkgmap file to match the new directoy name and the new size and checksum of the pkginfo file.
  7. Install the renamed package, using pkgadd -d . MOZfirefox-305, for example.
Here's how it looks like after the installation:
/tmp# ls -ald extr\*
drwxr-xr-x 3 root root 188 2008-12-23 22:11 extracted-firefox-20081223-221103
drwxr-xr-x 3 root root 188 2008-12-23 22:15 extracted-firefox-20081223-221529
drwxr-xr-x 3 root root 190 2008-12-24 12:02 extracted-firefox-20081224-120152
/opt/sfw/lib# pkginfo | grep MOZ
application MOZfirefox-20019                firefox 2.0.0.19 for Solaris 10
application MOZfirefox-304                  firefox 3.0.4 for Solaris 11
application MOZfirefox-305                  firefox 3.0.5 for Solaris 11
/opt/sfw/lib# ls -l
drwxr-xr-x 13 root bin 46 2008-12-24 12:02 firefox-20019
drwxr-xr-x 13 root bin 47 2008-12-23 22:15 firefox-304
drwxr-xr-x 13 root bin 47 2008-12-23 22:15 firefox-305

I wrote a script which performs all these steps (except installation). I have tested it on various Solaris releases from Solaris 10 to OpenSolaris. So far, I have not seen any errors.

As always: No guarantee or similar. Please back up your files before using the script.

Monday Dec 22, 2008

OpenSolaris 2008.11: More on unattended zones installation and the sysidcfg file

After my posting about a new zone for a web server, I did some more tests with the sysidcfg file and found that the following entries work (thanks, Jerry, for sending me a simple sample sysidcfg file):

system_locale=C
works fine.

system_locale=en_US.UTF-8
works after I added the following to file /usr/lib/brand/ipkg/pkgcreatezone in the global zone (after line 179) before starting the installation of the zone:

#
# Get some locale and language packages:
#
pkglist="$pkglist SUNWloc SUNWlang-enUS"

I found out the packages by looking into /usr/share/locale and /usr/lib/locale and searching for locales using (for example):
$ pkg search en_US.UTF-8

An example of a package that contains several languages is: SUNWinstall-l10n.


root_password=(short password string from Solaris 10's /etc/shadow file)
works fine. The old password string will be put into the new /etc/shadow file of the 2008.11 zone but can later be changed from the command line (using the passwd command) and will then be replaced by a long string.


keyboard=...
is apparently not needed, as the terminal and keyboard works fine for me even without that parameter.


name_service=NONE
works fine.

name_service=DNS {domain_name=domain.TLD
   name_server=DNS_IP_ADDR_1,DNS_IP_ADDR_2
   search=domain.TLD
}
works fine as well. In case the system cannot verify some of the information, it displays just one confirmation screen (as usual, after configuring the same settings by hand) and after choosing "No" to not enter the information again and using <esc>-2, the rest of the sysidcfg configuration remains intact and will not be asked again, and shortly afterwards the login prompt is displayed.


terminal=xterms
works fine.

Sunday Dec 21, 2008

Building an energy efficient and virus-free office PC with OpenSolaris 2008.11

I chose the Intel D945GCLF board with an Intel Atom processor 230 with a TDP of just 4W. The northbridge/GPU apparently needs more power, so it is cooled with a fan whereas the CPU does not need one. Unfortunately, the fan is really loud. The following picture shows the original configuration with an additional case fan for CPU and RAM:

I looked at various northbridge coolers (for example the Zalman ZM-NBF47, mentioned in this blog entry by Matthias Pfützner) but then found a much cheaper and very elegant solution: I just removed the existing northbridge/GPU fan and installed a standard 80mm quiet fan (about 1000/min) on top of the cooler, using just one of the two cooler holes by mounting the fan with a 3mm screw and nut:

I also added a small part of a plastic packaging to the fan so more air would flow downwards towards the CPU cooler (this reduced the CPU temperature by 2 to 3 degrees Celsius):

The whole PC with 2 GB RAM and DVD-ROM drive needs just 30W which is OK, I think.

For the OpenSolaris 2008.11 installation, my first attempt was to install it on an external 200GB USB drive. But contrary to my installation experience on my Intel D201GLY2A (see my other blog entry), this time the USB ports got renamed at every next cold boot attempt, so I had to try all USB ports until the disk drive was found. Which is really funny if your USB disk has two connectors (an additional one for extra power). The good thing was that the power supply of the case had an additional SATA power outlet. And as the D945GCLF mainboard has two SATA connectors, and my external USB disk was a SATA type disk drive, I could remove the disk from its USB case and install it inside the computer case. As it only has 5.25" and 3.5" drive bays, I attached an additional u-shaped sheet of metal in the 5.25/3.5" drive bay so that the 2.5" disk is supported from below and has a solid, flat metal surface to lay on, and the two mounting screws on just one side of the disk are sufficient to keep it in place:



Installation went smoothly from the OpenSolaris 2008.11 CD. Network and Sound worked right from the start. Up to now, I only had a problem with printing from Firefox: The fonts were either pixel fonts with a very coarse resolution or the font hinting was not implemented (the spaces between characters did not look correctly). I have not yet analyzed that problem so far.

About

blogfinger

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today