Wednesday Jan 28, 2009

Cardiff System Administration Mash up

Big thank you Clive for arranging and Cardiff University for hosting the System Admin Mash up today. I was paritcularly pleased to have 100% of the Cardiff OpenSolaris User group present and to hear of people exploring COMSTAR on Thumpers to produce high performance OpenStorage at a very affordable price.

Lewis gave an excellent overview of the VSCAN & CIFS server with some brave demostrations using VirtualBox to host servers and clients. If you can persuade him to repeat it then do so.

However good though that was, for me, that was not the highlight of the day. That goes to Gwent Police Crime Forensics Unit who gave a excellent and and informative presentation about the challenges and successes of investigating issues around computer forensics. As storage devices get bigger the problems can only increase.

Tuesday Jan 27, 2009

New version of scsi.d required for build 106

This version supports some more filters. Specifically you can now specify these new options:

  • MIN_BLOCK only report on IO to less than or equal to this value.

  • MAX_BLOCK only report on IO to blocks greater or equal to this value.

This is most useful for limiting your trace to particular block ranges, be they file system or as was the case that caused me to add this to see who is trampling on the disk label.

In this contrived example it was format:

pfexec /usr/sbin/dtrace -o /tmp/dt.$$ -Cs  scsi.d -D MAX_BLOCK=3 
00058.529467684 glm0:-> 0x0a  WRITE(6) address 01:00, lba 0x000000, len 0x000001
, control 0x00 timeout 60 CDBP 60012635560 1 format(3985) cdb(6) 0a0000000100
00058.542945891 glm0:<- 0x0a  WRITE(6) address 01:00, lba 0x000000, len 0x000001
, control 0x00 timeout 60 CDBP 60012635560, reason 0x0 (COMPLETED) pkt_state 0x1
f state 0x0 Success Time 13604us

While this answered my question there are neater ways of answering the question just by using the IO provider:

: TS 68 $; pfexec /usr/sbin/dtrace -n 'io:::start / args[0]->b_blkno < 3 && args[0]->b_flags & B_WRITE / { printf("%s %s %d %d", execname, args[1]->dev_statname, args[0]->b_blkno, args[0]->b_bcount) }'
dtrace: description 'io:::start ' matched 6 probes
CPU     ID                    FUNCTION:NAME
  0    629             default_physio:start format sd0 0 512
  0    629             default_physio:start format sd0 0 512
  0    629             default_physio:start format sd0 0 512
  0    629             default_physio:start format sd0 0 512
  0    629             default_physio:start format sd0 0 512
  0    629             default_physio:start format sd0 0 512

Also build 106 of nevada has changed the structure definition for scsi_address and in doing so this breaks scsi.d which has intimate knowledge of scsi_address structures. I have a solution that you can download but in writing it I also filed this bug:

679803 dtrace suffers from macro recursion when including scsi_address.h

which scsi.d has to work around. When that bug is resolved the work around may have to be revisited.

All versions of scsi.d are available here and this specific verison, version 1.16 here.

Thank you to Artem Kachitchkine for bringing the changes to scsi_address.h and their effects on scsi.d to my attention.

Thursday Jan 22, 2009

Powerline network control from OpenSolaris

At last I can configure my ethernet over power devices from OpenSolaris. I do have to run Windows XP in VirtualBox in seamless mode with a virtual network adapter but you can hardly tell.

My requirement for Windows on the bare metal has just disappeared.

Wednesday Jan 07, 2009

Access hours for Sun Ray users

Having installed a Sun Ray in my daughters bedroom I am now faced with the inevitable problem of her being online all night not getting any sleep and then being generally grumpy. The irony here is that I was sent an email asking how I handle access control to the DTUs and I said I just trusted the children to be sensible (what was I thinking!).

So a solution was required that gave access to the systems only between certain hours. The hours would depend on the user and would have to not loose all their “work” in case this was a late night finishing their homework session.

After asking around no one came back to me and said how it can be done so I wrote my own script. It works by having a file that contains lines with a format


The times are specified in 24 hour format and only accurate to the minute.

# cat /etc/opt/local/access_hours             

The top line is just really for testing only not allowing access from 1900 to 1915. Then you need a user who has system admin privs which does not have a crontab file. Since I already have a kroot role I'm overloading this. Running the script as with the -c flag and the name of the user will write the crontab file. Note it also writes an entry to keep the crontab file uptodate on an hourly basis.

# /usr/local/sbin/check_access_hours -c kroot
# crontab -l kroot
46 \* \* \* \* /usr/local/sbin/check_access_hours -c kroot
00 19 \* \* \* /usr/local/sbin/check_access_hours user1
00 23 \* \* \* /usr/local/sbin/check_access_hours user2
30 22 \* \* \* /usr/local/sbin/check_access_hours user3
00 20 \* \* \* /usr/local/sbin/check_access_hours user4

Finally I added a line to the utaction script that is already run for every user when they connect to a Sun Ray DTU:

if ! /usr/local/sbin/check_access_hours -t 0 $1
        exit 1

The way it disallows access is that it adds the DTU's IP address to the ipfilter, which you have to have configured, so that all traffic from the DTU is blocked. It also submits an at(1) job to run 2 minutes in the future to remove the block so that the Sun Ray can burst back into life. The effect is that the user can no longer use any Sun Ray outside of the defined hours. But after about 2 minutes the DTU is usable again by others or indeed as a photo frame.

A word of warning. Having got all this running the system has paniced twice which is disappointing on one level, that it panics, but pleasing on another, I've found a bug that can now be fixed. The bug is:

6791062: System panic in ip_tcp_input when a rule is added to ipfilter

I look forward to the fix!

The script is here but check that that bug has been fixed before you use it.

Saturday Jan 03, 2009

Making the zfs snapshot service run faster

I've not been using Tim's auto-snapshot service on my home server as once I configured it so that it would work on my server I noticed it had a large impact on the system:

: pearson FSS 15 $; time /lib/svc/method/zfs-auto-snapshot \\

real    1m22.28s
user    0m9.88s
sys     0m33.75s
: pearson FSS 16 $;

The reason is two fold. First reading all the properties from the pool takes time and second it destroys the unneeded snapshots as it takes new ones. Something the service I used cheats with and does only very late at night. Looking at the script there are plenty of things that could be made faster and so I wrote a python version that could replace the cron job and the results , while and improvement were disappointing:

: pearson FSS 16 $; time ./ \\

real    0m47.19s
user    0m9.45s
sys     0m31.54s
: pearson FSS 17 $; 

still too slow to actually use. The time was dominated by cases where the script could not use a recursive option to delete the snapshots. The problem being that there is no way to list all the snapshots of a filesystem or volume but not it's decendents.

Consider this structure:

# zfs list -r -o name,com.sun:auto-snapshot tank
NAME                                  COM.SUN:AUTO-SNAPSHOT
tank                                  true
tank/backup                           false
tank/dump                             false
tank/fs                               true
tank/squid                            false
tank/tmp                              false

The problem here is that the script wants to snapshots and clean up “tank” but can't use recustion without backing up all the other file systems that have the false flag set and set for very good reason. Howeve If I did not bother to snapshot “tank” then tank/fs could be managed recusively and there would be no need for special handling. The above list does not reflect all the file systems I have but you get the picture. The results of making this change brings the timing for the service

: pearson FSS 21 $; time ./ \\

real    0m9.27s
user    0m2.43s
sys     0m4.66s
: pearson FSS 22 $; time /lib/svc/method/zfs-auto-snapshot \\

real    0m12.85s
user    0m2.10s
sys     0m5.42s
: pearson FSS 23 $; 

While the python module still gets better results than the korn shell script the korn shell script does not do so badly. However it still seems worthwhile spending the time to get the python script to be able to handle all the features of the korn shell script. More later.

Thursday Jan 01, 2009

Http proxy in a zone

Now that the new crossbow networking stack is in OpenSolaris I have been able to configure a transparent proxy server for the Sun Ray users. By having a zone act as the only route from the internal network the internet all the http traffic can now go through the proxy and hence benefit from the cache and all in one box.

Now all traffic from the internal network gets a default router of the squid zone's vnic0 from dhcp and the global zone routes via in internal network that I have called dmz0 to the squid zone. The internal network is not absolutley needed as the global zone could route via the internal network but some how that does not seem such a good set up. I have the naming of the vnics not quite the way I want it but that is really just cosmetic.

Here are the virtual nics:

: pearson FSS 3 $; pfexec dladm show-vnic        
LINK         OVER         SPEED  MACADDRESS           MACADDRTYPE         VID
vnic0        nge0         1000   2:8:20:b2:86:2       random              0
sshnic0      rtls0        100    2:8:20:2c:d7:cf      random              0
dmzpearson0  dmz0         0      2:8:20:ce:2e:43      random              0
dmzsquid0    dmz0         0      2:8:20:20:a2:69      random              0
: pearson FSS 4 $; 

and this is the configuration for the zone:

: pearson FSS 8 $; pfexec zonecfg -z squid info net
	address not specified
	physical: vnic0
	defrouter not specified
	address not specified
	physical: rtls0
	defrouter not specified
	address not specified
	physical: dmzsquid0
	defrouter not specified
: pearson FSS 9 $; 

Then in the zone I have ipfilter configured to handle the usual NAT and also to forward web traffic to the proxy:

: pearson FSS 10 $; pfexec zlogin squid cat /etc/ipf/ipnat.conf   
# First the usual NAT entries to handle everything going out
map rtls0 ->
map rtls0 ->
# These next two lines forward traffic to port 80 to the transparent
# web proxy that is running in this zone
rdr vnic0 port 80 -> port 3128 tcp
rdr dmzsquid0 port 80 -> port 3128 tcp
: pearson FSS 11 $; 

Then remember to configure squid to accept the transparent proxy by adding the transparent line to the http_port option:

: pearson FSS 12 $; pfexec zlogin squid grep \^http_port /etc/squid/squid.conf
http_port 3128 transparent
http_port 8080
: pearson FSS 13 $;

Finally I had to remember to use routeadm(1m) to turn on routing in the zone, which was the first time I had run that command. No more messing around with files in /etc just run "routeadm -u -e ipv4-forwarding" to enable it in the zone and I was done.

All in all the solution is pretty pleasing.

Wednesday Dec 24, 2008

Timezone aware cron finally pushed to OpenSolaris

With this “push” yesterday:

changeset:   8439:51a23ac0d2a6
user:        Chris Gerhard <>
date:        Tue Dec 23 15:44:14 2008 +0000
files:       usr/src/cmd/cron/Makefile usr/src/cmd/cron/cron.c usr/src/cmd/cron/cron.h usr/src/cmd/cron/crontab.c usr/src/cmd/cron/funcs.c
PSARC/2007/503 crontab entry environment variables
6518038 cron & crontab should support multiple timezones

OpenSolaris finally contains a version of cron that understands and correctly handles having different timezones. You can also specify a different home directory (useful when you don't want NFS to get involved in your cron job for any reason) and shell to run jobs in. It should be in build 106 of OpenSolaris & Nevada.

This brings you crontab in line with at(1) which has been timezone aware for some time.

To use simply set the variables HOME, TZ and SHELL in your crontab file and all subsequent lines will use those values until the next HOME, TZ and SHELL lines are found:

23 \* 1-9,11-26,28-29,31 2-10,12 \* exec /var/tmp/cron/ 23 \\\* 1-9,1
1-26,28-29,31 2-10,12 \\\* Africa/Abidjan
3 0-7,9-10,12-22 1-6,8-9,11-21,24-26,28 1-7,9-10,12 \* exec /var/tmp/cron/crontes 3 0-7,9-10,12-22 1-6,8-9,11-21,24-26,28 1-7,9-10,12 \\\* Africa/Abidjan
37 0-2,4-5,7-17,20-22 \* \* \* exec /var/tmp/cron/ 37 0-2,4-5,7-17,2
0-22 \\\* \\\* \\\* Africa/Accra

Tuesday Dec 16, 2008

The best code is code you have forgotten.

Here is some shell code I had forgotten about:

if (( $(date +%m) == 12 ))
        ( while  (( $(date +%d) >= 15 ))
        test -f /var/run/snow-opts && snow_opts=$(</var/run/snow-opts)
        /opt/csw/bin/xsnow ${snow_opts:--santa 2}
        done ) &

all for the the Sun Ray picture frames. The kids, all of us, love it.

Friday Dec 12, 2008

Don't reboot to add swap on solaris.

Today I wish I had blogged this earlier after the second person who should have known better suggested that you need to reboot a system to increase the swap space when using ZFS for swap. The second one asserted it in such a way that the system I was using was rebooted!

Just like when we did not use zvols for swap you always had the option to have multiple swap devices that option is still there if you are using zvols. You can have multiple zvols just create them and add them to the vfstab so that when you do reboot you still have them. If however you don't like that you can always remove the existing swap device and add a new one of the new size.

Or if you really only want one swap volume and don't want to edit /etc/vfstab and like to live a little bit on the edge you can grow the swap device and then add the extra space as a new swap volume.

cjg@brompton:~$ swap -l
swapfile dev swaplo blocks free
/dev/zvol/dsk/rpool/swap 182,1 8 4126712 4111472
cjg@brompton:~$ pfexec zfs set volsize=4G rpool/swap
cjg@brompton:~$ pfexec zfs set reservation=4G rpool/swap
cjg@brompton:~$ pfexec env NOINUSE_CHECK=1 swap -a /dev/zvol/dsk/rpool/swap $((8+4126712))
cjg@brompton:~$ swap -l
swapfile dev swaplo blocks free
/dev/zvol/dsk/rpool/swap 182,1 8 4126712 4111472
/dev/zvol/dsk/rpool/swap 182,1 4126720 4261888 4261888

When the system is finally rebooted it will end up with just a single swap device again.

Obviously the correct answer is that the operating system should just “do the right thing” and to that end I have filed this RFE: 6783886The bug should turn up on in the next few days/hours.

Tuesday Dec 09, 2008

Automounter verbose toggle and GNU ls

As I have mentioned before you can turn automounters verbose mode on and off by accessing the file "=v" as root in the root of an automount point. Typically this means "/home/=v" so I would usually do:

# ls /home/=v

and the logging would burst into life. However you need to be sure you don't use the gnu ls (which is the default on OpenSolaris) as if you do you will see in the log file this:

t4 Automountd: verbose on 
t4 Automountd: verbose off 

The reason is clear when you truss the ls:

cjg@brompton:/var/crash/brompton$ pfexec truss -o /tmp/tr ls /home/=v
ls: cannot access /home/=v: No such file or directory
cjg@brompton:/var/crash/brompton$ egrep =v /tmp/tr
stat64("/home/=v", 0x0807A20C)                  Err#2 ENOENT
lstat64("/home/=v", 0x0807A20C)                 Err#2 ENOENT

It accesses the file twice, so toggles verbose mode on and then off. I think this is a bug in the gnu ls since if they did lstat64 first and it returned ENOENT they would not need to do the stat64 at all. Anyway the solaris ls does the right thing:

cjg@brompton:/var/crash/brompton$ pfexec /usr/bin/ls /home/=v
/home/=v: No such file or directory
cjg@brompton:/var/crash/brompton$ tail -1 /var/svc/log/system-filesystem-autofs:default.log
t4      Automountd: verbose on

Sunday Dec 07, 2008

Goodbye portable computer. Hello Laptop

It has taken me a while to realise that my old Toshiba Tecra M2 was not a laptop but was instead a portable computer. The realisation started to happen when I first closed the lid on my new Toshiba Tecra M9 and it hybernated and each time I opened it and all was still well the change began to dawn on me.

I'd been happy enough using OpenSolaris on the M2 (and Solaris before that) having tuned it to boot as fast as it could and in most scenarios it was fine. I contented myself that the bugs I filed improved the product and I got used to the portable computer. Great in a hotel room, ok for a presentation, less useful in an airport or on a train.

The M9 on the other hand, with OpenSolaris 2008.11 is a real laptop and it is suspend and resume that makes it so.

Also the list of things that don't work is much smaller than the list of things that do. Now some of them (like the SD slot) may well now work on the M2 as well I've not tried recently since it never used to so I just used the USB card reader out of habit. So far the list of things I would like to work but don't are:

  1. The volume control knob. I actually liked the M2's use of a real volume control knob so when I turned it down it was really down.

  2. Plugging a headphones into the socket does not disconnect the speakers. This makes using VOIP hard to use which is a shame as the VOIP client seems to work quite well.

  3. I would also like better video support, mpeg4 & DVD but mostly that is not for work but entertainment.

Thats it. Well that is it for now.

VirtualBox on home Sun Ray server

I'm after best practices for VirtualBox on a home Sun Ray server. My solution is to have a “vbox” role and create a VirtualBox named after each user contianing the OS that they need. For most users this there is no need as everything they need is available natively on Solaris but there are some apps that only work on Windows so for that user they get those apps.

The upside of this is that I get to manage the images (and since I will have to fix them that is good). Plus I can pause and VM when the user removes their card by having my utdetach script do:

su - vbox -c "pfexec VBoxManage controlvm $1 pause" > /dev/null

and then the utattach script do:

su - vbox -c "pfexec VBoxManage controlvm $1 resume" > /dev/null

So that the Virtual Machines are not burning resources when they need not be. The temptation to also do:

su - vbox -c "pfexec VBoxManage snapshot $1 take $(date '+%F-%T')”

in the detach is strong but I need to better understand the disk space implications of that and whether letting ZFS handle that would be better.

Friday Dec 05, 2008

Who forgot to update named.root ?


So the latest script added to my list of cron jobs running on my home server is:



trap "cd / && test -d $TMPDIR && rm -r $TMPDIR" EXIT
mkdir $TMPDIR || exit
cd $TMPDIR || exit

wget -q
if ! cmp -s named.root /var/named/db.cache
        mv named.root ~
        mailx -s "new named.root file " $LOGNAME << EOF

There is a new named.root file in ~/named.root it needs to be verified and then
installed as /var/named/db.cache.


All after the internet broke in the house. My initial suspicion that one of the kids had typed google into google was wide of the mark. My old named.root file was old, really, really old. So now I'm checking once a month that my named.root file is upto date

Thursday Dec 04, 2008

Doing more with less

As I have mentioned before I have an ancient Sun Ray 1 that drives the TV in our kitchen to look like a photo frame. The network is provided my an ethernet over mains bridge that is rated at 85Mbit/sec and the network drop from the server is a 1Gbit/sec. Since the switch I have is very cheap this results in a significant packet drop to that DTU with the result that the picture transition is less than ideal and can stutter somewhat.

So last night after reading an email on an internal list I finally got around to reading the documentation so I could set a bandwidth limit on this one DTU to see if things could be improved, With no bandwidth limit the very excellent utbw gives:

 lost      0/00% pkts     62 cpu   0% kbytes     53 0.021 Mbps 4.3(4.2) ms
 lost   1243/46% pkts   2652 cpu  11% kbytes   1614 0.631 Mbps 4.9(4.6) ms
 lost      0/00% pkts     60 cpu   3% kbytes     51 0.020 Mbps 4.1(4.3) ms
 lost      0/00% pkts     64 cpu   2% kbytes     55 0.022 Mbps 4.8(4.6) ms
 lost      0/00% pkts     64 cpu   2% kbytes     56 0.022 Mbps 4.2(4.7) ms
 lost      0/00% pkts     60 cpu   2% kbytes     51 0.020 Mbps 4.3(4.5) ms
 lost      0/00% pkts     62 cpu   3% kbytes     53 0.021 Mbps 4.9(4.7) ms
 lost    266/11% pkts   2393 cpu   6% kbytes   2314 0.904 Mbps 4.4(4.6) ms
 lost      0/00% pkts     60 cpu   2% kbytes     51 0.020 Mbps 4.4(4.5) ms
 lost      0/00% pkts     62 cpu   2% kbytes     53 0.021 Mbps 4.7(4.6) ms
 lost      0/00% pkts     64 cpu   3% kbytes     56 0.022 Mbps 4.1(4.3) ms
 lost      0/00% pkts     58 cpu   0% kbytes     48 0.019 Mbps 4.4(4.4) ms
 lost      0/00% pkts     62 cpu   2% kbytes     53 0.021 Mbps 6.0(5.2) ms
 lost    229/09% pkts   2377 cpu   8% kbytes   2320 0.907 Mbps 4.7(4.9) ms
 lost      0/00% pkts     60 cpu   2% kbytes     51 0.020 Mbps 4.1(4.5) ms
 lost      0/00% pkts     62 cpu   0% kbytes     53 0.021 Mbps 4.1(4.3) ms
 lost      0/00% pkts     64 cpu   2% kbytes     56 0.022 Mbps 4.2(4.2) ms
 lost      0/00% pkts     60 cpu   2% kbytes     51 0.020 Mbps 4.2(4.2) ms
 lost      0/00% pkts     62 cpu   0% kbytes     53 0.021 Mbps 4.4(4.3) ms
 lost    597/23% pkts   2532 cpu   9% kbytes   2123 0.830 Mbps 4.2(4.3) ms
 lost      0/00% pkts     60 cpu   2% kbytes     51 0.020 Mbps 4.1(4.2) ms
 lost      0/00% pkts     62 cpu   3% kbytes     53 0.021 Mbps 4.1(4.1) ms
 lost      0/00% pkts     64 cpu   0% kbytes     56 0.022 Mbps 5.1(4.6) ms
 lost      0/00% pkts     60 cpu   2% kbytes     51 0.020 Mbps 4.1(4.4) ms
 lost      0/00% pkts     62 cpu   2% kbytes     53 0.021 Mbps 4.3(4.4) ms
 lost      0/00% pkts   2133 cpu   8% kbytes   2322 0.907 Mbps 4.1(4.2) ms
 lost      0/00% pkts     60 cpu   0% kbytes     51 0.020 Mbps 7.3(5.7) ms
 lost      0/00% pkts     62 cpu   2% kbytes     53 0.021 Mbps 4.2(5.0) ms
 lost      0/00% pkts     64 cpu   2% kbytes     56 0.022 Mbps 4.2(4.6) ms
 lost      0/00% pkts     60 cpu   2% kbytes     51 0.020 Mbps 6.5(5.6) ms
 lost      0/00% pkts     62 cpu   0% kbytes     53 0.021 Mbps 4.1(4.9) ms
 lost    462/18% pkts   2509 cpu   9% kbytes   2251 0.879 Mbps 4.2(4.5) ms
 lost      0/00% pkts     60 cpu   2% kbytes     51 0.020 Mbps 4.2(4.3) ms

What is more the transition that is observed often jumps or will have one block of the picture updated after all the others. Having tuned the bandwidth down to 20Mb/sec:

 lost    114/04% pkts   2295 cpu   4% kbytes   2344 0.916 Mbps 4.4(4.7) ms
 lost      0/00% pkts     60 cpu   2% kbytes     51 0.020 Mbps 4.2(4.5) ms
 lost      0/00% pkts     62 cpu   2% kbytes     53 0.021 Mbps 7.2(5.8) ms
 lost      0/00% pkts     63 cpu   2% kbytes     53 0.021 Mbps 4.2(5.0) ms
 lost      0/00% pkts     60 cpu   0% kbytes     51 0.020 Mbps 4.4(4.7) ms
 lost      0/00% pkts     62 cpu   3% kbytes     53 0.021 Mbps 4.1(4.4) ms
 lost    216/09% pkts   2304 cpu   7% kbytes   2295 0.897 Mbps 4.4(4.4) ms
 lost      0/00% pkts     60 cpu   2% kbytes     51 0.020 Mbps 4.2(4.3) ms
 lost      0/00% pkts     63 cpu   0% kbytes     55 0.022 Mbps 4.4(4.4) ms
 lost      0/00% pkts     63 cpu   2% kbytes     53 0.021 Mbps 4.4(4.4) ms
 lost      0/00% pkts     60 cpu   2% kbytes     51 0.020 Mbps 4.1(4.2) ms
 lost      0/00% pkts     63 cpu   2% kbytes     55 0.022 Mbps 5.0(4.6) ms
 lost    168/07% pkts   2174 cpu   6% kbytes   2230 0.871 Mbps 7.0(5.8) ms
 lost      0/00% pkts     60 cpu   2% kbytes     51 0.020 Mbps 4.9(5.3) ms
 lost      0/00% pkts     62 cpu   2% kbytes     53 0.021 Mbps 5.9(5.6) ms
 lost      0/00% pkts     63 cpu   2% kbytes     56 0.022 Mbps 4.1(4.9) ms
 lost      0/00% pkts     60 cpu   0% kbytes     51 0.020 Mbps 4.5(4.7) ms
 lost      0/00% pkts     62 cpu   2% kbytes     53 0.021 Mbps 4.2(4.5) ms
 lost      0/00% pkts   1938 cpu   8% kbytes   2118 0.827 Mbps 4.3(4.4) ms
 lost      0/00% pkts     60 cpu   0% kbytes     50 0.020 Mbps 8.4(6.4) ms
 lost      0/00% pkts     62 cpu   2% kbytes     52 0.021 Mbps 17.7(12.1) ms
 lost      0/00% pkts     64 cpu   2% kbytes     55 0.022 Mbps 4.2(8.1) ms
 lost      0/00% pkts     60 cpu   2% kbytes     50 0.020 Mbps 6.3(7.2) ms
 lost      0/00% pkts     62 cpu   0% kbytes     52 0.021 Mbps 4.4(5.8) ms
 lost    214/09% pkts   2224 cpu   7% kbytes   2170 0.848 Mbps 4.0(4.9) ms
 lost      0/00% pkts     60 cpu   2% kbytes     50 0.020 Mbps 4.1(4.5) ms
 lost      0/00% pkts     62 cpu   2% kbytes     52 0.021 Mbps 4.6(4.6) ms
 lost      0/00% pkts     64 cpu   0% kbytes     55 0.022 Mbps 5.7(5.1) ms
 lost      0/00% pkts     60 cpu   2% kbytes     50 0.020 Mbps 4.6(4.8) ms
 lost      0/00% pkts     64 cpu   2% kbytes     55 0.022 Mbps 5.8(5.3) ms
 lost    194/08% pkts   2278 cpu   9% kbytes   2274 0.888 Mbps 4.2(4.8) ms
 lost      0/00% pkts     60 cpu   0% kbytes     50 0.020 Mbps 4.2(4.5) ms
 lost      0/00% pkts     63 cpu   2% kbytes     53 0.021 Mbps 4.2(4.3) ms
 lost      0/00% pkts     63 cpu   3% kbytes     53 0.021 Mbps 4.2(4.3) ms
 lost      0/00% pkts     60 cpu   1% kbytes     50 0.020 Mbps 4.9(4.6) ms
 lost      0/00% pkts     62 cpu   1% kbytes     52 0.021 Mbps 4.1(4.3) ms
 lost      0/00% pkts   2149 cpu   6% kbytes   2319 0.906 Mbps 4.3(4.3) ms
 lost      0/00% pkts     60 cpu   2% kbytes     50 0.020 Mbps 4.2(4.3) ms

The numbers improve greatly however they hide the truth somewhat. Since the operation of the picture frame is that except for the clock that updates every second there are no updates at all for most of the time then every 2 minutes a new photo is displayed. So those averages which appear to be every 20 seconds hide the very high burst of data that happens.

You can see the difference here:



Saturday Nov 29, 2008

Working around really really a small but irritating nwam bug

The euphoria over having a laptop that would suspend to RAM did not last long before it was shattered by a more real world situation. That is suspending while the wireless is connected, ie not when at my desk. This is bug 6766807 which is somewhat irritating and I'm sure will be resolved soon. With my work hat on I wonder if this could be one of the bugs that will be fixed in a supported update. However there is a simple work around.


function restart_nwam
	pfexec svcadm restart nwam
trap restart_nwam 35

while :
	sleep $((60\*60\*24))

Run that script as one of the programs started by the session and this problem is history. Obviously keep an eye on the bug so that when the fix is delivered you remove the work around. I'll update the bug with the workaround on Monday.

Friday Nov 28, 2008

New Laptop == Fresh install of OpenSolaris

I have a new laptop. A Toshiba Tecra M9. Since it, like my brompton is owned by Sun, it is called "brompton".

The new OpenSolaris 2008.11 bits on this hardware support suspend to RAM so closing the lid with the power disconnected results in the system sleeping almost instantly and equally importantly when I press the power button it restarts from where it left off. Really something that any laptop needs to have so this is real progress.

While Tim is suggesting that Sun should give up on the desktop, something I don't completely agree with as the savings would not be that great unless you give up on the X server as well which would leave Sun Ray high and dry something that we should not do. The desktop experience on a modern 3D accelerated frame buffer is something that is getting quite appealing. While most of the features are really just icing (rotating the workspaces when you hit <control><Alt><Left> & <control><Alt><Right>) at least one I've found useful already. When I press <control> & the key there is a ripple effect as if the desktop were water and a water drop has landed where the mouse is.

It allowed me to track the mouse after VirtualBox had hidden it although in the snapshot you can see the mouse. This has probably been on my old laptop but I either had not noticed it or it was not turned on as I had selected the custom options to compiz a while back. It makes me wonder what other new features are hidden in the window system that I may be missing. A VT220 emulator maybe?

One mis-feature though is that by default savecore does not get run at boot time. I recall the head in the sand arguments that were made for turning off savecore after beta in the dark (although at the time less dark than now) days of SunOS 4.0. This seems like a similar exercise in denying reality. On the upside this is not quite so bad as it was as at least there is a dedicated dump device so the dump will not get overwritten as part of swap and can be extracted later by running savecore. Indeed the first thing I would do and did do was this:

cjg@brompton:/boot/grub$ pfexec savecore
cjg@brompton:/boot/grub$ pfexec dumpadm -y
      Dump content: kernel pages
       Dump device: /dev/zvol/dsk/rpool/dump (dedicated)
Savecore directory: /var/crash/brompton
  Savecore enabled: yes


Thursday Nov 27, 2008

Adding dependancies to exim

I finally got around to adding dependancies to the smtp (mail) server I am using on my home server so that it depends on both spamassassin and the clam anti virus services. While there is probably a way to do this using individual commands it was much quicker to export the XML edit that and reimport it having added these lines:

    <dependency name='spamd' grouping='require_all' restart_on='error' type='service'>
      <service_fmri value='svc:/network/spamd'/>
    <dependency name='clam' grouping='require_all' restart_on='error' type='service'>
      <service_fmri value='svc:/network/clam'/>

Having refreshed the service and restarted I, it now shows as depending on the other two services:

: pearson FSS 3 $; svcs -d cswexim
STATE          STIME    FMRI
online         Nov_24   svc:/network/loopback:default
online         Nov_24   svc:/milestone/name-services:default
online         Nov_24   svc:/system/filesystem/local:default
online         Nov_24   svc:/network/clam:default
online         Nov_26   svc:/network/spamd:default
: pearson FSS 4 $; 

and any failure of the dependant services results in cswexim being restarted after the dependant service restarts. Depressingly I had found that small amounts of spam could sneak through thanks to exim not depending on spamassasin.

Tuesday Nov 25, 2008

Redirecting output to syslog

People are always asking this and often when they are not they should be. How do you redirect all the output from a script to syslog?

The obvious is:

my_script | logger -p local6.debug -t my_script 2>&1

but how can you do that from within the script? Simple put this at the top of your script:


logger -p daemon.notice -t ${0##\*/}[$$] |&

exec >&p 2>&1

Clearly this is korn shell specific but then who still writes bourne shell scripts. If you script was called redirect you get messages logged thus:

Nov 25 17:40:41 enoexec redirect[17449]: [ID 702911 daemon.notice] bar

Sunday Nov 23, 2008

Two pools on one drive?

Now I'm committed to ZFS root I'm left with a dilemma. Given the four drives I have in the system and that I have too much data and the drives are of different sizes so raid2Z is not an option even though it would give the greatest protection for the data the next best solution is some form of mirroring. Initially I simply had two pools which offers good redundancy and allows ZFS root to work but is suboptimal performance. If I could stripe the pool that would be better but then that does not work with ZFS root.

However since I used to run with a future proof Disk Suite, UFS based root I still have the space that used to contain the two boot environments that were on UFS into which I intended to grow the pool once they were not needed. What if I did not grow the pool but instead put a second pool on that partition? Then I would have a pool, “rpool” mirrored across part of the disk and then the data pool, “tank” mirrored over the rest of the boot drives and striped across a second mirror consisting of the entire second pair of drives.

Clearly the solution is suboptimal but given the constraints of ZFS root and the hardware I have would this perform better?

I should point out that the system as is does not perform badly, but I don't want to leave performance on the table if I don't have to. I'm not going to rush into this (that is I've not already done it) since growing the pool is a one way operation there being no way to shrink it again although at the moment I am minded to do it.

Comments welcome

Saturday Nov 22, 2008

Forced to upgrade

Build 103 and ZFS root have come to the home server. While I was travelling the system hit bug 6746456 which resulted in the system panicing every time it booted. So I was forced to return to build 100 and have now upgraded to build 103. Live upgrade using UFS would not work at all and since I have the space I've moved over to ZFS root. However the nautilus bug is still in build 103 so I'm either going to have to live with it, which is impossible, disable nautilus completely or work to get the time slider feature disabled until it is usable. Disabling nautilus while irritating is effectively what I have had to do now so could be the medium term solution.

The other news for the home server was the failure of the power supply. So it was good bye to the small Antec case that used to house the server since it did not really save any space a more traditional desk side unit has replaced it which also allows upto six internal drives. Since ZFS root will not support booting of stripes the extra two drives I have form a second pool.

# zpool list
pool2   294G  36.8G   257G    12%  ONLINE  -
tank    556G   307G   249G    55%  ONLINE  -

The immediate effect of two pools is being able to have the Solaris image from which I upgraded on a different pair of disks from the ones being upgraded with a dramatic performance boost. The other is that I can let the automatic snapshot service take control of the other pool rather than add it to my old snapshot policy. Early on I realise I need to turn off snapshots on the swap volumes which are on both pools (to get some striping):

# zfs set com.sun:auto-snapshot=false pool2/swap
zfs set com.sun:auto-snapshot=false tank/swap 

should do it.


This is the old blog of Chris Gerhard. It has mostly moved to


« July 2016