Friday Jul 17, 2009

10 Steps to OpenSolaris Laptop Heaven

If you have recently come into possession of a Laptop onto which to load Solaris then here are my top tips:

  1. Install OpenSolaris. At the time of writing the release is 2009.06, install that, parts of this advice may become obsolete with later releases. Do not install Solaris 10 or even worse Nevada. You should download the live CD and burn it onto a disk boot that and let it install but before you start the install read the next tip.

  2. Before you start the install open a terminal so that you can turn on compression on the root pool once it it created. You have to keep running “zpool list” until you see the pool is created and then run (pfexec zfs set compression=on rpool). You may think that disk is big but after a few months you will be needing every block you can get. Also laptop drives are so slow that compression will probably make things faster.

  3. Before you do anything after installation take a snapshot of the system so you can always go back (pfexec beadm create opensolaris@initialinstall). I really mean this.

  4. Add the extras repository. It contains virtualbox, the flash plugin for firefox, true type fonts and more. All you need is a sun online account. See https://pkg.sun.com/register/ and http://blogs.sun.com/chrisg/entry/installing_support_certificates_in_opensolaris

  5. Decide whether you want to use the development or support repository. If in doubt choose the supported one. Sun employees get access to the support repository. Customers need to get a support contract. (http://www.opensolaris.com/learn/subscriptions/). Then update to the latest bigs (pfexec pkg image-update).

  6. Add any extra packages you need. Since I am now writing this retrospectively there may be things missing. My starting list is:

    • OpenOffice (pfexec pkg install openoffice)

    • SunStudio (pfexec pkg install sunstudioexpress)

    • Netbeans (pfexec pkg install netbeans)

    • Flash (pkfexec pkg install flash)

    • Virtualbox (pfexec pkg install virtualbox)

    • TrueType fonts (pfxec pkg install ttf-fonts-core)

  7. If you are a Sun Employee install the punchin packages so you can access SWAN. I actually rarely use this as I have a Solaris 10 virtualbox image that I use for punchin so I can be both on and off SWAN at the same time but it is good to have the option.

  8. Add you keys to firefox so that you can browse the extras and support repositories from firefox. See http://wikis.sun.com/display/OpenSolarisInfo200906/How+to+Browse+the+Support+and+Extra+Repositories.

  9. Go to Fluendo and get and install the free mp3 decoder. They also sell a complete and legal set of decoders for the major video formats, I have them and have been very happy with them. They allow me to view the videos I have cycling events.

  10. Go to Adobe and get acroread. I live in hope that at some point this will be in a repository either at Sun or one Adobe runs so that it can be installed using the standard pkg commands but until then do it by hand.

Enjoy.

Tuesday Jun 30, 2009

Using dtrace to track down memory leaks

I've been working with a customer to try and find a memory “leak” in their application. Many things have been tried, libumem, and the mdb ::findleaks command all with no success.

So I was, as I am sure others before me have, pondering if you could use dtrace to do this. Well I think you can. I have a script that puts probes into malloc et al and counts how often they are called by this thread and when they are freed often free is called.

Then in the entry probe of the target application note away how many calls there have been to the allocators and how many to free and with a bit of care realloc. Then in the return probe compare the number of calls to allocate and free with the saved values and aggregate the results. The principle is that you find the routines that are resulting in allocations that they don't clear up. This should give you a list of functions that are possible leakers which you can then investigate1.

Using the same technique I for getting dtrace to “follow fork” that I described here I ran this up on diskomizer, a program that I understand well and I'm reasonably sure does not have systemic memory leaks. The dtrace script reports three sets of results.

  1. A count of how many times each routine and it's descendents have called a memory allocator.

  2. A count of how many times each routine and it's descendents have called free or realloc with a non NULL pointer as the first argument.

  3. The difference between the two numbers above.

Then with a little bit of nawk to remove all the functions for which the counts are zero gives:

# /usr/sbin/dtrace -Z -wD TARGET_OBJ=diskomizer2 -o /tmp/out-us \\
	-s /tmp/followfork.d \\
	-Cs /tmp/allocated.d -c \\
         "/opt/SUNWstc-diskomizer/bin/sparcv9/diskomizer -f /devs -f background \\
          -o background=0 -o SECONDS_TO_RUN=1800"
dtrace: failed to compile script /tmp/allocated.d: line 20: failed to create entry probe for 'realloc': No such process
dtrace: buffer size lowered to 25m
dtrace: buffer size lowered to 25m
dtrace: buffer size lowered to 25m
dtrace: buffer size lowered to 25m
 
# nawk '$1 != 0 { print  $0 }' < /tmp/out.3081
allocations
           1 diskomizer`do_dev_control
           1 diskomizer`set_dev_state
           1 diskomizer`set_state
           3 diskomizer`report_exit_reason
           6 diskomizer`alloc_time_str
           6 diskomizer`alloc_time_str_fmt
           6 diskomizer`update_aio_read_stats
           7 diskomizer`cancel_all_io
           9 diskomizer`update_aio_write_stats
          13 diskomizer`cleanup
          15 diskomizer`update_aio_time_stats
          15 diskomizer`update_time_stats
          80 diskomizer`my_calloc
         240 diskomizer`init_read
         318 diskomizer`do_restart_stopped_devices
         318 diskomizer`start_io
         449 diskomizer`handle_write
         606 diskomizer`do_new_write
        2125 diskomizer`handle_read_then_write
        2561 diskomizer`init_buf
        2561 diskomizer`set_io_len
       58491 diskomizer`handle_read
       66255 diskomizer`handle_write_then_read
      124888 diskomizer`init_read_buf
      124897 diskomizer`do_new_read
      127460 diskomizer`expect_signal
freecount
           1 diskomizer`expect_signal
           3 diskomizer`report_exit_reason
           4 diskomizer`close_and_free_paths
           6 diskomizer`update_aio_read_stats
           9 diskomizer`update_aio_write_stats
          11 diskomizer`cancel_all_io
          15 diskomizer`update_aio_time_stats
          15 diskomizer`update_time_stats
          17 diskomizer`cleanup
         160 diskomizer`init_read
         318 diskomizer`do_restart_stopped_devices
         318 diskomizer`start_io
         442 diskomizer`handle_write
         599 diskomizer`do_new_write
        2125 diskomizer`handle_read_then_write
        2560 diskomizer`init_buf
        2560 diskomizer`set_io_len
       58491 diskomizer`handle_read
       66246 diskomizer`handle_write_then_read
      124888 diskomizer`do_new_read
      124888 diskomizer`init_read_buf
      127448 diskomizer`cancel_expected_signal
mismatch_count
     -127448 diskomizer`cancel_expected_signal
          -4 diskomizer`cancel_all_io
          -4 diskomizer`cleanup
          -4 diskomizer`close_and_free_paths
           1 diskomizer`do_dev_control
           1 diskomizer`init_buf
           1 diskomizer`set_dev_state
           1 diskomizer`set_io_len
           1 diskomizer`set_state
           6 diskomizer`alloc_time_str
           6 diskomizer`alloc_time_str_fmt
           7 diskomizer`do_new_write
           7 diskomizer`handle_write
           9 diskomizer`do_new_read
           9 diskomizer`handle_write_then_read
          80 diskomizer`init_read
          80 diskomizer`my_calloc
      127459 diskomizer`expect_signal

#

From the above you can see that there are two functions that create and free the majority of the allocations and the allocations almost match each other, which is expected as they are effectively constructor and destructor for each other. The small mismatch is not unexpected in this context.

However it is the vast number of functions that are not listed at all as they and their children make no calls to the memory allocator or have exactly matching allocation and free that are important here. Those are the functions that we have just ruled out.

From here it is easy now to drill down on the functions that are interesting you, ie the ones where there are unbalanced allocations.


I've uploaded the files allocated.d and followfork.d so you can see the details. If you find it useful then let me know.

1Unfortunately the list is longer than you want as on SPARC it includes any functions that don't have their own stack frame due to the way dtrace calculates ustackdepth, which the script makes use of.

2The script only probes particular objects, in this case the main diskomizer binary, but you can limit it to a particular library or even a particular set of entry points based on name if you edit the script.

Saturday Jun 27, 2009

Follow fork for dtrace pid provider?

There is a ongoing request to have follow fork functionality for the dtrace pid provider but so far no one has stood upto the plate for that RFE. In the mean time my best workaround for this is this:

cjg@brompton:~/lang/d$ cat followfork.d
proc:::start
/ppid == $target/
{
	stop();
	printf("fork %d\\n", pid);
	system("dtrace -qs child.d -p %d", pid);
}
cjg@brompton:~/lang/d$ cat child.d
pid$target::malloc:entry
{
	printf("%d %s:%s %d\\n", pid, probefunc, probename, ustackdepth)
}
cjg@brompton:~/lang/d$ pfexec /usr/sbin/dtrace -qws followfork.d -s child.d -p 26758
26758 malloc:entry 22
26758 malloc:entry 15
26758 malloc:entry 18
26758 malloc:entry 18
26758 malloc:entry 18
fork 27548
27548 malloc:entry 7
27548 malloc:entry 7
27548 malloc:entry 18
27548 malloc:entry 16
27548 malloc:entry 18

Clearly you can have the child script do what ever you wish.

Better solutions are welcome!

Thursday Jun 18, 2009

Diskomizer Open Sourced

I'm pleased to announce the Diskomizer test suite has been open sourced. Diskomizer started life in the dark days before ZFS when we lived in a world full1 of bit flips, phantom writes, phantom reads, misplaced writes and misplaced reads.

With a storage architecture that does not use end to end data verification the best that you can hope for was that your application will spot errors quickly and allow you to diagnose the broken part or bug quickly. Diskomizer was written to be a “simple” application that could verify all the data paths worked correctly and worked correctly under extreme load. It has been and is used by support, development and test groups for system verification.

For more details of what Diskomizer is and how to build and install read these pages:

http://www.opensolaris.org/os/community/storage/tests/Diskomizer/

You can download the source and precompiled binaries from:

http://dlc.sun.com/osol/test/downloads/current/

and can browse the source here:

http://src.opensolaris.org/source/xref/test/stcnv/usr/src/tools/diskomizer

Using Diskomizer

First remember in most cases Diskomizer will destroy all the data on any target you point it at. So extreme care is advised.

I will say that again.

Diskomizer will destroy all the data on any target that you point it at.

For the purposes of this explanation I am going to use ZFS volumes so that I can create and destroy them with confidence that I will not be destroying someone's data.

First lets create some volumes.

# i=0
# while (( i < 10 ))
do
zfs create -V 10G storage/chris/testvol$i
let i=i+1
done
#

Now write the names of the devices you wish to test into a file after the key “DEVICE=”:

# echo DEVICE= /dev/zvol/rdsk/storage/chris/testvol\* > test_opts

Now start the test. When you installed diskomizer it put the standard option files on the system and has a search path so that it can find them. I'm using the options file “background” which will make the test go into the back ground redirecting the output into a file called “stdout” and any errors into a file called “stderr”:


# /opt/SUNWstc-diskomizer/bin/diskomizer -f test_opts -f background
# 

If Diskomizer has any problems with the configuration it will report them and exit. This is to minimize the risk to your data from a typo. Also the default is to open devices and files exclusively to again reduce the danger to your data (and to reduce false positives where it detects data corruption).

Once up and running it will report it's progress for each process in the output file:

# tail -5 stdout
PID 1152: INFO /dev/zvol/rdsk/storage/chris/testvol7 (zvol0:a)2 write times (0.000,0.049,6.068) 100%
PID 1152: INFO /dev/zvol/rdsk/storage/chris/testvol1 (zvol0:a) write times (0.000,0.027,6.240) 100%
PID 1152: INFO /dev/zvol/rdsk/storage/chris/testvol7 (zvol0:a) read times (0.000,1.593,6.918) 100%
PID 1154: INFO /dev/zvol/rdsk/storage/chris/testvol9 (zvol0:a) write times (0.000,0.070,6.158)  79%
PID 1151: INFO /dev/zvol/rdsk/storage/chris/testvol0 (zvol0:a) read times (0.000,0.976,7.523) 100%
# 

meanwhile all the usual tools can be used to view the IO:

# zpool iostat 5 5                                                  
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
storage      460G  15.9T    832  4.28K  6.49M  31.2M
storage      460G  15.9T  3.22K  9.86K  25.8M  77.2M
storage      460G  15.9T  3.77K  6.04K  30.1M  46.8M
storage      460G  15.9T  2.90K  11.7K  23.2M  91.4M
storage      460G  15.9T  3.63K  5.86K  29.1M  45.7M
# 


1Full may be an exaggeration but we will never know thanks to the fact that the data loss was silent. There were enough cases reported where there was reason to doubt whether the data was good to keep me busy.

2The fact that all the zvols have the same name (zvol0:a) is bug 6851545 found with diskomizer.

Sunday Jun 07, 2009

OpenSolaris 2009.06

After a week of running 2009.06 on my Toshiba Tecra M9 having upgraded from 2008.11 I'm in a position to comment on it. I've been able to remove all the workarounds I had on the system. Nwam appears to work even in the face a suspend and resume. Removalbe media also appears to be robust without the occasional panics that would happen when I removed the SD card with 2008.11.

Feature wise the things I have noticed are the new tracker system for searching files, but it seems to be completely non functional. The big improvements are in the support for the special keys on the Toshiba and the volume control, which unlike the volume on the M2 is a logical control so requires software support. 2009.06 has this support along with support fo the number pad, brightness and mute buttons.

The downside was hitting this bug. This pretty much renders resume useless and I was about to go back to 2008.06 when the bug was updated to say it will be fixed in the first update release and in the mean time there are binary packages. So after creating a new boot enviroment so that I have an unpatched one to switch to when the fix gets into the support repository I have applied the patch. Seems to work which is very pleasing as it has not taken me long to get used to the brightness buttons working.

Tuesday May 26, 2009

Why everyone should be using ZFS

It is at times like these that I'm glad I use ZFS at home.


  pool: tank
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

        NAME           STATE     READ WRITE CKSUM
        tank           ONLINE       0     0     0
          mirror       ONLINE       0     0     0
            c20t0d0s7  ONLINE       6     0     4
            c21t0d0s7  ONLINE       0     0     0
          mirror       ONLINE       0     0     0
            c21t1d0    ONLINE       0     0     0
            c20t1d0    ONLINE       0     0     0

errors: No known data errors
: pearson FSS 14 $; 

The drive with the errors was also throwing up errors that iostat could report and from it's performance was trying heroicially to give me back data. However it had failed. It's performance was terrible and then it failed to give the right data on 4 occasions. Anyother file system would, if that was user data, just had deliviered it to the user without warning. That bad data could then propergate from there on, probably into my backups. There is certainly no good that could come from that. However ZFS detected and corrected the errors.


Now I have offlined the disk the performance of the system is better but I have no redundancy until the new disk I have just ordered arriaves. Now time to check out Seagate's warranty return system.

Friday May 08, 2009

Stopping find searching remote directories.

Grizzled UNIX users look away now.

The find command is a wonderful thing but there are some uses of it that seem to cause confusion enough that it seems worth documenting them for google. Today's is:

How can I stop find(1) searching remote file systems?

On reading the documentation the “-local” option should be just what you want and it is, but not on it's own. If you just do:

$ find . -local -print

It will indeed only report on files that are on local file systems below the current directory. However it will search the entire directory tree for those local files even if the directory tree is on NFS

To get find to stop searching when it finds a remote file system you need:


$ find . \\( ! -local -prune \\) -o -print

simple.

Friday May 01, 2009

Installing support certificates in OpenSolaris

For some reason you only get the instructions on how to install a certificate to get access to supported or extras updates on your OpenSolaris system after you have downloaded the certificate. Not a big issue as that is generally when you want the instructions. However if you already have your certificates and now want to install them on another system (that you have support for) you can't get the instructions without getting another certificate.

So here are the instructions cut'n'pasted from the support page, as much for me as for you:

How to Install this OpenSolaris 2008.11 standard support Certificate

  1. Download the provided key and certificate files, called OpenSolaris_2008.11_standard_support.key.pem andOpenSolaris_2008.11_standard_support.certificate.pem using the buttons above. Don't worry if you get logged out, or lose the files. You can come back to this site later and re-download them. We'll assume that you downloaded these files into your Desktop folder,~/Desktop/.

  2. Use the following comands to make a directory inside of /var/pkg to store the key and certificate, and copy the key and certificate into this directory. The key files are kept by reference, so if the files become inaccessible to the packaging system, you will encounter errors. Here is how to do it:

            $ pfexec mkdir -m 0755 -p /var/pkg/ssl
            $ pfexec cp -i ~/Desktop/OpenSolaris_2008.11_standard_support.key.pem /var/pkg/ssl
            $ pfexec cp -i ~/Desktop/OpenSolaris_2008.11_standard_support.certificate.pem /var/pkg/ssl 

  3. Add the publisher:

            $ pfexec pkg set-authority \\
               -k /var/pkg/ssl/OpenSolaris_2008.11_standard_support.key.pem \\
               -c /var/pkg/ssl/OpenSolaris_2008.11_standard_support.certificate.pem \\
               -O https://pkg.sun.com/opensolaris/support/ opensolaris.org 
    
  4. To see the packages supplied by this authority, try:

            $ pkg list -a 'pkg://opensolaris.org/\*' 
    

If you use the Package Manager graphical application, you will be able to locate the newly discovered packages when you restart Package Manager.

How to Install this OpenSolaris extras Certificate

  1. Download the provided key and certificate files, called OpenSolaris_extras.key.pem and OpenSolaris_extras.certificate.pem using the buttons above. Don't worry if you get logged out, or lose the files. You can come back to this site later and re-download them. We'll assume that you downloaded these files into your Desktop folder, ~/Desktop/.

  2. Use the following comands to make a directory inside of /var/pkg to store the key and certificate, and copy the key and certificate into this directory. The key files are kept by reference, so if the files become inaccessible to the packaging system, you will encounter errors. Here is how to do it:

            $ pfexec mkdir -m 0755 -p /var/pkg/ssl
            $ pfexec cp -i ~/Desktop/OpenSolaris_extras.key.pem /var/pkg/ssl
            $ pfexec cp -i ~/Desktop/OpenSolaris_extras.certificate.pem /var/pkg/ssl
                    
  3. Add the publisher:

            $ pfexec pkg set-authority \\
                -k /var/pkg/ssl/OpenSolaris_extras.key.pem \\
                -c /var/pkg/ssl/OpenSolaris_extras.certificate.pem \\
                -O https://pkg.sun.com/opensolaris/extra/ extra
            
  4. To see the packages supplied by this authority, try:

            $ pkg list -a 'pkg://extra/\*'
            

    If you use the Package Manager graphical application, you will be able to locate the newly discovered packages when you restart Package Manager.

Sunday Apr 19, 2009

User and group quotas for ZFS!

This push will be very popular amoung those who are managing servers with thousands of users:

Repository: /export/onnv-gate
Total changesets: 1

Changeset: f41cf682d0d3

Comments:
PSARC/2009/204 ZFS user/group quotas & space accounting
6501037 want user/group quotas on ZFS
6830813 zfs list -t all fails assertion
6827260 assertion failed in arc_read(): hdr == pbuf->b_hdr
6815592 panic: No such hold X on refcount Y from zfs_znode_move
6759986 zfs list shows temporary %clone when doing online zfs recv

User quotas for zfs has been the feature I have been asked about most when talking to customers. This probably relfects that most customers are simply blown away by the other features of ZFS and the only missing feature was user quotas if you have a large user base.

Tuesday Apr 14, 2009

zfs list -d

I've just pushed the changes for zfs list that give it a -d option to limit the depth to which recursive listings will go. This is of most use when you wish to list the snapshots of a given data set and only the snapshots of that data set.

PSARC 2009/171 zfs list -d and zfs get -d
6762432 zfs list --depth

Before this you could achieve this using a short pipe line which while it produced the correct results was horribly inefficient and very slow for datasets that had lots of descendents.

: v4u-1000c-gmp03.eu TS 6 $; zfs list -t snapshot rpool | grep '\^rpool@'
rpool@spam                         0      -    64K  -
rpool@two                          0      -    64K  -
: v4u-1000c-gmp03.eu TS 7 $; zfs list -d 1 -t snapshot              
NAME         USED  AVAIL  REFER  MOUNTPOINT
rpool@spam      0      -    64K  -
rpool@two       0      -    64K  -
: v4u-1000c-gmp03.eu TS 8 $; 

It will allow the zfs-snapshot service to be much more efficient when it needs to list snapshots. The change will be in build 113.

Friday Apr 10, 2009

Native solaris printing working

The solution to my HP printer not working on Solaris is to use CUPS. Since I have a full install and solaris now has the packages installed all I had to do was switch the service over:

$ pfexec /usr/sbin/print-service -s cups

then configure the printer using http://localhosts:631 in the same way as I did with ubunto. Now I don't need the virtual machine running which is a bonus. I think cups may be a better fit for me at home since I don't have a nameservice running so the benefits of the lp system are very limited.


The bug is 6826086: printer drivers from package SUNWhpijs are not update with HPLIP

Sunday Apr 05, 2009

Recovering our Windows PC

I had reason to discover if my solution for backing up the windows PC worked. Apparently the PC had not been working properly for a while but no one had mentioned that to me. The symptoms were:

  1. No menu bar at the bottom of the screen. It was almost like the screen was the wrong size but how it was changed is/was a mystery.

  2. It was claiming it needed to revalidate itself as the hardware had changed, which it catagorically had not and I had 2 days to sort it out. Apparenty this message had been around for a few days (weeks?) but was ignored.

Now I'm sure I could have had endless fun reading forums to find out how to fix these things but it was Saturday night nd I was going cycling in the morning. So time to boot solaris and restore the back up. First I took a back up of what was on the disk, just in case I get a desire to relive the issue. I just needed one script to restore it over ssh. The script is:

: pearson FSS 14 $; cat /usr/local/sbin/xp_restore 
#!/bin/ksh 

exec dd of=/dev/rdsk/c0d0p1 bs=1k
: pearson FSS 15 $; 

and the command was:

$ ssh pc pfexec /usr/local/sbin/xp_restore < backup.dd

having chosen the desired snapshot. Obviously the command was added to /etc/security/exec_attr. Then just leave that running over night. In the morning the system booted up just fine, complained about the virus definitions being out of date and various things needing updates but all working. Alas doing this before I went cycling made me late enough to miss the peleton, if it was there.

Friday Mar 20, 2009

Limiting scsi.d output to IOs that take a long time.

I have added support for only reporting on scsi packets that take more than a certain amount of time to complete to scsi.d. This is partularly useful when combined with the PERF_REPORT option to see just the IO requests that took over a time threshold and still collect statistics on all the other IO requests.

To use this you have to specify “-D REPORT_OVERTIME=X” where X is the time you wish to report on in nano seconds. Here is an example that will only print the details of IO requests that took more than 250ms:

# scsi.d -D REPORT_OVERTIME=$((250\*1000\*1000)) -D PERF_REPORT 
Only reporting IOs longer than 250000000ns
Hit Control C to interrupt
00005.388800363 glm0:-> 0x2a WRITE(10) address 00:00, lba 0x00270e67, len 0x000008, control 0x00 timeout 60 CDBP 300002df438 1 sched(0) cdb(10) 2a0000270e6700000800
00005.649494475 glm0:<- 0x2a WRITE(10) address 00:00, lba 0x00270e67, len 0x000008, control 0x00 timeout 60 CDBP 300002df438, reason 0x0 (COMPLETED) pkt_state 0x1f state 0x0 Success Time 260833us
00005.384612799 glm0:-> 0x0a  WRITE(6) address 00:00, lba 0x048564, len 0x000001, control 0x00 timeout 60 CDBP 30002541ac0 1 sched(0) cdb(6) 0a0485640100
00005.716416658 glm0:<- 0x0a  WRITE(6) address 00:00, lba 0x048564, len 0x000001, control 0x00 timeout 60 CDBP 30002541ac0, reason 0x0 (COMPLETED) pkt_state 0x1f state 0x0 Success Time 331939us
00005.385907691 glm0:-> 0x0a  WRITE(6) address 00:00, lba 0x0605b4, len 0x000001, control 0x00 timeout 60 CDBP 300035637a0 1 sched(0) cdb(6) 0a0605b40100
00005.773925990 glm0:<- 0x0a  WRITE(6) address 00:00, lba 0x0605b4, len 0x000001, control 0x00 timeout 60 CDBP 300035637a0, reason 0x0 (COMPLETED) pkt_state 0x1f state 0x0 Success Time 388153us
00005.389078533 glm0:-> 0x2a WRITE(10) address 00:00, lba 0x004b19d3, len 0x000003, control 0x00 timeout 60 CDBP 300002df0f8 1 sched(0) cdb(10) 2a00004b19d300000300
00005.824242527 glm0:<- 0x2a WRITE(10) address 00:00, lba 0x004b19d3, len 0x000003, control 0x00 timeout 60 CDBP 300002df0f8, reason 0x0 (COMPLETED) pkt_state 0x1f state 0x0 Success Time 435303us
\^C

  glm                                                       0
           value  ------------- Distribution ------------- count    
         4194304 |                                         0        
         8388608 |                                         3        
        16777216 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@        678      
        33554432 |@@@@@                                    98       
        67108864 |@                                        27       
       134217728 |                                         8        
       268435456 |                                         3        
       536870912 |                                         0        

# 

The implementation of this uses dtrace speculations and as such may require some tuning of the various settings. Specifically the number of speculation buffers I have set to 1000 which should be enough for all but the busiest systems but if you do reach that limit you can increase them using the following options:


-D NSPEC=N

Set the number of speculations to this value.

-D SPECSIZE=N

Set the size of the speculaton buffer. This should be 200 times the size of NSPEC.

-D CLEANRATE=N

Specify the clean rate.

As usual the script is available here. It is version 1.18.

Sunday Mar 08, 2009

AV codecs for Solaris

One of the big complaints about using OpenSolaris and Solaris is multi media support. The lack of codecs to allow you to play "common" if not "open" formats is irritatings.

I've been using the Fluendo mp3 codecs for a while as they have a compelling price point, being free. Fluendo also have a bundle of other codecs that they supply for OpenSolaris containing codecs for a number of other popular formats. I decided the €28 was would be worth it and have now installed the codecs on my home server. I now have gstreamer support for:

: pearson FSS 1 $; gst-inspect | grep -i flue
fluasf:  flurtspwms: Fluendo WMS RTSP Extension
fluasf:  fluasfcmdparse: Fluendo ASF Command Parser
fluasf:  fluasfdemux: Fluendo ASF Demuxer
fluwmvdec:  fluwmvdec: Fluendo WMV Decoder
fluaacdec:  fluaacdec: Fluendo AAC Decoder
flumms:  flummssrc: Fluendo MMS source
fludivx3dec:  fludivx3dec: Fluendo DivX 3.11 Decoder
flulpcm:  flulpcmdec: Fluendo LPCM decoder
fluwma:  fluwmsdec: Fluendo WMS Decoder
fluwma:  fluwmadec: Fluendo WMA Decoder (STD + PRO + LSL + WMS)
fluh264dec:  fluh264dec: Fluendo H264 Decoder
flumpeg4vdec:  flumpeg4vdec: Fluendo MPEG-4 ASP Video Decoder
flumpeg2vdec:  flumpeg2vdec: Fluendo MPEG-2 Video Decoder
flump3dec:  flump3dec: Fluendo MP3 Decoder (C build)
: pearson FSS 2 $; 

Alas on the Sun Rays the video performance is still less than great but on the console it is faultless and I am seeing the Totem is not quite as useless as I had previously thought.

It would be really cool if you could get OpenSolaris codecs via IPS repository.

The next thing I want is to be able to edit video.

Saturday Feb 21, 2009

[Open]Solaris logfiles and ZFS root

This week I had reason to want to see how often the script that controls the access hours of Sun Ray users actually did work so I went off to look in the messages files only to discover that there were only four and they only went back to January 11.

: pearson FSS 22 $; ls -l mess\*
-rw-r--r--   1 root     root       12396 Feb  8 23:58 messages
-rw-r--r--   1 root     root      134777 Feb  8 02:59 messages.0
-rw-r--r--   1 root     root       53690 Feb  1 02:06 messages.1
-rw-r--r--   1 root     root      163116 Jan 25 02:01 messages.2
-rw-r--r--   1 root     root       83470 Jan 18 00:21 messages.3
: pearson FSS 23 $; head -1 messages.3
Jan 11 05:29:14 pearson pcplusmp: [ID 444295 kern.info] pcplusmp: ide (ata) instance #1 vector 0xf ioapic 0x2 intin 0xf is bound to cpu 1
: pearson FSS 24 $; 

I am certain that the choice of only four log files was not a concious decision I have made but it did make me ponder whether logfile management should be revisted in the light of ZFS root. Since clearly if you have snapshots firing logs could go back a lot futher:

: pearson FSS 40 $; head -1 $(ls -t /.zfs/snapshot/\*/var/adm/message\*| tail -1)
Dec 14 03:15:14 pearson time-slider-cleanup: [ID 702911 daemon.notice] No more daily snapshots left
: pearson FSS 41 $; 

It did not take long for this shell function to burst into life:

function search_log
{
typeset path
if [[ ${2#/} == $2 ]]
then
        path=${PWD}/$2
else
        path=$2
fi
cat $path /.zfs/snapshot/\*$path | egrep $1 | sort -M | uniq
}

Not a generalized solution but one that works when you root filesystem contains all your logs and if you remember to escape any globbing on the command line will search all the log files:

: pearson FSS 46 $; search_log block /var/adm/messages\\\* | wc
      51     688    4759
: pearson FSS 47 $; 

There are two ways to view this. Either it it great that the logs are kept and so I have all this historical data or it is a pain as getting red of log files becomes more of a chore, indeed this is encouraging me to move all the logfiles into their own file systems so that the management of those logfiles is more granular.

At the very least it seems to me that OpenSolaris should sort out where it's log files are going and end the messages going in /var/adm and move them to /var/log which then should be it's own file system.

Sunday Feb 15, 2009

Build 108

I've managed to upgrade my home server to build 108 which is an important mile-stone for me as it has the fix for:

6763600: nautilus becomes unusable on a system with 39000 snapshots.

Which was rendering nautilus close to unusable for any users who moved out of their lofs automounted home directories. In partilcular any attempt to use it to manage the photo directories was painful.

However all was not smooth as again I hit this bug:

6795892: Sun Ray X servers (both Xsun and Xorg) suffer network problems in snv_106

but since I was expecting this I tried the workaround from bug 6799655 which is the same as the one for 6763600:


In /etc/sock2path change the following lines:

    2   2   0   tcp
    2   2   6   tcp

    26  2   0   tcp
    26  2   6   tcp

    2   1   0   udp
    2   1   17  udp

    26  1   0   udp
    26  1   17  udp

to:

    2   2   0   /dev/tcp
    2   2   6   /dev/tcp

    26  2   0   /dev/tcp6
    26  2   6   /dev/tcp6

    2   1   0   /dev/udp
    2   1   17  /dev/udp

    26   1  0   /dev/udp6
    26   1  17  /dev/udp6

While this got the Sun Rays up it also stopped named from working, spewing errors like this:

Feb 15 15:10:39 pearson named[15558]: [ID 873579 daemon.error] 71/Protocol error
Feb 15 15:10:39 pearson named[15558]: [ID 873579 daemon.error] socket.c:4315: unexpected error:

So have had to revert to some internal to Sun binaries that work around this while the underlying bug is fixed. It is slighly worring as I'm left wondering what other victims are out there. One I have already found is ntp, which is a known bug:

6796948: NTP completely broken by I_SETSIG semantics change in snv_106

I suspect that the system will have to revert to build 105 soon.

Friday Feb 13, 2009

Mobile internet on OpenSolaris

I have a new mobile phone which is less steam driven than the one it replaces and, hopefully, will not power itself off and reset to factory defaults every few days. Being modern it can act as a USB modem device and therefore allows me to have my OpenSolaris Laptop work with it.

Following the instructions from James Leggs blog and some from an Email I had been sent (James' instructions were the same as the email) it all worked just fine except: The phone is a Nokia E71 and by default the USB setting was menu->connectivity->usb was set with "USB mode" to mass storage and "Ask on connection" to Yes. It seems that that was enough for OpenSolaris to see it as a disk drive and then not plumb up the serial devices. Changing the "USB mode" to "PC Suite" resolved this.

The only addition I have made was to add these scripts into /etc/ppp to disable nwam and fix resolv.conf when the link comes up and reverse that when it is taken down.

cjg@brompton:~$ cat /etc/ppp/ip-up
#!/bin/ksh -p

/usr/sbin/svcadm disable -t svc:/network/physical:nwam

mv /etc/resolv.conf /etc/resolv.conf.preppp

cp /etc/ppp/resolv.conf /etc/resolv.conf
chmod 644 /etc/resolv.conf
cjg@brompton:~$ 

and

cjg@brompton:~$ cat /etc/ppp/ip-down
#!/bin/ksh -p

test -f /etc/resolv.conf.preppp && mv /etc/resolv.conf.preppp /etc/resolv.conf

/usr/sbin/svcadm enable nwam
cjg@brompton:~$ 

Not rocket science and when nwam knows about ppp they will surely go but until then quite satisfactory.


Finally create a launcher that will start this at the click of a button. I have cheated here and simply have it starting in a terminal so I can stoop it by quitting the terminal:

cjg@brompton:~/Desktop$ cat 3G\\ network.desktop 

[Desktop Entry]
Encoding=UTF-8
Version=1.0
Type=Application
Terminal=true
Icon[en_GB]=/usr/share/icons/gnome/24x24/actions/call-start.png
Name[en_GB]=3G network
Exec=\\spfexec pppd call vodafone\\n
Name=3G network
Icon=/usr/share/icons/gnome/24x24/actions/call-start.png
cjg@brompton:~/Desktop$ 

Saturday Jan 31, 2009

Suspend and resume really does just work

The whole rebooting thing last night and the surprise that my laptop had been "up"for 18 days left me wondering how many times I had suspended the system. I turns out this is easy to check in /var/adm/messages, which thanks to the laptop being hibernated most of the night fails to get cleaned out by cron, using a short script:

cjg@brompton:~$ nawk '/SunOS/ { if (count > max) { max=count} ; count=0 } /resume/ { count++ } END { print "current",count; print "max",max }' /var/adm/messages 
current 3
max 109
cjg@brompton:~$ 

109 sucessful suspend and resume cycles is not bad. I can't find a kernel statistic that counts them directly which I think is a pity. Also I need to keep an eye on the messages files however with ZFS root and snapshots going on the whole log file rotation and clean up could do with rethinking even in the non laptop case.

Friday Jan 30, 2009

OpenSolaris updates virtualbox

After a short delay VirtualBox for OpenSolaris has been upgraded. Update manager popped up to tell me and ask if I wanted the upgrade and created a new boot environment with the package installed. Very cool. Alas now I have to reboot to install it so my laptop won't reach three weeks uptime:

cjg@brompton:~$ uptime
 10:12pm  up 18 days  0:05,  2 users,  load average: 0.16, 0.27, 0.84
cjg@brompton:~$ beadm list
BE            Active Mountpoint Space  Policy Created          
--            ------ ---------- -----  ------ -------          
opensolaris   -      -          89.37M static 2008-11-27 12:14 
opensolaris-1 -      -          8.39M  static 2008-11-30 13:14 
opensolaris-2 -      -          13.36M static 2008-12-02 11:27 
opensolaris-3 N      /          12.70M static 2008-12-06 20:02 
opensolaris-4 R      -          9.44G  static 2009-01-30 21:42 
cjg@brompton:~$ 

Thursday Jan 29, 2009

lwp_park and lwp_unpark

When you have a program that uses the locking primitives mutex_lock() and mutex_unlock and their POSIX equivlients if you truss the process you will often see calls to lwp_park() and lwp_unpark() appering:

/1:	 1.3658	lwp_park(0x00000000, 0)				= 0
/2:	 1.3659	lwp_unpark(1)					= 0
/1:	 1.3659	lwp_park(0x00000000, 0)				= 0
/2:	 1.3660	lwp_unpark(1)					= 0
/1:	 1.3660	lwp_park(0x00000000, 0)				= 0
/1:	 1.3661	lwp_unpark(2)					= 0
/2:	 1.3661	lwp_park(0x00000000, 0)				= 0

These system calls are, as their names imply, the calls that cause the current LWP to stop (park) and allow the current LWP to allow another parked LWP to run (unpark). If we consider the noddy example code:


#include <stdlib.h>
#include <unistd.h>
#include <thread.h>
#include <synch.h>

mutex_t global_lock;
int global_count;

static void \*
locker(void \*arg) 
{
	while (1) {
		mutex_lock(&global_lock);
		global_count++;
		mutex_unlock(&global_lock);
	}
}

int
main(int argc, char \*\*argv)
{
	int count;

	if (argc == 2) {
		count = strtol(argv[1], NULL, NULL);
	}  else {
		count = 2;
	}
	while (--count > 0) {
		thr_create(0, 0, locker, NULL, NULL, NULL);
	}
	locker((void \*)time);
}

The mutex global_lock is going to be battled over by all the threads that are created if one of those threads needs to sleep, as it can't get the mutex, then it has to make a system call so that it can stop using the CPU. The system call is lwp_park(). When the other thread, the one that has the mutex, releases the mutex it signals the blocked thread using lwp_unpark(), with the LWP id of the thread to which to start. This can be seen in the truss:


/1:	 1.3658	lwp_park(0x00000000, 0)				= 0
/2:	 1.3659	lwp_unpark(1)					= 0

/1:	 1.3659	lwp_park(0x00000000, 0)				= 0

/2:	 1.3660	lwp_unpark(1)					= 0
/1:	 1.3660	lwp_park(0x00000000, 0)				= 0
/1:	 1.3661	lwp_unpark(2)					= 0
/2:	 1.3661	lwp_park(0x00000000, 0)				= 0

however the truss can be a bit misleading. You have to remember that truss only reports on system calls when the system call returns (unless they block long enough to be reported as sleeping). So for a call like lwp_park which will sleep until there is a corresponding lwp_unpark call from another thread. In the truss output above you can see this. LWP 2, on the first line in red calls lwp_unpark(1) to unpark LWP 1, at this point LWP 1 returns from what was the blocked lwp_park() call and continues on it's merry way. Alas as can be seen it does not get very far before it once again finds itself blocked but that is what happens if your code is just fighting for a lock. If this were a real application then there are many D scripts that could be used to help track down your issue not least one like this:

pfexec /usr/sbin/dtrace -n 'syscall::lwp_park:entry /execname == "mutex"/
    { @[ustack(5)] = count() }'
About

This is the old blog of Chris Gerhard. It has mostly moved to http://chrisgerhard.wordpress.com

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today