X

An Oracle blog about Solaris

Recent Posts

How to set up NAT for zones using Oracle Solaris 11.3 PF (Packet Filter)

New in Oracle Solaris 11.3 is the BSD Packet Filter firewall. Which means my previous configuration for have a single IP address for my global zone and several zones (kernel or native) configured behind a NAT firewall will need to be changed eventually to use PF rather than IPFilter (the default firewall still in 11.3 and earlier). I did spend some time reading the documentation which explains it all very well once you understand the product, but for someone who was starting from nowhere (I had help setting up the IPfilter rules) I was a bit stumped. I couldn't find any good blog posts on it either, so thought it would be worth blogging my notes. These are the steps I found you need to follow, it includes setting up a basic zone as well: Step 1: Make sure PF is installed and enabled Step 2: Create VNICs for your private network Step 3: Create a zone Step 4: Create your PF rules for NAT Step 5: Enable IP forwarding Step 6: Checking it all works Lets take these one at a time: Step 1: Make sure PF is installed and enabled root@S11_3_SRU:~# pkg install network/firewall            Packages to install:   1             Services to change:   1        Create boot environment:  No Create backup boot environment: Yes DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED Completed                                1/1         33/33      0.4/0.4  123k/s PHASE                                          ITEMS Installing new actions                         92/92 Updating package state database                 Done Updating package cache                           0/0 Updating image state                            Done Creating fast lookup database                   Done Updating package cache                           1/1 root@S11_3_SRU:~# svcs firewall:default STATE          STIME    FMRI disabled       14:56:38 svc:/network/firewall:default root@S11_3_SRU:~# svcs ipfilter STATE          STIME    FMRI disabled       14:53:06 svc:/network/ipfilter:default You want to make sure ipfilter is disabled before you enable PF: root@S11_3_SRU:~# svcadm enable firewall:default root@S11_3_SRU:~# svcs firewall STATE          STIME    FMRI online         15:05:13 svc:/network/firewall:default There are no active rules that product the system in the default configuration you can check this with the pfctl command: root@S11_3_SRU:~# pfctl -s rules root@S11_3_SRU:~# This shows that no rules are configured. We'll fix that later. Step 2: Create VNICs for your private network Each zone is going to need a virtual NIC to attach to, and the global zone will need a VNIC on the same "network" as the zones to allow them to communicate, in addition if you don't want things you do in the zone to be seen on the real network you need to create a stub on which to create the vnic: root@S11_3_SRU:~# dladm create-etherstub stub0 root@S11_3_SRU:~# dladm create-vnic -l stub0 zone1 root@S11_3_SRU:~# dladm create-vnic -l stub0 gz0 root@S11_3_SRU:~# dladm show-link LINK                CLASS     MTU    STATE    OVER net0                phys      1500   up       -- stub0               etherstub 9000   unknown  -- zone1               vnic      9000   up       stub0 gz0                 vnic      9000   up       stub0 The one I'm going to use in the Global Zone for NAT needs an IP address on the same private network as I'm going to use for zone1: root@S11_3_SRU:~# ipadm create-ip gz0 root@S11_3_SRU:~# ipadm create-addr -a 192.160.0.1 gz0 gz0/v4 root@S11_3_SRU:~# ipadm show-addr ADDROBJ           TYPE     STATE        ADDR lo0/v4            static   ok           127.0.0.1/8 net0/v4           dhcp     ok           10.0.2.15/24 gz0/v4            static   ok           192.160.0.1/24 lo0/v6            static   ok           ::1/128 net0/v6           addrconf ok           fe80::a00:27ff:fe35:d606/10 OK so we're good to go on the IP address front. Now lets create a zone. Step 3: Creating a zone You may already have zones configured, but I'm creating a new one with using our zone1 vnic as the physical layer for the network (and setting autoboot to be true): root@S11_3_SRU:~# zonecfg -z zone1 Use 'create' to begin configuring a new zone. zonecfg:zone1> create create: Using system default template 'SYSdefault' zonecfg:zone1> remove anet 0 zonecfg:zone1> add net zonecfg:zone1:net> set physical=zone1 zonecfg:zone1:net> end zonecfg:zone1> set autoboot=true zonecfg:zone1> verify zonecfg:zone1> exit Now we need to install some software in the zone: root@S11_3_SRU:~# zoneadm -z zone1 install The following ZFS file system(s) have been created:     rpool/VARSHARE/zones/zone1 Progress being logged to /var/log/zones/zoneadm.20171113T145423Z.zone1.install        Image: Preparing at /system/zones/zone1/root.  Install Log: /system/volatile/install.3974/install_log  AI Manifest: /tmp/manifest.xml.ojc6sa   SC Profile: /usr/share/auto_install/sc_profiles/enable_sci.xml     Zonename: zone1 Installation: Starting ... <...... Stuff Deleted.......> Some time later we can boot it and check the installation: root@S11_3_SRU:~# zoneadm -z zone1 boot root@S11_3_SRU:~# zlogin -C zone1 We'll see the usual System Configuration Tool, this is where I will (for now) set up the IP addresses etc, but you can to this in the zone at any time (Have a look at the docs here if you want to know how to reconfigure a zone to use vnics). This is a screen shot of the zone1 Network Configuration screen From this we can log in to the zone and ping various places, but obviously nothing out side of the box I'm using. So it's time to set up the PF firewall rules.   Step 4: Create your PF rules for NAT The NAT rules in PF are really simple. For our example it's just this: pass out on net0 from 192.160.0.0/24 to any nat-to (net0) I add this to the end of /etc/firewall/pf.conf, and refresh the smf service with: root@S11_3_SRU:~# svcadm refresh firewall:default Analysing what this rule does it says (from left to right): Allow out bound packets on net0 from anything on my private network to any ip address, but then rewrite the source address to be that of net0. The brackets around the last net0 simply say to PF to update the rule if the IP address of net0 changes. You can add other firewall rules obviously. Checking it's configured correctly: root@S11_3_SRU:~# pfctl -s rules pass out on net0 inet from 192.160.0.0/24 to any flags S/SA nat-to (net0) round-robin This is how I expect it to look Step 5: Enable IP forwarding This is the one I forgot to do and was scratching my head for a while. As always I should have read the docs first: root@S11_3_SRU:~# ipadm show-ifprop -p forwarding net0 IFNAME      PROPERTY        PROTO PERM CURRENT    PERSISTENT DEFAULT    POSSIBLE net0        forwarding      ipv4  rw   off        --         off        on,off net0        forwarding      ipv6  rw   off        --         off        on,off root@S11_3_SRU:~# ipadm show-ifprop -p forwarding gz0 IFNAME      PROPERTY        PROTO PERM CURRENT    PERSISTENT DEFAULT    POSSIBLE gz0         forwarding      ipv4  rw   off        --         off        on,off gz0         forwarding      ipv6  rw   off        --         off        on,off Now check they're enabled: root@S11_3_SRU:~# ipadm set-ifprop -p forwarding=on -m ipv4 net0 root@S11_3_SRU:~# ipadm set-ifprop -p forwarding=on -m ipv4 gz0 IFNAME      PROPERTY        PROTO PERM CURRENT    PERSISTENT DEFAULT    POSSIBLE net0        forwarding      ipv4  rw   on         on         off        on,off net0        forwarding      ipv6  rw   off        --         off        on,off root@S11_3_SRU:~# ipadm show-ifprop -p forwarding gz0 IFNAME      PROPERTY        PROTO PERM CURRENT    PERSISTENT DEFAULT    POSSIBLE gz0         forwarding      ipv4  rw   on         on         off        on,off gz0         forwarding      ipv6  rw   off        --         off        on,off Step 6: Checking it all works Ping now works, but name services don't as I've not restarted the name-services service, so I simply rebooted it, just to make it clean root@zone1:~# ssh user@somehost This now works as expected. So all is good. I've done this very roughly mostly to jog my memory, but hopefully someone else will find it useful      

New in Oracle Solaris 11.3 is the BSD Packet Filter firewall. Which means my previous configuration for have a single IP address for my global zone and several zones (kernel or native)...

Solaris

Better ways to handle crash dumps in Solaris

Better ways to handle crash dumps in Solaris For some time now we've realised that system crash dumps in Solaris are getting larger, and taking longer to preserve. A number of us have been working on projects to help speed these up, and reduce the size. The first of these was "Crash Dump Restructuring". Implemented by Sriman and Vlad This went in to Solaris 11.2, and allowed the administrator to remove certain portions from the crash dump via the dumpadm(1m) command. Or even if you do need all of the data, the less useful data (like zfs metadata) is stored in a separate file, and you can analyse the main core file without the suplementary ones. This should allow you to send only the required data back to Oracle for analysis. For more information have a look at the documentation here and here Another popular one is the integration of the Autonomous Crashdump Analysis Tool (ACT) in to Solaris. This again was added in 11.2 by my colleague Chris. This allows you to run an mdb dcmd on a crash dump and get a summary of the commonly needed data in a text form. Again you could send this small(er) text file to Oracle for analysis before sending the entire crash dump. Though act is part of the mdb package, you need to load the module to make it work </var/crash/1> # mdb -k 1Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix zvpsm scsi_vhci iommu zfs rpcmod sata sd ip hook neti arp usba i915 stmf stmf_sbd sockfs md random idm cpc crypto ipc fcip fctl fcp zvmm lofs ufs logindmux nsmb ptm sppp nfs ]> ::load act> ::actcollector: 8.17osrelease: 5.11.2osversion: s11.2arch: i86pctarget: core[=== ACT Version: 8.17 <<< SUMMARY >>> ===][=== ACT report for core dump ===]hostname: myhostdomainname: mydomain.oracle.comrelease: SunOS 5.11.2architecture: i86pcisa_list: amd64 pentium_pro+mmx pentium_pro pentium+mmx pentium i486 i386 i86pagesize: 0t4096hostid: 0x296e9system booted at:0x543e6f2e ##:2014 Oct 15 12:57:18 GMTsystem crashed at:0x543e6ff3 ##:2014 Oct 15 13:00:35 GMTdump started at:0x543e6ff8 ##:2014 Oct 15 13:00:40 GMTpanic: forced crash dump initiated at user request<and copious stuff deleted> And finally the big project that has really eaten all my and my colleagues time for a while now, and the motivation for this blog post, Deferred Dump. Working with Brian, Nick, and again Sriman and Vlad, we've worked to preserve the crash dump information in memory across a reboot. Thus eliminating the painfully slow process of writing to a dump device and extracting it again. Vlad has written an excellent blog about how this was possible, and why it needed doing. Please do go and take a look here Deferred dump is now available in Solaris 11.2.8 (SRU 8 of Solaris 11 update 2), so we felt it was a good time to start talking about it. Vlad has highlighted some of the requirements as well.

Better ways to handle crash dumps in Solaris For some time now we've realised that system crash dumps in Solaris are getting larger, and taking longer to preserve. A number of us have been working on...

Solaris

Creating an SMF service for mercurial web server

I'm working on a project at the moment, which has a number of contributers. We're managing the project gate (which is stand alone) with mercurial. We want to have an easy way of seeing the changelog, so we can show management what is going on.  Luckily mercurial provides a basic web server which allows you to see the changes, and drill in to change sets. This can be run as a daemon, but as it was running on our build server, every time it was rebooted, someone needed to remember to start the process again. This is of course a classic usage of SMF. Now I'm not an experienced person at writing SMF services, so it took me 1/2 an hour or so to figure it out the first time. But going forward I should know what I'm doing a bit better. I did reference this doc extensively. Taking a step back, the command to start the mercurial web server is $ hg serve -p <port number> -d So we somehow need to get SMF to run that command for us. In the simplest form, SMF services are really made up of two components. The manifest Usually lives in /var/svc/manifest somewhere Can be imported from any location The method Usually live in /lib/svc/method I simply put the script straight in that directory. Not very repeatable, but it worked Can take an argument of start, stop, or refresh Lets start with the manifest. This looks pretty complex, but all it's doing is describing the service name, the dependencies, the start and stop methods, and some properties. The properties can be by instance, that is to say I could have multiple hg serve processes handling different mercurial projects, on different ports simultaneously Here is the manifest I wrote. I stole extensively from the examples in the Documentation. So my manifest looks like this $ cat hg-serve.xml <?xml version="1.0"?><!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1"><service_bundle type='manifest' name='hg-serve'><servicename='application/network/hg-serve'type='service'version='1'><dependency name='network' grouping='require_all' restart_on='none' type='service'> <service_fmri value='svc:/milestone/network:default' /> </dependency><exec_method type='method' name='start' exec='/lib/svc/method/hg-serve %m' timeout_seconds='2' /><exec_methodtype='method'name='stop'exec=':kill'timeout_seconds='2'></exec_method><instance name='project-gate' enabled='true'><method_context> <method_credential user='root' group='root' /> </method_context><property_group name='hg-serve' type='application'> <propval name='path' type='astring' value='/src/project-gate'/><propval name='port' type='astring' value='9998' /></property_group> </instance><stability value='Evolving' /><template> <common_name> <loctext xml:lang='C'>hg-serve</loctext> </common_name> <documentation> <manpage title='hg' section='1' /> </documentation> </template> </service> </service_bundle> So the only things I had to decide on in this are the service name "application/network/hg-serve" the start and stop methods (more of which later) and the properties. This is the information I need to pass to the start method script. In my case the port I want to start the web server on "9998", and the path to the source gate "/src/project-gate". These can be read in to the start method. So now lets look at the method scripts $ cat /lib/svc/method/hg-serve #!/sbin/sh### Copyright (c) 2012, Oracle and/or its affiliates. All rights reserved.## Standard prolog#. /lib/svc/share/smf_include.shif [ -z $SMF_FMRI ]; then echo "SMF framework variables are not initialized." exit $SMF_EXIT_ERRfi## Build the command line flags## Get the port and directory from the SMF propertiesport=`svcprop -c -p hg-serve/port $SMF_FMRI`dir=`svcprop -c -p hg-serve/path $SMF_FMRI`echo "$1"case "$1" in'start')cd $dir/usr/bin/hg serve -d -p $port ;;*) echo "Usage: $0 {start|refresh|stop}" exit 1 ;; esacexit $SMF_EXIT_OK This is all pretty self explanatory, we read the port and directory using svcprop, and use those simply to run a command in the start case. We don't need to implement a stop case, as the manifest says to use "exec=':kill'for the stop method. Now all we need to do is import the manifest and start the service, but first verify the manifest # svccfg verify /path/to/hg-serve.xml If that doesn't give an error try importing it # svccfg import /path/to/hg-serve.xml If like me you originally put the hg-serve.xml file in /var/svc/manifest somewhere you'll get an error and told to restart the import service svccfg: Restarting svc:/system/manifest-import The manifest being imported is from a standard location and should be imported with the command : svcadm restart svc:/system/manifest-import# svcadm restart svc:/system/manifest-import and you're nearly done. You can look at the service using svcs -l # svcs -l hg-servefmri svc:/application/network/hg-serve:project-gatename hg-serveenabled falsestate disablednext_state nonestate_time Thu May 31 16:11:47 2012logfile /var/svc/log/application-network-hg-serve:project-gate.logrestarter svc:/system/svc/restarter:defaultcontract_id 15749 manifest /var/svc/manifest/network/hg/hg-serve.xmldependency require_all/none svc:/milestone/network:default (online) And look at the interesting properties # svcprop hg-servehg-serve/path astring /src/project-gatehg-serve/port astring 9998 ...stuff deleted.... Then simply enable the service and if every things gone right, you can point your browser at http://server:9998 and get a nice graphical log of project activity. # svcadm enable hg-serve# svcs -l hg-servefmri svc:/application/network/hg-serve:project-gatename hg-serveenabled truestate onlinenext_state nonestate_time Thu May 31 16:18:11 2012logfile /var/svc/log/application-network-hg-serve:project-gate.logrestarter svc:/system/svc/restarter:defaultcontract_id 15858 manifest /var/svc/manifest/network/hg/hg-serve.xmldependency require_all/none svc:/milestone/network:default (online) None of this is rocket science, but a bit fiddly. Hence I thought I'd blog it. It might just be you see this in google and it clicks with you more than one of the many other blogs or how tos about it. Plus I can always refer back to it myself in 3 weeks, when I want to add another project to the server, and I've forgotten how to do it.

I'm working on a project at the moment, which has a number of contributers. We're managing the project gate (which is stand alone) with mercurial. We want to have an easy way of seeing the changelog,...

Solaris

Email notification of FMA events

One of the projects I worked on for Solaris 11 was to record some information on System Panics in FMA events.Now I want to start making it easier to gather this information and map it to known problems. So starting internally I plan to utilise another feature which we developed as part of  the same effort. This is the email notifications framework. Rob Johnston described this feature in his blog here. So the nice feature I want to utilise it custom message templates. So I thought I'd share how to do this It's pretty simple, but I got burnt by a couple of slight oddities - which we can probably fix. First off I needed to create a template. There are a number of committed expansion tokens - these will work to expand information from the FMA event in to meaninful info in the email. The ones I care about this time are %<HOSTNAME> : Hostname of the system which had the event %<UUID> : UUID of the event - so you can mine more information %<URL> : URL of the knowledge doc describing the problem In addition I want to get some data that is panic specific. As yet these are uncommitted interfaces and shouldn't be relied upon, but for my reference these can be accessed Panic String of the dump is %<fault-list[0].panicstr>Stack trace to put in to MOS is  %<fault-list[0].panicstack> These are visible in the panic event - so I don't feel bad about revealing the names, but I stress they shouldn't be relied upon. So create a template which contains the text you want. Make sure it's readable by the noaccess user (ie. not /root) The one I created for now looks like this # cat /usr/lib/fm/notify/panic_template%<HOSTNAME> PanickedFor more information log in to %<HOSTNAME> and run the commandfmdump -Vu %<UUID>Please look at %<URL> for more informationCrash dump is available on %<HOSTNAME> in %<fault-list[0].dump-dir>Panic String of the dump is %<fault-list[0].panicstr>Stack trace to put in to MOS is %<fault-list[0].panicstack>  I then need to add this to the notification for the "problem-diagnosed" event class. This is done with the svccfg command # svccfg setnotify problem-diagnosed \"mailto:someone@somehost?msg_template=/usr/lib/fm/notify/panic_template\" (Note the backslashes and quotes - they're important to get the parser to recognise the "=" correctly.) It would be nice to tie it specifically to a panic event, but that needs a bit of plumbing to make it happen. You can  verify it is configured correctly with the command # svccfg listnotify problem-diagnosed     Event: problem-diagnosed (source: svc:/system/fm/notify-params:default)         Notification Type: smtp             Active: true             reply-to: root@localhost             msg_template: /usr/lib/fm/notify/panic_template             to: someone@somehost Now when I get a panic, I get an email with some useful information I can use to start diagnosing the problem. So what next? I think I'll try to firm up the stability of the useful members of the event, and may be create a new event we can subscribe to for panics only, then make this template an "extended support" option for panic events, and make it easily configurable. Please do leave comments if you have any opinions on this and where to take it next.

One of the projects I worked on for Solaris 11 was to record some information on System Panics in FMA events.Now I want to start making it easier to gather this information and map it to known...

Solaris

More thoughts on ZFS compression and crash dumps

Thanks to Darren Moffat for poking holes in my previous post, or more explicitly pointing out that I could add more useful and interesting data. Darren commented that it was a shame I hadn't included the time to take a crash dump along side the size, and space usage. The reason for this is that one reason for using vmdump format compression from savecore is to minimize the time required to get the crash dump off the dump device and on to the file system. The motivation for this reaches back many years, back to when the default for Solaris was to use swap as the dump device. So when you brought the system back up, you wanted to wait till savecore completed before letting the system complete coming up to multiuser (you can tell how old this is by the fact we're not talking about SMF services) So with Oracle Solaris 11 the root file system is ZFS, and the default configuration is to dump to a dump ZVOL. And as it's not used by anything else, the savecore can and does run in the background. So it isn't quite as important to make it as fast as possible. It's still interesting though, as with everything in life, it's a compromise. One problem with the tests I wrote about yesterday is the size of the dumps is too small to make measurement of time easy (size is one thing, but we have fast disks now, so getting 8GB off a zvol on to a file system takes very little time) So this is not a completely scientific test, but an illustration which helps me understand what the best solution for me is. My colleague Clive King wrote a driver to leak memory to create larger kernel memory segments, which artificially increases the amount of data a crash dump contains. I told this to leak 126GB of kernel memory, set the savecore target directory to be one of "uncompressed" "gzip9 compressed" or "LZJB Compressed", and in the first case set it to use vmdump format compressed dumps, oj and then I took a crash dump, repeating over the 3 configurations. The idea being to time the difference in getting the dump on to the file system. This is a table of what I found  Size Leaked (GB)  Size of Crash Dump (GB)  ZFS pool space used (GB)  Compression  Time to take dump (mm:ss)  Time from panic to crash dump available (mm:ss)  126  8.4  8.4  vmdump  01:48  06:15  126  140  2.4  GZIP level 9  01:47  11:39  126  141  8.02  LZJB  01:57  07:05 Notice one thing, the compression ratio for gzip 9 is massive - 70x, so this is probably a side effect of the fact it's not real data, but probably contains some easily compressible data. The next step should be to populate the leaked memory with random data. So what does this tel us - assuming the lack of random content isn't an issue, that for a modest hit in time take to get the dump from the dump device (7:05 vs 6:15) we get an uncompressed dump on an LZJB compressed ZFS file system while using a comparable amount of physical storage. This allows me to directly analyse the dump as soon as it's available. Great for development purposes. Is it of benefit to our customers? That's something I'd like feedback on. Please leave a comment if you see value in this being the default.

Thanks to Darren Moffat for poking holes in my previous post, or more explicitly pointing out that I could add more useful and interesting data. Darren commented that it was a shame I hadn't included...

Solaris

Exploring ZFS options for storing crash dumps

Systems fail - for a variety of reasons, That's what keeps me in a job. When they do, you need to store data about the way it failed if you want to be able to diagnose what happened. This is what we call a crash dump. But this data consumes a large amount of space on disk. So I thought I'd explore which ZFS technologies can help reduce the overhead of storing this data. The approach I took was to make a system (x4600M2 with 256GB memory) moderately active using the filebench benchmarking utility, available from the Solaris 11 Package repository. Then take a system panic using reboot -d, then repeat the process twice more, taking live crash dumps using savecore -L. This should generate 3 separate crash dumps, which will have some unique, and some duplicated data. There are a number of technologies available to us. savecore compressed crash dumps not a ZFS technology, but the default behavior in Oracle Solaris 11 Have a read of Steve Sistare's great blog on the subject ZFS deduplication Works on a block level Should store only one copy of a block if it's repeated among multiple crash dumps ZFS snapshots If we are modifying a file, we should only save the changes To make this viable, I had to modify the savecore program to create a snapshot of a filesystem on the fly, and reopen the existing crash dump and modify the file rather than create a new one ZFS compression either in addition to or instead of savecore compression Multiple different levels of compression I tried LZJB and GZIP (at level 9) All of these can be applied to both compressed (vmdump) and non-compressed (vmcore) crash dumps.So I created multiple zfs data sets with the properties and repeated the crash dump creation, adjusting savecore configuration using dumpadm(1m) to save to the various data sets, either using savecore compression or not. Remember also one of the motivations of saving a crash dump compressed is to speed up the time it takes to get from the dump device to a file system, so you can send it to Oracle support for analysis. So what do we get? Lets look at the default case, this is no compression on the file system, but using the level of compression achieved by savecore (which is the same as the panic process, and is either LZJB or BZIP2). In this we have three crash dumps, totaling 8.86GB. If these same dumps are uncompressed we get 36.4GB crash dumps (so we can see that savecore compression is saving us a lot of space) Interestingly use of dedup seems to not give us any benefit, I wouldn't have expected it to do so on vmdump format comrpessed dumps, as the act of compression is likely to make many more block unique, but I was surprised so no vmcore format uncompressed dumps showed any benefit. It's hard to see the how dedup is behaving because from a ZFS layer perspective  the data is still full size, but use of zdb(1m) can show us the dedup table # zdb -D spaceDDT-sha256-zap-duplicate: 37869 entries, size 329 on disk, 179 in coreDDT-sha256-zap-unique: 574627 entries, size 323 on disk, 191 in corededup = 1.03, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.03 the extra 0.03 dedup only came about when I started using the same pool for building Solaris kernel code. I believe the lack of benefit is due to the fact that dedup works at a block level, and as such, even the change of a single pointer in a single data structure in a block of the crash dump would result in the block being unique and not being deduped In light of this - the fact that my modified savecore code to use snapshots didn't show any benefit, is not really a surprise. So that leaves compression. And this is where we get some real benefits. By enabling both savecore and zfs compression get between 20 and 50% saving in disk space. On uncompressed dumps, you get data size between 4.63GB and 8.03GB - ie. comparable to using savecore compression. The table here shows the various usage "> Name  Savecore compression  ZFS DEDUP  ZFS Snapshot  ZFS compression  Size (GB)  % of default  Default  Yes  No  No  No  8.86  100  DEDUP  Yes  Yes  No  No  8.86  100  Snapshot  Yes  No  Yes  No  14.6  165  GZ9compress  Yes  No  No  GZIP level 9  4.46  50.3  LZJBcompress  Yes  No  No  LZJB  7.2  81.2  Expanded  No  No  No  No  36.4  410  ExpandedDedup  No  Yes  No  No  36.4  410  ExpandedSnapshot  No  No  Yes  No  37.7  425  ExpandedGZ9  No  No No GZIP level 9  4.63  52.4  ExpandedLZJB  No  No No LZJB  8.03  91 The anomaly here is the Snapshot of savecore compressed data. I can only explain that by saying that, though I repeated the same process, the crashdumps created were larger in that particular case. 5Gb each in stead of 5GB and two lots of 2GB So what does this tell us? Well fundamentally, the Oracle Solaris 11 default does a pretty good job of getting a crash dump file off the dump device and storing it in an efficient way. That block level optimisations don't help in minimizing the data size (dedup and snapshot). And compression helps us in data size (big surprise there - not!) If disk space is an issue for you, consider creating a compressed zfs data set to store you crash dumps in. If you want to analyse crash dumps in situ, then consider using uncompressed crash dumps, but written to compressed zfs data set. Personally, as I do tend to want to look at the dumps as quickly as possible, I'll be setting my lab machines to create uncompressed crash dumps using # dumpadm -z off but create a zfs compressed data set, and use # dumpadm -s /path/to/compressed/data/set to make sure I can analyse the crash dump, but still not waste storage space

Systems fail - for a variety of reasons, That's what keeps me in a job. When they do, you need to store data about the way it failed if you want to be able to diagnose what happened. This is what...

Solaris

Package Version Numbers, why are they so important

One of the design goals of IPS has been to allow people to easily move forward to a supported "Surface" of component. That is to say, when you  # pkg update your system, you get the latest set of components which all work together, based on the packages you already have installed. During development, this has meant simply you update to the latest "build" of the components. (During development, we build everything and publish everything every two weeks). Now we've released Solaris 11 using the IPS technologies, things are a bit more complicated. We need to be able to reflect all the types of Solaris release we are doing. For example Solaris Development builds, Solaris Update builds and "Support Repository Updates" (the replacement for patches) in the version scheme. So simply saying "151" as the build number isn't sufficient to articulate what you are running, or indeed what is available to update to In my previous blog post I talked about creating your own package, and gave an example FMRI of pkg://tools/mytools@1.3,0.5.11-0.0.0 But it's probably more instructive to look at the FMRI of a Solaris package. The package "core-os" contains all the common utilities and daemons you need to use Solaris.  $ pkg info core-os Name: system/core-os Summary: Core Solaris Description: Operating system core utilities, daemons, and configuration files. Category: System/Core State: Installed Publisher: solaris Version: 0.5.11 Build Release: 5.11 Branch: 0.175.0.0.0.2.1Packaging Date: Wed Oct 19 07:04:57 2011 Size: 25.14 MB FMRI: pkg://solaris/system/core-os@0.5.11,5.11-0.175.0.0.0.2.1:20111019T070457Z The FMRI is what we will concentrate on here. In this package "solaris" is the publisher. You can use the pkg publisher command to see where the solaris publisher gets it's bits from $ pkg publisherPUBLISHER TYPE STATUS URIsolaris origin online http://pkg.oracle.com/solaris/release/ So we can see we get solaris packages from pkg.oracle.com.  The package name is system/core-os. These can be arbitrary length, just to allow you to group similar packages together. Now on the the interesting? bit, the versions, everything after the @ is part of the version. IPS will only upgrade to a "higher" version. core-os@0.5.11,5.11-0.175.0.0.0.2.1:20111019T070457Z core-os = Package Name0.5.11 = Component - in this case we're saying it's a SunOS 5.11 package, = separator5.11 = Built on version - to indicate what OS version you built the package on- = another separator0.175.0.0.0.2.1 = Branch Version: = yet another separator20111019T070457Z = Time stamp when the package was published So from that we can see the Branch Version seems rather complex. It is necessarily so, to allow us to describe the hierarchy of releases we doIn this example we see the following0.175: is known as the trunkid, and is incremented each build of a new release of Solaris. During Solaris 11 this should not change  0: is the Update release for Solaris. 0 for FCS, 1 for update 1 etc 0: is the SRU for Solaris. 0 for FCS, 1 for SRU 1 etc 0: is reserved for future use 2: Build number of the SRU 1: Nightly ID - only important for Solaris developersTake a hypothetical example core-os@0.5.11,5.11-0.175.1.5.0.4.1:<something> This would be build 4 of SRU 5 of Update 1 of Solaris 11 This is actually documented in a MOS article 1378134.1 Which you can read if you have a support contract.

One of the design goals of IPS has been to allow people to easily move forward to a supported "Surface" of component. That is to say, when you  # pkg update your system, you get the latest set of...

Solaris

Adventures in Regular Expressions

I'm one of those people who will get stuck in and solve problems, even if I don't know everything about an area a problem lies in. As such I often find I'm learning new and unexpected things. Hey, that's why I love coming to work. So the project I'm working on at the moment relates to how we build the SRUs (the Support Repository Updates), which replace patches in Solaris 11. As such I'm learning a lot about IPS - the Image Packaging System, and in particular how the tools it provides help you deliver consistent and upgradeable packages. The trick I picked up this week is about Regular Expressions. I have need to change some values in an FMRI of a package. I had been doing it in a set of shell scripts until Mark Nelson of the SMF team pointed out I was rewriting pkgmogrify(1) So reading the pkgmogrify(1) man page left me feeling less than knowledgeable, I'd sort of gathered it worked with regular expressions. Now these are well known in the industry, just I've never needed to use them. So after a bit of experimenting I find I can substitute values in a string using the "edit" directive. This is the relevant portion of the man page, which makes sense now I know the answer      edit      Modifies an attribute of the action.  Three arguments are taken.  The first is the name of the attribute, the second is a regular expression matching the attribute value, and the third is the replacement string substituted for the portion of the value matched by the regular expression. Unlike the regular expression used to match an action, this expression is unanchored.  Normal regular expression backreferences, of the form '\1', '\2', etc., are available in the replacement string, if there are groups defined in the regular expression. The last sentence is the clincher for what I need to do. I can search a string for a pattern, and load it in to the "groups", and then reference them in the replacement string. So for example if I want to change the package version string from "-0.175.0.0.0" to "-0.175.1.1.1", I can do it using a transform, but where 175 might change and I want that to be reflected in the resulting manifest eg. $ cat trans/mytransform <transform set name=pkg.fmri -> edit value \ '@*-0.([0-9])([0-9])([0-9]).+' '-0.\1\2\3.1.1.1'> The "Groups" are defined in the round brackets eg. ([0-9]) for example. Then I can simply run pkgmogrify to get the  result $ pkgmogrify -I trans -O <output file> <input manifest> mytransform This performs the substitution just as I needed it to.

I'm one of those people who will get stuck in and solve problems, even if I don't know everything about an area a problem lies in. As such I often find I'm learning new and unexpected things. Hey,...

Solaris

How to create your own IPS packages

It's been ages since I blogged, so apologies for that. I'm at Oracle Openworld 2011 this week, and I was helping out on the Solaris booths in the DemoGrounds. One of the people I talked to asked a question, which I've heard internally at Oracle a few times recently. That is "How can I package up my own app as an IPS package?" I've answered it a few times, and so suddenly it struck me as a good subject for a blog.  Most of this information is available either in the documentation,  or other blogs, but this is also for me to reference back to (rather than my notes) With IPS, packages are published to a repository (or repo). This can either be a special pkg web server or simply a directory. For simplicity I've used the latter $ export PKGREPO=/path/to/repo$ pkgrepo create $PKGREPO This populates the $PKGREPO with the required repository structure, you only need to do this once. Packages are described by manifests. You need one per package. If you have existing System V packages, you can allow the tools to generate it all for you, but simply passing the SystemV package file in the a tool called pkgsend $ pkgsend generate sysvpkg > sysvpkg.mf However, often my own tools are just added manually by hand or archived together with tar. So if I want to turn those in to an IPS package, I need to create my own manifest. Again fortunately pkgsend can help, but you'll need to add some details.  So let's assume I've put all my tools in /opt/mytools $ export ROOT=/opt/mytools $ pkgsend generate $ROOT > mytools.mf This manifest needs a line adding to it to describe the package set name=pkg.fmri \ value=pkg://tools/mytools@1.3,0.5.11-0.0.0 This states the publisher (tools), package name (mytools) and the version (1.3,0.5.11-0.0.0) So once you have the working manifest - and there's a lot more detail we can add to these,  they can be published. (to the $PKGREPO directory we created earlier) $ pkgsend publish -s $PKGREPO -d $ROOT mypkg.mf Note for each successfully published you get a "Published" message, but silent failure (exit code is 1 though so it can be detected in a script) You can then add the repo you've just populated to your test machine. You'll need to do this a as a privileged user, such as root # pkg set-publisher -O $PKGREPO tools and add a package # pkg install mytools  Packages to install:  1 Create boot environment: No DOWNLOAD                                  PKGS       FILES    XFER (MB) Completed                                  1/1     716/716      2.4/2.4 PHASE                                        ACTIONS Install Phase                                746/746 PHASE                                          ITEMS Package State Update Phase                       1/1 Image State Update Phase                         2/2 PHASE                                          ITEMS Reading Existing Index                           8/8 Indexing Packages                                1/1 So this is now installed on my system. There are so many interesting things you can add to the manifest, that if you're interested, I'll try to blog about in the future. However one thing I've glossed over here is the package version. It's surprisingly important that you understand what you set that to, so I'll make that the subject of a future blog.

It's been ages since I blogged, so apologies for that. I'm at Oracle Openworld 2011 this week, and I was helping out on the Solaris booths in the DemoGrounds. One of the people I talked to asked a...

Solaris

Modeling Panic event in FMA

I haven't blogged in ages, in fact since Sun was taken over by Oracle. However I've not been idle, far from it, just working on product to get it out to market as soon as possible. However - the release of Solaris 11 Express 2010.11 (yes I've been so busy I haven't even got round to writing this entry for 4 months!) I can tell you about one thing I've been working on with members of the FMA and  SMF teams. It's part of a larger effort to more tightly integrated software "troubles" in to FMA. This includes modeling SMF state changes in FMA, and my favorite, modeling System panic events in FMA. I won't go in to the details, but in summary, when a system reboots after a panic, savecore is run (even if dumpadm -n is in effect) to check if a dump is present on the dump device. If there is, it raise an "Information Report" for fma to process. This becomes and FMA MSGID of  SUNOS-8000-KL. You should see a message on the console if you're looking, giving instructions on what to do next. There is a small amount of data about the crash, panicstring, stack, date etc embedded in the report. Once savecore is run to extract the dump from the dump device, another information report is raised which FMA ties to the first event, and solves the case. One of the nice things that can then happen, is the FMA notification capabilities are open to us, so you could set up an SNMP trap or email notification for such a panic. A small thing, but it might help some sysadmins in the middle of the night. One final thing. That small amount of information in the Ireport can be accessed using fmdump with the -V flag for the uuid of the fault (as reported in the messages on the console or fmadm faulty), for example, this was from a panic I induced by clearing the root vnode pointer. # fmdump -Vu b2e3080a-5a85-eda0-eabe-e5fa2359f3d0 TIME UUID SUNW-MSG-IDJan 13 2011 13:39:17.364216000 b2e3080a-5a85-eda0-eabe-e5fa2359f3d0 SUNOS-8000-KL TIME CLASS ENA Jan 13 13:39:17.1064 ireport.os.sunos.panic.dump_available 0x0000000000000000 Jan 13 13:33:19.7888 ireport.os.sunos.panic.dump_pending_on_device 0x0000000000000000nvlist version: 0 version = 0x0 class = list.suspect uuid = b2e3080a-5a85-eda0-eabe-e5fa2359f3d0 code = SUNOS-8000-KL diag-time = 1294925957 157194 de = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = fmd authority = (embedded nvlist) nvlist version: 0 version = 0x0 product-id = CELSIUS-W360 chassis-id = YK7K081269 server-id = tetrad (end authority) mod-name = software-diagnosis mod-version = 0.1 (end de) fault-list-sz = 0x1 fault-list = (array of embedded nvlists) (start fault-list[0]) nvlist version: 0 version = 0x0 class = defect.sunos.kernel.panic certainty = 0x64 asru = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = sw object = (embedded nvlist) nvlist version: 0 path = /var/crash/<host>/.b2e3080a-5a85-eda0-eabe-e5fa2359f3d0 (end object) (end asru) resource = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = sw object = (embedded nvlist) nvlist version: 0 path = /var/crash/<host>/.b2e3080a-5a85-eda0-eabe-e5fa2359f3d0 (end object) (end resource) savecore-succcess = 1 dump-dir = /var/crash/tetrad dump-files = vmdump.1 os-instance-uuid = b2e3080a-5a85-eda0-eabe-e5fa2359f3d0 panicstr = BAD TRAP: type=e (#pf Page fault) rp=ffffff0015a865b0 addr=ffffff0200000000 panicstack = unix:die+10f () | unix:trap+1799 () | unix:cmntrap+e6 () | unix:mutex_enter+b () | genunix:lookupnameatcred+97 () | genunix:lookupname+5c () | elfexec:elf32exec+a5c () | genunix:gexec+6d7 () | genunix:exec_common+4e8 () | genunix:exece+1f () | unix:brand_sys_syscall+1f5 () | crashtime = 1294925154 panic-time = January 13, 2011 01:25:54 PM GMT GMT (end fault-list[0]) fault-status = 0x1 severity = Major __ttl = 0x1 __tod = 0x4d2f0085 0x15b57ec0 Any way, I hope you find this feature useful. I'm hoping to use the data embedded in the event for data mining, and problem resolution. However if you have any ideas of other information that could be realistically added to the ireport, then please let me know. However you have to bare in mind this information is written while the system is panicking, so what can be reliably gathered is somewhat limited

I haven't blogged in ages, in fact since Sun was taken over by Oracle. However I've not been idle, far from it, just working on product to get it out to market as soon as possible. However - the...

Solaris

Comparing dtrace output using meld

Comparing dtrace and other debug logs using meld meld is a powerful OpenSource graphical "diff" viewer. It is available from the OpenSolaris IPS repositories so can bee installed from the packagemanager in OpenSolaris or simply by typing    $ pfexec pkg install SUNWmeld It is very clever at identifying the real changes within files and highlighting where the difference start and end. This example is using the output of two dtrace runs, tracing receive network activity in the a development version of the nge driver, once when it works, and once when it doesn't, trying to identify a bug in a development version. First off the dtrace script is    $ cat rx.d    #!/usr/sbin/dtrace -Fs    fbt::nge_receive:entry   {       self->trace=1;    }       fbt::nge_receive:return    /self->trace==1/    {        self->trace=0;    }       fbt:::entry    /self->trace==1/    {        printf("%x",arg0);    }       fbt:::return    /self->trace == 1/    {        printf("%x",arg0);    } This very simply traces all function calls from the nge_receive() function. So I ran it twice, once when the network was working, and once when it wasn't and simply ran meld over the two files.    $ meld rx.out rx.works.out This throws up a gui as seen here It's worth loading full size. What you see is on the right a large area of probes that have fired that do not exist within the one on the left. That implies a lot of code run in the working case that is missing from the failing case.This is a picture of source code of nge_receive() You can see it essentially does two things   o Calls nge_recv_ring()   o If that succeeds calls mac_rx()Looking at the meld screenshot you can see the big green area starts at mac_rx. So in the failing case nge_receive() doesn't call mac_rx() (that'd explain why it fails to receive a packet).Why doesn't it? Well it implies that nge_recv_ring() returns NULL. nge_recv_ring() is supposed to return an mblk, and it hasn't. Why is that? well looking in to the blue and red highlighted area in the meld window, we see another area in the working case that is missing in the failing case. Hey presto, this bit is the call to allocb(). allocb() is used to allocate an mblk.So we know in the failing case the nge_recv_ring() function fails to allocate an mblk. Now just need to work out why.I found this a powerful way of viewing complex data and quickly homing in on differences.

Comparing dtrace and other debug logs using meld meld is a powerful OpenSource graphical "diff" viewer. It is available from the OpenSolaris IPS repositories so can bee installed from the...

Solaris

It's Finally Here

I've decided I really need to get back to writing a blog occasionally, and what better day to choose than June 1 2009. Why? Well today we release OpenSolaris 2009.06, the latest OpenSource release of our operating system Solaris. I know this all sounds a bit marketing, but actually there are some really good reasons for running OpenSolaris on your own machine. First off, it is the most secure OS I know of. No need to Virus protection. Second, it just works (mostly). I've just got a new Macbook Pro, I always find it easier to do development work on Solaris than any other platform so I like to run OpenSolaris. It installed pretty much seamlessly (just having to change the EFI disk label using the macOS fdisk utility as described here). The only thing that doesn't work out of the box is the Wifi - which is a pain. It's a broadcom chipset so I've got hold of a PCI3/4 Atheros card which works well Third, all the development tools I need (and indeed anyone developing for or on Solaris) are available within the standard repositories. I found this page which is how I set up my laptop as a build machine. From a day to day computing perspective it does everything I need. Mail, Web, chat all included, OpenOffice in the repositories for free (and simple) download. A new Media player in Elisa (in the repo), though unfortunately you have to buy the codecs for many common video formats. So the next questions is, is it any different from 2008.11? Well it's hard for me to say as I've been upgrading every few weeks to the latest development builds (by using the opensolaris.com/dev repository). But I did install it fresh in side a VirtualBox VM and was impressed with the speed of the install. The auto installer is now more complete and can install SPARC machines (necessary for a good proportion of our customers). There are networking improvements, but generally the speed and usability is what you'll notice. Oh and Fast reboot. Makes it much quicker to shutdown or reboot a machine. Today I'm attending Comunity One (or C1 as we call it) and much more will be discussed about OpenSolaris and all our other OpenSource development efforts. I'll try to remember to write a blog about it (though don't hold your breath on recent form :-)

I've decided I really need to get back to writing a blog occasionally, and what better day to choose than June 1 2009. Why? Well today we release OpenSolaris 2009.06, the latest OpenSource release of...

General

Interesting view on making money in business

I had the chance to be a key note speaker at PROMISE (and ICSE workshop) last week. The group uses datamining and A/I techniques to look for patterns and make predictions on a variety of things, like where in code defects might occur, or how much effort a project might entail. I was there because we had  a project a few years a go running to predict which bugs might cause customer escalations, using similar techniques. I was responsible for implementing the fixing of these bugs proactively. My talk was geared around how to put a business case together and run such a project, and ultimately why in this case it was wound up before major benefits were realised.There were loads of other great talks and papers, but the other keynote speaker, Murray Cantor from IBM had some interesting points, one of which I wanted to pull out here.He said that there are three things you can monetize. Innovation, Customer Relationships, and Cost structure. For example, you can Make money by having the first product to market, or a good close relationship with the customer or buy doing it cheaper than anyone else. He  drew this in a triangle like this So this got me thinking as to where Sun fits in to the picture. First off I'd say it's a different place from IBM who put huge resources in to having a close relationship with the customer (Murray indicated he felt IBM was somewhere on the Innovation/Customer line). However It isn't purely at the Innovation point either. We provide Innovative technologies to help lower costs both for our customers (hey Free Software anyone - check out http://opensolaris.com), but also by automating things like system management thus removing cost and complexity (Take a look at our xVM strategy to merge virtualization and system management at http://openxvm.org, thus removing some of the headaches to running a virtual data center). Oh and did I mention our coolthreads hardware? So I think we're probably somewhere between the Cost and Innovation points.I'm not saying this is a full theory of business, but I found it a useful thought experiment to see the different value propositions of various companies business models.

I had the chance to be a key note speaker at PROMISE (and ICSE workshop) last week. The group uses datamining and A/I techniques to look for patterns and make predictions on a variety of things, like...

Solaris

Debugging sparc really (and I do mean really) early boot problems

For some work I've been doing I've had to work out how to debug the sparc boot process, before you can get to kmdb. And yes you can do it, it's just not that easy. So I thought I'd put it on my blog, in case I lose the notes I made in a mail to myself, and it might be of interest to some of you.First off get as much of the diagnostics available from the OBP as possible  {1} ok setenv fcode-debug? truefcode-debug? =          true{1} ok{1} ok setenv diag-switch? truediag-switch? =          true{1} ok reset-allThe reset-all is important as it saves the options the the nvram.Now we try and boot it up - before anything is loaded. Note this requires a debug kernel, but if you're playing in this space and you're on sparc then you probably know that already{ 1} ok boot disk0 -F kernel/unix -H You will see the boot fail like thisRebooting with command: boot disk0 -F kernel/unix -H                  Boot device: /pci@1c,600000/scsi@2/disk@0,0  File and args: -F kernel/unix -Hufs-file-system Halted with -H flag. Warning: Fcode sequence resulted in a net stack depth change of 1 The file just loaded does not appear to be executable.This is expected and how we get to start playing with breakpoints really early on. Note the unix module is not yet loaded so we now have to load the unix module. To do this we load the boot forth code and copy what it does{1} ok see do-boot: do-boot      parse-bootargs halt? l->n if          " Halted with -H flag. " type cr exit    then  get-bootdev load-pkg mount-root zflag? nested? invert and    l->n if          fs-name$ open-zfs-fs    then  load-file setup-props exec-file ; So by copying what do-boot does we can intercept the boot process{1} ok get-bootdev load-pkg mount-root{1} ok load-file setup-propsLoading: /platform/SUNW,Sun-Fire-V240/kernel/unixLoading: /platform/sun4u/kernel/unix{1} ok Now we can start some more magic. A DEBUG kernel will check the stop-me property in kobj_start(). This is something we have to populated in the boor properties which is why we've done all this messing around to get to this point{1} ok cd /chosen {1} ok 00 0 " stop-me" property {1} ok .properties stop-me                  fs-package               ufs-file-systemwhoami                   /platform/sun4u/kernel/uniximpl-arch-name           SUNW,Sun-Fire-V240elfheader-length         001c55c0 elfheader-address        51000000 bootfs                   fed85a80 fstype                   ufsbootargs                 -F kernel/unix -Hbootpath                 /pci@1c,600000/scsi@2/disk@0,0:ammu                      fff74080 memory                   fff74290 stdout                   fed97b90 stdin                    fed97ea8 stdout-#lines            ffffffff name                     chosenWe can now start the boot process using exec-file. It will stop immediately because of the stop-me property (ctrace gives me the stacktrace){1} ok exec-fileType  'go' to resume{1} ok ctracePC: 0000.0000.f004.81e4Last leaf: jmpl  0000.0000.f005.d274   from 0000.0000.0100.8aec client_handler+70       0 w  %o0-%o7: (f0000000 16 f0000000 6d 73 6 fedcb441 1008aec )call 0000.0000.0106.bea8 p1275_sparc_cif_handler        from 0000.0000.0106.7de8 prom_enter_mon+24       1 w  %o0-%o7: (f005d274 fedcbda8 1839400 106af00 185fc00 f005d274 fedcb4f1 1067de8 )call 0000.0000.0106.7dc4 prom_enter_mon        from 0000.0000.0101.9ed4 kobj_start+30       2 w  %o0-%o7: (0 10bdaf0 f002d224 1 1817700 1821dd8 fedcb5c1 1019ed4 )call 0000.0000.0101.9ea4 kobj_start        from 0000.0000.0100.7ac8 _start+10       3 w  %o0-%o7: (f005d274 0 0 0 10bd800 181fc00 fedcb701 1007ac8 )From this point we have access to the unix symbols and can start setting break points. For example{1} ok load_primary +bp{1} ok go0000.0000.010a.c7b0 load_primary         save        %o6, ffffffffffffff30, %o6{1} ok ctracePC: 0000.0000.010a.c7b0 load_primary     Last leaf: call 0000.0000.010a.c7b0 load_primary        from 0000.0000.010a.b46c kobj_init+d8       0 w  %o0-%o7: (1879400 0 fedcbe78 184f000 1879340 181ac00 fedcb111 10ab46c )call 0000.0000.010a.b394 kobj_init        from 0000.0000.0101.9fd0 kobj_start+12c      1 w  %o0-%o7: (f005d274 185c800 184f000 fedcbe78 184f3f8 184e400 fedcb5c1 1019fd0 )call 0000.0000.0101.9ea4 kobj_start        from 0000.0000.0100.7ac8 _start+10       2 w  %o0-%o7: (f005d274 7 0 51000040 51000000 51000040 fedcb701 1007ac8 )I'm interested in getting some more module loading debug info out so lets set moddebug to 0xf{1} ok moddebug l?0 (displays current value of a long){1} ok F moddebug l!{1} ok moddebug l? f {1} ok (set the long to be F then display it again)Now lets see what additional info I get {1} ok go/kernel/fs/sparcv9/specfs symbol _info multiply defined/kernel/fs/sparcv9/specfs symbol _init multiply definedReturned from _info, retval = 1init_stubs: couldn't find symbol in module fs/specfs(Can't load specfs) Program terminatedOK That doesn't tell me much more but you get the idea. You can access the symbols - set break points, set variables. In addition you can  dump out memory with dump, single step with step and loads of other things that you might want to do, but this at least will act as a memory jogger for meLet me know if you found this useful.Chris

For some work I've been doing I've had to work out how to debug the sparc boot process, before you can get to kmdb. And yes you can do it, it's just not that easy. So I thought I'd put it on my...

Solaris

Installing Indiana/Opensolaris

For a few days recently I have been looking at the future of packaging, pkg(5) or IPS. IPS looks really powerful and quite simple. It will allow us to generate fixes and deliver them much more simply. What I've been thinking about is how and when will we generate fixes using this mechanism.Any way as a result I've signed up for pkg-discuss@opensolaris.org and indiana-discuss@opensolaris.org. Both of these are very active and full of interesting discussions (and arguments) and ideas. Anyway, it's not surprising there has been so much activity recently. Today indiana-discuss announced the launch of the developer preview of the opensolaris binary distribution. So I tried it out on a couple of machines. My laptop first, an Acer Ferrari 4005. Everything just worked. The LiveCD booted up, really quickly actually, well done the team for getting the performance up so well. Even wireless worked, though that's probably because I've already swapped the Broadcom wireless miniPCI card for an Atheros one. Unfortunately I have no spare slices available on the laptop so I move on to my next machine.This is my home PC, usually running WindowsXP for the kids, it has never successfully Solaris for reasons that will become apparent. I have just upgraded the hard drive so theres 60Gb partition free for me to do some damage.Booting the livecd failed, or rather Xorg failed to display anything. My machine is an old Athlon XP2600 with an AGP radeon x1600pro graphics card. Great for games, but unfortunately the Solaris/OpenSolaris Radeon driver doesn't support it. Fortunately Stephan Hahn blogged about how to get Xorg to use the vesa driver from the livecd. With that in place I got the gnome gui up and gave the install a go.The installer uses dwarf-caiman, a cut down slim line installer which is nice and easy to navigate. The install itself was really quick - there's only a CD's worth installed. The rest should be added later over the web from the IPS repository. Unfortunately that is where my old machine creaked too much. The onboard ethernet is an nforce2 gigabit ethernet. It should work with the nge driver but I think it's just too old. I tried adding an alias for it using # add_drv '"pci10de,66"' nge But even though I could plumb it there was no traffic going through it :-( I guess I'll have to find another ethernet card.The install claimed it failed, but it did come up fine after a reboot, though I had to add a user again at single user because the useradd hadn't worked. Warning here. root is just a role that users can take on - so you can't log in as root as you might expect from a "normal" solaris system.I'm pretty impressed. Nice installer, lightweight liveCD to get you started. zfs root and pkg(5) to add new stuff (or it will when I get a new ethernet adapter. I wonder if I can get one of my old USB wireless sticks to work :-). Plus it seems to be more responsive under OpenSolaris than windows XP.  Do give it a go, it is one vision of the future of opensolarisChris  

For a few days recently I have been looking at the future of packaging, pkg(5) or IPS. IPS looks really powerful and quite simple. It will allow us to generate fixes and deliver them much more simply....

Solaris

Starting out with Solaris on Xen

As you may have seen from the announcement and John's blog we have a new set of Solaris on Xen bits available for download. A lot has changed in the (almost) year since the last drop. Certainly things are a lot easier set up than they were back then.First big difference I notice is that you can install these bits straight from the DVD which means no mucking around with bfu.Once it is installed also you have the joys of much newer Solaris builds including improvements to networking and removable media (but that isn't the point of this post).Of course the thing you really want to do is run multiple operating systems so (while there are documents here I always think it's nice to see peoples use cases. Find out how they got things working.I'm going to use zfs for storage so I made sure I had a large amount of space available for a zpool# zpool create guests c2d0s7First gotcha. After install the default boot entry in the grub menu.lst is for solaris on metal (ie not booting under Xen). You can change that before rebooting or select Solaris dom0 from the grub menu.Check you are running under Xen by looking at uname -idominion# uname -ii86xpv(dominion is the name of my host)If that says i86pc then you're not booted under Xen, i86xpv is the new platform modified to run on Xen.I found that I accidentally booted on metal first time, and when I then booted under Xen the services weren't enabled. I had to manually enable them. (If you boot straight in to Dom 0 they start.dominion# svcs -a | grep xctlonline 10:51:04 svc:/system/xctl/store:defaultonline 10:51:11 svc:/system/xctl/xend:defaultonline 10:51:11 svc:/system/xctl/console:defaultonline 10:51:16 svc:/system/xctl/domains:defaultIf it says anything other than online, enable them with # svcadm enable "service name"I use a zpool to create my disk devices for my domains. This has huge advantages, such as the ability to quickly snapshot a domain (say after install) so you can always return to that state. Also you can clone a snapshot so if you want to have many similar domains (say multiple solaris development environments) you can clone an install and then only the changes between the domains are stored (zfs being copy on write).To set this up you need to create a zvol on your zpool# zfs create -V 10G guests/solaris-pvThis creates a zvol of up to 10G in size. Unused space is still free for other users of the pool to allocate.You can access the device for this zvol using/dev/zvol/dsk/guests/solaris-pvSo that's simple - how do we install a Solaris domain? First off I create an install python config file. (Soon there will be a tool to manage the install for you but that's not really ready yet).This python file describes some simple things about the domain like where the disk and cdrom is.dominion# cat /guests/configs/solaris-pv-install.py name = "solaris-pv-install"memory = "1024"disk = [ 'file:/guests/isos/66-0613-nd.iso,6:cdrom,r', 'phy:/dev/zvol/dsk/guests/solaris-pv,0,w' ]vif = [ '' ]on_shutdown = 'destroy'on_reboot = 'destroy'on_crash = 'destroy'Name is obvious, and I've copied the iso image to be a file to speed up install.You can kick off the install just by starting the domaindominion# xm create -c /guests/configs/solaris-pv-install.pyThis says start the domain and give me a serial console access to it. You then do a normal Solaris install. Once complete you should create a second python file to boot off the zvol. but first I'm going to snapshot it so I can quickly duplicate it (though I really should sys-unconfig it first to make me input the hostname and ip info again.)dominion# zfs snapshot guests/solaris-pv@installdominion# cat solaris-pv.py name = "solaris-pv"memory = "1024"root = "/dev/dsk/c0d0s0"disk = [ 'phy:/dev/zvol/dsk/guests/solaris-pv,0,w' ]vif = [ '' ]on_shutdown = 'destroy'on_reboot = 'destroy'on_crash = 'destroy'and create it with # xm create -c solaris-pv.pyThis then comes up as per a normal solaris boot, if you've given it an ip address during the install or set it to use dhcp you should be able to log in to it using ssh. The networking is effectively bridged, that is to say, you need a real IP address for each domain on the same network as the Dom0.So the next question I always get is "Can I run windows as a domU". And the answer is "maybe". What we have done up till now is use a paravirualised domU. That is one that has been modified to run on Xen. Anything that would trigger a privileged operation (interrupt, privileged instruction etc) is modified to be a call to the hypervisor. This is nice and fast, but some operating systems haven't had this treatment.However with the advent of the intel core2duo and Rev F Opteron/Athlon64 (socket AM2) processors, some hardware support for virtualisation has been built in to the chip. This detects these privileged operations and redirects control back to the hypervisor to do "the right thing"With Xen these are referred to as HVM domains.Russ is going to be blogging more about these so I won't go in to too much detail, but if you want to know if your system is HVM capable, I wrote this simple program to tell youdominion# cat hvm-capable.c #include < sys/types.h>#include < sys/stat.h>#include < fcntl.h>#include < unistd.h>#include < string.h>#include < errno.h>#include < stdio.h>static const char devname[] = "/dev/cpu/self/cpuid";/\*ARGSUSED\*/intmain(int argc, char \*argv[]){ struct { uint32_t r_eax, r_ebx, r_ecx, r_edx; } _r, \*rp = &_r; int d; char \*s; int isamd = 0; int isintel = 0; if ((d = open(devname, O_RDONLY)) == -1) { perror(devname); return (1); } if (pread(d, rp, sizeof (\*rp), 0) != sizeof (\*rp)) { perror(devname); goto fail; } s = (char \*)&rp->r_ebx; if (strncmp(s, "Auth" "cAMD" "enti", 12) == 0) { if (pread(d, rp, sizeof (\*rp), 0x80000001) == sizeof (\*rp)) { (void) printf ("processor is AMD "); /\* \* Read secure virtual machine bit \* (bit 2 of ECX feature ID) \*/ (void) close(d); if ((rp->r_ecx >> 2) & 1) { (void) printf("and processor supports SVM\\n"); return (0); } (void) printf("and does not support SVM\\n"); } else { (void) printf ("error reading features register"); (void) close(d); return (1); } } else if (strncmp(s, "Genu" "ntel" "ineI", 12) == 0) { if (pread(d, rp, sizeof (\*rp), 0x00000001) == sizeof (\*rp)) { (void) printf ("processor is Intel "); /\* \* Read VMXE feature bit \* (bit 5 of ECX feature ID) \*/ (void) close(d); if ((rp->r_ecx >> 5) & 1) { (void) printf("and processor supports VMX\\n"); return (0); } (void) printf("and does not support VMX\\n"); } else { (void) printf ("error reading features register"); (void) close(d); return (1); } }fail: (void) close(d); return (1);}SVM is AMD's implementation of HVM while VMX is Intel's.And just a teaser of what you can expect. (right click - view image to see it full size)Here you see a solaris paravirtualized vm being installed, a windows vista hvm domain. In the top left corner you can see the virtual machine manager. A new management gui that will help manage domains.Sorry this is going to be pretty hard to see unless you view the image in it's original size (1600x1200, yes virtualisation helps you use up those wasted resources including screen real estate)

As you may have seen from the announcement and John's blog we have a new set of Solaris on Xen bits available for download. A lot has changed in the (almost) year since the last drop. Certainly things...

Solaris

Using the OpenSolaris Mercurial repository

I finally decided to have a look at how you get OpenSolaris from the Mercurial repository I've been put off by the fact at the office we're behind a fire wall, so you have to pull it through a Socks proxy. I turns out it's absolutely trivial to do (the instructions are all on the OpenSolaris WebSite. If you're behind a firewall that requires you to have a proxy for ssh th first thing you need to do is set up ssh to use a proxy. I can't help you with that other than to say add the following line to your ssh config for opensolaris.org $ cat ~/.ssh/config Host \*.opensolaris.org ProxyCommand /usr/lib/ssh/ssh-socks5-proxy-connect -h [Proxy IP address] %h %p Compression yes(thanks to Erik and Stephen for pointing out the compression option)Putting in the IP address of your socks proxy. Now you can clone the repository $ hg clone ssh://anon@hg.opensolaris.org/hg/onnv/onnv-gate adding changesets adding manifests adding file changes added 3487 changesets with 67524 changes to 43099 files 39742 files updated, 0 files merged, 0 files removed, 0 files unresolved It took only 37 minutes to my home machine which is much faster than a full bringover in teamwareNow to start playing with some of the build tools

I finally decided to have a look at how you get OpenSolaris from the Mercurial repository I've been put off by the fact at the office we're behind a fire wall, so you have to pull it through a...

General

Is it 2007 already?

Thanks to Patrick for tagging me and making me realise I hadn't touched my blog in ages.I've been so busy, travelling to present at the Tech day in Prague, meeting our colleagues over in Grenoble. Working on FMA, and now finally being pulled in to porting Solaris on Xen to Solaris 10. Busy times along with getting on with having a dog, buying a new camera. Christmas and all....Any way Patrick tagged me to come up with 5 things you didn't (or didn't need to) know about me.1. Despite my accent, I was born and grew up in the North East of England. In a town called Darlington I sometimes have a hankering to move back.2. I went to school with England Rugby player Tony Underwood (who has recentley also appeared on programme about Easyjet)3. I went to UCL and studied Applied Physics at which the only course I really enjoyed were Planetary Geology (Colouring in photos of Mars) and Operating System Design (hence what I do now)4. I'm a firm believer in rational process as defined by Kepner Tragoe. I find it helpful in defining a problem or situation I find myself in and giving me a structure to progressing it. I enjoy facilitating people to use this and used it recentley to help establish priorities for the future of FMA.5. Apparentley if I was a Super Hero I'd be Spiderman (like many people around here...)You are Spider-ManSpider-Man 70%Superman 65%Robin 60%Green Lantern 60%Iron Man 55%Supergirl 53%Wonder Woman 43%The Flash 40%Catwoman 35%Hulk 35%Batman 35%You are intelligent, witty, a bit geeky and have great power and responsibility.Click here to take the Superhero Personality TestSo you may have know some of those, but I hope it was interesting. I here by tag Gavin, Peter, and

Thanks to Patrick for tagging me and making me realise I hadn't touched my blog in ages.I've been so busy, travelling to present at the Tech day in Prague, meeting our colleagues over in Grenoble....

Solaris

EuroOSCON 2006

EuroOSCON 06This week (18th - 21st September) I've had the opportunity to attend and be involved in EuroOSCON06. This was primarily to increase my understanding of opensource, but also to promote Open Solaris. This was my first OpenSource conference so first I'll make a few general observations before moving on to details about the sessions.EuroOSCON06 was this year in Brussels, a city I'd not visited before. It's a surprisingly small city for the self proclaimed capital of europe. There are some very beautiful parts as well as some rather seedy parts, and being small the seedy and wonderful nestle uncomfortably together.There were some other Sun employees (Martin Man Peter Dennis, Patrick Finch, Darren Kenney and Gary Pennington). I'd met a few of them before but we're from very different backgrounds so we had different reasons for wanting to attend EuroOSCON and promote OpenSolaris.So I took the EuroStar from London (I was booking the trip just as the security scare happened last month so thought this would be easiest). Met with Peter Dennis on the train and worked through some demos we could show to people. We had a BOF and a Booth on Wednesday so thought we'd try and show some cool stuff. The DemosWe decided we'd show how easy it was to set up and build OpenSolaris. I had a media kit with me on the train and by the time I was in Brussels had installed a build machine environment on my laptop and was happily building code, cool!. We also wanted to show some zfs features, and some zones features. There is a new facility in OpenSolaris to allow you to create a Zone on one system (preferably on a zfs file system, and then take a copy of it to create a new zone. If your using zfs it will sanpshot the filesystems rather than copying data meaning you get use the zfs snapshot facility meaning it is rather quick. This is done with zonecfg clone -sYou can then dettach that zone from your current system (using zoneadm dettach),and as the zpool was on an external disk we moved the USB disk to another laptop and imported the pool (zpool import ) and attach the zone to the new laptop (zoneadm -z attach -F -n ). It then just works as it did on the old system. I was amazed and can see how useful this is going to be.ObservationsSome things struck me. First - everyone uses Mac their. This surprised me as although it is based on an opensource OS, it is far from and OpenSource product. Second - There are a lot of people on the conference gravy train, they obviously go to a lot of these. That's fine, but it did distort the audience a bit. Finally Licensing both of Projects and data is still a big worry for this community.It was well worth going this year to promote OpenSolaris. People were falling queing up at the booth to talk to us so on to some more detailThe Booth and the BOFWednesday had us setting up the booth for OpenSolaris and manning it. We had intended to take turns, but we were so busy for the whole of the morning that we all were talking pretty much all morning to various interested people. We had some "OpenSolaris Starter Kits" Containing the install media, the source and the compilers, along with a coupl of livedvd images. Overall they went like hot cakes everyone seemed excited by OpenSolaris. I even persuaded a guy from Google to take a look. The afternoon was a little quieter so we could go off to a few sessions, but Still a steady stream of interested people.I gave more demos than I can remember - people were excited about how easy it was to use ZFS and how simple building opensolaris was. Dtrace is still a big thing and now it's in macOS the community can see am example of why they should be involved with OpenSolarisThe BOF was at the same time as one from Ubuntu's Mark Shuttleworth, so we weren't expecting many and we had a few, though not as many as I would have hoped. We talked through SMF, Zones, dtrace, ZFS among other things and the people seemed interested. The funny thing was that Mark Shuttleworth failed to turn up and the Ubuntu BOF got rather heated.The SessionsKeyNote Speakers: Tim Orielly, Tor NorretrandersTim Orielly spoke about data, and licensing (as you'll see licensing seems to be a hotly discussed topic at the conference. It's not something that has ever interested me but it is clearly important. Tim was commenting that if you put it on Myspace but can't take it anywhere else is it really yours? Same goes for applications that run on the web server rather than in your own system, what is to stop the owner of the webserver doing what ever they want. I feeling he was blowing his own trumpet saying he'd pointed this out in 1998 or something but only now people are taking him seriously. He was trying to say (I think) that in the Web2.0 world where everyone contributes on line the things you put in your blog or on some one else collab site you may not own the rights to.Tor gave a very funny speach about the motivations for participating in opensource. He stated it was aboutGlow-> The nice feeling we get when interacting with someone elseShow -> Showing off and helping others gets you noticed (or laid as he put it)Flow -> You are constantly changing.2.0 -> We're getting back to a bartering type economy.Certainly I enjoyed it and the first two points are clearly right, the rest felt a little forced, but then it was well made.Industrial strength Email and Calendar: Flaorian von KurnatowskiWithout realising it I'd wandered in to the Products and Services track. Basically Opensource friendly companies promoting their products. That said he didn't push his company Scalix too much. What he observered was that you needed to have a true replacement for Outlook before people would be able to move away from Microsoft. It seems Outlook is very closely tied to all other Microsoft apps and if you remove them you loose a lot of functionality (and he said Outlook is 50% of the license fee too).Until OpenOffice can provide that or has an equivalent it will not considered by many people. Also 90% of Admins have never done a migration of mail systems so they're scared of it, there need to be good migration tools.Final point was that Calendar services do not have any standards which is why Calendar infrastructure is even harder to do than email.Channeling OpenSourced in Europe: Ranga TangachariBack in the OpenSource world I was interested by this session. I'd assumed that this would be about getting the most out of OpenSource in Europe but instead it was a talk about how his company made money in Opensource by encouraging the Channel (resellers)His assertion was that Communities provide innovations and companies provide Products (more the just projects, fully tests and supported things). In the middle are the Channel which adds value by things like locaization and training. Being a pool of deployment experts.You need to encourage the Channel by giving them what they care about which is1) Margin2) Professional Services Opportunities3) Maintenance (recurring revenue)Think beyound Downloads they only mean one click, find examples of happy customers.Big Data and the Open Warehouse: Roger MagoulasThis was a dissapointing presentation about what Orielly do about data storage and data mining. It was unfortunately simply a run down of tricks tips, products and techinques used by the Orielly guy in their data centre. There were some interesting things mentioned though which I will go and look at.SecondLife and Opensource: Jim PurbrickI'm intrigued by SecondLife, it's a game where the whole purpose is to make "Stuff" and "Hang Out" and generally share or sell what you do. I have looked at it and it is cool, but I haven't got my head round Why? yet.SecondLife is not (yet) opensource but the Guy from LindenLab was explaining that the big difference with second life to other MMORPG games is that the players create the world. LindeLabs couldn't have provided enough content to keep people interested, but because it is created by the game community they reacon they get ~6500 man years of content development per year! (not even EA could manage that for one game I think) All of this is a course up for sale or copy depending on the desires of the community memberThere are interesting aspects to the way LindenLab have architected their set up, like each now plot of land requires a new server so they're adding new servers at a huge rate.Over all the crowd were excited by SecondLife. I didn't see them as engaged in any other sessionAfternoon Keynotes: Steve Coast, Adrian HolovatySteve Coast: Open Data: An interesting view of "Good enough" data. He uses a gps held by volunteers as they go about there daily business to look for a plan of cities, making it available via OpenStreetMap. Most map data in the world is either govenement owned (like the Ordanance Survey) or not of great quality. Creating good enough Open Data which can be shared will be enough for some and will consequently bring down the price of Closed Data). Adrian Holovaty: Journalism through Programming:Facinating and slightly scarey view on providing access to the raw data used by journalists via web applications. He works for the Washington Post and is involved in a few projects one of which is Faces of the Fallen. Another one he quoted was to look at your MPs voting reacord TheyWorkForYou. Now I can see the point of this, and will certainly be checking on what my MP is doing, but I am slightly worried that this requires balance. We're asking the public to draw conclusions from only a small amount of the data (as the records published on line are incomplete), where as journalism is all about weighing up all the information and providing a balanced summary for consumption.Open Useability: Jan MuehligA talk to encourage open source projects to include Useability eningeering in the project to make the user experience of Open Source Products as good or better than closed source, something we have often complained about. His greoup OpenUseability is working to promote this and it seems to boil down to publishing best practices, which noone does yet.Making It Work: Louis Suarez-PottsA talk about how to build succesful OpenSource projects. He comes from OpenOffice Which is both a succesful project and a really useful product so he should know what he's doing.He talked about the two different approaches. The organic where a few friends start up with a common goal, and the Sponsored, where a corperate entity is driving towards a set of goals. Eitehr way you need to do the followingPick the right licenseHave a neutral environment (ie safe to contribute)Have transparent governance and processesMake decisions in publicHave clear decision pathsUse good communications tools (everyone liks IM these daysHave immediate gratification (easy and fast contribution)Market your projectHave the right ProductI found this quite encouraging as I felt OpenSolaris has it about rightOpenSource and Freedom: Why Open Standards are crucial to protecting your linux investment: Jim ZemlinThis talk was aimed at promoting the LSB(Linux Standards Base To make sure applications will run on the largest number of Distros. LSB dictates the minimum number of components available within the Distro so your application can rely on them. This is to encourage growth over Microsoft. He quoted what happend in the Unix world when the standards fragmented and he is absolutely rightKeyNote: Florian Muller Roml LefkowitzFlorian Muller Spoke about lobying in the European Parliament to limit the changes to Patent law which some companies are trying to tighten up to protect their IP, while OpenSource are trying to go the other way. I was left slightly disconcerted that someone with such a one sided view was having an effect on our laws.Roml Lefkopwitz Spoke about the need to internationalize and localize the source code and languages used in opensource projects. Nice pie in the sky thinking, but misses the point that the source should not be the documentation, we need documentation before we can worry about such things.Xgl and Compiz - New X11 features and the OpenGL Accelerated Desktop Matthias HopfFacinating talk about the future of desktop from Suze At last a talk with lots of technical details and a neat demo at the end demonstrating the desktop mapped on to a 3d cube running two movies and Quake 3 at the same time on different faces of the cube. All of this should soon also be possible in Solaris and I think it's vital we do it.The End If you got this far then well done :) It was a lot to read.Technorati Tags: Solaris OpenSolaris EuroOSCON EuroOSCON06

EuroOSCON 06 This week (18th - 21st September) I've had the opportunity to attend and be involved in EuroOSCON06. This was primarily to increase my understanding of opensource, but also to promote Open...

Solaris

Building Opensolaris

This weekend I did something I've been meaning to do for a while. I've been putting it off due to lack of time to think about how to approach it without breaking anything.Any way I finally tried downloading and installing Solaris Express the community edition (build 46) and downloaded all the build tools. I was amazed how easy it was. Within an hour offinishing the downloads, it was building opensolaris. The main shock for me was the lack of an SCM (Source Code Management) system. Being fully entrenched in the world of ON (Os and Networking) for the last 10 years Teamware (the SCM we use) is just \*what we do\*. So I had to rethink how I'll manage the build. But then using opensolaris.sh from usr/src/tools/env did a good job.So I know have my own build built from opensolaris running on my system, without anything from within Sun. Cool. Give it a go, it was easy and gives you the chance to play with how things work.Next step for me - build the Xen bits from outside Sun :)==== Nightly distributed build started: Mon Sep 11 11:27:18 BST 2006 ======== Nightly distributed build completed: Tue Sep 12 03:22:02 BST 2006 ======== Total build time ====real 15:54:44==== Nightly argument issues ====Warning: the N option (do not run protocmp) is set; it probably shouldn't be==== Build environment ====/usr/bin/unameSunOS osol-bld 5.11 snv_46 i86pc i386 i86pc/opt/onbld/bin/nightly myopensolaris.shnightly.sh version 1.104 2006/08/29/opt/SUNWspro/bin/dmakedmake: Sun Distributed Make 7.7 2005/10/13number of concurrent jobs = 432-bit compiler/opt/onbld/bin/i386/cw -_cccw version 1.20primary: /opt/SUNWspro/bin/cccc: Sun C 5.8 Patch 121016-02 2006/03/31shadow: /usr/sfw/bin/gccgcc (GCC) 3.4.3 (csl-sol210-3_4-20050802)64-bit compiler/opt/onbld/bin/i386/cw -_cccw version 1.20primary: /opt/SUNWspro/bin/cccc: Sun C 5.8 Patch 121016-02 2006/03/31shadow: /usr/sfw/bin/gccgcc (GCC) 3.4.3 (csl-sol210-3_4-20050802)/usr/java/bin/javacjava full version "1.5.0_08-b03"/usr/ccs/bin/asas: Sun Compiler Common 10 snv_46 08/03/2006/usr/ccs/bin/ldld: Software Generation Utilities - Solaris Link Editors: 5.11-1.545Build project: group.staffBuild taskid: 62==== Build version ====ws.opensolaris==== Make clobber ERRORS ======== Make tools clobber ERRORS ======== Tools build errors ======== SCCS Noise (DEBUG) ======== Build errors (DEBUG) ======== Build warnings (DEBUG) ======== Elapsed build time (DEBUG) ====real 11:42:57.3user 4:41:00.3sys 2:40:32.5Technorati Tags: Solaris OpenSolaris Xen

This weekend I did something I've been meaning to do for a while. I've been putting it off due to lack of time to think about how to approach it without breaking anything.Any way I finally tried...

Music

What am I listening to

Last week I was visiting our colleagues in Prague. Talking about Xen, Dtrace, FMA and other cool stuff, but also helping out with some problems they were having. Anyway I did have a nightmare journey back and was thankfull of two things. A good book (Use of Weapons - Iain M Banks: I love the detail he puts in the worlds he creates), and an IPod.So what did I listen too? Well I found I had subscribed to a podcast that has some excellent music on it. IndustrialRadio. Plenty of interesting music I'd not heard before. Apprarently the detroit industrial scene is pretty active, which is more than can be said for most places.Any way back in the real world, I rediscovered I like some Handel, Having bought Essential Handel I've remembered learning Violin as a child. May be I should take up some more.What else. Well in complete contrast there's been Classic Euphoria Which is pretty much audio wall paper but great for getting stuff done too. My eldest Son did a part in the school production of Oliver a couple of weeks a go. Today there was a local schools show case of the out of school activities that they've been doing. Naturally eldest was the best, but one other that stood out was another schools "Rock School". They have taken "Feel Good Inc" by Gorillaz and re recorded it using the instruments they can play (The group only sang to a backing track which I guess is fair enough given the size of the venue, and the fact they had no switch over time, Oh and hey're only 8-11). I only realised it wasn't to the real track when I heard recorders on it! Pretty impressive.And this evening while watching the end of The Tour They had a butchered version of One Which made me put on the original which was much better.

Last week I was visiting our colleagues in Prague. Talking about Xen, Dtrace, FMA and other cool stuff, but also helping out with some problems they were having. Anyway I did have a nightmare journey...

Solaris

Xen for dummies - part1

I've been helping out the Xen team to try out the bits they've put on OpenSolaris over the last week or two. I've been impressed with how much they've got working so far, but as an experienced Solaris user who is new to Xen I've found it quite hard to get my head around what Xen does and how it works.First off install it using the instructions on OpenSolaris and then you need to get some domains set up.So boot up under Xen. You'll see from the grub menu (/boot/grub/menu.lst) some things have changed. Instead of booting a Solaris kernel, you boot xen which then loads a solaris kernel in the module line. #Solaris on Xen 64bit title Solaris on Xen 64-bit kernel /boot/amd64/xen.gz dom0_mem=524288 console=com1 com1=9600,8n1 module /platform/i86xen/kernel/amd64/unix /platform/i86xen/kernel/amd64/unix -k module /platform/i86pc/boot_archiveNote the option dom0_mem=524288. This assigne 512Mb to your Dom0 at startup.Another thing to note is that even booting on metal (ie not booting the xen hypervisor first) we no longer use multiboot, but can boot our unix directley. #---------- ADDED BY BOOTADM - DO NOT EDIT ---------- title Solaris Nevada snv_41 X86 kernel /platform/i86pc/kernel/amd64/unix module /platform/i86pc/boot_archive #---------------------END BOOTADM--------------------Read Joe Bonasera's blog about why this is the case.OK so you've got Solaris booted on Xen. This is refered to as Dom0 you need to set up some more instances of Solaris (or whatever Xen compatible OS you happen to want) and that is refered to as a DomU.Each DomU has it's own full OS install so for solaris we set up a flar archive and use the vbdcfg script to help us convert that in to a OS instance Xen can boot as a DomU. This is all described here. A couple of things to think about. Networking is bridged (ie it appears to be directly connected to the outside world) so you're going to need to give it a real IP address or use DHCP (this can be decided as the DomU boots up). Also you need to give it an ethernet address using the -e flag to vbdcfg. As this is a made up ethernet addess I don't know how you're supposed to create it, Dave Edmondson suggested making the first octet 0xaa and encoding the IP address in the rest of it. So when you've got your DomU setup using vbdcfg what do you get?Well in /export/xc/xvm you'll have a directory for your DomU domain$ ls -l /export/xc/xvm/mydomUtotal 10-rw-r--r-- 1 root root 379 Jul 12 09:57 mydomU-64.py-rw-r--r-- 1 root root 367 Jul 12 09:57 mydomU.pydrwxr-xr-x 4 root root 512 Jul 12 09:48 platform-rw-r--r-- 1 root root 19 Jul 12 09:48 root.devdrwxr-xr-x 2 root root 512 Jul 12 09:48 vmntthe .py files are python scripts used to start the domain, platform is the directory where the kernel to boot is, root.dev contains the name of the root device and vmnt is where the domain can be mounted using $ vbdcfg mountdomU domU-nameBut you don't want to do that now. You want to start up your domain, So this is how you do it$ xm create mydomU-64.py (this starts mydomU in 64bit mode)$ xm console mydomU (puts you on the console)or alternatively$ xm create -c mydomU-64.pywhich starts the domU and puts you directly on the console - useful sometimes to find out why your domain isn't coming upSo at that point you on the console of your domain and it's coming up just as if its a fresh install of Solaris.In future "Xen for Dummies" installments I'll show how to configure the network, add disks, create more cpus in the domU than you have in the real box! and I'm sure I can think of more later.Xen Technorati Tags: Solaris OpenSolaris Xen

I've been helping out the Xen team to try out the bits they've put on OpenSolaris over the last week or two. I've been impressed with how much they've got working so far, but as an experienced...

Solaris

ISSIG() How do I use it?

As indicated in this blog entry a fix I did for a bug showed up some interesting and rather hard to diagnose problems. These showed up as the getting EACCES (permission denied) errors when trying to cd or read an automounted directory.Now when you use dtrace to work back through what has happened you find that Using truss and dtrace it was possible to see that the program that was getting EACCES was doing an open on the directory which ended up in auto_wait4mount(). It is this call that returns EACCES.At around the same time the automountd process gets an interrupted system call (through a variety of calling paths but usually from nfs4_mount() > eventually calling > nfs4secinfo_otw() > nfs4_rfscall() which returns EINTR indicating it's getting something like a signal. This includes watchpoint activity and fork1 requests. In this case it is a fork1 request so the thread requesting the stop is in the same processes, so ISSIG(JUSTLOOKING) returns true even if lwp_nostop is set. (this was the fix for 4522909). If we look in nfs4_rfscall() at the following section 1334 /\* 1335 \* If there is a current signal, then don't bother 1336 \* even trying to send out the request because we 1337 \* won't be able to block waiting for the response. 1338 \* Simply assume RPC_INTR and get on with it. 1339 \*/ 1340 if (ttolwp(curthread) != NULL && ISSIG(curthread, JUSTLOOKING)) 1341 status = RPC_INTR; 1342 else { 1343 status = CLNT_CALL(client, which, xdrargs, argsp, 1344 xdrres, resp, wait); 1345 }Here we look to see if there is a signal pending using ISSIG(curthread, JUSTLOOKING) to optimise out a CLNT_CALL() if its not needed. If so we return RPC_INTR (a little further down).The assumption is that if you have any signal like activity you need to return to userland to handle the signal. This is not the case for fork1() you can simply wait till you start running again and carry on. ISSIG(t, FORREAL) could be used to check if there is a real need to return to userland. The trouble is you need to drop all your locks before calling it. So you then have to reaquire the locks later. This may require you restart the rfscall operation.Also if you do a forkall() (ie a normal fork() systemcall) you do need to return to userland with EINTR, same with some /proc activity. So it's probably worth checking for that prior to calling the issig(FORREAL). A good example of how to do this is in cv_wait_sig().So an example of how to correctley check for signal delivery in a system call (say if you are going to do something that takes a long time and don'r want to waste that activity if it's going to be interrupted) would be. . if (lwp != NULL && . (ISSIG(t, JUSTLOOKING) || MUSTRETURN(p, t))) { . ... drop all of your locks! ... . if (ISSIG(t, FORREAL) || . lwp->lwp_sysabort || . MUSTRETURN(p, t)) { . lwp->lwp_sysabort = 0; . return (set_errno(EINTR)); . } . return (set_errno(ERESTART)); . }I'll be applying this approach to nfs shortlyTechnorati Tags: Solaris

As indicated in this blog entry a fix I did for a bug showed up some interesting and rather hard to diagnose problems. These showed up as the getting EACCES (permission denied) errors when trying to...

Music

HiFi Upgrade

Well you've probably gathered I like music, but I've really been dissapointed with my hifi for a number of years.When I graduated with my first few months pay cheques I bought a budget set of seperates. JPW speakers, Rotel 20W amp, Marantz CD52MkII CD player (second hand from a friend). I've been pretty happy with it - I still use it in the dining room. However about 5 years a go I started to get the desire for Home Cinema. This is not something that goes well with having children as they have a tendancy to think the speakers are things to be climbed on. So I bought a Sub and Sat set of speakers. The KEF KHT2005 which sounded pretty good when I auditioned them with an Arcam AVR 100 which I also bought. However with the reality of normal living rooms and the fact my DVD player may not be the best source for CDs you can imagine it never quite lived up to my expectation. It was pretty good with movies - but I just stopped listening to CD in the living room.For some reason I had convinced myself the the CD52MKII was a good enough source so I asked my local hifi shop if I could audition some new speakers or a CD player to see which gave me the best improvement. I tried some KEF IQ5s and an Arcam CD73 along side my exisiting CD player and the KHTs. (I nice mix and match session). Over all I wasn't too impressed with the IQ5s. They didn't seem any improvement over the KHT as long as you are using the Sub with them for music (you have to really), but the difference the CD player mad was stunning. So I am now the proud owner of an Arcam CD73.Then of course I've had to rearange the living room to keep it all hidden a subtle but it was worth it. I've been listening to some of the 560 CDs we have. The bass is much more controlled and the detail astonishing.So what have I listened to that really impressed?The Orb: Towers of Dub (UFOrb)Depeche Mode: Precious (Playing the angel)The Sisters of Mercey: Everything (...)New Order: World (single)Linkin Park: Faint (Meteora)Placebo: Bitter End (single)Plenty more but thos stick out. Especially for the detail revieled after the upgrade.Technorati Tags: Music HiFi

Well you've probably gathered I like music, but I've really been dissapointed with my hifi for a number of years.When I graduated with my first few months pay cheques I bought a budget set of...

Solaris

Why have I got two automountd processes?

I've been working for ages on how to resolve an issue reported as automountd hangs when using executeable automount maps. This is logged as bug 4522909. The problem that when the automountd attempts to do a mount, it triggers a lookup on the mountpoint. This is done by another thread in the automountd. While we're waiting for that to complete we call auto_wait4mount which in turn blocks all signals by calling signintr(). This also makes the thread unstoppable by incrementing lwp_nostop.As this is an executeable map, the "other thread" has to fork1() in order to run the map. This in turn tries to STOP all threads in the process to get them in to a known state before forking the 1 thread we care about. As the mount thread is unstoppable this never completes.The fix was to allow a thread to be stopped, even if lwp_nostop is set, if the thread stopping it is in the same process. However this has show up a couple of mistaken assumptions in NFS land which mean that more work is needed there to allow an RFS call to be restarted (6306343.)This is obviously a bit of a pain as the changes required to fix 6306343 may be quite large and require a change that is too risky for a patch so an alternative approach is needed.After a good amount of discussion we concluded the lowest risk solution was to create a door server for the automountd process to talk to. This door server would handle all requests to fork the process to do an exec. Obviously as the lookup is handled in the main automountd process the deadlock is avoided.Now if you read the door_call man page it talks extensively about attaching to the door_server file descriptor as there is an assumption that the fd is somewhere in the file name space. If it is then anyone with sufficient privilege can write to it and you can end up with all sorts of rubbish written in. However if you fork() a child inherits the fs'd of the parent so the simple sollution is to have the automountd process set up the door_server() itself before it becomes multithreaded and then create a child to behave like the old automountd did. But with calls to the door_server to get the fork()/exec() stuff to work. Hence you get two automountd processes.Unfortunately this is all in the patch releases as it isn't needed for OpenSolaris, so you'll never see my excellent code, but I still thought it was worth writing it up.I'll fill in the patch versions as they come out So far we haveSolaris 8108994-56 SunOS 5.8_x86: LDAP2 client, libc, libthread and libnsl libraries patch108993-56 SunOS 5.8: LDAP2 client, libc, libthread and libnsl libraries patchSolaris 9117468-12 SunOS 5.9_x86: nfs patch113318-26 SunOS 5.9: nfs patchSolaris 10T118833-18 SunOS 5.10: Kernel UpdateT118855-15 SunOS 5.10_x86: Kernel UpdateTechnorati Tag: OpenSolarisTechnorati Tag: Solaris

I've been working for ages on how to resolve an issue reported as automountd hangs when using executeable automount maps. This is logged as bug 4522909.The problem that when the automountd attempts...

General

Trouble Shooting

I was in the process of composing a blog entry about some of the things I've been up to so far this year (but infact I'll make it several later on) when a colleague pointed me at Chris G's entry on SGRSGR (Sun Global Resolution) Is Sun's internal development of the Kepner Tregoe Analytic Trouble Shooting Technique. The idea is to provide a consistent and reproduceable technique which will help solve problems. I have to be honest that when I first heard of it I was really sceptical. I fixed customer problems - how could they have a process for that?Well to be honest what it does is force you to not ignore the obvious, not to jump to conclusions and to write things down in a way that others who understand the process can understand and progress. Chris's Ephihany is a great example of this.In SGR language You have 4 processes. First of which is Situation Appraisal (Finding out what it is you need to work on) This leads in to one of three other processes, one of which is Problem Analysis (I'll talk about the others another time).To decide if you need to do some problem Analysis you ask yourself three questions1) Is there a deviation: In this case is the system doing something it shouldn't?2) Is the cause unknown?3) Do I need to know the cause to take meaningful action (eg. if the Espresso machine in the cafe is broken I don't need to know why, I can walk the 50m to the cafe in the other building to get the coffee to wake me up tomorrow)So in Chris's case he is doing Situation Appraisal with a bit of Crashdump Analysis to find out what the concerns are. At which point there is probably a deviation but Chris knows the cause so stop trouble shooting. There is a forth process that is unofficially recognised by most ATS/SGR Programme leaders like myself. That is the JFDI process. Once you know what to do, just do it.Tags: Sun SGRT Troubleshooting

I was in the process of composing a blog entry about some of the things I've been up to so far this year (but infact I'll make it several later on) when a colleague pointed me at Chris G's entry on...

General

Neglecting my blog as usual

OK So I'm guilty of neglegting my blog. This time I was prompted by my friend Kimberley who has a much more interesting blog than mine. She bought a hourse farm out in New Hampshire recentley and it's snowing....I've just got back from another business trip to India where I helped the group set it's priorities for the next year. This was important for me and the team as we're part of the support structure here at Sun and now we're not charging for software (just for service) we're the part of the software organisation that can most directley effect whether it's worth paying Sun for support. Believe me it is. Did you know that for supported Solaris customers you have 24x7 source code expertise down to the most obscure parts of Solaris. If we don't know the answer then we will know how to find it out PDQ.Any way, I always suffer with jetlag and with Christmas coming up I'm doubly snowed under. This weekend we have a family get together at my parents which will be fun, though it's a bit of a rush as Andrea has to cram in 3 different things over this weekend (due to me being away the previous two)Still mustn't grumble. I've just taken on a large backport of a project that's recentley gone in to Solaris, so that should keep me interested for a while. As I learn more about it I'll document what I find out. That should be a useful thing for a blog(and by the way - mileage this week was 20 very stiff road miles)Technorati Tag: OpenSolarisTechnorati Tag: Solaris

OK So I'm guilty of neglegting my blog. This time I was prompted by my friend Kimberleywho has a much more interesting blog than mine. She bought a hourse farm out in New Hampshire recentley and it's...

Solaris

So what the heck is anonymous memory

As part of some work I've been doing I've had to talk a lot about anonymous memory. So I thought I'd write it down while I remember what its all about.anonymous memory is memory that is managed by segvn but is not really directly associated with a file. It's used for things like process stacks, heap, or COW (copy on write) pages. A good example of this is if you fork a process. All the addresses in the second process actually map back to the same bits of physical memory (the same pages). However if youre child process was then to do something different with the memory (eg. the child went off and manipulated an array in memory) the VM subsytem would copy those pages and change the mappings in the child process to point to the new pages. This new memory would be anonymous memory, and the child process would merrily make the changes to the array, unaware it now had new "physical" memory it was talking to.In a bit more detail, you've probably heard people talking about "anon_maps" and "vnode, offsets". Or at least if you're interested in VM and have been trawling the code you probably have. So I'll try and put a simplistic view of what they mean.Typically when you mmap a file, you are give it a vnode, offset and length. Segvn manages mmaped files and stores this in its private data structure. So when you walk through the address space of the process looking for a virtual address, you'll see that the segment contains that address and be able to map it to the vnode and where in that file using the offset.As mentioned above anonymous memory is not associated with files, but with perhaps swap (you'd get to uses anon memory if you mmaped /dev/zero BTW). So the purpose of anonymous memory or the anon layer is to fake up a vnode and offset for segvn to find the data on a swap device (actually by going through swapfs which I haven't yet looked at).The core of the anon layer is the anon_map structure. This is stored in the segvn_data structure and points you to the anon_hdr for this segment.Each anon_hdr is linked to an array on anon structures which are the swapfs implementation of how to find a vnode and offset on a swap file system (or it might be on a real swap device if you bypass swapfs, I think I'll need another blog entry for swapfs after this).So after some swapfs magic we can find the backing store for this anonmap.You might have noticed a bit if handwaiving in the middle there. What was that about an array of anon structures? Why do we need one of them? Well each anon strucure represents a page of memory. Our segment may be more than one page in size (try pmap $$ and you may well see some anon segments of more than 8K on sparc) so would need more than one of these anon structure to describe it. So we have this array. Technorati Tag: OpenSolarisTechnorati Tag: Solaris

As part of some work I've been doing I've had to talk a lot about anonymous memory. So I thought I'd write it down while I remember what its all about.anonymous memory is memory that is managed by...

General

Road Riding with Tim

Well today was a little foolish. I've been chatting with Chris Gerhard and Tim Uglow over the last week or so, and they finally persuaded me to go for a road ride with them. Mostly by telling me that it would be an easy 10 miles trying to keep up 20mph apart from the hills. I could probably do that, I thought. I do normally Road Ride on my own.I live 17 miles from work and the thought of riding there and back still fills me with dread (let alone riding at lunchtime as well!) I packed up my road bike and drove to work (yes I know it's the cheat's way...). Tim planned the route (14 miles). Anyway Chris was tempted to come out, but the thought of wearing me out was not quite tempting enough for him to have an extra 14 miles on his regular 40.Did I mention it's about 30 centigrade out there.So we head out and it's fun, in a I don't normally start this fast kind of way. I don't think I'm holding Tim up that much. It's good to ride with someone more used to riding in traffic. I went everywhere on my bike till I could drive, then it was mountain biking about 10 years a go that re kindleld my cycling fire. I learnt a lot watching Tim handling the cars. Where I live I normally ride out in to the Chilterns. That means Up, Up, little down, Up, UP, big down, bigger up...etc... ie. not flat. Around work it's got some flat bits before big hills. I was pleased to see I could manage a good 20+ on the flat, which the local track club have as a pre-requisite. Not that track really appeals. Up hills I need more strength and actually I just need to be fitter.There was one really fantastic hill (the down any way - no I do enjoy the ups, but it was hot today) a 1 in 8 down 38mph :). Don't know what the up was. I'll plot it later on the map and work it out. It was definitely less than 38mph.In the end it was 15.6 miles but I know my computer reads long (so it's probably .5 a mile less). Just over 55 mins.All this in a lunch hour.

Well today was a little foolish. I've been chatting with Chris Gerhard and Tim Uglow over the last week or so, and they finally persuaded me to go for a road ride with them. Mostly by telling me that...

Solaris

mutex_enter() in an interrupt handler?

I've always assumed that interrupt handlers were absoluteley there to do the minimum work and get out as soon as possible. A question came to me today that challenged that view. The question was "Can I mutex_enter the pidlock while in an interrupt context?"My instant reaction was no way, you should never block in an interrupt context. However it seems I was wrong. Take a look at fas_intr(). You can see virtually the first thing it does is grab a mutex. This got me thinking.There is a fundemental difference between grabbing a lock in an interrupt handler that is private to that code and using a lock which is in general use throughout the kernel. Take this example. Something is holding the pidlock. But for whatever reason (it's timeslice has run out) it has be scheduled off CPU. So we have our foo card installed in our Solaris machine and an interrupt comes in for it. foointr() has been written to grab the mutex for the pidlock to make sure it has a stable view of the process table. So it tries to grab the pidlock and it has to block till the owner frees it. Could be ages, and the interupt wouldn't be serviced in a timely manner.I'd like to say I dredged through the code and found all this out by reading the code, but infact I just read the manual page.This says that you can acquire a mutex, kernel or high level interupt context. However there is a restriction that the mutex must have been initialised as type DRIVER_MUTEX, and have the ddi_iblock_cookie (returned from ddi_get_iblock_cookie(9f)) as the arg to mutex_init(9f).So there you have it. you can use a mutex in a high level interupt handler, just not ones like pidlockTechnorati Tag: OpenSolarisTechnorati Tag: Solaris

I've always assumed that interrupt handlers were absoluteley there to do the minimum work and get out as soon as possible. A question came to me today that challenged that view. The question was "Can...

General

Goodwood Festival of Speed

Just a quick non work related blog. On Saturday I went to Goodwood Festival of Speed. This is a fantastic event in which you get hundreds of cars doing the hill climb at Goodwood. I took the boys and they had (to quote the youngest, who is 4) "The bestest day I've ever had". So we got there just as some historic McLarens were going up the hill. To start with the youngest was scared of the noise till I put him on my shoulders and he could see them coming.We then walked around the F1 paddock (which is open to everyone) We were there as this years McLaren was being unloaded from its lorry and we following them to their paddock place.We then saw the F1 cars do their thing up the hill. I have an xD card full of empty track but did manage to get a couple of shots of F1 cars. Nothing can prepare you for the noise. It's like having someone suddenly scream in your ear.We then looked around the stands with my cousin and headed up to the top of the hill to the top paddock. It was a really long walk, but the boys managed it. At the top they had a forest rally stage which was really coolWe also got to see the F1 cars come in after their second run.We got the tractor ride back half way down then watched one of the last runs of the day from just above the flintwall. Fantastic view.Home then, which took a while as the roads out were not really designed for ~30000 people trying to leave at the same time, but well worth going.Tags: F1 goodwood

Just a quick non work related blog. On Saturday I went to Goodwood Festival of Speed. This is a fantastic event in which you get hundreds of cars doing the hill climb at Goodwood. I took the boys and...

Solaris

Debugging automounter problems

I've been working on a fix to an automounter problem for some time. So I'm just testing the final fix and sanity testing the binaries on my laptop. Suddenly my home directory doesn't get mounted. This is of course very strange. It's been working on that laptop for ages so I think it must be my fix. So naturally the first thing I do is take my binaries off and try again. Few that didn't work either so at least my fix is good. Trouble is I still have an autmounter that doesn't work.Some things do work however. eg.$ cd /net//But somethings don't$ cd /work/cwbksh: /work/cwb: permision denied$ cd /home/cwbksh: /home/cwb: not foundThis was rather strange.So, you may or may not already know this, but there is a way of seeing the diagnostic output from automountd by running the command$ ls -l /net/=9/net/=9: No such file or directoryWell actually you can substitute the 9 for another number corresponding to a debug level. Traditionally this sent output to the console window. So I dilligentley start up a console terminal window and it stays resolutely blank.After a little investigation it appears that in converting autofs to SMF, these messages are now sent to the smf logsvcs -l \\\*autofs\\\*fmri svc:/system/filesystem/autofs:defaultname Automounterenabled truestate onlinenext_state nonestate_time Fri 17 Jun 2005 09:45:35 BSTlogfile /var/svc/log/system-filesystem-autofs:default.logrestarter svc:/system/svc/restarter:defaultcontract_id 47 dependency require_all/none svc:/system/filesystem/local (online)dependency require_all/restart svc:/milestone/name-services (online)dependency optional_all/none svc:/network/nfs/client (online)So I take a look in /var/svc/log/system-filesystem-autofs:default.log and low and behold we have the answer.t1 Automountd: trace level = 9t1 do_lookup1: action=2 wildcard=FALSE error=2t1 LOOKUP REPLY : status=2t10 LOOKUP REQUEST: Wed Jun 22 17:50:02 2005t10 name=cwb[] map=auto_home opts=nobrowse path=/home direct=0t10 PUSH /etc/auto_homet10 Executable map: map=/etc/auto_home key=cwbt10 read_execout: forking .....t10 read_execout: map=/etc/auto_home key=cwb line=t10 POP /etc/auto_homet10 do_lookup1: action=2 wildcard=FALSE error=2t10 LOOKUP REPLY : status=2t10 LOOKUP REQUEST: Wed Jun 22 17:50:02 2005t10 name=cwb[] map=auto_home opts=nobrowse path=/home direct=0t10 PUSH /etc/auto_homet10 Executable map: map=/etc/auto_home key=cwbt10 read_execout: forking .....t10 read_execout: map=/etc/auto_home key=cwb line=t10 POP /etc/auto_homet10 do_lookup1: action=2 wildcard=FALSE error=2t10 LOOKUP REPLY : status=2Note the PUSH of auto_home. In that section it appears that the map is executeable. But in this case auto_home is just a normal file$ cat /etc/auto_home# Home directory map for automounter#cwb deadarm:/export/home/&So clearly something has happened to make the automounter believe this is an executeable map (see man automount(1m) for more details of automount maps and executeable ones).Check the permissions on /etc/auto_\*$ ls -l /etc/auto_\*-rwxr-xr-x 1 root root 339 Apr 9 18:11 /etc/auto_direct-rwxr-xr-x 1 root bin 66 Jun 23 14:44 /etc/auto_home-rwxr-xr-x 1 root bin 128 Jun 22 17:55 /etc/auto_masterSo somehow I've managed to get execute permission on these files, so the automounter tries to run them. Hence I get no mounts. Clearing the exec bit$ chmod 644 /etc/auto_\*And it all returns to normal. Now make sure I turn off the debugging so as not to fill up my log files :)$ ls -l /net/=0/net/=0: No such file or directoryPutting the debug messages in the the smf log file is great. It makes it much easier to debug remotely and you don't have to rty and capture the console output as you're testing. My laptop is now working fine, and my fix is still good. Of course I still don't know why my map files had the mode changed. I suspect I mistyped. I'll pkgchk the system later to see if there are any other strange permissions I've set...Technorati Tag: OpenSolarisTechnorati Tag: SolarisTechnorati Tag: automountTechnorati Tag: debuggingTechnorati Tag: smf

I've been working on a fix to an automounter problem for some time. So I'm just testing the final fix and sanity testing the binaries on my laptop. Suddenly my home directory doesn't get mounted....

Solaris

UK OpenSolaris User group

On Monday we had the first meeting of the UK OpenSolaris user group. It was interesting to meet some of the other people (ie. non Sun people) who want to get OpenSolaris working. There was a presentation from Simon Phipps about why OpenSource is necessary and good, which was interesting. We also had a free form discussion about anything. The vast majority of this was covering Licensing which is something I have no experise, or to be honest interest in. However there were two really interesting technical points. 1. How will I know exactly which source made up my [Open]Solaris system: I put Open in brackets because this also pertains to comercial solaris and is a problem we've had inside sun for many years. We know what shipped as base Solaris 10 for example, but then we ask customers to apply patches (or they choose to because they have a problem). These patches are generated from what is called the "Patch Gate", which is branched off the main gate at FCS (at which point the main gate is frozen). All patches are generated from this and then the target binaries are put in to the patch and applied to the system. Copies of the source are also take and put in the "Source Patch" which is only available internally. Obviously we can keep a record of the SCCS versions of the code we put in to the patches so that we can go back later and find out. But these are not readily available from the binaries. This would be a good thing to be able to find out from a running system and is probably something we should put in to OpenSolaris to allow users to identify exactley what is on their system.2. Can we mount ext[23]fs on Solaris: The answer to this is no. At the moment we have no implementation of the ext[23] file systems. It would be useful for migrating people to Solaris, however as a company our development is targetting ZFS. That's not to say it shouldn't be done, so put it up for discussion and see who bites. Try posting to the disucssion forum Or perhaps the file systems one I know that is primarily for ufs, but theres some good filesystems people in that comunity.I'm now looking forward to the next meeting. Hopefully we will get even more people there and can have some targetted technical discussionsTechnorati Tag: OpenSolarisTechnorati Tag: SolarisTechnorati Tag: FileSystemsTechnorati Tag: ext3fsTechnorati Tag: ext2fs

On Monday we had the first meeting of the UK OpenSolaris user group. It was interesting to meet some of the other people (ie. non Sun people) who want to get OpenSolaris working. There was a...

Solaris

How does my signal handler get to run

WithOpenSolaris a reality, I can talk about some of the areas I've had to trouble shoot in the past. I started trying to write blogs and soon found I needed to reference the source code to illustrate the point. This is one area I've had to look at many times when tracking down problems. I've pieced together my understanding from reading the source code, however while writing this blog entry I've used a simple program and some "D" toillustrate what's going on.Here's the simple program to copy some memory around, receive an alarm signaland run a signal handler.$ cat alarm.c#include <string.h>#include <stdlib.h>#include <unistd.h>#include <signal.h>void alarm_handler();main(){ char \*src, \*dest; sigset_t set; struct sigaction sigact; sigact.sa_handler = alarm_handler; sigemptyset(&sigact.sa_mask); sigaddset(&sigact.sa_mask, SIGALRM); sigact.sa_flags = SA_SIGINFO; sigaction(SIGALRM,&sigact,NULL); src = malloc (4096); dest = malloc (4096); memset(src, '@', 4096); alarm(4); while(1) { memcpy (src, dest, 4096); }}void alarm_handler(){ char txt[]="In alarm handler"; write (1, txt, sizeof(txt)); exit(0);}memset and memcpy are implemented by using the block load and store functions which use the floating point registers. This was used to emulate a customer's problem.We know that the signal is actually delivered by the kernel when it is safe to do so, which is usually when you transition from kernel to userland or back (ie entering or leaving a system call). The act of sending a signal simply marks the t_astflag in the thread structure for trap to process later, so we can write a little D to trace anything from trap_cleanup (where the astflag is checked). We also need to check whether we should be checking for a signal at this time (this was new to me as well)$ cat trap.d#!/usr/sbin/dtrace -Fsfbt::trap_cleanup:entry/ curthread->_tu._ts._t_astflag == 1 && curthread->_tu._ts._t_sig_check == 1 && execname == "alarm"/{ trace (timestamp); self->trace = 1;}fbt::trap_cleanup:return/ self->trace == 1/{ trace (timestamp); self->trace = 0;}fbt:::entry/ self->trace == 1/{ trace (timestamp);}fbt:::return/ self->trace == 1/{ trace (timestamp);}So we run these two together so we can see what happens. I'll annotate this in-line$ trap.ddtrace: script 'trap.d' matched 46375 probesCPU FUNCTION 1 -> trap_cleanup 94182830223096 1 | trap_cleanup:entry 94182830229763 1 -> sigcheck 94182830231096 1 <- sigcheck 94182830232763sigcheck is actually part of the ISSIG_PENDING macro which is used to establish if there is a signal there and if we are allowed to do anything with it. 1 -> issig 94182830234346 1 <- issig 94182830235763 1 -> issig_forreal 94182830237763 1 -> schedctl_finish_sigblock 94182830239763 1 <- schedctl_finish_sigblock 94182830241263 1 -> fsig 94182830244096 1 -> sigdiffset 94182830246013 1 <- sigdiffset 94182830247346 1 <- fsig 94182830249263 1 -> fsig 94182830250263 1 -> sigdiffset 94182830251096 1 <- sigdiffset 94182830251930 1 -> lowbit 94182830253430 1 <- lowbit 94182830254513 1 <- fsig 94182830255263 1 -> sigdeq 94182830257513 1 <- sigdeq 94182830259430 1 -> isjobstop 94182830261013 1 <- isjobstop 94182830262430 1 -> undo_watch_step 94182830264346 1 <- undo_watch_step 94182830266096 1 <- issig_forreal 94182830267180All of this is issig_forreal() a function which extracts the next pending signal and puts it in to p_cursig. We then call psig () 1 -> psig 94182830268763 1 -> schedctl_finish_sigblock 94182830269680 1 <- schedctl_finish_sigblock 94182830270596Blocks all signals if requested by the userlevel code setting tdp->sc_shared->sc_sigblock 1 -> sigorset 94182830273596 1 <- sigorset 94182830275013sendsig() is platform dependant. This is for the sparc version so we go in to sendsig32() 1 -> sendsig32 94182830277430 1 -> flush_user_windows_to_stack 94182830279263 1 <- flush_user_windows_to_stack 94182830280346Make sure the stack has all been flushed to the stack so we can safely change the stack pointer.We're going to have to contruct a stack frame for our signal handler so the current registers have to have somewhere to go on the stack. So we need to know how big a stack frame is, we do this by incrementing minstacksz. First of all the floating point registers if the fpu is enabled then the general registers, then the extra register set. 1 -> xregs_getsize 94182830282596 1 <- xregs_getsize 94182830283930Watch point handling 1 -> watch_disable_addr 94182830285846 1 -> avl_numnodes 94182830287096 1 <- avl_numnodes 94182830288263 1 <- watch_disable_addr 94182830289346From here we're copying the context (all the register information) in to the area on the stack we've just set up. 1 -> kmem_alloc 94182830290513 1 -> kmem_cache_alloc 94182830291846 1 <- kmem_cache_alloc 94182830293430 1 <- kmem_alloc 94182830294680 1 -> savecontext32 94182830296430 1 -> flush_user_windows_to_stack 94182830298680 1 <- flush_user_windows_to_stack 94182830299513 1 -> getgregs32 94182830301763 1 -> mkpsr 94182830304680 1 <- mkpsr 94182830306096 1 <- getgregs32 94182830307596 1 -> getfpregs 94182830309430 1 -> lwp_getdatamodel 94182830310513 1 <- lwp_getdatamodel 94182830311430 1 <- getfpregs 94182830315096 1 -> fpuregset_nto32 94182830316513 1 <- fpuregset_nto32 94182830318680 1 <- savecontext32 94182830320013 1 -> xregs_setptr32 94182830321846 1 <- xregs_setptr32 94182830323013 1 -> kmem_alloc 94182830324013 1 -> kmem_cache_alloc 94182830325096 1 <- kmem_cache_alloc 94182830326263 1 <- kmem_alloc 94182830327180 1 -> xregs_get 94182830328846 1 -> xregs_getgregs 94182830331263 1 <- xregs_getgregs 94182830333013 1 -> xregs_getgfiller 94182830334763 1 <- xregs_getgfiller 94182830336180 1 <- xregs_get 94182830337346 1 -> xregs_getfpregs 94182830339346 1 -> xregs_getfpfiller 94182830341513 1 <- xregs_getfpfiller 94182830343346 1 <- xregs_getfpregs 94182830344846 1 -> kmem_free 94182830347013 1 <- kmem_free 94182830347930 1 -> kmem_cache_free 94182830348930 1 <- kmem_cache_free 94182830349846 1 -> kmem_free 94182830352263 1 <- kmem_free 94182830353013 1 -> kmem_cache_free 94182830353930 1 <- kmem_cache_free 94182830354846 1 -> watch_disable_addr 94182830356513 1 -> avl_numnodes 94182830357430 1 <- avl_numnodes 94182830358430 1 <- watch_disable_addr 94182830359263 1 -> ucopy 94182830361013 1 <- ucopy 94182830362680 1 <- sendsig32 94182830364680Interestingly we see nothing in the dtrace output for the crucial bit of the code that sets the registers up to run the signal handler. This is because it is simply done by manipulating the register within sendsig32. Here is a snippit of the code that does this1451 /\*1452 \* Set up user registers for execution of signal handler.1453 \*/1454 rp->r_sp = (uintptr_t)fp;1455 rp->r_pc = (uintptr_t)hdlr;1456 rp->r_npc = (uintptr_t)hdlr + 4;1457 /\* make sure %asi is ASI_PNF \*/1458 rp->r_tstate &= ~((uint64_t)TSTATE_ASI_MASK << TSTATE_ASI_SHIFT);1459 rp->r_tstate |= ((uint64_t)ASI_PNF << TSTATE_ASI_SHIFT);1460 rp->r_o0 = sig;1461 rp->r_o1 = (uintptr_t)sip_addr;1462 rp->r_o2 = (uintptr_t)&fp->uc; 1 <- psig 94182830366263 1 <- trap_cleanup 94182830368013So once we return to userland the registers we're using are the ones set at the end of sendsig32. As a result we start running in the signal handler.Simple really. With dtrace it is very simple to see the flow of code through a function. Combined with the opensolaris source browser you can quickly learn about new areas. Essential when tracking down bugs and customer problems.Technorati Tag: OpenSolarisTechnorati Tag: SolarisTechnorati Tag: DTrace

WithOpenSolaris a reality, I can talk about some of the areas I've had to trouble shoot in the past. I started trying to write blogs and soon found I needed to reference the source code to illustrate...

Music

Garbage

Last night I got to go to see Garbage at Brixton Academy. I'd forgotten what a good venue Brixton was.The place was completely packed. I think its about a 5000 capacity venue, and it sold out within a day of the tickets going on sale. We were lucky to get downstairs tickets, a friend of mine also went but could only get upstairs. For some reason, it's not the same if you have seats and can't get down the front. Mind you we couldn't get further forward than the mixing desk as we only got there 10 minutes before they were on. (Putting children to bed does cramp your giggin' style).So lets just say the sound was not the best I've heard there. Especially during the first couple of songs. Shirley's voice sounded way too brittle. Guitars were scratchy and there was a sever lack of bass. Either I got used to it or after a few songs the sound guys managed to get most of it under control. There were some moments of brilliance, Why do you love me?, Only happy when it rains, When I grow Up. I think I'm Paranoid, and Push It really stand out. Androgyny was excellent too. I haven't listened to the new album too much so I didn't know the songs as well as I'd have liked. I'm listening to it now.There was a clever backdrop. They had 4 plasma TVs which showed some video footage. The backcloth of the stage was then projected with probably 200x200 spotlights which showed a pixelated interpretation of the TV's image. It worked really well.Garbage's music lost a lot of it's subtlety live. They don't use (much) sequenced stuff, there's no backing vocals. OK there were a couple of drum loops I spotted, but generally anything sampled was triggered by one of the guys on stage. So you get a much more raw interpretation of the songs. Shirley, two guitars, bass and drums. Overall excellent.On a human note, the couple in front were interesting to watch. She was obviously not a fan, had probably only in the new album. He was obviously a die hard fan, new everything. The look she gave him when he wolf whistled waiting for the encore was priceless. A fantastic evenings musicTechnorati Tag: MusicTechnorati Tag: Garbage

Last night I got to go to see Garbage at Brixton Academy. I'd forgotten what a good venue Brixton was.The place was completely packed. I think its about a 5000 capacity venue, and it sold out within...

General

It's been a while. Thoughts on vast amounts of data

Now I'm back in the UK things are back to there usual hectic nature and I've neglected my embryo of a blog. So I'm going to make a conscious effort to try and blog before going home.So I was sent a very interesting paper today by Clay Shirky. It started me thinking about the nature of file systems. UFS now supports 16 TB of data in a single file system. OK there are limitations on the number of files, but that's still a heck of a lot of data that needs to be organised. The trouble is our file systems and email folders and many other things we do on our computers assume that the "thing" we're looking at can be easily categorised and infact many times they can't. The paper describes many examples, but the paper itself is as good as any. I wanted to provide a link to the paper in this blog entry, I remembered it have been emailed to me, but email folders tend to run rampant with things I think are going to be useful. I never remember where I've put something. I also remembered it had ontology in the title, and was sent to me by by Peter Harvey. These could be described as metadata for the paper. By searching for this meta data I found the link to the paper again. Where it was was irrelevant.The extension that Clay Shirky makes is that if you get a large number of people to add the meta data they think is important to a document (eg a URL) then you get, on average, a good classification of the document which can be used to find it. The beauty is that the quality of the cateforisation, and hence the ability to find the data increases with the number of people adding the meta data (refered to as a tag)From a technical point of view this could be done now using UFS extended attributes. What would be needed would be a simple tool to add the tags to the data and search the extended attributes for the documents. Sounds like an RFE for Nautilus. The only trouble is I guess one of scale. How would it work unless we have a large number of people adding their tags to the files. I need to give this more thought, but I the paper really got me thinking and I hope it does you too.

Now I'm back in the UK things are back to there usual hectic nature and I've neglected my embryo of a blog. So I'm going to make a conscious effort to try and blog before going home.So I was sent a...

General

Weekend Stuff: Back home and some family time

Those that know me will know I've been in India for the last couple of weeks. We have a team of around 50 plus engineers there, so it is important that all the senior folks get to meet them and help set the tone for the coming year. I think we did that quite successfully, we have a very motivated team out there. While I enjoyed the trip and it was productive, I'm glad to be home. The flight home was long but pleasant. I flew via Dubai (Bangalore to Dubai, Dubai to Heathrow). The first leg was fine but a bit crowded (good breakfast mind). 3 hours in Dubai (fantastic duty free, must plan to do more shopping next time) and the 8 hours back from there. There was a moment when I looked at the sky map and saw the nearest town was Basra. It brought me up a bit short. So today was a quiet 18 mile bike ride, not too fast as I haven't done much for two weeks being away. I think I'll set up a Bikes or Rides section for that kind of thing. The my wife went for a 3 mile run. She's doing a 10K tomorrow, but it didn't go well today. Hopefully tomorrow will go better.This afternoon we went to Legoland with the kids. Good fun. Our youngest went to bed straight as soon as we got home, and our eldest bought a Harry Potter lego set, 1/2 price in the end of line stuff, and is working through the instructions.It's going to be strange being back in the office. I've been out for over 2 weeks.

Those that know me will know I've been in India for the last couple of weeks. We have a team of around 50 plus engineers there, so it is important that all the senior folks get to meet them and...