Tuesday Jun 16, 2009

Back to Parallel Patching for Solaris 10

In my previous entry Parallel Patching in Solaris 10 I mentioned that the patches for this would be released before the end of June, these should be available on SunSolve from tomorrow (June 17th), feature is contained in the latest Solaris 10 patch utilities patch, 119254-66 (SPARC) and 119255-66 (x86).

This is available for use on all Solaris 10 systems. 

Simply install this patch, set the maximum number of non-global zones to be patched in parallel in the config file /etc/patch/pdo.conf, and away you go.

Prior to this feature, each non-global zone was patched sequentially, leading to unnecessarily long patching times for zones systems.

With this feature invoked, the global zone continues to be patched first, but then the non-global zones can be patched in parallel, leading to significant performance gains in patching operations on Zones systems.

While the performance gain is dependent on a number of factors, including the number of non-global zones, the number of on-line CPUs, the speed of the system, the I/O configuration of the system, etc., a performance gain of ca. 300% can typically be expected for patching the non-global zones - e.g. On a T2000 with 5 sparse root non-global zones.

Here's the relevant note from the patch README file:

NOTE 10: 119255-66 is the first revision of the patch utilities to deliver "zones parallel patching". 

         This new functionality allows multiple non-global zones to be patched in parallel by patchadd.   Prior to revision 66, patchadd would patch all applicable non-global zones sequentially, that is one after another. With zones parallel patching, a sysadmin can now set the number of zones to patch in parallel in a new configuration file for patchadd called /etc/patch/pdo.conf.

         The two factors that affect the number of non-global zones that can be patched in parallel are

         1. Number of on-line CPUs
         2. The value of num_proc in /etc/patch/pdo.conf

          If the value of num_proc is less than or equal to 1.5 times the number of on line CPUs, then patchadd limits the maximum number of non-global zones that will be patched in parallel to num_proc. If the value of num_proc is greater than 1.5 times the number of on line CPUs, then patchadd limits the maximum number of non-global zones that will be patched in parallel to 1.5 times the number of on line CPUs.  Note that patchadd will patch all applicable non-global zones on a system, the above description outlines only how patchaadd determines the maximum number of job slots to be used during parallel patching of non-global zones.

          An example of this in operation would be where:
          number of non-global zones=5
          and number of on line CPU's is 32 ( assume a T2000 here )

          In this case the maximum setting for num_proc would be 48, but as the number of non-global zones=5, then num_proc will be set to 5.

          Please see comments in /etc/patch/pdo.conf for more details on setting num_proc.

Bigger than 1Tb spindles

Seems a strange title for a blog entry, but I'll explain. As many of you are aware the current Solaris 10 installer is getting somewhat old and painful to use, in fact it is really beyond its useful life.

This is one of the many reasons why in OpenSolaris you have not just IPS but a new installer as well, far more akin to what you see on other operating systems today.

As I discussed in my blog entitled Parallel Patching for Solaris 10, the work we do in this space is highly targeted as it is high risk.

Solaris 10 Update 8 when it ships later this year will have some more enhancements in this space as I've already mentioned. We recently made some more changes in this area, specifically to allow Solaris to work with boot disks of greater than 1TB.

Now this may not sound like such a big deal but in the context of when the installer was designed for Solaris it cannot be underestimated.

So what have we done and why?

The current disk labeling scheme in Solaris (vtoc inside fdisk) breaks down past 1Tb for bootable disks. With Solaris 10 Update 8 not only have the OS level changes been made to allow x86 and SPARC based system to boot from disks up to 2Tb but install changes have also been made to allow the Solaris 10 installer to handle these disks.

Monday Jun 15, 2009

Configuring an auto install client / server setup using VirtualBox

During one of my presentations @ CommunityOne I demonstrated on my laptop an auto install setup in VirtualBox. I had an auto install server running OpenSolaris 2009.06 and used that to "jumpstart" (to use a Solaris 10 familiar term) a client.

I was asked to write this and committed to that asap after @ CommunityOne,so here goes...

1. Install 2009.06 in VirtualBox

2. Create a 2nd network adapter for the virtual machine, under settings, select network, adapter 2, enable it and then where it says attached to select "internal network" (you'll need to shutdown the virtual machine to do this if you didn't do it before installing).

3. Inside the virtual machine install the autoinstaller package:

pkg install SUNWinstalladm-tools


You need to have root priviliges to do everything from 3. onwards, best way is to pfexec bash

4. Now configure the network connections on the virtual machine you just installed

Adapter 1 is e1000g0 and will be dhcp by default, leave that alone

Adapter 2 is e1000g1 and will need to be configured, specifically:


So first off edit /etc/hosts, adding in:    name_of_the_machine    aiclient0    aiclient1    aiclient2    aiclient3

Add a line entry aiclient<number> for every machine you'd like to auto install. You can pick a whatever name you want here for each one, I just used aiclient to make it easy to understand.

name_of_the_machine is the name of the machine you installed in step 1, by default this is "opensolaris".

Also in /etc/hosts you have the line: opensolaris opensolaris.local localhost loghost

Delete the first opensolaris entry so the line looks like: opensolaris.local localhost loghost

Now edit /etc/hostname.e1000g1, assuming you entered as the internal network address of your guest earlier enter that into this file.

Now edit /etc/netmasks, adding in the line:

Now check and modify the status of your network/physical smf services:

guest@opensolaris:~# svcs -a | grep network/physical
disabled       12:36:58 svc:/network/physical:default
online         12:37:02 svc:/network/physical:nwam
guest@opensolaris:~# svcadm disable /network/physical:nwam
guest@opensolaris:~# svcadm enable /network/physical:default
guest@opensolaris:~# svcs -a | grep network/physical
disabled       13:20:57 svc:/network/physical:nwam
online         13:21:20 svc:/network/physical:default

Configure the e1000g1 interface:

ifconfig e1000g1 inet netmask broadcast
ifconfig e1000g1 up

Configure the e1000g0 interface:

ifconfig e1000g0 dhcp

5. Start the auto install service, run the command:

installadm create-service -n 0906x86 -i -c 4 -s /images/osol-0906-111b2-ai-x86.iso /export/aiserver/osol-0906-ai-x86


The assumes you've put the iso image in /images and named it osol-0906-111b2-ai-x86.iso
The -c option is the number of clients you've configured in /etc/hosts, in this example it is 4
The -i option is the address of the first client you've configured

You should see:

Setting up the target image at /export/aiserver/osol-0906-ai-86 ...
Registering the service 0906x86._OSInstall._tcp.local
Creating DHCP Server
Created DHCP configuration file.
Created dhcptab.
Added "Locale" macro to dhcptab.
Added server macro to dhcptab - opensolaris.
DHCP server started.
Unable to determine the proper default router
or gateway for the subnet. The default
router or gateway for this subnet will need to
be provided later using the following command:
   /usr/sbin/dhtadm -M -m -e  Router=<address> -g
Added network macro to dhcptab -
Created network table.
adding tftp to /etc/inetd.conf
Converting /etc/inetd.conf
copying boot file to /tftpboot/pxegrub.I86PC.OpenSolaris-1
Service discovery fallback mechanism set up

Verify the service creation has worked:

Start up firefox and go to http://localhost:5555 it should show an 'Index of /' page

6. Now setup dhcpmgr and ipv4

run dhcpmgr

Select macros and double click on the dhcp_macro_0906x86 to bring up a macro window and add the following macros:

Router with the value of the ip address of this machine (

DNSserv with the value of the IP address in /etc/resolv.conf (or one of them if more than one is listed)


DNSserv will need to be \*changed\* when you move the system around to the current value in /etc/resolv.conf

8. Fix up the server to allow for IPv4 forwarding:

routeadm -e ipv4-forwarding
routeadm -u

9. Create your client machine in VirtualBox

\* Give it a hard disk of 16G or more
\* Set the boot order to be network first (Settings->General->Advanced)
\* Set the network to be internal (in the same way as you setup the e1000g1 interface earlier, using Settings->General->Network)

10. Startup the client

You'll get the PXE boot, a dhcp address, then the grub menu for 2009.06 with a single line and then you'll get the screenshot you see below:

11. When it is installed

You'll get a successful completion messages and if you look at the /tmp/install_log file the end will look like this:

To use your newly installed image shut it down, then go and change your network boot priority or remove (deslect) it completely (Settings->General->Advanced).

Once you have done that you can boot your newly installed machine.

The install log can be found in /var/sadm/system/logs/install_log on the installed client machine.

Note: Thx goes to Pete Dennis on my team for working with me on this.

Sunday May 31, 2009

CommunityOne, distro constructor and laptop migration

I'm on the road again, actually this is my second trip since the one I wrote in China a few weeks ago, since then I've done a week in California, a couple of weeks @ home and I'm now back on the VS19, this time for CommunityOne and JavaOne and the launch of OpenSolaris 2009.06 on Monday at CommunityOne.

I've spent the last couple of weeks writing slides (off and on) and setting a laptop up for a demo at CommunityOne I'm planning to give on Tuesday, the talk is entitled "Deploying OpenSolaris in your DataCentre" as part of that I'm planning to demo at least one of the features that makes OpenSolaris so effective in a datacentre environment.

The distro constructor, put simply it allows you to build a custom image (very much like the OpenSolaris livecd) as either an iso or a usb and then with OpenSolaris 2009.06 you the automated installer technology, which is new, you can then take that iso image and "jumpstart" (for those of you familiar with earlier versions of Solaris) and apply that to machines across your enterprise. Knowing that each one of them is installed absolutely the same.

Its a really easy process, in fact it is so easy an executive can do it...the example below is one I ran on my workstation on Friday.

cove(5.11-64i)$ pfexec pkg install SUNWdistro-const
DOWNLOAD                                    PKGS       FILES     XFER (MB)
Completed                                    1/1       75/75     0.19/0.19

PHASE                                        ACTIONS
Install Phase                                104/104
PHASE                                          ITEMS
Reading Existing Index                           8/8
Indexing Packages                                1/1
cove(5.11-64i)$ cd /usr/share/distro_const/slim_cd
cove(5.11-64i)$ distro_const build ./slim_cd_x86.xml
cove(5.11-64i)$ pfexec distro_const build ./slim_cd_x86.xml
/usr/share/distro_const/DC-manifest.defval.xml validates
/tmp/slim_cd_x86_temp_6104.xml validates
Simple Log: /rpool/dc/logs/simple-log-2009-05-29-14-41-39
Detail Log: /rpool/dc/logs/detail-log-2009-05-29-14-41-39
Build started Fri May 29 14:41:39 2009

Two hours later

==== usb: USB image creation
/dev/rlofi/2:    1723200 sectors in 2872 cylinders of 1 tracks, 600 sectors
  841.4MB in 180 cyl groups (16 c/g, 4.69MB/g, 2240 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
32, 9632, 19232, 28832, 38432, 48032, 57632, 67232, 76832, 86432,
1632032, 1641632, 1651232, 1660832, 1670432, 1680032, 1689632, 1699232,
1708832, 1718432
1435680 blocks
Build completed Fri May 29 17:04:47 2009
Build is successful.

Other stuff to know about...

This will result in two image files:


The manifest is an XML file that describes what to build and comes with two examples:


Now auto-installing it is something I'm still coming to terms with, I have a recipe, but ran out of time to set it up before I boarded the plane, so that'll be one I'll have to try when I get to San Francisco later today. Assuming all goes well you can come and see me demo autoinstalling from one VirtualBox guest to another on my Toshiba R600 running OpenSolaris 2009.06, if not it'll be just the distro constructor. Either way as soon as I crack auto install I'll post a blog about it.

Of course VirtualBox 2.2.4 is also now available, just released over this weekend, something else I need to do, upgrade my 2.2.2 image before Tuesday (whilst putting this blog into the editing tool and loading up the pictures I let the upgrade run in the background, 16 minutes inc. a 74MB download, the wonders of IPS).

Just to top it all this last week I managed to "break" my R600 :-( well I bent the power socket on the laptop so that when I plugged the mains supply in it wobbled and it was only a matter of time before the connector itself came off the motherboard so I also had to migrate from one R600 to another.

Laptop migration particularly when you were running multiple guests used to be a hugely painful experience, well in an OpenSolaris 2009.06 world it got a lot simpler (OK with a bit of help from VirtualBox as well :-) .

Firstly just copying your home environment off to something and then back, well USB sticks are slow and Solaris interop to something link a dlink NAS box also used to be difficult, well now anymore. At home I have a number NAS boxes, primarily for backup off the computers in the house which my family use, but also to serve out DVD iso images, music, video and recorded tv and using those in now just got a whole lot easier.

For those of you with OpenSolaris goto Places -> Computer -> Network, it'll then highlight a Windows network (which is how these commodity NAS boxes appear) and you can then simply connect to the appropriate volume and "bingo" you have a store to copy your home environment onto.

So just for my home environment (without any virtual machines) I was looking at at least 8G of data, well a copy off one laptop to the dlink box and back to my new laptop, took under 20 minutes, I'm sure having a gig network helped, but even so it was really "drag and drop" style simple.

Then onto VirtualBox, 2.2.2 has a great feature, File -> Export Appliance and File -> Import Appliance, which basically allows you to save off a virtual machine and import it somewhere else. The great part is it takes care of all the config file stuff which used to make this painful and of course copying them between machines, which effectively backs them up at the same time, is simple as well.

All in all a painless experience and so simple my kids can do it as well.

Friday May 29, 2009

A Day in the Life: Kansas

A week ago I published a blog about my recent trip to Kansas, well I remembered the first time I went and a video the local team sent me, worth a view, it is such a great parody of working life in Kansas.

Wednesday May 27, 2009

Parallel Patching for Solaris 10

One of the things I said I'd try and write about are various features that you will see upcoming...and I've kind of touched on this before in the "What, no tornados?" entry, specifically Parallel Patching.

This functionality went back into build 1 of what will become Solaris 10 Update 8 later this year, so what was the problem statement and what does the project do?

When zones were introduced in Solaris 10 they stretched patching of a system to the limits. The original patching system had been designed to patch the global zone and then all non global zones sequentially. Given all the performance overheads in patching a system with a large number of zones could take 10+ hours to complete a full patching window often the window could be stretched to 30+ hours depending on the number of zones installed on a system.

The solution was to remove the sequential nature of patching. In the solution in place under the "Parallel Patch Project" a Global Zone is patched then Non Global Zones are patched in parallel. The degree of parallelization is determined by a new configuration file. The overall performance gains we saw in testing are from around 20 hours down to 6.5 hours with a value of 14 for the number of parallel patch invocations. The value 14 was based on the 14 non global zones the system had installed.

Now the really good news is that the project team managed to work their magic and you'll be able to get this functionality in the patchadd patch which you can apply to an existing Solaris 10 installation. It'll be available in the late June timeframe allowing you to take advantage of this now as opposed to waiting for the update release to ship towards the end of this year.

Whilst this may not seem like a big deal the customer impact of the current functionality limitations cannot be underestimated. I remember having a discussion with one customer after they had taken a 23 hour outage and the 200+ zones they had were still not patched.

On top of this, changes like this have to go into what we call the "install" consolidation, it contains the installer and patching to name but two a lot of which dates back to SVR4 and often earlier. The code itself is fragile and any changes have to be very carefully managed, as the risk of breakage and regressions is high. One of the reasons I've got a long list of changes I'd like my team to make to improve the customer experience in this space, but why it takes us time to get them out the door. Yet another reason why OpenSolaris and Solaris.Next will have a new installer and IPS.

Solaris 8 Vintage Support

As well as providing the sustaining engineering for the newer versions of Sun's products we also get to manage them once development really stops and they eventually go into EOL and finally EOSL, Solaris 8 recently went from what is known as phase 1 into phase 2 support, in fact in the last two months, on April 1st.

When Solaris reaches this phase in its lifecycle we offer what is called a "Vintage Patch Service". I often get asked when Solaris reaches this stage in its lifecycle what does it all mean and why.

Solaris 8 first shipped in February 2000 and was in the planning for a few years before that, Sun has released two versions of Solaris since then and you can already see where Solaris is heading by using OpenSolaris (which incidentally is what I'm running on my laptop I'm typing this on, but that is another blog entry).

Like all things Solaris 8 is starting to reach its limits and in some cases is being pushed beyond its design limits, which results in all sorts of performance impacts in the field, it also does not support our latest generation of HW platforms, such as the recently announced Nehalem chipset from Intel and our Niagara architecture in any form.

The hardware which Solaris 8 was designed to run on is also approaching EOL to varying degrees, plus in these economic times Sun has hardware like the aforementioned Niagara based machines that bring many benefits.

It is kind of like owning a car, you keep it and keep it and you keep spending money on it, but eventually you get to a point in time where you need to do that major upgrade and all of sudden you have a new car with a manufacturers warranty and your saving money on that, running costs etc.

Sun recognises the impact of such major upgrades and provides many things to make everyones life easier Solaris 8 containers for example and our Application Binary Guarantee which seems to be one of our best kept secrets. In simple terms what it says is that if you follow the published stable interfaces and write your application to them if you compile it on that release of Solaris you can pick up your application and run it on later releases with no recompilation necessary. The legal boys will now tell you it has some caveats, and it does but I've seen great success where people have "just done it" and I cannot think of a case where it has failed if people have followed the rules.

I've also seen a number of our large accounts use the Solaris 8 container technology and move everything from a Solaris 8 machine and pick it up and put onto a newer platform running Solaris 10 under the hood and in many cases with multiple machines now consolidated onto one newer platform.

The cost benefit in doing this was described to me by one customer like this..."I need to spend $ on new applications but I have an ever shrinking IT budget. By using Solaris 8 containers I was able to consolidate my existing estate (a lot of E450 type machines) onto the T series platforms and with the savings I made on HVAC plus maintenance cost reductions I had the $ I needed to re-invest in application development and deployment." which is great and especially as that was one of the reasons we designed the product and it does as they say "exactly what it says on the tin".

I also get asked why you have to pay extra to keep getting engineering support on Solaris 8 in phase 2, it is simple really, Solaris keeps marching on and otherwise we'd move that resource onto developing the next generation of products. Again back to the car analogy as parts become in short supply and with continuing demand the price goes up.

I've worked with a lot of customers over the last 9 months since we announced this programme and a lot of account teams, the business arguments for migrating are compelling even for an engineering guy and even more compelling in these economic times.

Friday May 22, 2009

CommunityOne an easy way to register for free deep dives

We've had a number of requests on how to make it easy to register for the free deep dives I mentioned in my previous entry, well here you go, we look forward to seeing everyone June 1st week, so go on register it is free.

Online Event Registration - Powered by www.eventbrite.com

Whilst your signing up for this why not also register for the OpenSolaris Ignite Newsletter.


The week of June 1st is CommunityOne or CommunityOne West to give it its full title. It starts on Monday at Moscone and runs parallel with JavaOne on Tuesday and Wednesday. CommunityOne itself moves to the InterContinental from Tuesday and JavaOne runs at Moscone. Just in case you could not get enough of all this great technology the the Open HA Cluster Summit starts on the Sunday at the Marriott, registration link for these part is available on the agenda page as well.

The full agenda for CommunityOne is here. The OpenSolaris deep dives are FREE, when you register for CommunityOne, use the promotional code OSDDT and you will not be charged. This code will NOT get you into the other deep dive tracks. For those of you that have already registered to add a deep dive session to your registration, please call the CommunityOne Hotline at 1-866-405-2517 (U.S. and Canada only) or +1-650-226-0831 (international).

On the Tuesday you'll see John Fowler, Executive Vice President of Systems for Sun officially launch OpenSolaris 2009.06, I spent sometime earlier this week installing the final release candidate on 3 machines on the metal (my office and home workstation and my R600 laptop - which is what I'll be using to present from at CommunityOne), plus I did a couple of installs in a virtual world using the latest version of VirtualBox running on Vista 64bit (with a 64 bit OpenSolaris guest). Then of course I gave the CD to my children and told them to install it :-)

More to come on 2009.06 the week of CommunityOne I may even sneak in some screen shots and other points of note before then, assuming marketing of course are not reading my blog...

Thursday May 21, 2009

Virtualisation aka V12N

I seem to have a bunch of emails in my inbox this morning about V12N, first up is one about an Upcoming V12N Webinar on May 27th, entitled "New Ways to Maximize V12N Performance" (the official title uses Virtualisation as opposed to V12N, but it is the same thing). This one is free to register and is a joint session with AMD. Details can be found here.  I've also got a presentation to review about the upcoming release of VitualBox, plus I'm still in the middle of an LDoms discussion following my customer visit from last week.

Back to TechDays, St Petersburg

A couple of weeks ago I was sent an email following on from the TechDays in St Petersburg some 8 weeks ago, it included a link to the agenda with all the videos that were recorded of presentations, on the list was one of mine, specifically the OpenSolaris track keynote "What is OpenSolaris and Why Should You Care?". The page I was sent was in Russian, so with a bit of assistance from a colleague it is now linked below. We had some fun with the microphones in the keynote, like I broke two radio microphones and had to resort to a handheld microphone. That is what happens when you give a software guy hardware :-)

Wednesday May 20, 2009

What, no tornados?

Last week I was in Kansas customer visiting, bit of a convoluted route to get to Kansas City airport (MCI). Virgin from London Heathrow to New York JFK, then Delta onto Kansas City, 2 days in Kansas and then on from Kansas City to San Francisco on Midwest and finally home on the Friday night on Virgin to London Heathrow.

Kansas is remembered by many people for a number of things, but primarily the Wizard of Oz and being in what is known as "Tornado Alley". I'm told Kansas even has a Wizard of Oz museum. That said their is a lot more to Kansas than both of these, although it is funny how people fixate on things. This is the second time have been to Kansas and it prompted the usual bunch of sarcastic comments on my facebook page, and of course I had to explain to people that Kansas City is not in Kansas it is in Missouri.

All my daughter of course was interested in was the Wizard of Oz and tornados, I can confirm that I saw neither. The closest I got to a tornado was this sign at Kansas City airport. Last time I was in Kansas, like this time I got to eat some good meat although I have to say the first time at "The Savoy Grill", is some of the best steak I've had in my life.

Anyway onto more important things, why Kansas? well if you read my last posting you'll have noted I said customer visiting.

I first went to visit this customer some 8 months ago and they were not happy at the time (as you'll have seen me comment before this is primarily the reason I get "wheeled" into customers, to visit them when they are unhappy), as the engineering guy most on the hook for support of the customer experience it is very much part of the job.

Like a number of customers they have struggling with patching, update releases and the like and several months ago wanted to hear about what happened and what our plans were to fix it, this meeting was the followup.

Well I'm pleased to say one of the first words they said was, it has definitely improved, in fact it has got a lot better they said, on more than one occasion. So what in particular?

Well starting with Solaris Update 4 we've introduced a lot of technology in the install / patching space as well as improved "other" materials such as BigAdmin and training materials

- Deferred Activation Patching

- LU & Zones imporvements

- BigAdmin patching centre

- "-M" improvements

- Update on Attach

- Training and education materials, such as youtube, SLX Patch Channel and Sun Online Learning Center

The is all documented on BigAdmin and in the patching blog, one particular entry here is sums it up in a public presentation, if anyone reading this feels we've got some gaps in the training and education space feel free to drop me an email.

Coming in Solaris 10 Update 8 towards the end of this year:

- Parallel patching

- Turbo Packaging

We've got other project in the pipeline such as:

- Pre flight checks for patching

- Patch cluster install enhancements

- Changes to SunSolve to make it easier to locate patches

- Sparse file support for lu

- Re-write of the lu-copy code

Equally if you have projects you'd like us to look at in this space, the usual caveat applies that I cannot guarantee to deliver on them, but we are always open to feedback and suggestions

My point is and this was also the point made to me by the customer it is all about progress and one of the big things from their perspective was just that. Months ago I came and said we were going to do this and we have, we have executed on it and delivered. For those of you that experienced the infamous Solaris 10 Update 3 kernel patch 118833-36 and the consequences of that you'll know what I mean directly.

I also often get asked when will I be done with the work in this space, to which my answer is "never", why? people ask, simple, this is all about improving the customer experience and as long as Solaris 10 is around we will have work to do in this space. The fact that we have to do all of this was one of the driving factors behind the Image Packaging System in OpenSolaris.

The general theme of the meeting was continued progress and demonstration of that progress, and they really felt they had a voice and that voice was being listened to and they are right on both counts and we'd demonstrated progress as we said we would.

As well as this we talked about LDoms aka Logical Domains and particularly live migration, an upcoming feature in a future release of LDoms.

We also got into a discussion around dtrace and what to do, when you hit an issue as they did where they had some fibre-based kernel structures that were not defined in Solaris 10 (ctf), the result is DTrace errors out. So armed with a specific example I came back and spoke to one of my team whose reply is as follows when I asked him is this a valid limitation of DTrace? One of them who is currently working a particularly tough SNDR issue came back and said:

Yes and no, if there is no ctf data available you can define the structures in the DTrace script itself.  Case in point, SNDR was not built with ctf.

They would want to do something like this...

#!/usr/sbin/dtrace -s  

typedef uint64_t        nsc_size_t;
typedef uint64_t        nsc_off_t;

typedef struct rdc_aio_s {
        struct rdc_aio_s \*next;
        void \*handle;
        void \*qhandle;
        uint64_t pos;
        uint64_t qpos;
        uint64_t len;
        uint64_t orig_len;
        int     flag;
        int     iostatus;
        int     index;
        uint_t  seq;            /\* sequence on async Q \*/
} rdc_aio_t;


Thanks to Paul for his comprehensive response to my question, for which I can take no credit, as I cut n' pasted his email and as my team is fond of telling me your not supposed to be doing this kind of stuff anymore.

Overall a good week, a good a constructive dialogue, with no tornados either outside or in the meeting :-)

Now back to writing slides for CommunityOne...

Monday May 11, 2009


I've been to China, Beijing to be precise this week (OK last week when I posted this), its some 5 years since I was here, which is way too long, so many things have changed, Last time I came Beijing was a building site, this time building is still going on, but a lot less of it. The city and the infrastructure is so different from last time as a result of all that building.

The flight was on Air China, it is a Virgin codeshare flight.

I'm starting to write this blog in terminal 3 at Beijing airport, I won't get it finished here, I'll do that on the plane and post it when I get back home this evening. Its hard to comprehend the sheer scale of what was done in 4 years here, this article on the BBC website comes close. Then you think about how long terminal 5 @ Heathrow took, enough said.

Sun has an engineering centre in Beijing along with services (inc. a call centre) and sales. As well as presenting to the engineering teams, I also went customer visiting and spent time talking to the local sales team, they were particularly excited (still) about AmberRoad and the opportunities that is opening up. Customer response around the world on this product seems to be universal, it is a NetApp beater. The sales team in Beijing were particularly interested in the ISV opportunities available with AmberRoad, these are available for all to see here.

To an extent the scale of what was done in building the new airport @ Beijing is akin to what Sun did WRT AmberRoad and about as spectacular.

Like a lot of Sun product it is available via our try and buy programme.

A few things haven't changed in China, such as ringpulls on cans, remember the ones that come off the can? Well in China they still do as you can see.

Anyway back to more serious matters, customer visits the biggest topic here was Solaris 10, the roadmap, OpenSolaris and where Solaris is heading. Lots of excitment here about what we've just released in Update 7, where the roadmap is going and especially OpenSolaris. I covered most of this in my last blog entry so won't clutter up this one with that again.

So as expected I never got to finish this entry in China and since I forgot to save a local copy it is now Monday morning and I'm finishing this off @ LHR waiting for a flight to JFK.

The point is that vast majority of customers I talk to want a couple of things i) stability and ii) innovation, and we basically have that Solaris 10 gives you the stability, with limited new features, primarily those needed for new platform support and device driver support and OpenSolaris with innovations such as Crossbow, which at somepoint will become a Solaris.Next.

We also have that class of customer that would like all the stability that goes with the likes of Solaris 10 along with some of the major innovation that is in Solaris.Next. Sadly as those of you that have ever done software development well know, this is somewhat of a challenge to achieve. Changing 1M+ lines of code in a shipping product is high risk, to say the least, which is why we have a policy of focusing our update releases on platform support and device driver support, with very focused feature enhancements. Now as those of you that have ever spoken to me about this will know (and before I get a bunch of comments on this blog entry), we did not get the balance right in the earlier update releases to Solaris 10, which caused some stability issues, but that has been addressed.

Back to customer feedback, overwhelmingly positive, love Solaris, love the innovation, nice meetings to have. Especially when you get an opportunity to talk about the roadmap for Solaris at the same time. Got a different a far tougher meeting this week, same topic, despite loving Solaris, they've had a tough time of it recently and they want to hold the engineering guys feet to fire, that'll be me then. More about that later this week.

Friday May 01, 2009

San Francisco, Solaris 10 Update 7 and CTO reviews

I've been off on my travels again this week, for those that follow my blog you can see it now links into Tripit and facebook, as well as twitter and would have seen my updates.

Having a global role for a US corporation (if your interested in my bio click here) and a team spread around the world (the challenges of global engineering is one of those things I'll blog about at another time) means that I do spend a significant portion of my time in California.

So what about this week, well as I tend to on these trips I flew out on Sunday on the VS19 and fly back later today (well at least it is as I write this) on the VS20.

This week has really been about what are called CTO reviews although as always my time out in CA has been fully utilised, from early morning to late at night, including preparing for at least one presentation at the upcoming Community One West conference in early June.

So what are CTO reviews? these are 3 days where the Sun Systems business unit, headed up by John Fowler reviews all the ongoing projects, be they hardware or software, it matters not, every product in the Systems portfolio, which includes Solaris and all the other software products my team provides the sustaining engineering for gets reviewed in depth, including product roadmaps our for the next 24 months.

As part of that is a quality review of how the products are performing out in the world and most importantly from my perspective what is the customer experience and what lessons have we learnt or do we need to learn to drive up product quality and improve the customer exerience. Definitely a valuable exercise and allows me to ensure I keep product teams working on the latest new thing honest :-) So for those customers reading this be assured that the feedback I get does get passed into the new development teams and up the management chain.

Anyway onto the other part of this blog title Solaris 10 Update 7 RR'd on April 29th this week. Its also known as Solaris 10 5/09, at least that is what our marketing colleagues label it, although most people I know refer to it as the "7th update to Solaris 10", hence Solaris 10 Update 7, which is what we call it internally. I also often get asked what / how is a Solaris 10 update, that is yet another blog opportunity, I'm starting to build a list of these.

So what is so great about Solaris 10 Update 7?

The biggie in this is more from the Solaris and Intel collaboration, for those that read one of my previous blogs "Nehalem a Solaris and OpenSolaris perspective" a lot of what I'm going to mention below is the Solaris 10 implementation of this.

As I've mentioned previously Intel is the second-largest contributor to the development work taking place in Solaris community, be it OpenSolaris or Solaris 10 (after Sun itself).  This is where the next version of Solaris is being built, and much of the performance work for Intel processors and other hardware has been introduced into Solaris 10 updates as well, with more, although not all to follow.

This update includes the Power Aware Dispatcher - power management with a datacenter focus.  This includes support for Intel processor T-States -- the ability to throttle processor speed for cooling.  cpupm on by default, plus more aggressive management of P-State changes, only requiring 1 second of idle. These are significant for power management in datacenters.

CPU performance counters -- low-level instrumentation of Intel processors, allowing developers to tune their applications for best performance

Sun and Intel have also integrated Xeon processor hardware diagnostic features into the Solaris Predictive Self Healing framework, to provide greater reliability and resiliency.

Large Segment Offload -- the ability for Intel 10 gigabit network cards to handle processing of large network packet segments, resulting in better network throughput and less system loading.

So what else?

Internet Protocol Security (IPsec) is a suite of protocols for securing network communications.  In Solaris 10 5/09,  it has been integrated with Solaris Service Manager, allowing simplified management of overall security functions.  It now also supports UDP, and a new suite of algorithms.  It is also now usable as the interconnect for Solairs cluster, to manage fast, secure failover of session information among nodes of a cluster.

\* Secure Shell (ssh) performance enhancement on CMT systems

ssh can now use hardware crypto acceleration on "Niagara" based systems, using the  PKCS#11 engine.

\* Solaris Containers enhancements
Solaris Containers can now leverage the major speed and efficiency inherent in ZFS cloning of filesystems, by using this as the bases of container cloning.  Also, patches to container zones can now be easily backed out.

\* iSCSI target reliability and interoperability enhancements

\* Logical Domains (LDoms) enhancements
  \* Domain Services (DS) extensions for user program API
  \* Virtual disk enhancements (performance, extended VTOC large disk support)
  \* vnet and vsw now support jumbo frames
  \* libldom enhancements for sun4v root domains

\* SSD performance support

\* SunVTS updates to support diagnostics of new Sun systems

And, for those that like heading down into the real detail:

\* FMA Platform Independent topo enumeration for sun4v platforms
\* Support WWID based addressing of SAS, SATA devices
\* mpxio-capable disk support
\* RAID/ Raidctl enhancement for mpxio

Anyway enough rambling from me for now, need to catch a plane.

Tuesday Apr 21, 2009

New York City Customer Visiting

So one of the things I said I'd try and do was write "blogs on plane" (almost sounds like a movie title), can't post them until they give us internet access, those that had it pulled it when Boeing stopped the service, so it is one of the few disconnected times we all have, although cellphones will soon be allowed by various airlines later this year, at least if you believe the press.

As I'm writing this (well starting to write it) its 10.40am ish BST or 5.40am EDT, I'm on my way to New York's JFK airport for Customer meetings, left the house at 6.15am this morning to catch the VS3. We are due into JFK @ 12.10pm EDT, if you like real time updates on these things, you can see twitter, either subscribe to it or take a look at the twitter status in this blog.

I work an awful lot on airplanes, its great for catching up on email (in fact I often wonder how I stayed current on email before plane flights), by using thunderbirds offline facility and resync'ing when I get back to a connection, this is a 7hrs 40 flight, I reckon I'll work for 4hrs + of it, laptop on and headphones in connected to ipod. I also write presentations and now blogs :-) I know this re'syncing of email drives my staff "nuts" I hear coments like "you can tell when Chris has landed :-)" as the re-sync causes a slew of emails to land in inboxes. I know at least one other member of my staff that does this regularly, my director of operations, I'm also aware of others who have "caught the bug" including a number of members of my org around the world, at least those in my team have someone to "blame" they are just following the example from the boss.

On average I go to NYC 6 times a year for Customer meetings, by the very nature of my job most of them are when a Customer has had an experience they regard as sub-optimal and they want to see the engineering guy where the buck stops, that'll be me then. Although I do get the opportunity to go and evangelise Solaris as well.

I'm back on the VS2 from EWR on Wednesday night, leaves 8.50pm local arrives 9.10am local on Thursday, most of my NYC trips tend to be of this kind of length, although occasionally I get to spend a whole week and a couple of years ago I did actually manage a sightseeing trip with my wife, thinking about it that was 6 years ago for our 10th wedding anniversary!

As always I'm meeting up with the local account teams who work in NYC, mostly based out of our office on 101 Park Avenue. Plus three of my colleagues are flying in, two from the West Coast and one from just up the East Coast.

The beauty of trips like this is I get to do carry on, none of this checked baggage stuff, although by the time you add a laptop (Toshiba R600 on this trip to minimise weight and size), laptop, gadgets, charges for gadgets, gym kit and everything else, you can end up pushing the limit of carry on.

Sadly I can't talk about the specific names or too much of the Customer detail, but suffice to say like a lot of FSA accounts they have a long QA cycle and typically deploy @ most two software bundles a year, the business requirements mean that downtime is @ a premium, especially when these systems push trillions (yes trillions) of $ a day thru' them.

In this case they want to understand the how best to minimise the risks of outages from known defects (software has bugs its a fact of life) and what strategy Sun recommends WRT moving from Solaris 10 update release to update release as well as patching, as the owner of that strategy that definitely makes me the right person. In addition they want a better understanding of our QA processes and how we test Solaris releases including our application and hardware interoperability testing, hence my colleagues joining me on this visit.

Everything is risk / reward with these guys, they know they need to be current, but their internal processes can often mean that current for them is 6 months behind the curve because of these processes, something else we want to talk to them about, we need to jointly figure out a way that allows them shortern the deployment curve.

On the upgrade and patching strategy Sun actually documents this on bigadmin on which we have a patching portal we also have dedicated patching blog, well its actually the blog of Gerry Haskins but it amounts to the same thing.

If anyone out there wants to have me or one of my team talk more about this we are always more than happy to.

As for the rest of my time I've got a couple of "catchup" meetings with Customers have met many times before, you tend to build strong relationships with executive level IT staff in this job and those relationships are particularly useful when they have issues or concerns so its always good to stay in touch. these meetings also give you an opportunity to talk about product and futures as well.

One last thing, I'm not ignoring yesterdays announcement WRT Oracle, I'm just not commenting on it, if you want to read more than then click on the link, but other than that I'm going to keep well away from that topic in this blog.


Chris is the Senior Director of Solaris Revenue Product Engineering


« August 2016