Friday May 29, 2009

A Day in the Life: Kansas

A week ago I published a blog about my recent trip to Kansas, well I remembered the first time I went and a video the local team sent me, worth a view, it is such a great parody of working life in Kansas.

Thursday May 21, 2009

Virtualisation aka V12N

I seem to have a bunch of emails in my inbox this morning about V12N, first up is one about an Upcoming V12N Webinar on May 27th, entitled "New Ways to Maximize V12N Performance" (the official title uses Virtualisation as opposed to V12N, but it is the same thing). This one is free to register and is a joint session with AMD. Details can be found here.  I've also got a presentation to review about the upcoming release of VitualBox, plus I'm still in the middle of an LDoms discussion following my customer visit from last week.

Back to TechDays, St Petersburg

A couple of weeks ago I was sent an email following on from the TechDays in St Petersburg some 8 weeks ago, it included a link to the agenda with all the videos that were recorded of presentations, on the list was one of mine, specifically the OpenSolaris track keynote "What is OpenSolaris and Why Should You Care?". The page I was sent was in Russian, so with a bit of assistance from a colleague it is now linked below. We had some fun with the microphones in the keynote, like I broke two radio microphones and had to resort to a handheld microphone. That is what happens when you give a software guy hardware :-)

Wednesday May 20, 2009

What, no tornados?

Last week I was in Kansas customer visiting, bit of a convoluted route to get to Kansas City airport (MCI). Virgin from London Heathrow to New York JFK, then Delta onto Kansas City, 2 days in Kansas and then on from Kansas City to San Francisco on Midwest and finally home on the Friday night on Virgin to London Heathrow.

Kansas is remembered by many people for a number of things, but primarily the Wizard of Oz and being in what is known as "Tornado Alley". I'm told Kansas even has a Wizard of Oz museum. That said their is a lot more to Kansas than both of these, although it is funny how people fixate on things. This is the second time have been to Kansas and it prompted the usual bunch of sarcastic comments on my facebook page, and of course I had to explain to people that Kansas City is not in Kansas it is in Missouri.

All my daughter of course was interested in was the Wizard of Oz and tornados, I can confirm that I saw neither. The closest I got to a tornado was this sign at Kansas City airport. Last time I was in Kansas, like this time I got to eat some good meat although I have to say the first time at "The Savoy Grill", is some of the best steak I've had in my life.

Anyway onto more important things, why Kansas? well if you read my last posting you'll have noted I said customer visiting.

I first went to visit this customer some 8 months ago and they were not happy at the time (as you'll have seen me comment before this is primarily the reason I get "wheeled" into customers, to visit them when they are unhappy), as the engineering guy most on the hook for support of the customer experience it is very much part of the job.

Like a number of customers they have struggling with patching, update releases and the like and several months ago wanted to hear about what happened and what our plans were to fix it, this meeting was the followup.

Well I'm pleased to say one of the first words they said was, it has definitely improved, in fact it has got a lot better they said, on more than one occasion. So what in particular?

Well starting with Solaris Update 4 we've introduced a lot of technology in the install / patching space as well as improved "other" materials such as BigAdmin and training materials

- Deferred Activation Patching

- LU & Zones imporvements

- BigAdmin patching centre

- "-M" improvements

- Update on Attach

- Training and education materials, such as youtube, SLX Patch Channel and Sun Online Learning Center

The is all documented on BigAdmin and in the patching blog, one particular entry here is sums it up in a public presentation, if anyone reading this feels we've got some gaps in the training and education space feel free to drop me an email.

Coming in Solaris 10 Update 8 towards the end of this year:

- Parallel patching

- Turbo Packaging

We've got other project in the pipeline such as:

- Pre flight checks for patching

- Patch cluster install enhancements

- Changes to SunSolve to make it easier to locate patches

- Sparse file support for lu

- Re-write of the lu-copy code

Equally if you have projects you'd like us to look at in this space, the usual caveat applies that I cannot guarantee to deliver on them, but we are always open to feedback and suggestions

My point is and this was also the point made to me by the customer it is all about progress and one of the big things from their perspective was just that. Months ago I came and said we were going to do this and we have, we have executed on it and delivered. For those of you that experienced the infamous Solaris 10 Update 3 kernel patch 118833-36 and the consequences of that you'll know what I mean directly.

I also often get asked when will I be done with the work in this space, to which my answer is "never", why? people ask, simple, this is all about improving the customer experience and as long as Solaris 10 is around we will have work to do in this space. The fact that we have to do all of this was one of the driving factors behind the Image Packaging System in OpenSolaris.

The general theme of the meeting was continued progress and demonstration of that progress, and they really felt they had a voice and that voice was being listened to and they are right on both counts and we'd demonstrated progress as we said we would.

As well as this we talked about LDoms aka Logical Domains and particularly live migration, an upcoming feature in a future release of LDoms.

We also got into a discussion around dtrace and what to do, when you hit an issue as they did where they had some fibre-based kernel structures that were not defined in Solaris 10 (ctf), the result is DTrace errors out. So armed with a specific example I came back and spoke to one of my team whose reply is as follows when I asked him is this a valid limitation of DTrace? One of them who is currently working a particularly tough SNDR issue came back and said:

Yes and no, if there is no ctf data available you can define the structures in the DTrace script itself.  Case in point, SNDR was not built with ctf.

They would want to do something like this...

#!/usr/sbin/dtrace -s  

typedef uint64_t        nsc_size_t;
typedef uint64_t        nsc_off_t;

typedef struct rdc_aio_s {
        struct rdc_aio_s \*next;
        void \*handle;
        void \*qhandle;
        uint64_t pos;
        uint64_t qpos;
        uint64_t len;
        uint64_t orig_len;
        int     flag;
        int     iostatus;
        int     index;
        uint_t  seq;            /\* sequence on async Q \*/
} rdc_aio_t;


Thanks to Paul for his comprehensive response to my question, for which I can take no credit, as I cut n' pasted his email and as my team is fond of telling me your not supposed to be doing this kind of stuff anymore.

Overall a good week, a good a constructive dialogue, with no tornados either outside or in the meeting :-)

Now back to writing slides for CommunityOne...

Monday May 11, 2009


I've been to China, Beijing to be precise this week (OK last week when I posted this), its some 5 years since I was here, which is way too long, so many things have changed, Last time I came Beijing was a building site, this time building is still going on, but a lot less of it. The city and the infrastructure is so different from last time as a result of all that building.

The flight was on Air China, it is a Virgin codeshare flight.

I'm starting to write this blog in terminal 3 at Beijing airport, I won't get it finished here, I'll do that on the plane and post it when I get back home this evening. Its hard to comprehend the sheer scale of what was done in 4 years here, this article on the BBC website comes close. Then you think about how long terminal 5 @ Heathrow took, enough said.

Sun has an engineering centre in Beijing along with services (inc. a call centre) and sales. As well as presenting to the engineering teams, I also went customer visiting and spent time talking to the local sales team, they were particularly excited (still) about AmberRoad and the opportunities that is opening up. Customer response around the world on this product seems to be universal, it is a NetApp beater. The sales team in Beijing were particularly interested in the ISV opportunities available with AmberRoad, these are available for all to see here.

To an extent the scale of what was done in building the new airport @ Beijing is akin to what Sun did WRT AmberRoad and about as spectacular.

Like a lot of Sun product it is available via our try and buy programme.

A few things haven't changed in China, such as ringpulls on cans, remember the ones that come off the can? Well in China they still do as you can see.

Anyway back to more serious matters, customer visits the biggest topic here was Solaris 10, the roadmap, OpenSolaris and where Solaris is heading. Lots of excitment here about what we've just released in Update 7, where the roadmap is going and especially OpenSolaris. I covered most of this in my last blog entry so won't clutter up this one with that again.

So as expected I never got to finish this entry in China and since I forgot to save a local copy it is now Monday morning and I'm finishing this off @ LHR waiting for a flight to JFK.

The point is that vast majority of customers I talk to want a couple of things i) stability and ii) innovation, and we basically have that Solaris 10 gives you the stability, with limited new features, primarily those needed for new platform support and device driver support and OpenSolaris with innovations such as Crossbow, which at somepoint will become a Solaris.Next.

We also have that class of customer that would like all the stability that goes with the likes of Solaris 10 along with some of the major innovation that is in Solaris.Next. Sadly as those of you that have ever done software development well know, this is somewhat of a challenge to achieve. Changing 1M+ lines of code in a shipping product is high risk, to say the least, which is why we have a policy of focusing our update releases on platform support and device driver support, with very focused feature enhancements. Now as those of you that have ever spoken to me about this will know (and before I get a bunch of comments on this blog entry), we did not get the balance right in the earlier update releases to Solaris 10, which caused some stability issues, but that has been addressed.

Back to customer feedback, overwhelmingly positive, love Solaris, love the innovation, nice meetings to have. Especially when you get an opportunity to talk about the roadmap for Solaris at the same time. Got a different a far tougher meeting this week, same topic, despite loving Solaris, they've had a tough time of it recently and they want to hold the engineering guys feet to fire, that'll be me then. More about that later this week.

Friday May 01, 2009

San Francisco, Solaris 10 Update 7 and CTO reviews

I've been off on my travels again this week, for those that follow my blog you can see it now links into Tripit and facebook, as well as twitter and would have seen my updates.

Having a global role for a US corporation (if your interested in my bio click here) and a team spread around the world (the challenges of global engineering is one of those things I'll blog about at another time) means that I do spend a significant portion of my time in California.

So what about this week, well as I tend to on these trips I flew out on Sunday on the VS19 and fly back later today (well at least it is as I write this) on the VS20.

This week has really been about what are called CTO reviews although as always my time out in CA has been fully utilised, from early morning to late at night, including preparing for at least one presentation at the upcoming Community One West conference in early June.

So what are CTO reviews? these are 3 days where the Sun Systems business unit, headed up by John Fowler reviews all the ongoing projects, be they hardware or software, it matters not, every product in the Systems portfolio, which includes Solaris and all the other software products my team provides the sustaining engineering for gets reviewed in depth, including product roadmaps our for the next 24 months.

As part of that is a quality review of how the products are performing out in the world and most importantly from my perspective what is the customer experience and what lessons have we learnt or do we need to learn to drive up product quality and improve the customer exerience. Definitely a valuable exercise and allows me to ensure I keep product teams working on the latest new thing honest :-) So for those customers reading this be assured that the feedback I get does get passed into the new development teams and up the management chain.

Anyway onto the other part of this blog title Solaris 10 Update 7 RR'd on April 29th this week. Its also known as Solaris 10 5/09, at least that is what our marketing colleagues label it, although most people I know refer to it as the "7th update to Solaris 10", hence Solaris 10 Update 7, which is what we call it internally. I also often get asked what / how is a Solaris 10 update, that is yet another blog opportunity, I'm starting to build a list of these.

So what is so great about Solaris 10 Update 7?

The biggie in this is more from the Solaris and Intel collaboration, for those that read one of my previous blogs "Nehalem a Solaris and OpenSolaris perspective" a lot of what I'm going to mention below is the Solaris 10 implementation of this.

As I've mentioned previously Intel is the second-largest contributor to the development work taking place in Solaris community, be it OpenSolaris or Solaris 10 (after Sun itself).  This is where the next version of Solaris is being built, and much of the performance work for Intel processors and other hardware has been introduced into Solaris 10 updates as well, with more, although not all to follow.

This update includes the Power Aware Dispatcher - power management with a datacenter focus.  This includes support for Intel processor T-States -- the ability to throttle processor speed for cooling.  cpupm on by default, plus more aggressive management of P-State changes, only requiring 1 second of idle. These are significant for power management in datacenters.

CPU performance counters -- low-level instrumentation of Intel processors, allowing developers to tune their applications for best performance

Sun and Intel have also integrated Xeon processor hardware diagnostic features into the Solaris Predictive Self Healing framework, to provide greater reliability and resiliency.

Large Segment Offload -- the ability for Intel 10 gigabit network cards to handle processing of large network packet segments, resulting in better network throughput and less system loading.

So what else?

Internet Protocol Security (IPsec) is a suite of protocols for securing network communications.  In Solaris 10 5/09,  it has been integrated with Solaris Service Manager, allowing simplified management of overall security functions.  It now also supports UDP, and a new suite of algorithms.  It is also now usable as the interconnect for Solairs cluster, to manage fast, secure failover of session information among nodes of a cluster.

\* Secure Shell (ssh) performance enhancement on CMT systems

ssh can now use hardware crypto acceleration on "Niagara" based systems, using the  PKCS#11 engine.

\* Solaris Containers enhancements
Solaris Containers can now leverage the major speed and efficiency inherent in ZFS cloning of filesystems, by using this as the bases of container cloning.  Also, patches to container zones can now be easily backed out.

\* iSCSI target reliability and interoperability enhancements

\* Logical Domains (LDoms) enhancements
  \* Domain Services (DS) extensions for user program API
  \* Virtual disk enhancements (performance, extended VTOC large disk support)
  \* vnet and vsw now support jumbo frames
  \* libldom enhancements for sun4v root domains

\* SSD performance support

\* SunVTS updates to support diagnostics of new Sun systems

And, for those that like heading down into the real detail:

\* FMA Platform Independent topo enumeration for sun4v platforms
\* Support WWID based addressing of SAS, SATA devices
\* mpxio-capable disk support
\* RAID/ Raidctl enhancement for mpxio

Anyway enough rambling from me for now, need to catch a plane.

Tuesday Apr 21, 2009

New York City Customer Visiting

So one of the things I said I'd try and do was write "blogs on plane" (almost sounds like a movie title), can't post them until they give us internet access, those that had it pulled it when Boeing stopped the service, so it is one of the few disconnected times we all have, although cellphones will soon be allowed by various airlines later this year, at least if you believe the press.

As I'm writing this (well starting to write it) its 10.40am ish BST or 5.40am EDT, I'm on my way to New York's JFK airport for Customer meetings, left the house at 6.15am this morning to catch the VS3. We are due into JFK @ 12.10pm EDT, if you like real time updates on these things, you can see twitter, either subscribe to it or take a look at the twitter status in this blog.

I work an awful lot on airplanes, its great for catching up on email (in fact I often wonder how I stayed current on email before plane flights), by using thunderbirds offline facility and resync'ing when I get back to a connection, this is a 7hrs 40 flight, I reckon I'll work for 4hrs + of it, laptop on and headphones in connected to ipod. I also write presentations and now blogs :-) I know this re'syncing of email drives my staff "nuts" I hear coments like "you can tell when Chris has landed :-)" as the re-sync causes a slew of emails to land in inboxes. I know at least one other member of my staff that does this regularly, my director of operations, I'm also aware of others who have "caught the bug" including a number of members of my org around the world, at least those in my team have someone to "blame" they are just following the example from the boss.

On average I go to NYC 6 times a year for Customer meetings, by the very nature of my job most of them are when a Customer has had an experience they regard as sub-optimal and they want to see the engineering guy where the buck stops, that'll be me then. Although I do get the opportunity to go and evangelise Solaris as well.

I'm back on the VS2 from EWR on Wednesday night, leaves 8.50pm local arrives 9.10am local on Thursday, most of my NYC trips tend to be of this kind of length, although occasionally I get to spend a whole week and a couple of years ago I did actually manage a sightseeing trip with my wife, thinking about it that was 6 years ago for our 10th wedding anniversary!

As always I'm meeting up with the local account teams who work in NYC, mostly based out of our office on 101 Park Avenue. Plus three of my colleagues are flying in, two from the West Coast and one from just up the East Coast.

The beauty of trips like this is I get to do carry on, none of this checked baggage stuff, although by the time you add a laptop (Toshiba R600 on this trip to minimise weight and size), laptop, gadgets, charges for gadgets, gym kit and everything else, you can end up pushing the limit of carry on.

Sadly I can't talk about the specific names or too much of the Customer detail, but suffice to say like a lot of FSA accounts they have a long QA cycle and typically deploy @ most two software bundles a year, the business requirements mean that downtime is @ a premium, especially when these systems push trillions (yes trillions) of $ a day thru' them.

In this case they want to understand the how best to minimise the risks of outages from known defects (software has bugs its a fact of life) and what strategy Sun recommends WRT moving from Solaris 10 update release to update release as well as patching, as the owner of that strategy that definitely makes me the right person. In addition they want a better understanding of our QA processes and how we test Solaris releases including our application and hardware interoperability testing, hence my colleagues joining me on this visit.

Everything is risk / reward with these guys, they know they need to be current, but their internal processes can often mean that current for them is 6 months behind the curve because of these processes, something else we want to talk to them about, we need to jointly figure out a way that allows them shortern the deployment curve.

On the upgrade and patching strategy Sun actually documents this on bigadmin on which we have a patching portal we also have dedicated patching blog, well its actually the blog of Gerry Haskins but it amounts to the same thing.

If anyone out there wants to have me or one of my team talk more about this we are always more than happy to.

As for the rest of my time I've got a couple of "catchup" meetings with Customers have met many times before, you tend to build strong relationships with executive level IT staff in this job and those relationships are particularly useful when they have issues or concerns so its always good to stay in touch. these meetings also give you an opportunity to talk about product and futures as well.

One last thing, I'm not ignoring yesterdays announcement WRT Oracle, I'm just not commenting on it, if you want to read more than then click on the link, but other than that I'm going to keep well away from that topic in this blog.

Tuesday Apr 14, 2009

Nehalem a Solaris and OpenSolaris perspective

This morning Sun announced its latest generation of servers, built on Intel's Nehalem chip branded as Xeon 5500. For those of us within the engineering community we've had the chance to work with and watch development take place on these platforms as well as Nehalem over the last few years.

There will be many blogs from Sun this morning as well as the formal 10.30am webcast, will also be updated to include links to all the various platforms, to save you finding them, here they are:

What I'm not going to do is sit here an extol the virtues of the platforms (although they do warrant it), others will be far better at that than me, what I did want to talk about was the operating system that best exploits the advantages of Nehalem and that of course is Solaris be it Solaris 10 or OpenSolaris both are best placed to take advantages of the features in the Xeon 5500.

When Sun announced OpenSolaris 2008.11 back in November last year it was with full support for the Xeon 5500 already baked into the operating system, in fact I'd go as far as saying it was the first operating system to be optimized for and take advantage of these features. Intel themselves have been actively contributing to the OpenSolaris community, OpenSolaris has been one of the key elements in the success of the relationship with Intel and the development of the platforms based on the Xeon 5500.

So what did we deliver for the Xeon 5500:

In fact I was talking about a number of these in my keynote for the OpenSolaris track at the TechDays in St Petersburg just last week, see my previous entry.

So how does this technology translate into the real world and what does the customer see?

Unparalleled power efficiency is one, the fact that we support all p-states in the latest Intel chips and we added deep c-state power management, what the later means is that we have the ability to drop un-used codes to lowest power states. Building on this is the Solaris unique power aware dispatcher, the picture explains it a lot better than I ever could in words.

Onto PowerTOP it is designed to easily identify power wasting applications and therefore help you balance performance and energy use. PowerTOP leverages the DTrace technology introduced in Solaris 10 and allows us to provide a degree of observability that most vendors can only dream of. Finally it allows developers and administrators to view TurboMode in real time.

Then of course we have Solaris FMA that allows you to detect and automatically diagnose errors such as offline failing memory and workaround failed cores / CPUs to run in a degraded but reliable state, all of which helps to maximise your system uptime.

Of course none of this kind of technology is new to Solaris for years Solaris has been optimized to take advantage of large memory and multi-core / cpu / thread systems and especially with ZFS underneath you get enterprise data scaling with the performance of dedicated proprietary storage systems all in an operating system you can go and download now.

So for those of you lucky enough to already have access to Xeon 5500 machines why not give Solaris 10 or OpenSolaris 2008.11 and these latest machines a run, you won't be disappointed. As some of the benchmarks show.

As always my team will be the ones providing the sustaining engineering on all the elements of operating system on these platforms (both Solaris and OpenSolaris).

Thursday Apr 09, 2009

Where to start?

Blogging is one of those things I keep talking about doing and despite having created something basic well over year ago, in fact I only got as far as the template creation, I never got any further.

Twitter and facebook I've got working, especially using facebook as free way to post photos :-) oh and having twitter update my facebook status, SMS and JavaFX apps are wonderful things.

I'm sitting here in St Petersburg at the Sun TechDays 2009 having finally (well I will if I post this) getting round to writing my first blog.

I've not actually decided what I'm going to write about in this one, other than some ramblings, a couple of links to web pages and maybe a few photos. Like the one below of the pavilion area here in St Petersburg (click on the photo to see the rest of the photos I've uploaded so far onto facebook).

Oh and also a little bit more about how I was persuaded to start this, well pushed or made to feel a little bit guilty about not writing a blog, not least of which by admin Joy Marshall who has been blogging for over a year as well of two of my colleagues Bob Porras and Lynn Rohrer. Bob actually said to me your happy to play around in the kernel but won't blog, what is that all about?

OK guys so your guilt trip worked and here I am attempting my first blog.

Once I figured out how to setup the editor to not require html, uploading photos and linking out to webpages was not too painful, even adding bookmarks on the RHS and changing some of the settings, still have not figured out how to upload a photo of me yet, but I'm sure someone will embaress me and tell me it is really simple (OK Bob Porras just did, which prompted the comment from Simon Ritter who is sitting next to me in the speaker room here in SPB of "How any executives does it take to write a blog?")

So what do I do @ Sun, well I run the team that provides the sustaining engineering for all the products in the Solaris portfolio including Cluster and OpenSolaris as well as Solaris 10 and earlier versions to name a few.

We also sustain the software and firmware stack underlying the Sun Storage 7000 series platform aka Amber Road.

I traveled some 250,000 miles last calender year and visited many Sun customers as well as spending time with my global team which is spread across the world; India, Czech Republic, France, Germany, UK and many locations the US. My plan is to spend the time I have on those long plane flights writing at least the txt for blogs, unless someone happens to know a tool where I can put together a StarOffice document and just upload that.

Based on the last 12 months experience I'll have lots of customer experiences to write up (minus the names of course) and I'll try and include some of other things my team is working on that I can share.

This assumes I'll do a better job than I did the first time I wrote a starting entry back in 2007 and actually post this one.


Chris is the Senior Director of Solaris Revenue Product Engineering


« July 2016