Tuesday Jul 22, 2008

Video: OpenSolaris Tape Enhancements

I spoke to the June 2008 meeting of FROSUG on OpenSolaris Tape enhancements.

11:30 mins

iPod version also available

Monday Jul 14, 2008


Scoring in Lacrosse is the glamour in the game, but what about the setup for a score?  That’s where the hard work is done.  The feeders and off-ball movement, whether it be a mid-fielder or an attack are what setup the plays.  Sometimes the feeders get an assist, sometimes not, but always the biggest part of a goal is the setup.  With all plays, there is a primary outlet and a secondary outlet. (See video.)

You can never predict how the defense will react, so you need a secondary outlet for scoring plays in case the primary outlet is covered and cannot get open.

Setting up a SAN (Storage Area Network) is no different.  It’s mandatory to have primary and secondary paths to your disk drives to protect against path failures.  This has been a defacto standard in SANs since the late 90’s, and has become a standard component in all of the premier OSs (Operating Systems) on the market today.  In the industry it’s called multi-pathing software, for OpenSolaris it’s a component called MPxIO.  Most storage companies have a significant investment here, too, as this signified a differentiator in the early days, but now most storage purchasers use what comes packaged in the OS.

So that’s great for disks, but what about tape devices?  Pretty significant storage component on the SAN, right?  In most cases - not an option.  Why?  It’s very difficult to do and requires coordination with tape applications, the OS and tape devices.

Tape devices are sequential; when things are going well (on writes) there are three basic things happening: 

  • I/O is flowing without interruption to the tape device
  • The I/Os are filling the buffer on the tape device and keeping the buffer full
  • The actual tape is running at full speed with the buffer spilling onto it in the right location(s)

When things go badly, though, there’s a lot to be done:

  • Validate that you have an alternate good path to the tape device
  • Determine the last good write to the tape device and its proper location on tape
  • Set up the I/O stream on the host to start after the last good write, making sure the order is preserved
  • Reposition the tape to the correct point to restart the I/O stream
  • Restart the I/O stream, loading the buffer again
  • Start the tape movement and buffer spilling onto it

Today marks a banner day: build 93 of Open Solaris contains multi-pathing for tape.  Generic tape multi-pathing.  That’s right, the developers created a methodology that doesn’t require a special tape application, protocol or tape drives to provide the support.  Brilliant!!

So, how does it work?  Well, let’s go back to the lineage of development: 

  • ST Logical Block Addressing – The first thing to do was to start using an absolute position instead of a relative one.  So inside the tape driver (ST – SCSI Tape),  instead of using file/block (count of files from the beginning of the tape partition and the block within a particular file – relative positioning) a conversion has been made to use logical block addressing (count of any entity recorded from the beginning of the tape partition – absolute positioning) all the time.  This was added in build 69 of OpenSolaris.
  • ST Command Error Recovery – Dependent upon the SCSI command type for the tape device (Read, Write, etc.), the tape driver keeps track of the last command and expected position.  When an error occurs, the driver asks the tape device where it is on tape (The LBA – Logical Block Addressing).  Dependent upon the command type and position, the tape driver determines whether to resend the command or re-position and re-issue.  This was added in build 80 of OpenSolaris.
  • Multi-pathing – Once the above was added, then tape devices could be added under the control of MPxIO in Solaris.  This means that upon an I/O error, the ST Command Error Recovery procedure is used, and if the error is path-related, an alternate path is used.  This was the last phase and just added in build 93 of OpenSolaris.

But wait, there’s more:

  • The architecture of MPxIO is such that the driver is located below the tape driver (in the driver stack), and as a result, multiple paths to a tape device are not seen by tape applications.  For example, two paths to a single tape device in OpenSolaris will now show up as one tape device.  All re-routing and path management is handled behind the scenes.  This allows any tape application to use this feature.  No special handshake with the OS or tape driver required.
  • By adding tape multi-pathing, it eliminates the reliance upon protocol error recovery.  The retries and recovery are protocol-independent, so you don’t need Fibre Channel FCP 2 error recovery or iSCSI ERL 1 or 2 in your protocol stack to add resiliency to your tape support.
  • Supports all tape devices provided they support:
  • MultiP bit or TPGS bit in Inquiry command and;
  • SAN Connectivity and;
  • Page 83 with type 1 support (binary WWN info) or;
  • Special VID/PID in MPxIO (for legacy drives)

Wow, so what is next?  A whole new setup and set of plays with tape.  We’ve added single path asymmetrical multi-pathing support, but as we build out a portfolio on this, you can probably guess where we are headed next.  Tape support will be better in Solaris than any other OS on a SAN.  Any SAN - you pick the connectors, we’ll provide the rest.

Oh, and by the way, a little thing called “Tape Self-Identification” was added in build 78 of OpenSolaris.  This allows automatic pickup and configuration of tape drives without hand-editing .conf files or releasing patches with new tape drive additions.  A revolutionary way to support tape drives – the tape drive tells the OS how to configure it and what it is capable of.  All with standard SCSI commands.

Looks to me like the setup for tape got a great deal easier in OpenSolaris with a lot more options.  You can bet there will some high scores with this new set of plays.  Double hat-trick!

Friday Jun 06, 2008


In lacrosse, it’s important to be versatile.   For attack, it’s important to be able to move both ways, left or right, using either hand to pass and catch.  It’s normal to favor one hand or direction over another, but the good players can use both.  

For me, I was a right-handed player with little offense on my left hand.  This limited my ability to score or pass when split-second plays developed and the ball was in my off hand.  My son and I practice this a great deal in the backyard, making sure he can shoot and pass with either hand – it shows, he averages a couple of goals and assists a game.

Storage Software is no different - it’s rare when the tools come together to allow ease of development and configuration for multiple products.  A recent addition (Build 90) to allows multiple products, protocols and device types to be supported by any Solaris Server with a common framework - COMSTAR.

COMSTAR is software that allows any Solaris-based server to become a block-based storage device.  The Acronym stands for COmmon Multiprotocol SCSI TARget.  The project is the world’s first open source enterprise-class target framework.  The framework allows all SCSI device types (tape, disk, SES, etc.) connected to any transport (Fibre Channel, iSCSI, iSER, SAS, FCoE, etc.) with concurrent access to all LUNs (Logical Unit Numbers) and a single point of management.

The COMSTAR project concepts in of itself are not revolutionary, block-based storage is prevalent on the market today, what is revolutionary is that the software allows Solaris to be used as microcode using ZFS as the backing store file-system, using common off the shelf components to build a storage array.  A key objective of COMSTAR is to provide a simple framework for users to add transport protocols and device types to build new block storage devices.  This allows the user to quickly start adding new features to differentiate these new storage devices without spending time on the fundamental building blocks.

This allows any block storage device to be built from one common framework.  No other commercially available operating system allows this type of flexibility or coordination.  There are in fact block-based targets in other OS’s, but each are built independently which doesn’t allow for

•    Ease of maintenance – Fixing common bugs or adding RFE’s in one place
•    Concurrent access – Using different transports (FC, iSCSI, iSER, etc.) accessing a common LUN

Additionally, COMSTAR allows for different device types to be defined and provided by the plug-in architecture surrounding logical units or LU Providers.  This allows quick block-based support for Disk, Tape – any SCSI device type.  In this release, direct access support is included, but feel free to invent and contribute your own!

Versatility - that’s what it’s all about.  Reducing cost of ownership, increasing time to market and creating multiple products.   COMSTAR is ready to support your block-based needs.  Power your next storage products with COMSTAR.

Shoot, score, hat-trick!!

Monday Jun 02, 2008


All parents are biased. You can’t help yourself. A couple of recent events really highlight this for me. My son scored a hat-trick the last game of the season, his first. My daughter was valedictorian of her high-school class, her first. My son is also smart and my daughter also athletic, but what amazes me is the drive each has had this year in achieving their goals. I had little to do with this besides poking fun at them, making them feed the animals, wake-up in the morning, etc. The normal Dad stuff.

My wife on the other hand makes sure that the kids do what they say they are going to do. She measures them, encourages the right behavior and makes sure they are always on an even playing field. When things don’t go well, she gets help in a hurry and doesn’t delay. The kids owe a great deal of their success to her.

So, in my proud moment, I realize I’m thankful for the entire entourage, each playing a part in the success of each other. Hats off to you guys.

Oh, but we can’t end without a LAX shot. Here’s one of my son:


Friday Feb 15, 2008

Open Season

The STK 5800 has open-sourced the code base to three different open source communities:


This includes all of the source code as well as the XAM VIM associated with the XAM interface that enables the standard interface for the appliance.  It doesn’t include the GUI or hardware-specific code, but the guts of the box and the client library code are included.

What can you do with this? 

You can run it, modify it, and contemplate the future of storage that is a very different paradigm from what we know today:

  • Reliable – Long-term archive, calculated to have a mean time to data loss of over 2 million years!
  • Scaleable – Add boxes on the fly without losing performance, in fact you are adding processing nodes, with each additional node sharing the overall task load by design.
  • Manageable – No system administration tasks necessary when adding new cells, no provisioning, no zoning, etc.
  • Native metadata and query capable – Built-in database allowing on-board query of metadata with user-defined schema, no separate database or compute power required.
  • Built on OpenSolaris

Try it, it’s a new play that matches well with the amount of fixed content that is coming our way.

Score and hat trick.

Saturday Feb 02, 2008

Managing the Game

Spring is almost here and it’s time to get in shape and get ready for the season.  If you don’t get your base established, you can’t manage the game when you need to.  

Speaking of managing the game, let’s talk a bit about how storage management is achieve in Solaris.  This has not been the strongest play for Sun in the past, but the mindset and software have shifted over time. Solaris is in great shape, with several projects just finished, several that are imminent, and future investments that will pay large dividends.

Now, a good management scheme is only as good as its base.  If you have complex software with many knobs, it’s very difficult to manage this complexity in management software. Management starts with development and user interface design. Similarly, there can’t be disparate management stacks to manage similar hardware components unless you are talking about fast-moving components such as disk drives, host bus adapters and the like. In other words: you have to consider the whole system experience, not just point products, but be realistic and add value when it matters.

There is also a difference between element management and distributed management. Element management means managing a single component, typically one directly attached to a host.  Examples of element management in Solaris are fcinfo, which uses the FC HBA API. Distributed management is when you manage several components through one host. An example here would be Sun StorageTek Operations Manager Software, which provides Storage Area Network (SAN) management.  This software discovers, visualizes, monitors, and provisions complex multi-vendor storage environments from a single console.  

Standards can also play a big role in management by establishing an API for either element management or distributed management. This can pay huge dividends by offering:

•    Information Independence
•    Interoperability
•    Vendor Choice
•    Easier ISV qualification
•    Simple universal administration
•    Easier migration
•    Agnostic attach to applications, hardware, etc.
•    Quicker time to market development/enhancement

The element APIs generally start in the kernel.  These are important building blocks for larger scale management applications.  A few of these useful APIs available in Open Solaris are:

•    IMA – iSCSI Management API
•    FC HBA API – Fibre Channel HBA API (Soon to be SM HBA API)
•    MMA – Multipath Management API

To manage larger SANs, the prominent protocols use the Common Information Model  which allows each resource to be instrumented in a common way, yet extended to cover the complete functionality of the resource.  The CIM-XML protocol has been around for a number of years and is widely deployed, while the WS-Management is another protocol just now being deployed.  Larger SAN management is achieved in many ways, but typically involves a CIMOM and, more recently, one with support for the Storage Management Initiative Specification at its base.  

Our management strategy is to provide component management based on standards where possible, but also to bring in the Open Pegasus CIMOM, which comes with SMI-S.  This involves adding providers to our current APIs (IMA, FC HBA and HDR), thus populating the SMI-S schema.  This will allow choice for our customers through a standard interface that many distributed applications can use,  including some of our own products…

No score yet, the season hasn’t started, but I think we have our base.  Once the season starts, I’ll talk through a couple of targeted management schemes we have in place that have and will providing scoring opportunities.

Tuesday Jul 10, 2007

Penalty - Pushing

OK. Flags all over the field. My son participates in Web 2.0? Without telling his dad? Does he know what I do? What company I work for?

Apparently not. And he (and his cousin) posted this a month ago. Jeesh.

Intense Whiffle Ball

But that's the way of Web 2.0, we don't know we're participating, but the world around us is changing and to keep up you have to. And guess what, it's not a requirement that you tell your dad (er coach) when you do it.

Just don't do anything stupid....

My son? 2 minutes in the box. Man down!

Sunday Jun 10, 2007


I coach lacrosse now. Well, assistant-coach lacrosse. I happen to be paired with a really good coach – Dave Devine (all-Ivy, Cornell Defensemen) who has been playing or coaching lacrosse for the better part of the past 40 years. Thank god – he makes up for my ineptness.

My son plays lacrosse. He is different from me: fast, agile, learning quickly. I had to really work at being on the team, I don’t remember anything coming that easily to me.

I did have one advantage: I could see and anticipate the game better than most. I wasn’t very fast, but by anticipating moves and understanding other players' skills and habits, I was able to improve my game. Life-long habit, I still do that today (and feel that I need to). It won me starting special teams (man-down) and 2nd line duties during college.

We played our last game today. Among other things, I noticed that the young players didn’t pick up on the “telegraphing” – when opposing players indicate where they are passing or moving to during the development of a play. When you begin to pick up on the “telegraphs,” your game play rises a couple of notches. This is when you can get a golden steal and run it down for a fast-break and score.

So that is what this article is all about: telegraphing.

Solaris is concentrating on becoming the next storage platform. I’ve hinted about this, and now it’s time to come clean. We’ve seriously invested, and now we’re taking it up to the next level.

Jeff Bonwick talks about Storage running on general-purpose Solaris, and he’s right. We’ve got a feature-rich environment, with more features on the way. With recent additions to Open Solaris, the picture gets even better:

In this post we'll talk about some of the features at a high level. In future posts we'll dive deep to provide more detail.


In terms of the larger management picture, we will be replacing our current CIMOM with something more current and reflective of the open source effort. This in turn provides a framework for vendors to plug in to, to get specific information relevant to the storage platform through SMI-S. We will also continue the effort to support CLI's and API's relevant to storage underlying these providers, as well as creating more GUI's to help users complete their tasks.


We will continue evolving transport stacks on the host/server, using SCSI, SAS, FC, iSCSI and iSER as the primary transports. iSER will be completed in FY08. In addition there are pNFS, CIF’s/NFS, Shared QFS and Honeycomb clients.


Framework in play here - Currently supporting iSCSI target mode, we are moving to a framework that will allow multiple protocols to operate across the interface. Expect Fibre Channel and iSER to be added to the interface, and that the backend will handle traditional block traffic as well as Objects in FY08.

Configuration Management

iSNS Server is an industry standard which allows automated discovery, management and configuration of iSCSI. This serves a very similar purpose as the fibre channel switch manager. This project has been done in the open and should be ready in early FY08.

File Systems

Wow. Pick your favorite – ZFS, QFS or UFS. Expect to have these choices \*in\* Solaris.

Data Services

Recently added to Open Solaris, AVS will be integrated into Solaris this month. Offering replication services with a multitude of RPO and RTO settings, this is a powerful addition.

Also included here is the HSM product. Currently supplied by SAM, this will be improved to provide a file-system agnostic interface (Currently called ADM - Automatic Data Migration (the name could change)). These are located at the Target/Server level in the stack.

Backend Storage Systems

This encompasses what you can find on the Solaris servers today – multipathing and support of all protocol stacks needed for your backend disks. Expect more enhancements in MPxIO and open-sourcing of this driver soon.

Whew!! I’d say that’s a double-hat-trick.

Wednesday May 16, 2007



Biking has become a big part of my life. I started in 1999 riding with my 1982 Raleigh $150 garage-sale bike and I was hooked. I’ve trained more and more over the years, bought better bikes and gadgets but the reality is that it keeps my weight and general health in check and it’s a great socialization function – a great deal of my friends and family bike. I call my health club membership. One thing I have found interesting is the human reaction to forcing functions – meaning when there is an event to participate in, whether it be a non-profit fund-raiser or a real pro bike race, all people tend to train for the event. Of course when there is not an event, little training occurs. I’ve been in both predicaments.

To avoid this, I sign up for at least 2 races every year: The Triple Bypass and the Bob Cook Memorial Classic. Both are difficult rides, so you really have no choice but to get ready for them. Typically, I start training in January with indoor trainer rides and then by February start getting outdoors – weather permitting. This year was especially tough since we had sooo much snow (picture of our chicken coop and barn):

But now May has hit, I’m averaging around 150 miles per week and life is good.

I’ve learned a great deal over the years. Mostly that I’m not that fast, but I love the longer, harder rides. There is a feeling of accomplishment when you are done (or maybe the endorphins are just kicking in). The longer, the more elevation, the better and still better yet - If you do it with a buddy, it's the greatest. I can't for the life of me figure out why so much pain and suffering is so much fun.

I’ve also learned that I am a gadget freak. My polar 720i watch keeps track of my heartrate, speed, distance, elevation and even transmits this data to the computer. Both my bikes have wireless sensors mounted on them to track my every move. Here’s last years start on the triple bypass over the 1st pass - squaw:

Pretty cool, huh. I mull over this data all the time.

Point of this blog – I love biking, it’s healthy and I’m adding more data to the pile that we currently can’t keep track of – it’s good to be close to storage and great to be working for sun.

2nd half face-off – Solaris as the Storage platform. That should be some real action, get ready...

Thursday Mar 15, 2007

Multipathing in Solaris

Long game today.

I’ve had the unique opportunity at Sun to work on items that were not in the mainstream, For several years we had a team of engineers working on multipathing (MP) drivers on non-Sun OS’s to support our storage. A key to storage sales is the ability to sell your storage with Solaris, of course, but also on other host platforms too – customers want storage to plug and play with any OS they may have.

The experience was great for us, we pushed all sorts of process boundaries inside of Sun several years ago to provide Sun MP drivers on Linux, Windows, HPUX and AIX, but just when all things came together, we began to really see what the industry was doing with MP – each OS for the most part was including a framework for MP embedded in the OS. Windows has the DSM framework, HPUX uses PVLinks, etc. Even Linux had mdadm which provides some basic failover. No cost for these embedded drivers either (at least not above and beyond the OS/HW).

But to add storage still requires intimate knowledge of the storage device. This means code changes and typically kernel code changes, especially related to asymmetric arrays - real code development and a several months of test and bug-fix. A measure of success in the industry in terms of MP support is also how many arrays, especially the popular ones are supported by any MP driver. In addition, the ability to do unique things with an array such as load-balancing and performance improvements are key.

A few years back, we took the Solaris multipathing driver and began a Open initiative supporting 3rd party arrays. As we went through the grind of supporting these arrays – writing specific failover operations, testing and releasing support it became very clear that to support the industry this was going to become a full-time job and tie up a significant amount of developer time.

The industry noticed this too and developed a standard called Target Portal Group access States ( TPGS – section 5.8). This is a wonderful specification that allows automatic pickup and support of any array (asymmetric or symmetric) for multipathing. It requires an investment in your multipathing driver one time.

In Solaris, we’ve invested a great deal into this standard and recently have started working with vendors that are implementing the standard as well as our own Sun devices (you’ll note that the iSCSI target includes this support). The specification is much more explicit on determining the actual state of each path and requires no guesswork on the host side. The array can let the host know which paths are optimal for traffic and which are not, which paths are on standby and which are unavailable. More information than the host has known in the past to provide better decision-making when error occurs (fail-over) or when load-balancing decisions need to be made.

So, you want to get your array supported quickly on Solaris? Implement TPGS and no changes are needed. Shoot. Score!!

Friday Feb 02, 2007

iSCSI target is in S10U4!!

It's baaaack!! Thanks to good old-fashioned hard work, the iSCSI Target has been integrated into S10U4. What does this mean? Start thinking about Solaris as firmware - you've now got a target!! Lot's of possibilities now with Solaris running as your firmware. You have the flexibility of an entire OS to bring in data services, other backend connection protocols, etc. This changes the face of Solaris - it's not just for Servers anymore!! Shoot. Score!!

AVS is Open-Sourced!!

On ground-hog day? Why not!!

A hidden gem in our software portfolio, AVS provide block-based remote replication and point-in-time copy as well as a very interesting filter driver framework in the kernel. Check it out!

What can you use this with?

- Any Filesystem (UFS, QFS, ZFS, etc.)
- Any Volume manager (SVM, VxVM)
- Any Database (Oracle, Sybase, etc)
- Any raw device (JBOD, RAID, LOFI, etc.)
- Any block-based protocol or storage (DAS, SAN, iSCSI, etc.)
- Any Application (data, CD images, etc)
- S10 x86 or SPARC platform

Where is it used today?
- Standalone
- Bundled with the Netra HA product
- Qualified with GEOCluster

All sorts of possibilities exist - DR/Hotswap site setup, Data on disk migration to new hardware, the introduction of filter driver (Encryption? - The sky is the limit). Coupled with iSCSI Target this makes a nice piece of storage with fully functioning remote mirroring and point in time copy capabilities. Help us think of other cool combinations/additions. Suggestions welcome - invent opportunities!!

Shoot. Score!! The goalie was no where near that one.

Saturday Jan 27, 2007


My favorite food. Very addictive and at certain times, cravings just take over and you gotta do what you gotta do. My fav here in Denver (yes, that's where I live) is Sushi Zanmai. I do make my own and it is amazingly simple to do, given the right fish (My brother-in-law and I go get this in Denver from a place guaranteed for 24 hours or less old fish, we buy in bulk and freeze). The rice is also key, but this is art, not science - presentation emerges as the third key element and I don't always do well with that. At least it tastes good.

Shoot, score, hat trick.

Sun Products

So what are the products we do? Lots.

Fibre Channel - The original SAN stack in Solaris. Now very mature and with the help of the 2 top FC HBA vendors in the world, this is a world-class enterprise stack. Open Sourced January 2006.

MPxIO - The built-in failover driver in Solaris. Most of this is in closed source and we're working on getting this out very soon in a more open manner. This is top priority for us because we need the help of the open source community to help us with ongoing array support and load balancing algorithms. Stay posted for more information on this very, very soon.

iSCSI - Both the initiator and the target as well as iSNS server. Open-sourced and a more recent activity is to add the target to S10U4. Working with and taking advice from he community. Great work and more to come from these product.

Disk, Tape and SES drivers - These drivers handle all sorts of activities associated with managing disk, tape and SCSI enclosure targets. The drivers have been around for some time and were included in the original open sourcing of Solaris.

AVS - Availability Suite a product coming soon to open source and also integrating into a Solaris release. The product performs block-based Point in Time Copy as well as Remote Replication. It's well-established and has been in production for many years. It's also on the cusp of open-sourcing, the proposal has been accepted and we are very close to posting. Be looking for this soon!!

UI for the SAM/Q filesystem and archiving products - Nice interface into a very robust product makes administration and setup easy.

UI for 58xx - Coming soon in the next release of this product, we're working on this now.

Shoot. Score!


SAN: Manage Global team at Sun doing SAN protocol stacks and interfaces for File and Archiving products.

SUSHI: Love it, favorite food, especially in CA, started in '99 (with my favorite cynic - Julian Taylor, now ex-Sun) and now it's a once-a-week affair.

SPORTS: Skiing, Roadbiking any outdoor sport. I especially like ultra endurance activities. More on that in another blog.

Quickstick? You guys figure it out. Nuff Said. Score!




« August 2016