Tuesday Jul 22, 2008

Video: OpenSolaris Tape Enhancements

I spoke to the June 2008 meeting of FROSUG on OpenSolaris Tape enhancements.

11:30 mins

iPod version also available

Monday Jul 14, 2008


Scoring in Lacrosse is the glamour in the game, but what about the setup for a score?  That’s where the hard work is done.  The feeders and off-ball movement, whether it be a mid-fielder or an attack are what setup the plays.  Sometimes the feeders get an assist, sometimes not, but always the biggest part of a goal is the setup.  With all plays, there is a primary outlet and a secondary outlet. (See video.)

You can never predict how the defense will react, so you need a secondary outlet for scoring plays in case the primary outlet is covered and cannot get open.

Setting up a SAN (Storage Area Network) is no different.  It’s mandatory to have primary and secondary paths to your disk drives to protect against path failures.  This has been a defacto standard in SANs since the late 90’s, and has become a standard component in all of the premier OSs (Operating Systems) on the market today.  In the industry it’s called multi-pathing software, for OpenSolaris it’s a component called MPxIO.  Most storage companies have a significant investment here, too, as this signified a differentiator in the early days, but now most storage purchasers use what comes packaged in the OS.

So that’s great for disks, but what about tape devices?  Pretty significant storage component on the SAN, right?  In most cases - not an option.  Why?  It’s very difficult to do and requires coordination with tape applications, the OS and tape devices.

Tape devices are sequential; when things are going well (on writes) there are three basic things happening: 

  • I/O is flowing without interruption to the tape device
  • The I/Os are filling the buffer on the tape device and keeping the buffer full
  • The actual tape is running at full speed with the buffer spilling onto it in the right location(s)

When things go badly, though, there’s a lot to be done:

  • Validate that you have an alternate good path to the tape device
  • Determine the last good write to the tape device and its proper location on tape
  • Set up the I/O stream on the host to start after the last good write, making sure the order is preserved
  • Reposition the tape to the correct point to restart the I/O stream
  • Restart the I/O stream, loading the buffer again
  • Start the tape movement and buffer spilling onto it

Today marks a banner day: build 93 of Open Solaris contains multi-pathing for tape.  Generic tape multi-pathing.  That’s right, the developers created a methodology that doesn’t require a special tape application, protocol or tape drives to provide the support.  Brilliant!!

So, how does it work?  Well, let’s go back to the lineage of development: 

  • ST Logical Block Addressing – The first thing to do was to start using an absolute position instead of a relative one.  So inside the tape driver (ST – SCSI Tape),  instead of using file/block (count of files from the beginning of the tape partition and the block within a particular file – relative positioning) a conversion has been made to use logical block addressing (count of any entity recorded from the beginning of the tape partition – absolute positioning) all the time.  This was added in build 69 of OpenSolaris.
  • ST Command Error Recovery – Dependent upon the SCSI command type for the tape device (Read, Write, etc.), the tape driver keeps track of the last command and expected position.  When an error occurs, the driver asks the tape device where it is on tape (The LBA – Logical Block Addressing).  Dependent upon the command type and position, the tape driver determines whether to resend the command or re-position and re-issue.  This was added in build 80 of OpenSolaris.
  • Multi-pathing – Once the above was added, then tape devices could be added under the control of MPxIO in Solaris.  This means that upon an I/O error, the ST Command Error Recovery procedure is used, and if the error is path-related, an alternate path is used.  This was the last phase and just added in build 93 of OpenSolaris.

But wait, there’s more:

  • The architecture of MPxIO is such that the driver is located below the tape driver (in the driver stack), and as a result, multiple paths to a tape device are not seen by tape applications.  For example, two paths to a single tape device in OpenSolaris will now show up as one tape device.  All re-routing and path management is handled behind the scenes.  This allows any tape application to use this feature.  No special handshake with the OS or tape driver required.
  • By adding tape multi-pathing, it eliminates the reliance upon protocol error recovery.  The retries and recovery are protocol-independent, so you don’t need Fibre Channel FCP 2 error recovery or iSCSI ERL 1 or 2 in your protocol stack to add resiliency to your tape support.
  • Supports all tape devices provided they support:
  • MultiP bit or TPGS bit in Inquiry command and;
  • SAN Connectivity and;
  • Page 83 with type 1 support (binary WWN info) or;
  • Special VID/PID in MPxIO (for legacy drives)

Wow, so what is next?  A whole new setup and set of plays with tape.  We’ve added single path asymmetrical multi-pathing support, but as we build out a portfolio on this, you can probably guess where we are headed next.  Tape support will be better in Solaris than any other OS on a SAN.  Any SAN - you pick the connectors, we’ll provide the rest.

Oh, and by the way, a little thing called “Tape Self-Identification” was added in build 78 of OpenSolaris.  This allows automatic pickup and configuration of tape drives without hand-editing .conf files or releasing patches with new tape drive additions.  A revolutionary way to support tape drives – the tape drive tells the OS how to configure it and what it is capable of.  All with standard SCSI commands.

Looks to me like the setup for tape got a great deal easier in OpenSolaris with a lot more options.  You can bet there will some high scores with this new set of plays.  Double hat-trick!

Friday Feb 15, 2008

Open Season

The STK 5800 has open-sourced the code base to three different open source communities:


This includes all of the source code as well as the XAM VIM associated with the XAM interface that enables the standard interface for the appliance.  It doesn’t include the GUI or hardware-specific code, but the guts of the box and the client library code are included.

What can you do with this? 

You can run it, modify it, and contemplate the future of storage that is a very different paradigm from what we know today:

  • Reliable – Long-term archive, calculated to have a mean time to data loss of over 2 million years!
  • Scaleable – Add boxes on the fly without losing performance, in fact you are adding processing nodes, with each additional node sharing the overall task load by design.
  • Manageable – No system administration tasks necessary when adding new cells, no provisioning, no zoning, etc.
  • Native metadata and query capable – Built-in database allowing on-board query of metadata with user-defined schema, no separate database or compute power required.
  • Built on OpenSolaris

Try it, it’s a new play that matches well with the amount of fixed content that is coming our way.

Score and hat trick.

Saturday Feb 02, 2008

Managing the Game

Spring is almost here and it’s time to get in shape and get ready for the season.  If you don’t get your base established, you can’t manage the game when you need to.  

Speaking of managing the game, let’s talk a bit about how storage management is achieve in Solaris.  This has not been the strongest play for Sun in the past, but the mindset and software have shifted over time. Solaris is in great shape, with several projects just finished, several that are imminent, and future investments that will pay large dividends.

Now, a good management scheme is only as good as its base.  If you have complex software with many knobs, it’s very difficult to manage this complexity in management software. Management starts with development and user interface design. Similarly, there can’t be disparate management stacks to manage similar hardware components unless you are talking about fast-moving components such as disk drives, host bus adapters and the like. In other words: you have to consider the whole system experience, not just point products, but be realistic and add value when it matters.

There is also a difference between element management and distributed management. Element management means managing a single component, typically one directly attached to a host.  Examples of element management in Solaris are fcinfo, which uses the FC HBA API. Distributed management is when you manage several components through one host. An example here would be Sun StorageTek Operations Manager Software, which provides Storage Area Network (SAN) management.  This software discovers, visualizes, monitors, and provisions complex multi-vendor storage environments from a single console.  

Standards can also play a big role in management by establishing an API for either element management or distributed management. This can pay huge dividends by offering:

•    Information Independence
•    Interoperability
•    Vendor Choice
•    Easier ISV qualification
•    Simple universal administration
•    Easier migration
•    Agnostic attach to applications, hardware, etc.
•    Quicker time to market development/enhancement

The element APIs generally start in the kernel.  These are important building blocks for larger scale management applications.  A few of these useful APIs available in Open Solaris are:

•    IMA – iSCSI Management API
•    FC HBA API – Fibre Channel HBA API (Soon to be SM HBA API)
•    MMA – Multipath Management API

To manage larger SANs, the prominent protocols use the Common Information Model  which allows each resource to be instrumented in a common way, yet extended to cover the complete functionality of the resource.  The CIM-XML protocol has been around for a number of years and is widely deployed, while the WS-Management is another protocol just now being deployed.  Larger SAN management is achieved in many ways, but typically involves a CIMOM and, more recently, one with support for the Storage Management Initiative Specification at its base.  

Our management strategy is to provide component management based on standards where possible, but also to bring in the Open Pegasus CIMOM, which comes with SMI-S.  This involves adding providers to our current APIs (IMA, FC HBA and HDR), thus populating the SMI-S schema.  This will allow choice for our customers through a standard interface that many distributed applications can use,  including some of our own products…

No score yet, the season hasn’t started, but I think we have our base.  Once the season starts, I’ll talk through a couple of targeted management schemes we have in place that have and will providing scoring opportunities.

Sunday Jun 10, 2007


I coach lacrosse now. Well, assistant-coach lacrosse. I happen to be paired with a really good coach – Dave Devine (all-Ivy, Cornell Defensemen) who has been playing or coaching lacrosse for the better part of the past 40 years. Thank god – he makes up for my ineptness.

My son plays lacrosse. He is different from me: fast, agile, learning quickly. I had to really work at being on the team, I don’t remember anything coming that easily to me.

I did have one advantage: I could see and anticipate the game better than most. I wasn’t very fast, but by anticipating moves and understanding other players' skills and habits, I was able to improve my game. Life-long habit, I still do that today (and feel that I need to). It won me starting special teams (man-down) and 2nd line duties during college.

We played our last game today. Among other things, I noticed that the young players didn’t pick up on the “telegraphing” – when opposing players indicate where they are passing or moving to during the development of a play. When you begin to pick up on the “telegraphs,” your game play rises a couple of notches. This is when you can get a golden steal and run it down for a fast-break and score.

So that is what this article is all about: telegraphing.

Solaris is concentrating on becoming the next storage platform. I’ve hinted about this, and now it’s time to come clean. We’ve seriously invested, and now we’re taking it up to the next level.

Jeff Bonwick talks about Storage running on general-purpose Solaris, and he’s right. We’ve got a feature-rich environment, with more features on the way. With recent additions to Open Solaris, the picture gets even better:

In this post we'll talk about some of the features at a high level. In future posts we'll dive deep to provide more detail.


In terms of the larger management picture, we will be replacing our current CIMOM with something more current and reflective of the open source effort. This in turn provides a framework for vendors to plug in to, to get specific information relevant to the storage platform through SMI-S. We will also continue the effort to support CLI's and API's relevant to storage underlying these providers, as well as creating more GUI's to help users complete their tasks.


We will continue evolving transport stacks on the host/server, using SCSI, SAS, FC, iSCSI and iSER as the primary transports. iSER will be completed in FY08. In addition there are pNFS, CIF’s/NFS, Shared QFS and Honeycomb clients.


Framework in play here - Currently supporting iSCSI target mode, we are moving to a framework that will allow multiple protocols to operate across the interface. Expect Fibre Channel and iSER to be added to the interface, and that the backend will handle traditional block traffic as well as Objects in FY08.

Configuration Management

iSNS Server is an industry standard which allows automated discovery, management and configuration of iSCSI. This serves a very similar purpose as the fibre channel switch manager. This project has been done in the open and should be ready in early FY08.

File Systems

Wow. Pick your favorite – ZFS, QFS or UFS. Expect to have these choices \*in\* Solaris.

Data Services

Recently added to Open Solaris, AVS will be integrated into Solaris this month. Offering replication services with a multitude of RPO and RTO settings, this is a powerful addition.

Also included here is the HSM product. Currently supplied by SAM, this will be improved to provide a file-system agnostic interface (Currently called ADM - Automatic Data Migration (the name could change)). These are located at the Target/Server level in the stack.

Backend Storage Systems

This encompasses what you can find on the Solaris servers today – multipathing and support of all protocol stacks needed for your backend disks. Expect more enhancements in MPxIO and open-sourcing of this driver soon.

Whew!! I’d say that’s a double-hat-trick.

Thursday Mar 15, 2007

Multipathing in Solaris

Long game today.

I’ve had the unique opportunity at Sun to work on items that were not in the mainstream, For several years we had a team of engineers working on multipathing (MP) drivers on non-Sun OS’s to support our storage. A key to storage sales is the ability to sell your storage with Solaris, of course, but also on other host platforms too – customers want storage to plug and play with any OS they may have.

The experience was great for us, we pushed all sorts of process boundaries inside of Sun several years ago to provide Sun MP drivers on Linux, Windows, HPUX and AIX, but just when all things came together, we began to really see what the industry was doing with MP – each OS for the most part was including a framework for MP embedded in the OS. Windows has the DSM framework, HPUX uses PVLinks, etc. Even Linux had mdadm which provides some basic failover. No cost for these embedded drivers either (at least not above and beyond the OS/HW).

But to add storage still requires intimate knowledge of the storage device. This means code changes and typically kernel code changes, especially related to asymmetric arrays - real code development and a several months of test and bug-fix. A measure of success in the industry in terms of MP support is also how many arrays, especially the popular ones are supported by any MP driver. In addition, the ability to do unique things with an array such as load-balancing and performance improvements are key.

A few years back, we took the Solaris multipathing driver and began a Open initiative supporting 3rd party arrays. As we went through the grind of supporting these arrays – writing specific failover operations, testing and releasing support it became very clear that to support the industry this was going to become a full-time job and tie up a significant amount of developer time.

The industry noticed this too and developed a standard called Target Portal Group access States ( TPGS – section 5.8). This is a wonderful specification that allows automatic pickup and support of any array (asymmetric or symmetric) for multipathing. It requires an investment in your multipathing driver one time.

In Solaris, we’ve invested a great deal into this standard and recently have started working with vendors that are implementing the standard as well as our own Sun devices (you’ll note that the iSCSI target includes this support). The specification is much more explicit on determining the actual state of each path and requires no guesswork on the host side. The array can let the host know which paths are optimal for traffic and which are not, which paths are on standby and which are unavailable. More information than the host has known in the past to provide better decision-making when error occurs (fail-over) or when load-balancing decisions need to be made.

So, you want to get your array supported quickly on Solaris? Implement TPGS and no changes are needed. Shoot. Score!!

Friday Feb 02, 2007

iSCSI target is in S10U4!!

It's baaaack!! Thanks to good old-fashioned hard work, the iSCSI Target has been integrated into S10U4. What does this mean? Start thinking about Solaris as firmware - you've now got a target!! Lot's of possibilities now with Solaris running as your firmware. You have the flexibility of an entire OS to bring in data services, other backend connection protocols, etc. This changes the face of Solaris - it's not just for Servers anymore!! Shoot. Score!!

AVS is Open-Sourced!!

On ground-hog day? Why not!!

A hidden gem in our software portfolio, AVS provide block-based remote replication and point-in-time copy as well as a very interesting filter driver framework in the kernel. Check it out!

What can you use this with?

- Any Filesystem (UFS, QFS, ZFS, etc.)
- Any Volume manager (SVM, VxVM)
- Any Database (Oracle, Sybase, etc)
- Any raw device (JBOD, RAID, LOFI, etc.)
- Any block-based protocol or storage (DAS, SAN, iSCSI, etc.)
- Any Application (data, CD images, etc)
- S10 x86 or SPARC platform

Where is it used today?
- Standalone
- Bundled with the Netra HA product
- Qualified with GEOCluster

All sorts of possibilities exist - DR/Hotswap site setup, Data on disk migration to new hardware, the introduction of filter driver (Encryption? - The sky is the limit). Coupled with iSCSI Target this makes a nice piece of storage with fully functioning remote mirroring and point in time copy capabilities. Help us think of other cool combinations/additions. Suggestions welcome - invent opportunities!!

Shoot. Score!! The goalie was no where near that one.

Saturday Jan 27, 2007

Sun Products

So what are the products we do? Lots.

Fibre Channel - The original SAN stack in Solaris. Now very mature and with the help of the 2 top FC HBA vendors in the world, this is a world-class enterprise stack. Open Sourced January 2006.

MPxIO - The built-in failover driver in Solaris. Most of this is in closed source and we're working on getting this out very soon in a more open manner. This is top priority for us because we need the help of the open source community to help us with ongoing array support and load balancing algorithms. Stay posted for more information on this very, very soon.

iSCSI - Both the initiator and the target as well as iSNS server. Open-sourced and a more recent activity is to add the target to S10U4. Working with and taking advice from he community. Great work and more to come from these product.

Disk, Tape and SES drivers - These drivers handle all sorts of activities associated with managing disk, tape and SCSI enclosure targets. The drivers have been around for some time and were included in the original open sourcing of Solaris.

AVS - Availability Suite a product coming soon to open source and also integrating into a Solaris release. The product performs block-based Point in Time Copy as well as Remote Replication. It's well-established and has been in production for many years. It's also on the cusp of open-sourcing, the proposal has been accepted and we are very close to posting. Be looking for this soon!!

UI for the SAM/Q filesystem and archiving products - Nice interface into a very robust product makes administration and setup easy.

UI for 58xx - Coming soon in the next release of this product, we're working on this now.

Shoot. Score!



« August 2016