Wednesday Mar 10, 2010

Sun Storage 7000 Hardware Provider for Microsoft VSS

The release of 2010.Q1 for Sun Storage 7000 Appliances brings a wide variety of new features to the appliance family, including an increased number of ways appliances are integrated directly into storage ecosystems. In addition to the Sun Storage 7000 Management Plug-in for Oracle Enterprise Manager 10g Grid Controller 1.0, Q1 2010 brings integration with Microsoft Volume Shadow Copy Service (VSS) through the Sun Storage 7000 Provider for VSS Software 1.0.

For those that have a Microsoft Windows operating environment and are not familiar with Microsoft VSS, it is basically a framework that facilitates creating fast and consistent snapshots of volumes through the coordination of the applications doing backup-type operations (readers, also known as a requestor) and the applications that are writing to volumes (writers). In layman's terms, this facility built into the Microsoft Windows Platforms gives the ability for an application like Symantec NetBackup Advanced Client to take a snapshot of a volume that is being actively used by an application like Oracle Database or Microsoft Exchange. The reader notifies the framework that it would like to take a snapshot. The framework notifies applications that they need to coalesce. The applications complete their coalescing and the framework tells the reader to go. When complete, the framework releases the application to continue writing. An abstract illustration of the environment is shown here:

The whole process of coalescing and taking a snapshot of a volume can take only a few seconds (more depending on how complex the coalesce and snapshot operations are).

For efficiency, storage appliances (like Sun Storage 7000 Appliances) create a "Hardware Provider", also known as a "VSS Block Target Provider". The Hardware Provider takes over the snapshot and clone operations based on the GUID of a SCSI LUN (in our case for Release 1.0, this is iSCSI only, Fibre Channel is not supported).

Installation of the Sun Storage 7000 Provider for VSS is through the InstallShield package downloaded from the Oracle Corp. Download Site (this site contains the Sun domain ID if you notice those types of things). The installation process should be straightforward and results in a new folder accessible from the "Start" menu (Start -> All Programs -> Sun Microsystems, Inc. -> Sun Storage 7000 Provider for VSS). In this folder are two entries: a README.txt (required reading ... seriously) and the Sun Storage 7000 Provider for VSS ConfigUtil. This latter entry (the ConfigUtil) is also placed on the desktop for quick access.

To verify the hardware provider is registered with the Microsoft VSS Framework, find a command prompt and type "vssadmin list providers". As shown in the following screen capture, this results in a printout of registered providers that will include entry for the Sun Storage 7000 Provider for VSS. The Version number you will see in an original installation is 1.0.10.

This means that backup application, such as Symantec NetBackup Advanced Client, can set up policies that leverage the hardware provider for fast snapshots. There is one more setup operation that must be completed before the policy will successfully complete. The Sun Storage 7000 Provider for VSS uses a combination of "Management Path" and "Data Path" operations to achieve snapshot and clones of iSCSI Volumes.

The "Management Path" credentials to any Sun Storage 7000 Appliances supplying iSCSI LUNs to the Microsoft Windows client must be entered through the "Sun Storage 7000 Provider for VSS ConfigUtil". From the Microsoft Windows platform that is consuming the iSCSI LUNs and has the Hardware Provider installed, open the ConfigUtil (form the Start Menu or the Desktop). Use the DNS names of Sun Storage 7000 Appliances or IP Addresses (I prefer IP Addresses), enter each Sun Storage 7000 Appliance Management Path along with User ID and Passwords given to you by the storage administrator for each appliance. Tap the shoulder of the Storage Administrator and remind them you could use a non-root User ID as guided by the README.txt in the download. Most access rights can be removed from the User ID though they do need to have a role that facilitates snapshot creation and clone creation for the shares that you access. Use the template in the README.txt for a starting point.

This screen capture shows the entry of credentials:

Once the credentials are entered properly, access from applications is seamless.

Look for specific documentation on using Hardware Providers with the specific applications that you use on the Microsoft Windows platform. Because VSS is a framework, you may have products that utilize the framework that we did not specifically test in our labs. The README.txt in the download contains a list of applications that we have run using the provider.


The Sun Storage 7000 Provider for VSS Software 1.0 is a Hardware Provider that plugs into the Microsoft VSS Framework on Microsoft Windows 2003 and 2008, 64-bit or 32-bit variations. Using the provider, backup applications and other requestors can make snapshots and clones directly on Sun Storage 7000 Appliances for iSCSI LUNs consumed by the Microsoft Windows Client on the system that the Hardware Provider is installed on. The installation of the provider is quick and you should verify that it was registered with the Microsoft VSS Framework. You must then enter User ID and Password information for each target Sun Storage 7000 Appliance. No further intervention with the Microsoft VSS Framework is necessary from that point forward and the primary work you'll do is configure your Backup Applications to make use of the Hardware Provider through Backup Policies.

For additional reading, use the following resources:

Monitoring the Sun Storage 7000 Appliance from Oracle Grid Control

Over the past few months I've blogged on various monitoring and alerting topics for Sun Storage 7000 Appliances. Besides my favorite of the blogs (Tweeting your Sun Storage 7000 Appliance Alerts), the culmination of this monitoring work is now available as the Sun Storage 7000 Management Plug-in for Oracle Enterprise Manager 10g Grid Controller 1.0, for use with the just shipped 2010.Q1 software release for the Sun Storage 7000 Appliance Family. Phew, that's a bit of a mouthful for a title but I'll just refer to it as the SS7000MPfOEMGC, does that help? Well, maybe not ;-)

Sun Storage 7000 Management Plug-in for Oracle Enterprise Manager 10g Grid Controller creates a coupling between the enterprise-wide monitoring provided by Oracle Grid Control and the monitoring and analytics provided by Sun Storage 7000 Appliances. If you are not familiar with Oracle Grid Control, there is a nice write-up within the Installation and Configuration Guide for Oracle Grid Control. In a nutshell, Oracle Grid Control aids in monitoring your vertical data center rather than simply being an aggregation of horizontal health information. The documentation describes it as software and the infrastructure it runs on but I would simply call it a "Vertical Data Center Monitoring Environment".

The goal of the plug-in to Oracle Grid Control is to facilitate a Database Administrator in their use of Sun Storage 7000 Appliances without attempting to reproduce the world-class analytics available within Sun Storage 7000 Appliances. In other words, the goal is to create a bridge between the world of Database Administration and the world of Storage Administration with just enough information so the two worlds can have dialog about the environment. Specifically, the Plug-in for Sun Storage 7000 Appliances is targeted at the following tasks:

  • Connecting Database deployments with Sun Storage 7000 resources that provide storage infrastructure to the database
  • Understanding the performance metrics of a Database from the perspective of the Appliance (what cache resources are being used for a database, what network resources and the performance being delivered, and how various storage abstractions are being used by the database)
  • Providing a Federated view of Sun Storage 7000 Appliances deployed in the environment (including storage profiles and capacities, network information and general accounting information about the appliances)
  • Providing detailed performance metrics for use in initial service delivery diagnostics (these metrics are used to have more detailed conversations with the Storage Administrator when additional diagnostics are required)

Let's take a look at one of the more interesting scenarios as a simple way of showing the plug-in at work rather than reproducing the entire Installation Guide in blog-form.

Download the Plug-in for Sun Storage 7000 Appliances, Unzip the downloaded file, and read the Installation Guide included with the plug-in.

Follow the instructions for installing, deploying the plug-in to agents and adding instances of Sun Storage 7000 Appliances to the environment for monitoring. Each instance added takes about 60 minutes to fully populate with information (this is simply the nature of this being a polling environment and the plug-in is set-up to monitor data sets that don't change often less frequently ... 60 minutes ... than data sets that do change frequently ... 10 minute intervals).

Once data is funneling in, all of the standard appliance-centric views of the information are available (including the individual metrics that the plug-in collects) as well as a view of some of the important high-level information presented on the home page for an instance (provided you are using Oracle Grid Control Here is a view of a single appliance instance's home page:

Looking into the Metrics collected for an appliance brings you to a standard displays of single metrics (as shown below) or tables of related metrics (all standard navigation in Oracle Grid Controller for plug-in components).

Included in the plug-in for Sun Storage 7000 Appliances are 5 reports. Of these reports, 3 run against a single instance of a Sun Storage 7000 Appliance and are available from both the context of the single instance and the Oracle Grid Control Reports Tab while 2 run against all monitored instances of Sun Storage 7000 Appliances and are only available from the Reports Tab. Among the 5 Reports are 2 that combine information about Databases deployed against NFS mount points and Sun Storage 7000 Appliances that export those NFS mount points. The two reports are:

  • Database to Appliance Mapping Report - Viewable from a single target instance or the Reports Tab, this report shows databases deployed against NFS shares from a single Sun Storage 7000 Target Instance
  • Federated Database to Appliance Mapping Report - Viewable only from the Reports Tab, this report shows databases deployed against NFS shares from all monitored Sun Storage 7000 Appliances

Looking at the "Master" (top-level) Database to Appliance Mapping Report (shown below) you will see a "Filter" (allowing you to scope the information in the table to a single Database SID) and a table that correlates the filtered Database SID to Network File Systems shared by specific appliances along with the Storage IP Address that the share is accessed through, the appliance's Storage Network Interface and the name that the appliance is referred to as throughout this Grid Control instance.

From the Master report, 4 additional links are provided to more detailed information that is filtered to the appliance abstraction that is used by the Database SID. The links in the columns navigate in the following way:

  • Database Link - This link takes the viewer to a drill-down table that shows all of the files deployed on the shares identified in the first table. With this detail report, and administrator could see exactly what files are deployed where. The table also contains the three links identified next.
  • Network File System - Takes the viewer down to a detailed report showing metadata about the share created on the appliance, how the cache is used (ARC and L2ARC) for this share and general capacity information for the share.
  • Storage IP Address - Takes the viewer to the Metric Details that relate to the appliance configuration (serial number, model, etc...).
  • Storage Network Interface - Takes the viewer to metadata about the network interface as well as reports on the Network Interface KB/sec and NFS Operations Per Second (combined with the NFS Operations Per Second that are allocated to serving the share that the database resides on)

The detail reports for the Network File System and Storage Network Interface (both of which are not directly accessible from the Reports Tab) use a combination of current metrics and graphical time-axis data, as shown in the following report:

Wherever applicable, the Detail Reports drill further into Metric Details (that could also be accessed through an appliance instance target home page).

It is important to note that several of these reports combine a substantial amount of data into a single page. This approach can create rather lengthy report generation times (in worst case scenarios up to 5 minutes). It is always possible to view individual metrics through the monitoring home page. As metric navigation is much more focused and relates to a single metric, metric navigation always performs faster and is preferred unless the viewer is looking for a more complex assembly of information. With the reports, an administrator can view network performance and storage performance side by side which may be more helpful in diagnosing service delivery issues than navigating through single metric data points.

In addition to a substantial number of collected metrics there are several alerts that are generated on various appliance thresholds that can occur throughout the operation of target appliances.


Oracle Grid Control gives a fully integrated view of the "Vertical" data center, combining software infrastructure with hardware infrastructure (including storage appliances). Sun Storage 7000 Management Plug-in for Oracle Enterprise Manager 10g Grid Controller 1.0 presents Sun Storage 7000 Appliances within the vertical context and presents metrics and reports tailored specifically towards Sun Storage 7000 Appliances as viewed by a Database Administrator. For more information on the plug-in and software discussed in this entry:

Thursday Dec 17, 2009

Tweeting your Sun Storage 7000 Appliance Alerts

Twitter, Instant Messages, Mobile Alerts have always fascinated me. I truly believe that a Storage Administrator should not have to leave the comfort of their iPhone, Droid or Palm Pre to do 90% of their day to day management tasks. As I was scanning the headlines of I saw this great article on Tweeting from Command Line using Python.

So, leading into how to manage a Sun Storage 7000 Appliance using Alerts (the next article in my series) I thought I would take some time and adapt this script to tweet my received Sun Storage 7000 Appliance Traps. I am going to use the AK MIB traps (to be explained in more detail in the next article) to achieve this.

Writing the Python Trap Handler

First, create the trap handler (this is based on the Python Script presented in the Blog Article: Tweeting from Command Line using Python).

Here is the Python Script:

import sys

from os import popen

def tweet(user,password,message):
print 'Hold on there %s....Your message %s is getting posted....' % (message, user)

url = ''

curl = '/usr/dist/local/sei/tools/SunOS-sparc/curl -s -u %s:%s -d status="%s" %s' % (user,password,message,url)

pipe = popen(curl, 'r')
print 'Done...awesome'

if __name__ == '__main__':
host = sys.stdin.readline()
ip = sys.stdin.readline()
uptime = sys.stdin.readline()
uuid = sys.stdin.readline()
alertclass = sys.stdin.readline()
alertcount = sys.stdin.readline()
alerttype = sys.stdin.readline()
alertseverity = sys.stdin.readline()
alertsresponse = sys.stdin.readline()
messageArray = [host,ip,alerttype]
message = t.join(messageArray)
message = message[0:140]

user = "yourtwitter" #put your username inside these quotes

password = "yourpassword" #put your password inside these quotes

You will have to make the following changes at a minimum

  • Re-insert the missing tabs based on Python formatting
  • Ensure the path to CURL is appropriate
  • Change the user and password variables to your Twitter account

Once that is done you should be set.

Adding the Trap Handler to SNMP

Next, set up your snmptrapd.conf to handle traps from the AK MIB by invoking the Python Script above. My /etc/sma/snmp/snmptrapd.conf looks something like this:

traphandle . /export/home/oracle/

The OID . identifies the sunAkTraps portion of the AK-MIB delivered with the Sun Storage 7000 Appliance.

Now, invoke snmptrapd using the above configuration file (there are many ways to do this but I am doing the way I know will pick up my config file :-)

/usr/sfw/sbin/snmptrapd -c /etc/sma/snmp/snmptrapd.conf -P

Sending Alerts from the Sun Storage 7000

Using the article I posted yesterday, ensure SNMP is enabled with a trapsink identifying the system where your trap receiver is running. Now we have to enable an alert to be sent via SNMP from your Sun Storage 7000 Appliance (this is different from the default Fault Management Traps I discussed yesterday).

For now, trust me on this, I will explain more in my next article, let's enable a simple ARC size threshold to be violated. Go into the Browser User Interface for your system (or the simulator) and go into the Configuration -> Alerts screen. Click through to the "Thresholds" and add one that you know will be violated, like this one:

Each alert that is sent gets posted to the twitter account identified within the Python Script! And as my friends quickly noted, from there it can go to your Facebook account where you can quickly declare "King of the Lab"!

Thanks Sandip for your inspiring post this morning :-)

Thursday Jun 26, 2008

Fun with the FMA and SNMP

I've been working back in the Auto Service Request space for the past few weeks and doing some education and stuff as we build and test this product or that, it seemed like a great opportunity to put out a quick and dirty "SNMP with the FMA" blog post. There are some excellent previous materials out there that are extremely applicable and you could probably get by with, so first I will point those out:

With those articles and a little help from your nearby SNMP guru, you are pretty much good to go. I've extended the information in those papers a tiny, tiny bit with how to link the information back to Sun's message repository. The great thing about the FMA is that we have "special sauce" around that can create a full-service cycle (including a Remote Shell application that service can use with your permission and oversight to further diagnose problems from a remote location as well as tools to deliver faults back to Sun Microsystems, including our own Common Array Manager).


This guides a user through a 2 system setup with

  • system1 - enabled to send traps when a fault is detected by the FMA
  • system 2 - enabled to receive traps from system1

Throughout this write-up, a roughly equivalent FMA command is given that can be run on the originating host so you can follow what the SNMP trap is delivering. To me, the FMA command on the originating host is definitely the preference though, since it usually provides additional information and formatting that may not be available in the SNMP trap or walk.

Setup of Trap Sender

system1 must be setup to deliver traps when FMA events occur.

To do this, follow the instructions provided at A Louder Voice for the Fault Manager, summarized here:

  • Create the file /etc/sma/snmp/snmp.conf (if it doesn't exist) and add the line "mibs +ALL"
  • Add the line to /etc/sma/snmp/snmpd.conf: trap2sink system2
  • Add the additional line to support deeper information query: dlmod sunFM /usr/lib/fm/amd64/

Assuming you have made no other changes prior to this, your system should be ready to go. You do have to restart the SNMP service at this point (I always reboot...its a Windows habit).

Setup of Trap Receiver

system2 must be setup to receive traps, this is simple for demo purposes:

  • Run /usr/sfw/sbin/snmptrapd -P
  • Watch for traps

Receiving FMA traps

When system2 receives FMA traps from system1, they will look like this (formatting appropriately rearranged)

DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1190862249) 137 days, 19:57:02.49
SNMPv2-MIB::snmpTrapOID.0 = OID: SUN-FM-MIB::sunFmProblemTrap
SUN-FM-MIB::sunFmProblemUUID."33570acb-e108-4ca8-8276-c67aeecf2043" = STRING: "33570acb-e108-4ca8-8276-c67aeecf2043"
SUN-FM-MIB::sunFmProblemCode."33570acb-e108-4ca8-8276-c67aeecf2043" = STRING: ZFS-8000-CS
SUN-FM-MIB::sunFmProblemURL."33570acb-e108-4ca8-8276-c67aeecf2043" = STRING:

Check out that URL in the sunFmProblemURL, You can actually go there and get the System Administrator Actions that should be taken as well as an extended view of the problem (not contextualized to your view, but this is the general fault information that you would see if you were on the system itself).

This trap is roughly equivalent to the information you would receive from running the
basic fmdump command on the system with the fault. You could also run the "fmdump -v -u 33570acb-e108-4ca8-8276-c67aeecf2043" command on the trap originator to get a bit more information:

bash-3.00# fmdump -v -u 33570acb-e108-4ca8-8276-c67aeecf2043
Jun 26 10:57:45.3602 33570acb-e108-4ca8-8276-c67aeecf2043 ZFS-8000-CS
100% fault.fs.zfs.pool

Problem in: zfs://pool=data
Affects: zfs://pool=data
FRU: -
Location: -

Navigating Problems on the Remote System

Let's start at the top now. We can quickly navigate all known, unresolved problems on the remote system by walking the sunFmProblemTable:

system2:/root-> /usr/sfw/bin/snmpwalk -v2c -c public -t 20 system1 sunFmProblemTable

This results in a dump of all problems with the UUIDs that can be used for deeper queries. The following printout is only a single problem on the system, additional problems are listed in blocks by the attribute (so all sunFmProblemUUIDs are lumped together followed by all ProblemCodes, and so on).

SUN-FM-MIB::sunFmProblemUUID."0f3dcdf3-f85b-c091-8f1f-ce2164976cda" = STRING: "0f3dcdf3-f85b-c091-8f1f-ce2164976cda"
SUN-FM-MIB::sunFmProblemCode."0f3dcdf3-f85b-c091-8f1f-ce2164976cda" = STRING: ZFS-8000-CS
SUN-FM-MIB::sunFmProblemURL."0f3dcdf3-f85b-c091-8f1f-ce2164976cda" = STRING:
SUN-FM-MIB::sunFmProblemDiagEngine."0f3dcdf3-f85b-c091-8f1f-ce2164976cda" = STRING: fmd:///module/fmd
SUN-FM-MIB::sunFmProblemDiagTime."0f3dcdf3-f85b-c091-8f1f-ce2164976cda" = STRING: 2008-6-18,10:7:4.0,-6:0
SUN-FM-MIB::sunFmProblemSuspectCount."0f3dcdf3-f85b-c091-8f1f-ce2164976cda" = Gauge32: 1

This is roughly equivalent to the fmdump command on a system, though with the basic fmdump only the UUID, Code and a MSG-ID are given.

Based on this information, we can look at the ZFS-8000-CS message at and determine what our next steps should be. It indicates that using the zpool status -x on the system with the fault would be useful. Going to the originating system and running it returns:

bash-3.00# zpool status -x
pool: data
state: UNAVAIL
status: One or more devices could not be opened. There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
scrub: none requested

data UNAVAIL 0 0 0 insufficient replicas
mirror UNAVAIL 0 0 0 insufficient replicas
c2t0d0 UNAVAIL 0 0 0 cannot open
c3t1d0 UNAVAIL 0 0 0 cannot open
mirror UNAVAIL 0 0 0 insufficient replicas
c2t2d0 UNAVAIL 0 0 0 cannot open
c3t3d0 UNAVAIL 0 0 0 cannot open
mirror UNAVAIL 0 0 0 insufficient replicas
c2t4d0 UNAVAIL 0 0 0 cannot open
c3t5d0 UNAVAIL 0 0 0 cannot open
mirror UNAVAIL 0 0 0 insufficient replicas
c2t6d0 UNAVAIL 0 0 0 cannot open
c3t7d0 UNAVAIL 0 0 0 cannot open
mirror UNAVAIL 0 0 0 insufficient replicas
c2t8d0 UNAVAIL 0 0 0 cannot open
c3t9d0 UNAVAIL 0 0 0 cannot open
mirror UNAVAIL 0 0 0 insufficient replicas
c2t10d0 UNAVAIL 0 0 0 cannot open
c3t11d0 UNAVAIL 0 0 0 cannot open
mirror UNAVAIL 0 0 0 insufficient replicas
c2t12d0 UNAVAIL 0 0 0 cannot open
c3t13d0 UNAVAIL 0 0 0 cannot open
mirror UNAVAIL 0 0 0 insufficient replicas
c2t14d0 UNAVAIL 0 0 0 cannot open
c3t15d0 UNAVAIL 0 0 0 cannot open
mirror UNAVAIL 0 0 0 insufficient replicas
c2t16d0 UNAVAIL 0 0 0 cannot open
c3t17d0 UNAVAIL 0 0 0 cannot open
mirror UNAVAIL 0 0 0 insufficient replicas
c2t18d0 UNAVAIL 0 0 0 cannot open
c3t19d0 UNAVAIL 0 0 0 cannot open
mirror UNAVAIL 0 0 0 insufficient replicas
c2t20d0 UNAVAIL 0 0 0 cannot open
c3t21d0 UNAVAIL 0 0 0 cannot open
mirror UNAVAIL 0 0 0 insufficient replicas
c2t22d0 UNAVAIL 0 0 0 cannot open
c3t23d0 UNAVAIL 0 0 0 cannot open

For some history on this particular problem, we disconnected a JBOD that had the "data" pool built, so none of the devices are available...ouch.

You can look more deeply at the events that resulted in the problem by walking the sunFmFaultEventTable:

system2:/root-> /usr/sfw/bin/snmpwalk -v2c -c public -t 20 system1 sunFmFaultEventTable
SUN-FM-MIB::sunFmFaultEventProblemUUID."0f3dcdf3-f85b-c091-8f1f-ce2164976cda".1 = STRING: "0f3dcdf3-f85b-c091-8f1f-ce2164976cda"
SUN-FM-MIB::sunFmFaultEventProblemUUID."33570acb-e108-4ca8-8276-c67aeecf2043".1 = STRING: "33570acb-e108-4ca8-8276-c67aeecf2043"
SUN-FM-MIB::sunFmFaultEventProblemUUID."3600a05e-acc1-cae2-c185-f50852156777".1 = STRING: "3600a05e-acc1-cae2-c185-f50852156777"
SUN-FM-MIB::sunFmFaultEventProblemUUID."97bfeb63-7b02-c2b6-c51f-c451a9f760c5".1 = STRING: "97bfeb63-7b02-c2b6-c51f-c451a9f760c5"
SUN-FM-MIB::sunFmFaultEventClass."0f3dcdf3-f85b-c091-8f1f-ce2164976cda".1 = STRING: fault.fs.zfs.pool
SUN-FM-MIB::sunFmFaultEventClass."33570acb-e108-4ca8-8276-c67aeecf2043".1 = STRING: fault.fs.zfs.pool
SUN-FM-MIB::sunFmFaultEventClass."3600a05e-acc1-cae2-c185-f50852156777".1 = STRING: fault.fs.zfs.pool
SUN-FM-MIB::sunFmFaultEventClass."97bfeb63-7b02-c2b6-c51f-c451a9f760c5".1 = STRING: fault.fs.zfs.pool
SUN-FM-MIB::sunFmFaultEventCertainty."0f3dcdf3-f85b-c091-8f1f-ce2164976cda".1 = Gauge32: 100
SUN-FM-MIB::sunFmFaultEventCertainty."33570acb-e108-4ca8-8276-c67aeecf2043".1 = Gauge32: 100
SUN-FM-MIB::sunFmFaultEventCertainty."3600a05e-acc1-cae2-c185-f50852156777".1 = Gauge32: 100
SUN-FM-MIB::sunFmFaultEventCertainty."97bfeb63-7b02-c2b6-c51f-c451a9f760c5".1 = Gauge32: 100
SUN-FM-MIB::sunFmFaultEventASRU."0f3dcdf3-f85b-c091-8f1f-ce2164976cda".1 = STRING: zfs://pool=1ca09fa50e7ca8c7
SUN-FM-MIB::sunFmFaultEventASRU."33570acb-e108-4ca8-8276-c67aeecf2043".1 = STRING: zfs://pool=data
SUN-FM-MIB::sunFmFaultEventASRU."3600a05e-acc1-cae2-c185-f50852156777".1 = STRING: zfs://pool=6f658a6c4b99b18b
SUN-FM-MIB::sunFmFaultEventASRU."97bfeb63-7b02-c2b6-c51f-c451a9f760c5".1 = STRING: zfs://pool=data
SUN-FM-MIB::sunFmFaultEventFRU."0f3dcdf3-f85b-c091-8f1f-ce2164976cda".1 = STRING: -
SUN-FM-MIB::sunFmFaultEventFRU."33570acb-e108-4ca8-8276-c67aeecf2043".1 = STRING: -
SUN-FM-MIB::sunFmFaultEventFRU."3600a05e-acc1-cae2-c185-f50852156777".1 = STRING: -
SUN-FM-MIB::sunFmFaultEventFRU."97bfeb63-7b02-c2b6-c51f-c451a9f760c5".1 = STRING: -
SUN-FM-MIB::sunFmFaultEventResource."0f3dcdf3-f85b-c091-8f1f-ce2164976cda".1 = STRING: zfs://pool=1ca09fa50e7ca8c7
SUN-FM-MIB::sunFmFaultEventResource."33570acb-e108-4ca8-8276-c67aeecf2043".1 = STRING: zfs://pool=data
SUN-FM-MIB::sunFmFaultEventResource."3600a05e-acc1-cae2-c185-f50852156777".1 = STRING: zfs://pool=6f658a6c4b99b18b
SUN-FM-MIB::sunFmFaultEventResource."97bfeb63-7b02-c2b6-c51f-c451a9f760c5".1 = STRING: zfs://pool=data

This is roughly equivalent to the fmdump -v command. The fmdump -V command cannot be duplicated over SNMP, though it can be useful to run on the host side. fmdump -V can provide product, chassis and server IDs as well as a more complete list of faults and ereports that resulted in the diagnosis.

You could also view the fault management information by resource rather than by fault or event over SNMP

system2:/root-> /usr/sfw/bin/snmpwalk -v2c -c public -t 20 system1 sunFmResourceTable
SUN-FM-MIB::sunFmResourceFMRI.1 = STRING: zfs://pool=data
SUN-FM-MIB::sunFmResourceStatus.1 = INTEGER: faulted(5)
SUN-FM-MIB::sunFmResourceDiagnosisUUID.1 = STRING: "33570acb-e108-4ca8-8276-c67aeecf2043"

This is similar to the "fmadm faulty" command that can be run from a system. The faulty parameter results in the some additional information and text, though that text can also be retrieved at the event URL that was earlier identified.

While this is the "SPAM" approach to inquiring about a system, you could also walk each problem as they come in.

As was mentioned previously, there isn't a "ton" of information available within the SNMP MIB itself. The result of receiving a particular trap is often to do additional diagnosis on the system with the fault. So some ssh work may be necessary.

Additional things to do with FMA

You can dump the topology of a system!

Go to /usr/lib/fm/fmd and type "fmtopo", you should get a complete topology as it is recognized by the fault management infrastructure.

bash-3.00# ./fmtopo
Jun 26 13:20:16 ddba8792-0166-6fb4-a81b-db3de9622649













hc://:product-id=Sun-Fire-X4150:chassis-id=09999:server-id=server1:serial=555555555:part=777777777 :revision/motherboard=0/memory-controller=0/dram-channel=1/dimm=0

... (and on and on)

Artificial Fault Injection

There isn't a lot of documentation I've found for fminject, but it looks like it can be quite useful. It will, at its easiest, allow you to replay a set of ereports and faults that previously occurred in a system. For example, we can replay the error log that resulted in the FMA trap for the ZFS pool problem that the trap was sent out for. In one window on the system that sent the trap, start up the simulator

cd /usr/lib/fm/fmd
./fmsim -i

In another window, replay an error log

cd /usr/lib/fm/fmd
./fminject -v /var/fm/fmd/errlog

You will see the SNMP Trap re-appear in the trap receiver. Note there is no good way to determine that this is a replayed trap in the trap receiver's window that I've determined, so use this with caution.


While there is enough information to completely diagnose a problem from an SNMP trap, the traps are a great opportunity for integration with higher level management tools. More often than not, additional commands and information are required to be run on the system with the fault (be it logs or output of various commands, especially in the ZFS space), but the messages themselves typically have the next steps for diagnosis.

Thursday May 15, 2008

Sun Open Storage and the Sun Fire x4240

We announced a whole set of new servers this week, including one of my favorites, the Sun Fire X4240 Server. I only like to talk about things I've touched and played with so I figured I'd show you a little about how I'm using my x4240s.

I'm a storage guy, so what I immediately like about this particular system (and the Sun Fire x4150s I talked about before, as well as the related 8 and 16 disk offerings with SPARC chips), is the disk density I get along with the raw horsepower sitting right next to the disk drives.

The Sun Fire x4240s I'm using right now have 2 x Dual-Core AMD Opteron 2222s that run at 3.0 GHz, but you can have a couple of Quad-Cores in this thing as well! As for storage, I have a set with 146 GB, 10k RPM SAS Drives that are pumping about 66 MB/sec through a single NIC port with our Open Storage CIFS Server that integrates nicely with ZFS.

Watch me share 15 drives through CIFS so those poor Windows folks can attach to my box for their storage (and get the benefits of ZFS snapshot and capacity):

$ zpool create sharepool raidz2 c1t1d0 c1t2d0 c1t3d0 c1t4d0 c1t5d0 raidz2 c1t6d0 c1t7d0 c1t8d0 c1t9d0 c1t10d0 raidz2 c1t11d0 c1t12d0 c1t13d0 c1t14d0 c1t15d0
$ zfs create sharepool/share1
$ zfs set sharesmb=on sharepool/share1
$ zfs set sharesmb=name=share1 sharepool/share1

(You also have to configure the PAM module appropriately, but this is a one time thing, try this:

1. Add the below line to the end of /etc/pam.conf:

other password required nowarn

2. Set/Change the Passwords for any Solaris User That Will be Used to Authenticate when Connecting to a CIFS share, like this:

$ passwd pmonday
New Password:
Re-enter new Password:
passwd: password successfully changed for pmonday

With the SMB PAM module installed, passwords are generated that can be used by Windows as well as Solaris (you have to do this). There are, of course, differences from your normal Samba installation but with everything, figure out the tool that is best for you and go with it ;-) One thing you notice with Sun's CIFS Server is there is no anonymous access to shares (this can be a good thing)!

Sun's CIFS Server and the Sun Fire X4240 are just one example of how the Open Storage initiative pays off. You can turn this capacity / performance rich server into an iSCSI target, NFS target, or even keep the storage to yourself and load applications directly on the box that can make use of the fast SAS spindles. Don't want to use Solaris or OpenSolaris...heck, load Red Hat or Windows on these things, we do it all the fact, we'll even pre-load a Linux-variant for you (check the options page).

Cool stuff, nice job System's Group! Oh yeah, and the iLOM in the x4240 works GREAT (better than the x4150 even).

Thursday Mar 27, 2008

National Archives, PASIG, a little Vacation

My family and I took a brief vacation this weekend and made our way to Washington D.C. for a little R & R. We enjoyed 2 and a half days of sights, tours, history and we even squeezed in a little time for the pool. For those of you that have been to D.C. (or live there), you know that 2 and a half days only allowed us to scratch the surface of the United States cultural base that is alive as well as preserved in the city (and often within a few blocks of the National Mall).

There are so many thought provoking and emotional moments as you move around that after two and a half days I found myself almost completely wrung out. We saved the Congressional Gardens with the Vietnam Memorial, World War II Memorial, Lincoln Memorial, Korean Memorial, and the others for the last day. The artistry and the thought that went into these memorials is astounding and the emotions that they pull out of you put you into knots.

I won't list everything we did on the whole journey over the weekend. For my youngest son, going up the Washington Memorial (and our need to start standing in line for tickets at 6:30am) will probably be the most impacting moments. For Shaun, hopefully the Vietnam Memorial and the Pederson House. For me, who knows, the Bill of Rights, the Constitution, the Declaration of Independence, the Magna Carta...simply amazing, but overall I can't name a single moment that wasn't worth its weight in gold.

Professionally though, the National Archives had to be one of the most thought provoking of our stops.

Here is this large building, with all of these physical manifestations of our history on display and in vaults around the building. The Constitution, the Declaration of Independence, the Bill of Rights, the Emancipation Proclamation and more than I could ever list here. It was "Magna Carta Days" at the National Archives as well and one of the four remaining copies of the 1297 Magna Carta from King Edward I was on display.

Here are a few thoughts that went through my head:

  • With infinite, perfect copies of digital content, what makes a digital entity "unique" and "awe inspiring"?
  • How is our country going to preserve and revere a digital creation 710 years from now?
  • How do you know what digital creations are worth preserving since you can hit a button and destroy them so easily?

The first of these seems entirely out of place with storage technology, but when you stand in front of the Declaration of Independence it makes you wonder what digital content could actually have this impact on a person and how you would embody that digital content. The record companies are struggling with it as well. In addition to digital download content, the companies are trying out releases on USB thumb drives as well as larger packages and "Deluxe" sets. Books are going to struggle through the same revolution (as magazines already have) of being bundled as bits with little or no "branding" or "artistry" about the packaging. How does one recreate that sense of uniqueness when the content is merely a bunch of bits that gets flattened into 10 songs amongst 8,000 on an iPod? Really, what "value" do those songs and books have anymore when they can be passed around at will and are part of a great "torrent" of traffic into and out of our computers? Something really has to "stick" to remain on "top of our stereo" these days.

And as for the Declaration of Independence...what a clean and simple document. The document itself hung in windows and is incredibly faded and worn down. Only after time passed did our country seek to formally preserve it for posterity. Perhaps we caught it in time to save it from deteriorating any more. But one still has to ask, is it the single original document that retains the significance or is it the content that remains significant. If it is the content, we wouldn't store the original in a huge underground vault and protect it as well as Vice President Cheney, would we?

Having seen the original, I would have to argue that there is something incredibly unique about it, it actually holds more reverence (for lack of a better word) than one of the many copies of it. So how does one reproduce that "reverence" in a digital world?

If that is not enough to think about, we have to think about digital preservation. The Magna Carta of 1297 has withstood time for 710 years and is in wonderful shape. What digital storage technology today do we have that can withstand decay for that length of time (of course, one could argue that some rock etchings have withstood time for 1,000s of years). Let's put this in perspective, today's disk drives and SSDs are generally spec'd for 5 years. If I want to preserve my family's pictures for 710 years, I would have to ensure the data was migrated 142 times. Hmmm, I'm not sure if my kids and their kids and their kids are up for that.

It appears that CDs and DVDs may have a lifespan of around 50-200 years if you preserve them properly. That is getting pretty reasonable...of course, they haven't been around for 50 to 200 years so they are certainly not battle tested like carving on a good rock. The National Institute of Standards and Technology appears to be looking heavily into the longevity of optical recording media. DLT appears to have a shelf-life of around 30 years if preserved properly.

Let's say, hypothetically, that you solve the problem of the storage media (perhaps a self migrating technology in a box that guarantees infinite lifespan and that, itself, produces the new disks and technology to ensure fresh DVDs are always built). Now you have two additional challenges (at least):

  • Maintaining the integrity of the data (how do I ensure) that the data that is NOW on the DVD is the original data
  • Maintaining the ability for outsiders to inspect and recall the stored information

The first of these seems obvious, but is actually quite difficult. Checksums can be overcome with time (imagine compute power in 700 years!) and we can't guarantee that the keepers of the information will not have a vested interest in changing the contents of the information. We see governments attempting to re-write history all the time, don't we?

Let's take a simpler example of what happens when "a" byte disappears. Recall Neil Armstrong's famous quote: "One small step for man ... ". Well, after a lot of CPU cycles and speculation and conspiracy theories, it turns out that we now believe that Neil Armstrong said: "One small step for a man...". It is a fundamentally different statement (though there is no less historical impact). This data is only 40 years old, but consider the angst in trying to prove whether or not "a" was a part of the quote. What happens when a government deliberately alters, say, the digital equivalent of an "original" 2nd Bill of Rights written in 2030?

One more thought for the day, since I really do have to work and if you have made it this far, it is my duty to you to free you of my ramblings.

We know for a fact that the English (dare I say...United States dialect) language is evolving. Even after 200 years there are phrases and semantics and constructs in the Declaration of Independence that require quite a bit of research for the common US citizen. Take the following paragraph:

He is at this time transporting large Armies of foreign Mercenaries to compleat the works of death, desolation and tyranny, already begun with circumstances of Cruelty & perfidy scarcely paralleled in the most barbarous ages, and totally unworthy the Head of a civilized nation.

There is the obvious use of the word perfidy, a word that has since all but disappeared from common speech in the United States.

Looking deeper at the paragraph we see evolution in spelling (compleat). There is also a fascinating use of capitalization throughout the Declaration of Independence. The study and usage of capitalization alone could be worth the creation of long research papers.

What does this tell us? The content and meaning of a work lies often with the context and times in which the work was created. How does one retain this context, language, and ability to read the content over 700 years? This is not a small problem at all. There are entire cultures lost or in the process of being lost as the language and the context is lost, consider the United States own Anasazi culture as an example.

A computer dialect (protocol, standard, information model, etc...) are themselves subject to evolution and are even more fragile than spoken language itself. A change in a capitalization in an XML model may break the ability of pre-existing programs for reading and migrating information, resulting in lost information. Once you break a program from 200 years prior, how much expertise will still exist to maintain and fix that program?

Crazy things to think about. Personally, I believe we are in a fragile place in our history where we could lose decades of historical information as we transition between written works and digital works. As part of my night job I'm trying to get more involved in the Sun Preservation and Archiving Special Interest Group (PASIG) to learn more about what our customers are doing in this area. I'm also trying to reorganize my own home "infrastructure" to be more resilient for the long run to ensure that my family's history does not disappear with my computers.

There are significant challenges in the computer industry all over, but preservation of history is one that our children and our children's children will judge us with. USB thumb drives will come and go, but hopefully our generation's digital treasures will not go to the grave with us.

Friday Feb 15, 2008

Why one bit matters.

Sometimes I wonder why I'm in the field of storage. Its not glamorous. Its JBODs, RAID arrays, HBAs, expanders, spinning rust, and all of those things wrapped into enclosures with lots of fans humming. My background is varied, I wrote a file system for my Master's, I worked on one of the biggest Java Business Frameworks ever (the SanFrancisco Project at IBM), and I've danced between the application and infrastructure space more than once.

I often think about my "ideal" job, I've even pondered it here on my blog...and take note, the new Jack Johnson CD is very good and I am ripping it to 8-track real soon now. Personally, I love the field of digital preservation, XAM is in the right direction, and long term digital archives are important to people-kind.

But still, this storage business, there is something to it.

I watched my friend get their eyes lasered to correct their vision this week. While I was watching, I was able to sit with one of the assistants and pepper her with questions, it is an astounding process. Basically, as I understand it, the Doctors use the scanners and computers to

  • map the surface of each eye
  • analyze the surface to understand why the vision is incorrect
  • create several corrective treatments
  • the doctor looks at the corrective treatments and adds their wisdom to make the right decision (a lot goes into this, like the health of the patient, the age, their profession, whatever...)
  • the doctor may tweak the map of places that need adjustments
  • the updated map is loaded into the "laser"
  • the patient comes in, gets prepped, the doctor aims the laser and sets the program loose
  • the "laser" jumps around the eye zapping away
  • the doctor reassembles the eye
  • the patient goes home

Coolness. But then the geek in me took over, I asked what I could about the machine, backup generators, power, moving the data, mapping the eye, etc... But my head kept thinking about the storage and computer software.

What if a bit is wrong? What if the bits are stored away but due to some battery backup cache being down, it doesn't really get stored and the out of date map is actually in place? What if one tiny point "ages" and becomes rust and there is no checksumming to see it "rotted"? These are people's eyes, you know? Would you want to be the storage vendor that supplied storage that messed up someone's eye because you didn't get the signal / noise ratio on the cabling right?

I've been thinking a lot about digital photography lately as well. While its not people's eyes, it is still an incredibly fragile process. In fact, many of the world's best photographers still do not use digital, and for very good reason. Even when you purchase photographs, you pay a premium price for pictures that have not gone through the digitization process.

Think about this, if a person takes a picture, the CCD (or whatever they are these days) takes the light and transfers it to a memory card. The memory card gets transferred to a laptop hard drive (in my case), a variety of backups are made and I move many of the pictures to SmugMug.

That's a lot of storage along the way. Now, let's say (God forbid), my house burns down. I get my pictures back from SmugMug and one of my pictures has a bit that rotted away.

Now, that is one tiny bit of imperfection to some people. To a professional, that picture is no longer an original. At that point, you have to decide to toss away your artistic integrity and photoshop the point to be like the ones near to it, or just toss the picture from your portfolio. Either way, the picture is never the same.

How would you like to be the one that sold the storage unit that allowed the bit to rot or be stored incorrectly, or archived incorrectly and destroyed that person's memory, that one perfect picture that was meant to be a keepsake forever.

Well, when you think about it, building storage units and management for those storage units is probably not as glamorous as owning the software or companies that specialize in photo archiving, or "lasering" people's eyes, or storing original recordings for artists, or archives of space travel. But those folks have to pick storage units from a company...and if you are the company they pick and you fulfill your moral responsibility to supply checksumming in your file systems, and well-tested storage that may occasionally be late to market to ensure that a memory is not lost or an eye doesn't get know, that's pretty rewarding.

Cheers to all of my co-workers at Sun who believe storage is more than a spinning drive or a paycheck.

Sunday Dec 09, 2007

Metadata, CAS and the future of Digital Photos

I have my new Adobe Photoshop Elements 6.0 up in the background and I am currently working away at tagging all of my pictures. Get this, I have 8,477 digital photos since I started taking them around 1997 or so with my Kodak DC120.

Here is a picture from the Kodak, I was trying to figure out how well the camera processed different color bands.

I have a few scanned photos from before that and, in fact, I went on a scanning binge this Summer with old pictures my Mom had sent me...of course, I have lost those already and have to re-scan them.

My newer Canon Rebel XTi is very cool. I can't say enough about it. Finally a camera that I feel replaced my 35mm (Canon Rebel as well). I have a few lenses for it and am still practicing like crazy. My friend bought me a kick-butt lens for it, 67mm front opening and a fixed f/2.8 aperture. Here is one of my favorite "technical" pictures so far (not to say I don't have a growing library of pictures that I love too...more on that in a minute).

Back when I used my Kodak, vendors hadn't realized the importance of a non-proprietary format. The Kodak recorded images in a ".mix" format. These days its hard to find software that recognizes those images, .mix has not stood the test of time. Generally, vendors have learned from their lessons. Oh, wait...more advanced cameras allow you to take pictures in a RAW image format, this is an unprocessed, uncompressed image that is basically like a digital negative. RAW formats generally suffer the same limitations as those early picture formats, you are often tied to the manufacturer's software to edit the RAW images, then convert them to something like a JPG.

Today was the first real sled day here in Colorado (at least in the Denver-metro area). I tromped out with my camera and kid in tow and snapped a few shots. When I say a few shots, I mean about 150. Here is one from the zoom lens.

Now, let's talk storage. I wouldn't even rate myself an "amateur photographer", I am more like a "every few weekend" photographer. Still, I have racked up 1,000s of pictures, and the flood-gates have opened with my new camera. I have about 40GB of photos today and the challenge of maintaining them is growing. Here are my current set of challenges:

  • Backup

  • Search and Retrieve

  • Picture Format Compatibility

  • Printing

There are more challenges, of taking a good picture...but let's stick to the above. Reading a book like Photographing the Landscape: The Art of Seeing by John Fielder makes you realize what an art a camera really is.

So, here is how I am tackling my challenges with some technology tips for storage and folks like me...please feel free to let me know if you have additional tips, I can always use pointers.

Simply put DVDs don't work. I just don't like fiddling around with everything. So here is my current home-grown process.

  • Download pictures to my laptop
  • Take a pass at processing the photos (initial edits, ratings, etc...)
  • Convert favorites from RAW to JPG (to share)
  • Copy all of the pictures to a second computer

Then, every couple of weeks I copy the photos again using the Areca Backup Software out on SourceForge.

Now, I haven't experimented with it yet, but Adobe Photoshop Elements 6.0 finally has an online backup facility built in. Trust me, we are all going here sooner or later. Its crazy having all of your memories on a hard drive. And, further, if you think you will have to leave your house in a hurry some day (just ask folks in San Diego about this), you better have them on your laptop or on one of the new external drives. Semi-professional and professional photographers even build their own storage networks with JBODs and systems as storage heads. The The DAM Book: Digital Asset Management for Photographers is a great reference if you want to see an outline of putting together your own system for archiving.

Search and Retrieve
Search and retrieve is becoming my biggest headache. Thankfully, I was reading The DAM Book... as I started encountering this headache. Basically, your memory will fail and you will start losing track of your most treasured photos.

Here is the key: metadata, metadata, metadata.

Adobe Photoshop Elements has been worth every penny I spent. I rate the pictures and add tags to all of my pictures that will help me locate them in the future. I use top-level classes of People, Places, Events, Sports and Other. The Places are cool as there is built-in geo-tagging so that all of your pictures for a particular place can be found on a map (as long as you are Internet connected).

The DAM Book had additional guidance for tagging that is very useful (like how to effectively use the \* ratings). Now rather than searching around folders that I think were around the date that I took a picture, I simply click on the tags, put in date limits and figure out the quality of picture I'm looking for, a nice spread of pictures is displayed.

The investment in tagging my old pictures has been heavy and I'm still not done, but metadata is the ONLY way to manage your pictures. In my opinion, the XAM initiative is going to be huge, it basically makes this type of metadata search and retrieve the "top-level" storage primitive. Today, my metadata is heavily tied to Adobe Photoshop and the application. With XAM, applications use standard APIs for querying objects rather than files. My prediction...XAM or something like it will be a part of every major operating system within 5 years, making the storage primitive that of a Content Addressable Storage (CAS) device. With CAS/XAM type APIs, applications should be able to share the metadata and content rather than having it tied to proprietary "sidecar" files. Further, a good XAM implementation should let you use the information as standard files if you are really tied to your file explorer.

Picture Format Compatibility
My Canon's RAW format (.cs2) is thankfully compatible with Adobe Photoshop Elements. So I can edit my RAW files directly from the application. There is only one set of filters I'm missing which would be nice to have, but I'll survive.

But here's the thing, there is an open format from Adobe that is extremely interesting, it is the Digital Negative (DNG) format. It is sort of like a TIFF file, but open and with much more information packed in. The files end up about double the size of your original RAW file, but for a storage archive, DNG files make a lot of sense. The contents are still "RAW" but have been moved to a standard format that can withstand the lifetime of a digital camera (RAW files have no guarantee to live past the life of a particular camera or brand, thus giving you a potential headache when you switch cameras). DNG is not ubiquitous yet, so often times you cannot read the DNG file from software (other than Adobe's).

Still, my $$$ are on DNG to be ubiquitous in the coming years. I have not converted all of my files yet, but the day is coming...once I have them all tagged ;-)

Finally, printing. Let me give you a tip. Use your photo printer sparingly, it is simply not worth it. When you add up the cost of the printer, ink, and the headaches and combine it with the limited options of your home photo printer...just forget it. I can get $0.10 prints from my local grocery store now. I literally upload them, tell them to print them, and I can go pick them up in an hour. Its amazing. The quality at the grocery store was "OK", not exceptional...but your local photography store allows you to do the same thing. I print at a photo store by the mall for some even better prints...they are about $0.20 a piece for a 4x6. Kodak's online site kicks butt too, those come in the mail in a couple of days and the quality is incredible.

But here's a tip I learned the other week. The image sensors on the Canon's (and other dSLRs) blow up at a different ratio than a pocket camera. The dSLRs have an 8x12 ratio with no crop whereas the pocket cameras have an 8x10 ratio. I have only found one site that does 8x12s, SmugMug. The pictures arrived after a couple of days professionally packaged and looking exceptional. The price was pretty good too...there is a fee to be a member though whereas Kodak is pretty much free if you order something once a year.

And that, my friends, is a blog post that is way too long but has been eating me up for a long time.


Friday Aug 03, 2007

Storage System Patterns Wiki Site

It looks like went live, public and "beta" this morning! As a result, I can mention a little experiment I've been working on for deployment and implementation patterns for storage systems :-)

Over the past few months I've been brainstorming about how to capture the substantial community knowledge around implementing storage solutions that are centered around "off the shelf" systems coupled with "off the shelf" software. I will admit the obvious bias towards Sun Systems and using OpenSolaris as the Storage Operating System, but when a pattern is posted there is no reason you can't fork it and add other systems and software implementations.

The space is pretty sparse right now, but you can log in, go to the System Patterns Space, and start editing it, linking things, helping to organize, adding and altering the pattern I put up already (more are in the works), etc...

My plan is to "moderate" the space and hopefully guide it into usefulness, keep the space full of "positive" kharma, and add patterns as my time permits me to experiment in my architecture role (this isn't my full time job).

The first pattern I posted is a reformat of the two part blog posting on ZFS Snapshot and Amazon S3 (Part 1 and Part 2) into a single editable File System Backup to Storage Utility Storage System Pattern.

I will post my partial pattern on zones and MediaWiki over the weekend. I haven't had a lot of time to do the software install in my zones, so any help would be appreciated.

I do have goals for the space, but since its my intent to make this a community site, I'm not particularly interested in my goals if it is morphed into something more useful. Here is what my original set of goals were:

  • Create a set of architectural blueprints, recipes and implementations for interesting and non-intuitive storage solutions that start with servers as the primary building block
  • Anyone can plug in different implementation or morph the patterns to be more useful
  • Let individuals post requests for solutions to a particular problem that others may have seen or are interested in solving with the community
  • Try to maintain the space as "problem solution" centric rather than technology-centric
  • Make sure links are provided to patterns on the Internet
  • Ensure folks using the patterns are encouraged to put notes into the pattern or keep content fresh if things get out of date

Again, I'll be blunt, I love the idea of this being Sun-centric but I have to be honest, I know many of you have solutions that don't include Sun gear, so it would be naive of me to believe a wiki-space would be useful to you if there was some "Sun-gear Only" clause ;-) But the power of the Wiki is that when you post a pattern and recipe that uses non-Sun gear, the community can sit and ponder the solution you present and fork a section with a bill-of-goods for Sun Gear :-) heh heh (if it exists, of course).

So, chip-in if you have the time or interest, send me ideas if you don't want to edit the space directly, enjoy the openness! You can also ignore it and it will probably quietly go away some day into the lava field of interesting ideas that got stale and regurgitated as something else ;-)

Thursday Jun 07, 2007

ZFS Snapshot and Amazon S3 (Part 1 of 2)

There are a few things on my mind these days: ZFS, Amazon S3, Project Blackbox, Why the "C" on my keyboard is sticking so much, OpenSolaris, Network Attached Storage, Storage System Blueprints, Communities, and on and on. I finally decided to sit and think about the economics of utility storage vs. rolling your own as well as whether I could create a very simple BluePrint for backing up and restoring ZFS to a storage utility, like Amazon S3. Here's a hint...ZFS to Amazon S3 is insanely simple.

Needless to say I incurred my own credit card bills to bring this to you since I figured the cost of backing up some of my pictures was going to be less than filling out an expense report :-) SO, since I burned the money, you are forced to sit through a 2 part blog post that first looks at VERY HIGH LEVEL economics of storage utilities and is followed by the "recipe" for implementing a simple ZFS snapshot to Amazon S3 backup and restore process.

Before I got too involved with the whole implementation, I needed to see what the whole process would cost me, cost is king in a start-up. This is a nowhere-near completely scientific breakdown, but it is based on how I've seen some friends do this type of thing, so it is decent. I would really suggest you do your own up front planning and roll your own costs into the mix and do comparisons to colocation services along with your own server or even some of the ISPs around that give you liberal storage quotas.

Here are the simple requirements:

  • Off-site backup location
  • 40 GB/Month Backup/Archive, maxing out and staying even at 400 GB after 10 months (continuing to upload 40 GB/Month but deleting snapshots prior to that)
  • 2 times a year download of 40GB for restore

Notice I have not discussed access rates or access latency. My assumption that "Internet Time" for access rates and latency is sufficient. The choice of 40GB a month is "relatively" arbitrary so please, please do your own pricing (using Sun Servers of course if you are doing a comparison).

  • Amazon S3 - Pricing is located on the site and is based on bandwidth plus storage.
  • The total price for new storage + bandwidth for my requirements is $10
  • The raw storage cost increases each month until I hit 400GB ($6, $12, ... , $60 and remaining at $60/month afterwards).
  • I planned two downloads per year of 40GB which will cost me $7.20 each, $14.40 per year
  • The TOTAL cost for my 2 years of 400GB of storage should be about $1294.80.

At this point, I will give you a very important disclaimer. The cost seems relatively reasonable for a worry-free 400GB, but using some other trajectories, you can exceed the costs of your own server and hosting. The cross-over point for my overly simple world seems to be around a year when you are using terrabyte economics. Colocation of a Sun Fire X4500 can be extremely competitive or better if you are looking at backups hitting the 24TB range. For the SmugMug architecture and storage requirements (well over the TB range), Don MacAskill does see great cost savings with Amazon S3.

Consider the simple modification to assume that the storage is capped at 400GB but 400GB is replaced each month, increasing the bandwidth consumption. This brings the entire cost for 24 month span up to $2428.80, add another 10x for 4TB per month and we start talking real money ( me, real money started at anything over $30). But raw storage and bandwidth are not the only concerns, what about system administration that goes into rolling your own server and maintaining it, that costs money and time!

You really need to ponder these types of decisions long and hard. I am a HUGE advocate of storage utilities for the short-handed, but I am also a huge advocate of rolling your own to get your own economies of scale (did I mention that Sun provides some great solutions?). Every company is unique and has its own requirements.

There are many, many factors to consider for your own business. There are CLEAR advantages to storage utilities like Amazon S3, here are a few:

  • Pay for what you use
  • No system administration on your part
  • Single throat to choke when something is down
  • You reap the benefits of their efficiencies of scale

There are disadvantages too

  • You are somewhat locked into your storage utility
  • Misjudging your requirements can result in high cost to you
  • You cannot create better efficiencies of scale as you reach high amounts of data capacity (lack of hierarchical pricing in Amazon S3 based on aging is a bummer)

Still, Amazon S3 is clearly useful to startups and even your own personal backups. SmugMug uses it liberally as well as a variety of other startups. We shouldn't eliminate the other storage utilities around either, there are quite a few to choose from these days.

I have about 5GB of pictures that I need to keep backed up. That backup would cost me mere dollars a month, 10s of dollars for a year...that is pretty reasonable when you consider they are not replaceable and my house may be hit by a power surge at a moment's notice.

Part 2 will be posted soon. I used the ZFS snapshot capability and a simple Java program to move data back and forth to the Storage Utility! With ZFS, combining with Amazon S3 is a piece of cake.

Friday May 25, 2007

Storage Remote that...

One of my many projects is to tackle the product-side architecture for Remote Monitoring of our storage systems. Remote Monitoring is a fascinating problem to solve for many, many reasons:

  • There are different ways to break the problem up, each being pursued with almost religious fanaticism, but each having its place depending on the customer's needs
  • It is a cross-organizational solution (at least within Sun)
  • It has a classic separation of responsibilities in its architecture
  • It solves real problems for customers and for our own company
  • It is conceptually simple, yet extremely difficult to get right

The problem at hand was to create a built-in remote monitoring solution for our midrange storage systems. Our NAS Product Family and anything being managed by our Common Array Manager was a good start. Our CAM software alone covers our Sun StorageTek 6130, 6140, 6540, 2530, and 2540 Arrays. Our high-end storage already has a level of remote monitoring and we already have a solution to do remote monitoring of "groups" of systems via a service appliance, so our solution was targeted directly at monitoring individual systems with a built in solution.

This remote monitoring solution is focused on providing you with a valuable service: "Auto Service Request", ASR. The Remote Monitoring Web Site has a great definition of ASR: Uses fault telemetry to automatically initiate a service request and begin the problem resolution process as soon as a problem occurs. This focus gives us the ability to trim down the information being sent to Sun to faults, it also gives you a particular tightens up the service pipeline to get you what you need in a timely manner.

For example, if a serious fault occurs in your system (one that would typically involve Sun Services), we will have a case generated for you within a few minutes...typically less than 15.

The information flow with the "built in" Remote Monitoring is only towards Sun Microsystems (we heard you with security!). If you, the customer, want to work with us remotely to resolve the problem, a second solution known as Shared Shell is in place. With this solution, we work cooperatively with you so that you can collaborate with us to resolve problems.

Remember though, I'm an engineer, so let's get back to the problem...building Remote Monitoring.

The solution is a classic separation of concerns. Here are the major architectural components:

  • HTTPS protocol for connectivity
  • Security (user-based and repudiation) via Authentication and Public / Private Key Pairs
  • Information Producer (the product installed at the customer site)
  • Information Consumer (the service information processor that turns events into cases)
  • Routing Infrastructure

The REST-XML API gives us a common information model that abstracts away implementation details yet gives all of the organizations involved in information production and consumption a common language. The relatively tight XML Schema also gives an easily testable output for the product without having to actually deliver telemetry in the early stages of implementation. Further, the backend can eaily mock up messages to test their implementation without a product being involved. Early in the implementation we cranked out a set of messages that were common to some of the arrays and sent them to the programmers on the back end, the teams then worked independently on their implementations. When we brought the teams back together, things went off without much of a hiccup, though we did find places where the XML Schema was too tight or too loose for one of the parties, so you do still have to talk. The format also helps us bring teams on board quickly...give them an XSD and tell them to come back later.

Here is an example of a message (real data removed...). Keep in mind there are multiple layers of security to protect this information from prying eyes. We've kept the data to a minimum, just the data we need to help us determine if a case needs to be created and what parts we probably need to ship out:

<?xml version="1.0" encoding="UTF-8"?>
<message xmlns:xsi="" xsi:noNamespaceSchemaLocation="message.xsd">
<message-time timezone="America/Denver">2005-11-22T12:10:11</message-time>
<product-name>Sun StorageTek 6130</product-name>
<event-time timezone="America/Denver">2005-11-22T12:10:11</event-time>
<summary>Critical: Controller 0 write-cache is disabled</summary>
<description>Ctlr 0 Battery Pack Low Power</description>

Use of XML gives us the ability to be very tight with use of tabs and enforce particular values, like severity, across the product lines.

The format above is heavily influenced by our Fault Management Architecture, though an FMA implementation is not required.

What we've found is that good diagnostics on a device (and FMA helps with this) yields a quick assembly of the information we need and fewer events that are not directly translated into cases. FMA and "self healing" provide and exceptional foundation for remote monitoring with a heavy reduction in "noise".

The rest of the architecture (the services that produce, consume, secure, and transport the information) is handed off to the implementors! The product figures out how to do diagnostics and output the XML via HTTPS to services at Sun Microsystems. Another team deploys services in the data center for security and registration (there are additional XML formats, authentication capabilities and POST headers for this part of the workflow). Another team deploys a service to receive the telemetry, check the signature on the telemetry for repudiation purposes, process it, filter it, and create a case.

There are additional steps that each product needs to go through, such as communicating across organizations the actual message-ids that a device can send and what should happen if that message-id is received.

In the end, the centerpiece of the architecture is the information and the language that all teams communicate with. Isn't this the case with any good architecture? Choose the interfaces and the implementations will follow.

Keep in mind, this remote monitoring solution is secure end to end. Further, remote monitoring is only one piece of the broader services portfolio...I'm just particularly excited about this since I was privileged to have worked with a great, cross-organizational team to get it done! The team included Mike Monahan (who KICKS BUTT), Wayne Seltzer, Bill Masters, Todd Sherman, Mark Vetter, Jim Kremer, Pat Ryan and many others (I hope I didn't forget any). There are also lots of folks that were pivotal in getting this done that we lost along the way (Kathy MacDougall I hope you are doing well as well as Mike Harding!).

This post has been a long time in coming! Enjoy!

Thursday May 17, 2007

OpenSolaris = Storage Operating System

By now, you've hopefully heard about the OpenSolaris Storage Community. I am lurking in the community for the time being as I try to take care of a few jobs here. Hopefully I'll be more active in the coming weeks and months.

During my day job, I do spend quite a bit of time thinking about this and working to apply Solaris to practical storage problems. Often, the question of "What IS a Storage Operating System" comes up. A storage operating system, to me, is simply an operating system that either is, or can be, entirely focused on delivering, consuming, and/or transforming storage from one representation to another (such as block to file, or file to objects, etc...). Most general purpose operating systems can profess to do all of these things in one form or another so, in many ways, all operating systems are storage operating systems.

Still, there is something unique about OpenSolaris that makes it even "more" of a storage operating system than the others. Its not ZFS alone, though ZFS is interesting on many accounts. ZFS is truly a paradigm shift that every single company should be looking at for its availability characteristics, built-in checksum, built-in volume manager, etc...

When I look at the image showing the open source stack that relates to storage, I realize that these are tried and true, enterprise class storage technologies that are either in the open source community or being released to the open source community in the near future.

One of my favorite projects that was just added to the open source community is the Honeycomb Project. A true Content Addressable Storage API being added to the Open Source community should help advance CAS in the mainstream quicker than all of the slideshows about CAS on the planet. I have played with the Honeycomb API for about a year and as a web application programmer (on occasion), this thing really gets me excited. CAS APIs are truly closer to the application tier then the API is. CAS also has the potential to disintermediate the database from unstructured data, like photo sharing sites and such. With standards, such as XAM, in the works, perhaps we are on a 2-3 year curve, but filesystems should be watching their back.

I can honestly say, I don't think any other operating system community has a CAS API sitting in it. Further, what makes the OpenSolaris Storage Operating System so unique is its coherency. All of the projects are under one umbrella for now, making it simple. I like simple.

Now one could argue that OpenSolaris is too BIG to be a Storage Operating System, to which you have to argue: "then strip it down". OpenSolaris is modular enough that you don't have to take the whole thing. I was working with Belenix this week. Distributions can certainly tune OpenSolaris all they want to focus the functionality and capabilities just on what is necessary. Need your Storage Operating System to attach to a SAN and serve NFS out the top end? How about using the NFS Server, UFS, SVM and a few of our fibre-channel drivers. No need to include iSNS or CIFS. Need your Storage Operating System to serve NFS over about 24 Terrabytes of disk? How about using NFS over ZFS? Piece of cake.

OpenSolaris as a Storage Operating System is going to open new doors in the storage world. A fully functioning, living, breathing, market penetrating, OPEN SOURCE operating system focused on consuming, delivering, and transforming storage. Couple the OpenSolaris storage capabilities with a particularly dynamic hardware system, perhaps focused on optimizing storage delivery itself, and you have the makings of a storage revolution.

Friday Jan 12, 2007

Wired: One Giant Screwup for Mankind

Several weeks ago I blogged about data loss and taking the long view when it comes to data retention. This month's Wired magazine has an article entitled One Giant Screwup for Mankind that illustrates the need for taking a long view of data retention policies. It also brings up an interesting point about our current trend at digitizing and chopping our digital content up into lossy data compression formats (like 128kbps MP3s).

Apparently, the grainy images of the original moon landing that we see on TV ("one small step for [a] man...") are not the original images and sound! The engineers were forced to create a smaller format for transmission from the moon to earth, it was 320 scan lines at 10 frames per second transmitted at 500 kHz. This stream was received at 3 tracking stations, pushed to a central location, recorded on media and converted to the broadcast rate of 525 scan lines at 30 frames per second transmitted at 4.5 Mhz. This is essentially 3 transmissions (camera to tracking station, tracking station to central site, central site to tv) and 2 conversions (camera to moon/earth broadcast, moon/earth broadcast to tv). Between the reception of the data and conversion to the tv format, the quality was greatly reduced! The engineers noticed that the broadcast images were not as crisp as what SHOULD have been in the original format. In fact, they could verify this with pictures of the monitors in the conversion room. So, the engineers tried to find the tape that the original data was recorded on so they could recover the full quality images.

Gone, lost, disappeared.

Just as I mentioned previously though, the engineers had TWO problems they had to work on:
- Getting and retaining the equipment they could use to recover the origninal data (remember, we have went through multiple media formats since the 60s)
- Locating the original tapes used for recording the data stream prior to conversion to the television signal format

I won't tell you how its going, you have to read Wired to find out. But, this does bring up an excellent example of
- Why a company that has record retention requirements of over 7 years must put in place a comprehensive policy to not only record the information and store it, but also retain the equipment that can read that data and write it to a new format. Some companies, instead of storing the components to read/write data, will enact a policy to migrate the data to the current media format every 7 years or less.
- Why a company should consider the effect on history of losing their data if their retention policy is less than 7 years or not explicitly stated. For example, is there a retention policy at our record companies for all of the garage band tape recordings they've received? If there isn't, how are we going to retain this valuable piece of American History and Culture? The record companies have a historical responsibility to record and maintain these.

More interestingly to me for this blog post is the problems with the data conversion process itself. Recall I'm a big vinyl fan at this point. Vinyl and analog recordings provide a warm and continuous signal whereas digital chops that up into many slices. Further, when compressing information for our MP3s we actually lose data. Depending on the number of Kbps you use, the data loss can be very noticeable in certain types of music.

Many download services also do not provide lossless downloads.

In the coming year we will see 1 Terrabyte desktop drives. I am convinced that we will start seeing more pervasive use of lossless compression. Still, it begs the question, will our original data remain intact? Are we losing important historical data and content quality through the conversion to digital and then using lossy compression techniques because we feel the quality is "good enough"? I have every reason to believe that as we start merging technology with our bodies and brains, our senses will become more and more aware of the lossy compression techniques used in the late 90's and early 2000's. Even without computer enhancement our brains are adapting to the saturation of media and information in a way that previous generations would be astounded at.

The only question to our kids who will have the heightened senses through the merging of technology with our human anatomy will be "How much quality did my parents compromise and lose for the sake of their convenience...and how much of it will we be able to recover to enjoy their creativity to its fullest potential?". So, be sure to save those original recordings...especially if you are the owner of the Beatles recordings.

btw, does anyone REALLY agree with releasing a Beatles album that does not adhere to the group's original music scores but is instead a mashup? Should content created by a team of people in a specific way be rebuilt to fulfill someone else's vision? What if our future generation actually thinks that these songs were originally mashed up, are we changing history? I agree with mashups and especially for content that is INTENDED to be mashed up, but I believe we should be very careful with taking original content and mashing it up to be something not intended by the author (though I do like the version of the Elvis tune at the beginning of the NBC show Las Vegas :-)

- Gotta run!

Wednesday Nov 01, 2006

Architecting Manageability

Systems that people build are inherently manageable, device builders always provide at least a simple UI or a set of CLI commands. Unfortunately, the question of "how manageable" is rarely asked except through requests for functionality from customers. I figured I would brainstorm some facets of manageability and what it means to me as I've approached from the bottom -> up (building a NAS-like appliance) as well as the top down (attempting to build a higher level management application that devices can plug into).

Consider the following management components of a device or application:

  • Management User Interface - Does it manage one instance of a device, or many? Can new functionality be plugged into it without a forklift upgrade or is it a monotlithic software package? Is there centralized access management for all of the devices in the network? What about for all of the management user interfaces (if you require more than one), can you configure security for the network or do you have to deal with security for each installation?
  • Management API - Is there one available for the device or application? What aspects of the device or application can be accessed for management, here are some categories: health, capacity, security. Is it a read only management API or can characteristics of the device or application be modified?
  • Standard Protocol for the API - Is a standard protocol like XML over HTTPS (REST) or SOAP used so people that want to use the API don't need your protocol libraries? How high up does your standard protocol go, to XML, to a special comma delimited format over XML that you use? If you are a device vendor, do you use SNMP to deliver traps so your events can be integrated with most management tools?
  • Standard Management Model - Can a vendor well-versed in standard management models for your domain interact with your device or software at a "basic" level without even picking up an API document? Can a tool that leverages standard protocols and models for your domain "automatically" manage your device or software? To access your "value added" functionality in your device or software, does someone have to learn an entirely new "theory of operations" or can they use a basic theory that is common in a standard (such as the "Recipes" and "Models" used in SNIA's SMI-S)

I split the categories of manageability up like the above for a specific reason. The "resistance" to each of those categories is substantial in engineers and in architects. Whereas one engineer or architect may easily recognize the need for a management API, only a select few recognize the value of a standard protocol. Even fewer recognize the value of the standard management model.

A second reason for the category separation from above is the barrier to entry. An engineer or architect can "create" more and spend less time learning and adopting the lower they stay on the manageability scale. It is fundamentally easier to create a user interface (the first time) if I am not negotiating a (perceived to be) bloated standards-based model of management. Unfortunately, this "ease" of creation and unwillingness to research and use off the shelf standards is only an illusion. As a device or software component matures and becomes weaved into a management ecosystem, anything that was originally done in a proprietary way slowly evolves to carry the characteristics identified above (if its going to be truly successful, especially in enterprise datacenters).

I've blogged about the benefits of adopting standards early in development before.

In smaller businesses, those that are trying to run a start up, it may make sense to gradually move up the scale of manageability. This is especially the case if you are part of a start-up trying to navigate the waters of small business where there may be no large, deployed management ecosystem.

For larger companies that build applications and devices, having a high level of manageability for their devices shows a maturity in understanding that their devices will have to play well with others. This maturity often implies a heavy amount of work on "non datapath" engineering efforts. User Interface maturity, planned event handling and codes fired off through SNMP traps, a well-planned management API so the device or software can be plugged into management applications, an accessible CLI so those crazy script loving (IT'S A JOKE) system admins can integrate the device and software with their existing "simplifying" scripts, the list goes on.

Increasingly we should be looking at how to ensure devices and software play well with a single access management solution. Having a proliferation of NAS devices, each with their own user id's and passwords can start to get unwieldy. If each software element and each hardware element in a datacenter requires separate management of credentials, separate roles, separate concepts of access rights, something is broken. There are different people managing differing things in a datacenter and they have different rights to what they manage or look at. But why should security have to be configured for every single device and software element? Access management can help create a more secure (through simplifying management) environment. Treat it as part of architecting manageability as well. Access management can be applied to agent tiers, management APIs, user interfaces as well as to datapath.

Gotta run!

Monday Oct 02, 2006

Vinyl, Digital Content, Data Loss, The Long View

I implemented a "content recovery" project in my house on Sunday. It reminded me of a talk I saw at JavaOne 1999 on taking the long view, as well as a customer problem you hear about with lost data due to loss of devices that can read the data!

[Read More]



« February 2017