Wednesday Mar 10, 2010

Sun Storage 7000 Hardware Provider for Microsoft VSS

The release of 2010.Q1 for Sun Storage 7000 Appliances brings a wide variety of new features to the appliance family, including an increased number of ways appliances are integrated directly into storage ecosystems. In addition to the Sun Storage 7000 Management Plug-in for Oracle Enterprise Manager 10g Grid Controller 1.0, Q1 2010 brings integration with Microsoft Volume Shadow Copy Service (VSS) through the Sun Storage 7000 Provider for VSS Software 1.0.

For those that have a Microsoft Windows operating environment and are not familiar with Microsoft VSS, it is basically a framework that facilitates creating fast and consistent snapshots of volumes through the coordination of the applications doing backup-type operations (readers, also known as a requestor) and the applications that are writing to volumes (writers). In layman's terms, this facility built into the Microsoft Windows Platforms gives the ability for an application like Symantec NetBackup Advanced Client to take a snapshot of a volume that is being actively used by an application like Oracle Database or Microsoft Exchange. The reader notifies the framework that it would like to take a snapshot. The framework notifies applications that they need to coalesce. The applications complete their coalescing and the framework tells the reader to go. When complete, the framework releases the application to continue writing. An abstract illustration of the environment is shown here:

The whole process of coalescing and taking a snapshot of a volume can take only a few seconds (more depending on how complex the coalesce and snapshot operations are).

For efficiency, storage appliances (like Sun Storage 7000 Appliances) create a "Hardware Provider", also known as a "VSS Block Target Provider". The Hardware Provider takes over the snapshot and clone operations based on the GUID of a SCSI LUN (in our case for Release 1.0, this is iSCSI only, Fibre Channel is not supported).

Installation of the Sun Storage 7000 Provider for VSS is through the InstallShield package downloaded from the Oracle Corp. Download Site (this site contains the Sun domain ID if you notice those types of things). The installation process should be straightforward and results in a new folder accessible from the "Start" menu (Start -> All Programs -> Sun Microsystems, Inc. -> Sun Storage 7000 Provider for VSS). In this folder are two entries: a README.txt (required reading ... seriously) and the Sun Storage 7000 Provider for VSS ConfigUtil. This latter entry (the ConfigUtil) is also placed on the desktop for quick access.

To verify the hardware provider is registered with the Microsoft VSS Framework, find a command prompt and type "vssadmin list providers". As shown in the following screen capture, this results in a printout of registered providers that will include entry for the Sun Storage 7000 Provider for VSS. The Version number you will see in an original installation is 1.0.10.

This means that backup application, such as Symantec NetBackup Advanced Client, can set up policies that leverage the hardware provider for fast snapshots. There is one more setup operation that must be completed before the policy will successfully complete. The Sun Storage 7000 Provider for VSS uses a combination of "Management Path" and "Data Path" operations to achieve snapshot and clones of iSCSI Volumes.

The "Management Path" credentials to any Sun Storage 7000 Appliances supplying iSCSI LUNs to the Microsoft Windows client must be entered through the "Sun Storage 7000 Provider for VSS ConfigUtil". From the Microsoft Windows platform that is consuming the iSCSI LUNs and has the Hardware Provider installed, open the ConfigUtil (form the Start Menu or the Desktop). Use the DNS names of Sun Storage 7000 Appliances or IP Addresses (I prefer IP Addresses), enter each Sun Storage 7000 Appliance Management Path along with User ID and Passwords given to you by the storage administrator for each appliance. Tap the shoulder of the Storage Administrator and remind them you could use a non-root User ID as guided by the README.txt in the download. Most access rights can be removed from the User ID though they do need to have a role that facilitates snapshot creation and clone creation for the shares that you access. Use the template in the README.txt for a starting point.

This screen capture shows the entry of credentials:

Once the credentials are entered properly, access from applications is seamless.

Look for specific documentation on using Hardware Providers with the specific applications that you use on the Microsoft Windows platform. Because VSS is a framework, you may have products that utilize the framework that we did not specifically test in our labs. The README.txt in the download contains a list of applications that we have run using the provider.

Conclusion


The Sun Storage 7000 Provider for VSS Software 1.0 is a Hardware Provider that plugs into the Microsoft VSS Framework on Microsoft Windows 2003 and 2008, 64-bit or 32-bit variations. Using the provider, backup applications and other requestors can make snapshots and clones directly on Sun Storage 7000 Appliances for iSCSI LUNs consumed by the Microsoft Windows Client on the system that the Hardware Provider is installed on. The installation of the provider is quick and you should verify that it was registered with the Microsoft VSS Framework. You must then enter User ID and Password information for each target Sun Storage 7000 Appliance. No further intervention with the Microsoft VSS Framework is necessary from that point forward and the primary work you'll do is configure your Backup Applications to make use of the Hardware Provider through Backup Policies.

For additional reading, use the following resources:

Monitoring the Sun Storage 7000 Appliance from Oracle Grid Control

Over the past few months I've blogged on various monitoring and alerting topics for Sun Storage 7000 Appliances. Besides my favorite of the blogs (Tweeting your Sun Storage 7000 Appliance Alerts), the culmination of this monitoring work is now available as the Sun Storage 7000 Management Plug-in for Oracle Enterprise Manager 10g Grid Controller 1.0, for use with the just shipped 2010.Q1 software release for the Sun Storage 7000 Appliance Family. Phew, that's a bit of a mouthful for a title but I'll just refer to it as the SS7000MPfOEMGC, does that help? Well, maybe not ;-)

Sun Storage 7000 Management Plug-in for Oracle Enterprise Manager 10g Grid Controller creates a coupling between the enterprise-wide monitoring provided by Oracle Grid Control and the monitoring and analytics provided by Sun Storage 7000 Appliances. If you are not familiar with Oracle Grid Control, there is a nice write-up within the Installation and Configuration Guide for Oracle Grid Control. In a nutshell, Oracle Grid Control aids in monitoring your vertical data center rather than simply being an aggregation of horizontal health information. The documentation describes it as software and the infrastructure it runs on but I would simply call it a "Vertical Data Center Monitoring Environment".

The goal of the plug-in to Oracle Grid Control is to facilitate a Database Administrator in their use of Sun Storage 7000 Appliances without attempting to reproduce the world-class analytics available within Sun Storage 7000 Appliances. In other words, the goal is to create a bridge between the world of Database Administration and the world of Storage Administration with just enough information so the two worlds can have dialog about the environment. Specifically, the Plug-in for Sun Storage 7000 Appliances is targeted at the following tasks:


  • Connecting Database deployments with Sun Storage 7000 resources that provide storage infrastructure to the database
  • Understanding the performance metrics of a Database from the perspective of the Appliance (what cache resources are being used for a database, what network resources and the performance being delivered, and how various storage abstractions are being used by the database)
  • Providing a Federated view of Sun Storage 7000 Appliances deployed in the environment (including storage profiles and capacities, network information and general accounting information about the appliances)
  • Providing detailed performance metrics for use in initial service delivery diagnostics (these metrics are used to have more detailed conversations with the Storage Administrator when additional diagnostics are required)

Let's take a look at one of the more interesting scenarios as a simple way of showing the plug-in at work rather than reproducing the entire Installation Guide in blog-form.

Download the Plug-in for Sun Storage 7000 Appliances, Unzip the downloaded file, and read the Installation Guide included with the plug-in.

Follow the instructions for installing, deploying the plug-in to agents and adding instances of Sun Storage 7000 Appliances to the environment for monitoring. Each instance added takes about 60 minutes to fully populate with information (this is simply the nature of this being a polling environment and the plug-in is set-up to monitor data sets that don't change often less frequently ... 60 minutes ... than data sets that do change frequently ... 10 minute intervals).

Once data is funneling in, all of the standard appliance-centric views of the information are available (including the individual metrics that the plug-in collects) as well as a view of some of the important high-level information presented on the home page for an instance (provided you are using Oracle Grid Control 10.2.0.5). Here is a view of a single appliance instance's home page:

Looking into the Metrics collected for an appliance brings you to a standard displays of single metrics (as shown below) or tables of related metrics (all standard navigation in Oracle Grid Controller for plug-in components).

Included in the plug-in for Sun Storage 7000 Appliances are 5 reports. Of these reports, 3 run against a single instance of a Sun Storage 7000 Appliance and are available from both the context of the single instance and the Oracle Grid Control Reports Tab while 2 run against all monitored instances of Sun Storage 7000 Appliances and are only available from the Reports Tab. Among the 5 Reports are 2 that combine information about Databases deployed against NFS mount points and Sun Storage 7000 Appliances that export those NFS mount points. The two reports are:


  • Database to Appliance Mapping Report - Viewable from a single target instance or the Reports Tab, this report shows databases deployed against NFS shares from a single Sun Storage 7000 Target Instance
  • Federated Database to Appliance Mapping Report - Viewable only from the Reports Tab, this report shows databases deployed against NFS shares from all monitored Sun Storage 7000 Appliances

Looking at the "Master" (top-level) Database to Appliance Mapping Report (shown below) you will see a "Filter" (allowing you to scope the information in the table to a single Database SID) and a table that correlates the filtered Database SID to Network File Systems shared by specific appliances along with the Storage IP Address that the share is accessed through, the appliance's Storage Network Interface and the name that the appliance is referred to as throughout this Grid Control instance.

From the Master report, 4 additional links are provided to more detailed information that is filtered to the appliance abstraction that is used by the Database SID. The links in the columns navigate in the following way:


  • Database Link - This link takes the viewer to a drill-down table that shows all of the files deployed on the shares identified in the first table. With this detail report, and administrator could see exactly what files are deployed where. The table also contains the three links identified next.
  • Network File System - Takes the viewer down to a detailed report showing metadata about the share created on the appliance, how the cache is used (ARC and L2ARC) for this share and general capacity information for the share.
  • Storage IP Address - Takes the viewer to the Metric Details that relate to the appliance configuration (serial number, model, etc...).
  • Storage Network Interface - Takes the viewer to metadata about the network interface as well as reports on the Network Interface KB/sec and NFS Operations Per Second (combined with the NFS Operations Per Second that are allocated to serving the share that the database resides on)

The detail reports for the Network File System and Storage Network Interface (both of which are not directly accessible from the Reports Tab) use a combination of current metrics and graphical time-axis data, as shown in the following report:

Wherever applicable, the Detail Reports drill further into Metric Details (that could also be accessed through an appliance instance target home page).

It is important to note that several of these reports combine a substantial amount of data into a single page. This approach can create rather lengthy report generation times (in worst case scenarios up to 5 minutes). It is always possible to view individual metrics through the monitoring home page. As metric navigation is much more focused and relates to a single metric, metric navigation always performs faster and is preferred unless the viewer is looking for a more complex assembly of information. With the reports, an administrator can view network performance and storage performance side by side which may be more helpful in diagnosing service delivery issues than navigating through single metric data points.

In addition to a substantial number of collected metrics there are several alerts that are generated on various appliance thresholds that can occur throughout the operation of target appliances.

Conclusion


Oracle Grid Control gives a fully integrated view of the "Vertical" data center, combining software infrastructure with hardware infrastructure (including storage appliances). Sun Storage 7000 Management Plug-in for Oracle Enterprise Manager 10g Grid Controller 1.0 presents Sun Storage 7000 Appliances within the vertical context and presents metrics and reports tailored specifically towards Sun Storage 7000 Appliances as viewed by a Database Administrator. For more information on the plug-in and software discussed in this entry:

Thursday Dec 17, 2009

Tweeting your Sun Storage 7000 Appliance Alerts

Twitter, Instant Messages, Mobile Alerts have always fascinated me. I truly believe that a Storage Administrator should not have to leave the comfort of their iPhone, Droid or Palm Pre to do 90% of their day to day management tasks. As I was scanning the headlines of blogs.sun.com I saw this great article on Tweeting from Command Line using Python.

So, leading into how to manage a Sun Storage 7000 Appliance using Alerts (the next article in my series) I thought I would take some time and adapt this script to tweet my received Sun Storage 7000 Appliance Traps. I am going to use the AK MIB traps (to be explained in more detail in the next article) to achieve this.

Writing the Python Trap Handler

First, create the trap handler (this is based on the Python Script presented in the Blog Article: Tweeting from Command Line using Python).

Here is the Python Script:

#!/usr/bin/python
import sys

from os import popen

def tweet(user,password,message):
print 'Hold on there %s....Your message %s is getting posted....' % (message, user)

url = 'http://twitter.com/statuses/update.xml'

curl = '/usr/dist/local/sei/tools/SunOS-sparc/curl -s -u %s:%s -d status="%s" %s' % (user,password,message,url)

pipe = popen(curl, 'r')
print 'Done...awesome'

if __name__ == '__main__':
host = sys.stdin.readline()
ip = sys.stdin.readline()
uptime = sys.stdin.readline()
uuid = sys.stdin.readline()
alertclass = sys.stdin.readline()
alertcount = sys.stdin.readline()
alerttype = sys.stdin.readline()
alertseverity = sys.stdin.readline()
alertsresponse = sys.stdin.readline()
messageArray = [host,ip,alerttype]
t=","
message = t.join(messageArray)
message = message[0:140]

user = "yourtwitter" #put your username inside these quotes

password = "yourpassword" #put your password inside these quotes
tweet(user,password,message)

You will have to make the following changes at a minimum


  • Re-insert the missing tabs based on Python formatting
  • Ensure the path to CURL is appropriate
  • Change the user and password variables to your Twitter account

Once that is done you should be set.

Adding the Trap Handler to SNMP

Next, set up your snmptrapd.conf to handle traps from the AK MIB by invoking the Python Script above. My /etc/sma/snmp/snmptrapd.conf looks something like this:

traphandle .1.3.6.1.4.1.42.2.225.1.3.0.1 /export/home/oracle/ak-tweet.py

The OID .1.3.6.1.4.1.42.2.225.1.3.0.1 identifies the sunAkTraps portion of the AK-MIB delivered with the Sun Storage 7000 Appliance.

Now, invoke snmptrapd using the above configuration file (there are many ways to do this but I am doing the way I know will pick up my config file :-)

/usr/sfw/sbin/snmptrapd -c /etc/sma/snmp/snmptrapd.conf -P

Sending Alerts from the Sun Storage 7000

Using the article I posted yesterday, ensure SNMP is enabled with a trapsink identifying the system where your trap receiver is running. Now we have to enable an alert to be sent via SNMP from your Sun Storage 7000 Appliance (this is different from the default Fault Management Traps I discussed yesterday).

For now, trust me on this, I will explain more in my next article, let's enable a simple ARC size threshold to be violated. Go into the Browser User Interface for your system (or the simulator) and go into the Configuration -> Alerts screen. Click through to the "Thresholds" and add one that you know will be violated, like this one:

Each alert that is sent gets posted to the twitter account identified within the Python Script! And as my friends quickly noted, from there it can go to your Facebook account where you can quickly declare "King of the Lab"!

Thanks Sandip for your inspiring post this morning :-)

Wednesday Dec 16, 2009

The SNMP Service on a Sun Storage 7000 Appliance

Without a doubt, SNMP rules the playground in terms of monitoring hardware assets, and many software assets, in a data center monitoring ecosystem. It is the single biggest integration technology I'm asked about and that I've encountered when discussing monitoring with customers.

Why does SNMP have such amazing staying power?


  • It's extensible (vendors can provide MIBs and extend existing MIBs)
  • It's simple (hierarchical data rules and really it boils down to GET, SET, TRAP)
  • It's ubiquitous (monitoring tools accept SNMP, systems deliver SNMP)
  • It operates on two models, real time (traps) and polling (get)
  • It has aged gracefully (security extensions in v4 did not destroy it's propagation)

To keep the SNMP support in the Sun Storage 7000 Appliances relatively succinct, I am going to tackle this in two separate posts. This first post shows how to enable SNMP and what you get "out of the box" once it's enabled. The next post discusses how to deliver more information via SNMP (alerts with more information and threshold violations).

To get more information on SNMP on the Sun Storage 7000 and to download the MIBs that will be discussed here, go to the Help Wiki on a Sun Storage 7000 Appliance (or the simulator):


  • SNMP - https://[hostname]:215/wiki/index.php/Configuration:Services:SNMP

Also, as I work at Sun Microsystems, Inc., all of my examples of walking MIBs on a Sun Storage 7000 Appliance or receiving traps will be from a Solaris-based system. There are plenty of free / open source / trial packages for other Operating System platforms so you will have to adapt this content appropriately for your platform.

One more note as I progress in this series, all of my examples are from the CLI or from scripts, so you won't find many pretty pictures in the series :-)

Enabling SNMP on the Sun Storage 7000 Appliance gives you the ability to:


  • Receive traps (delivered via Sun's Fault Manager (FM) MIB)
  • GET system information (MIB-II System, MIB-II Interfaces, Sun Enterprise MIB)
  • GET information customized to the appliance (using the Sun Storage AK MIB)

Enabling alerts (covered in the next article) extends the SNMP support by delivering targeted alerts via the AK MIB itself.

Enable SNMP


The first thing we'll want to do is log into a target Sun Storage 7000 Appliance via SSH and check if SNMP is enabled.


aie-7110j:> configuration services snmp
aie-7110j:>configuration services snmp> ls
Properties:
<status> = disabled
community = public
network =
syscontact =
trapsinks =

aie-7110j:configuration services snmp>

Here you can see it is currently disabled and that we have to set up all of the SNMP parameters. The most common community string to this day is "public" and as we will not be changing system information via SNMP we will keep it. The "network" parameter to use for us is 0.0.0.0/0, this allows access to the MIB from any network. Finally, I will add a single trapsink so that any traps get sent to my management host. The last step shown is to enable the service once the parameters are committed.


aie-7110j:configuration services snmp> set network=0.0.0.0/0
network = 0.0.0.0/0 (uncommitted)
aie-7110j:configuration services snmp> set syscontact="Paul Monday"
syscontact = Paul Monday (uncommitted)
aie-7110j:configuration services snmp> set trapsinks=10.9.166.33
trapsinks = 10.9.166.33 (uncommitted)
aie-7110j:configuration services snmp> commit
aie-7110j:configuration services snmp> enable
aie-7110j:configuration services snmp> show
Properties:
<status> = online
community = public
network = 0.0.0.0/0
syscontact = Paul Monday
trapsinks = 10.9.166.33

From the appliance perspective we are now up and running!

Get the MIBs and Install Them


As previously mentioned, all of the MIBs that are unique to the Sun Storage 7000 Appliance are also distributed with the appliance. Go to the Help Wiki and download them, then move them to the appropriate location for monitoring.

On the Solaris system I'm using, that location is /etc/sma/snmp/mibs. Be sure to browse the MIB for appropriate tables or continue to look at the Help Wiki as it identifies relevant OIDs that we'll be using below.

Walking and GETting Information via the MIBs


Using standard SNMP operations, you can retrieve quite a bit of information. As an example from the management station, we will retrieve a list of shares available from the system using snmpwalk:


-bash-3.00# ./snmpwalk -c public -v 2c isv-7110h sunAkShareName
SUN-AK-MIB::sunAkShareName.1 = STRING: pool-0/MMC/deleteme
SUN-AK-MIB::sunAkShareName.2 = STRING: pool-0/MMC/data
SUN-AK-MIB::sunAkShareName.3 = STRING: pool-0/TestVarious/filesystem1
SUN-AK-MIB::sunAkShareName.4 = STRING: pool-0/oracle_embench/oralog
SUN-AK-MIB::sunAkShareName.5 = STRING: pool-0/oracle_embench/oraarchive
SUN-AK-MIB::sunAkShareName.6 = STRING: pool-0/oracle_embench/oradata
SUN-AK-MIB::sunAkShareName.7 = STRING: pool-0/AnotherProject/NoCacheFileSystem
SUN-AK-MIB::sunAkShareName.8 = STRING: pool-0/AnotherProject/simpleFilesystem
SUN-AK-MIB::sunAkShareName.9 = STRING: pool-0/default/test
SUN-AK-MIB::sunAkShareName.10 = STRING: pool-0/default/test2
SUN-AK-MIB::sunAkShareName.11 = STRING: pool-0/EC/tradetest
SUN-AK-MIB::sunAkShareName.12 = STRING: pool-0/OracleWork/simpleExport

Next, I can use snmpget to obtain a mount point for the first share:

-bash-3.00# ./snmpget -c public -v 2c isv-7110h sunAkShareMountpoint.1
SUN-AK-MIB::sunAkShareMountpoint.1 = STRING: /export/deleteme

It is also possible to get a list of problems on the system identified by problem code:

-bash-3.00# ./snmpwalk -c public -v 2c isv-7110h sunFmProblemUUID
SUN-FM-MIB::sunFmProblemUUID."91e97860-f1d1-40ef-8668-dc8fb85679bb" = STRING: "91e97860-f1d1-40ef-8668-dc8fb85679bb"

And then turn around and retrieve the associated knowledge article identifier:

-bash-3.00# ./snmpget -c public -v 2c isv-7110h sunFmProblemCode.\\"91e97860-f1d1-40ef-8668-dc8fb85679bb\\"
SUN-FM-MIB::sunFmProblemCode."91e97860-f1d1-40ef-8668-dc8fb85679bb" = STRING: AK-8000-86

The FM-MIB does not contain information on severity, but using the problem code I can SSH into the system and retrieve that information:

isv-7110h:> maintenance logs select fltlog select uuid="91e97860-f1d1-40ef-8668-dc8fb85679bb"
isv-7110h:maintenance logs fltlog entry-005> ls
Properties:
timestamp = 2009-12-15 05:55:37
uuid = 91e97860-f1d1-40ef-8668-dc8fb85679bb
desc = The service processor needs to be reset to ensure proper functioning.
type = Major Defect

isv-7110h:maintenance logs fltlog entry-005>

Take time to inspect the MIBs through your MIB Browser to understand all of the information available. I tend to shy away from using SNMP for getting system information and instead write scripts and workflows as much more information is available directly on the system, I'll cover this in a later article.

Receive the Traps


Trap receiving on Solaris is a piece of cake, at least for demonstration purposes. What you choose to do with the traps is a whole different process. Each tool has it's own trap monitoring facilities that will hand you the fields in different ways. For this example, Solaris just dumps the traps to the console.

Locate the "snmptrapd" binary on your Solaris system and start monitoring:


-bash-3.00# cd /usr/sfw/sbin
-bash-3.00# ./snmptrapd -P
2009-12-16 09:27:47 NET-SNMP version 5.0.9 Started.

From there you can wait for something bad to go wrong with your system or you can provoke it yourself. Fault Management can be a bit difficult to provoke intentionally since things one thinks would provoke a fault are actually administrator activites. Pulling a disk drive is very different from a SMART drive error on a disk drive. Similarly, pulling a Power Supply is different from tripping over a power cord and yanking it out. The former is not a fault since it is a complex operation requiring an administrator to unseat the power supply (or disk) whereas the latter occurs out in the wild all the time.

Here are some examples of FM traps I've received through this technique using various "malicious" techniques on a lab system ;-)

Here is an FM Trap when I "accidentally" tripped over a power cord in the lab. Be careful when you do this so you don't pull the system off the shelf if it is not racked properly (note that I formatted this a little bit from the raw output):


2009-11-16 12:25:34 isv-7110h [172.20.67.78]:
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1285895753) 148 days, 19:55:57.53
SNMPv2-MIB::snmpTrapOID.0 = OID: SUN-FM-MIB::sunFmProblemTrap
SUN-FM-MIB::sunFmProblemUUID."2c7ff987-6248-6f40-8dbc-f77f22ce3752" = STRING: "2c7ff987-6248-6f40-8dbc-f77f22ce3752"
SUN-FM-MIB::sunFmProblemCode."2c7ff987-6248-6f40-8dbc-f77f22ce3752" = STRING: SENSOR-8000-3T
SUN-FM-MIB::sunFmProblemURL."2c7ff987-6248-6f40-8dbc-f77f22ce3752" = STRING: http://sun.com/msg/SENSOR-8000-3T

Notice again that I have a SunFmProblemUUID that I can turn around and shell into the system to obtain more details (similarly to what was shown in the last section). Again, the next article will contain an explanation of Alerts. Using the AK MIB and Alerts, we can get many more details pushed out to us via an SNMP Trap, and we have finer granularity as to the alerts that get pushed.

Here, I purchased a very expensive fan stopper-upper device from a fellow tester. It was quite pricey, it turns out it is also known as a "Twist Tie". Do NOT do this at home, seriously, the decreased air flow through the system can cause hiccups in your system.


DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1285889746) 148 days, 19:54:57.46
SNMPv2-MIB::snmpTrapOID.0 = OID: SUN-FM-MIB::sunFmProblemTrap
SUN-FM-MIB::sunFmProblemUUID."cf480476-51b7-c53a-bd07-c4df59030284" = STRING: "cf480476-51b7-c53a-bd07-c4df59030284"
SUN-FM-MIB::sunFmProblemCode."cf480476-51b7-c53a-bd07-c4df59030284" = STRING: SENSOR-8000-26
SUN-FM-MIB::sunFmProblemURL."cf480476-51b7-c53a-bd07-c4df59030284" = STRING: http://sun.com/msg/SENSOR-8000-26

You will receive many, many other traps throughout the day including the Enterprise MIB letting us know when the system starts up or any other activities.

Wrap it Up


In this article, I illustrated enabling the SNMP Service on the Sun Storage 7000 Appliance via an SSH session. I also showed some basic MIB walking and traps that you'll receive once SNMP is enabled.

This is really simply the "start" of the information we can push through the SNMP pipe from a system. In the next article I'll show how to use Alerts on the system with the SNMP pipe so you can have more control over the events on a system that you wish to be notified about.

Thursday Dec 10, 2009

Monitoring the Sun Storage 7000 Appliance

Over the past several months I've been working on integrating our Sun Storage 7000 Appliances into monitoring products from other companies. The monitoring work I'm doing is a combination of software writing (via a plug-in for a data center monitoring product that will see it's release in conjunction with our next Sun Storage 7000 Appliance Software Release) and "consulting" with our customers directly about monitoring the appliances they install after purchase.

The Sun Storage 7000 Appliance comes with a variety of mechanisms for monitoring:
- SNMP (via several different MIBs using traps or GETs)
- Email Alerts
- Remote Syslog

A variety of software and hardware faults delivered internal to the system as Fault Management Architecture (FMA) events get pushed to the monitoring environment via the above mechanisms.

As valuable as these capabilities are, customers always have more advanced monitoring needs that require customization of the environment. Some customers want to tune the information available for significant digits, get more significant digits than we surface in the CLI, or gather data from our industry leading analytics capabilities delivered with the appliance. Some may want to integrate with an ITIL-style Configuration Management Database, others may want to create a billing system based on user capacity and accounting for levels of service (guaranteed space, thin-provisioned space, etc...).

All of these customizations can easily be achieved using simple SSH navigation of the appliance's environment or more advanced manipulation of the environment using the embedded JavaScript environment on each Sun Storage 7000 Appliance via scripts or Workflows.

Over the next few weeks, I'm going through my Email Archives (not a pretty sight to be honest) and I'm going to mine the greatest hits as I've sent out information to specific audiences on monitoring boxes and customizing the environment based on specific monitoring application use cases. Other articles will be focused on how I achieved the monitoring environment for the upcoming plug-in that will hit the download center with the next software release.

With all of that lead-in, I am going to kick off my monitoring guidance with what I tell everyone right out of the chute, "Use the Built-in Sun Storage 7000 Appliance Help Wiki to get up to speed on these topics and get the latest information". After all, this blog post will age with each release of the Sun Storage 7000 Appliance whereas the Help Wiki is updated with each release.

On a running Sun Storage 7000 Appliance, use the following URLs (substituting the address of the appliance where I put [hostname]):


  • SNMP - https://[hostname]:215/wiki/index.php/Configuration:Services:SNMP
  • Alerts - https://[hostname]:215/wiki/index.php/Configuration:Alerts
  • Scripting - https://[hostname]:215/wiki/index.php/User_Interface:CLI:Scripting
  • Workflows - https://[hostname]:215/wiki/index.php/Maintenance:Workflows

You can download the latest Sun Storage 7000 Appliance Storage Simulator and follow these instructions as well.

In case the pages have moved, be sure to use the Search feature in the Help Text that comes with the Wiki.

There are always cases that customers want more hardcore examples tailored to environments of each of the above or a slightly different take on learning these topics. And that, my friends, is what the next few weeks will be about. I'll give more examples and approaches, similar to what I did with my Fun with the FMA and SNMP article.

Tuesday Aug 04, 2009

Sun Storage 7000 as an Administrator Development Platform

The Sun Storage 7000 Family of Appliances breaks ground in manageability and transparency through an amazing amount of analytics information provided to administrators as well as a highly customizable and extensible management environment that resides on the system. The "Workflow", delivered in the latest release of appliance software, is of particular interest to those of us responsible for "integrating" the Sun Storage 7000 into a management ecosystem, bundling pieces of management logic for use by our peers and reproducing management logic (such as configuration and environmental setup) on several systems at a time.

A workflow is a parameterized piece of logic that is uploaded to a Sun Storage 7000 where it remains resident and is then run via the BUI, CLI or remotely via a shell. The logic within the workflow is programmed in JavaScript (resident on the Sun Storage 7000) and interacts with the system's management shell via "run" commands or built-ins that interact with the current administrative context.

A workflow can do anything that an administrator could do via the CLI, but in a nicely bundled and parameterized way. Here are a few things I've done with workflows:


  • gather information about the appliance and reformat it to make it digestable by a higher-level tool
  • retrieve sets of analytics data and turn them into different sized chunks (instead of 1 second interval give me a 60 second interval as an average as well as the min and max during the interval) and reformat it to make it easy to digest
  • manage the lifecycle of shares (create, manage settings and delete) that are common across appliances
  • manage network settings
  • create a set of worksheets on every appliance in the network

The opportunities for automation are endless, only bounded by the needs of the administrator in their efforts to integrate the appliances within the management ecosystem.

There is substantial documentation on the appliance's Help Wiki, but for clarity, here is a very simple workflow that will list the attribute of a filesystem that is given as input to the workflow:


  • Input: attribute name (same as the attribute in the CLI)
  • Output: CSV format: project,sharename,attribute (one line for each share)
  • Behavior Notes: a listed attributed that is not valid will return NA in the column (this could be moved to parameter verification but will serve to illustrate exception handling). Also, there are some properties that return empty values as the value was actually inherited from the project context.

Since this is a relatively "short" example, I will simply put the code here with comments and then add additional information afterwords. Note the use of JavaScript functions (such as printToString) as well as the most important element, the definition of the variable "workflow".

/\* The printed headers, one will be added with the property name \*/
var headerList = new Array(
"Project",
"Share"
);

/\* A function to print the array into a string for display \*/
function printToString(csvToPrint){
var csvAsString = "";
for(var i=0 ; i csvAsString = csvAsString + csvToPrint[i];
// do not finish with an end of line marker
if(i!=csvToPrint.length-1) csvAsString = csvAsString + "\\n";
}
return csvAsString;
}

/\* This is a required structure for the workflow, it identifies the name, parameters
and the function to execute when it is run \*/
var workflow = {
name: 'Get Filesystem Attribute',
origin: 'Sun Microsystems, Inc.',
description: 'Prints a Property for all Shares',
parameters: {
property : {
label: 'Filesystem Property',
type: 'String'
}
},
execute:
function (params) {
// prepare the output arrays
var csvContents = new Array();
var currentRow = 0;
headerList[2] = params.property;
csvContents[0] = headerList;
currentRow++;

// go to the root context to start navigation
run('cd /');
run('shares')

// get a list of all of the projects on the system
var projects = list();

// navigate through each project
for(var i=0 ; i run('select '+projects[i]);

// get a list of all shares
var shares = list();

// go into the context of each share
for(var j=0 ; j run('select '+shares[j]);
var filesystem = true;
var mountPoint = "";
try {
mountPoint = get('mountpoint');
} catch (err) {
// will end up here if "mountpoint" does not exist, not a filesystem
filesystem = false;
}
if(filesystem) {
var currentRowContents = new Array();
currentRowContents[0] = projects[i];
currentRowContents[1] = shares[j];
try {
var propertyValue = get(params.property);
currentRowContents[2] = ""+propertyValue;
} catch (err) {
currentRowContents[2] = "NA";
}
csvContents[currentRow] = currentRowContents;
currentRow++;
}
run('cd ..');
}

run('cd ..');
}

var newCsvAsString = printToString(csvContents);

return (newCsvAsString);
}
};

While the bulk of the example is standard JavaScript, the workflow structure is where there must be adherence. Here are the important properties:


  • name - The name that the workflow will be identified by within the BUI or CLI
  • origin - The author of the workflow, can also be used to minimize name collisions
  • description - A description of the contents of the workflow, displayed in the BUI or CLI
  • parameters - A list of parameters with types (the types supported are listed in the documentation)
  • execute - The function that gets executed when the workflow is run (there are more advanced ways of identifying the execution code than are shown here)

The code itself interacts with the system to get a list of the projects on the system, then a list of the shares within the system. The mountpoint property is ONLY available on filesystems, so we know if there is a property error that we do not have a filesystem and skip processing of it (it is most likely an iSCSI LUN).

To upload the workflow, cut/paste the text above and put it in a file. Log into a Sun Storage 7000 Appliance with the latest software and go to Maintenance / Workflows. Click the "+" sign to add a workflow and identify the location of the file. The syntax is error checked on upload, then you will see it listed. Workflows can also be uploaded from the CLI.

Here is what a run of the workflow from the CLI looks like:


isv-7110h:maintenance workflows> ls
Properties:
showhidden = false

Workflows:

WORKFLOW NAME OWNER SETID ORIGIN
workflow-004 Get Filesystem Attribute root false Sun Microsystems, Inc.

isv-7110h:maintenance workflows> select workflow-004
isv-7110h:maintenance workflow-004> ls
Properties:
name = Get Filesystem Attribute
description = Prints a Property for all Shares
owner = root
origin = Sun Microsystems, Inc.
setid = false

isv-7110h:maintenance workflow-004> execute
isv-7110h:maintenance workflow-004 execute (uncommitted)> ls
Properties:
property = (unset)

isv-7110h:maintenance workflow-004 execute (uncommitted)> set property=space_total
property = space_total
isv-7110h:maintenance workflow-004 execute (uncommitted)> commit

Project,Share,space_total
AnotherProject,NoCacheFileSystem,53928
AnotherProject,simpleFilesystem,53928
OracleWork,simpleExport,53928
TestVarious,filesystem1,53928
default,test,448116
default,test2,5368709120
isv-7110h:maintenance workflow-004>

While the example is simple, hopefully it illustrates that this is the start of workflow capabilities, not the entirety of them. The workflow can create management structures (like new shares and worksheets), delete them, modify them, and even enable and disable services.

Workflows make the Sun Storage 7000 an Administrator Development Platform. Try it out in the Sun Unified Storage Simulator if you don't have an appliance at your fingertips!

Sunday Jul 19, 2009

Social Networks Connect, they don't Replace

July 17th, I woke up and watched my friend's facebook page turn into a memorial to the great and loving husband that he was. I watched my other friend's page turn into an outpouring of friendship, stunned sadness, and finally...offerings to join her IRL. The effectiveness of a social network for connecting is obvious, the depth of those connections is a bit more mysterious. In the background of the connecting a more human network is coalescing ... one made of phone conversations, emails to coordinate activities, and finally, traveling to build a village around my friend.

The whole episode made me think again about where social networks ... the online variety ... fit in life.

What good are 100 'friends' on a social network if even one of those friends inhibits you from being yourself and really connecting to the people that know you and want to be there for you?

Intertwined social networks become emotional nightmares as people reach out to friends in coded messages while trying to maintain professional coverings. Wrecked relationships, depression, friending and unfriending, bad days ... all of this conducted on tiptoes because peers, bosses, and coworkers are watching over your moves, your playlists, your pictures.

And now I go back to my friend who remains my social network friend but is no longer on our little blue planet.

157 friends on facebook and the only reason this day is anything but an empty mess is because I have the luck of surfing in Hawaii with my boy ... Who doesn't even have an fb account ... And an old (well, she's young ...) friend from Minnesota that delivered her phone number as soon as she connected the subtly coded status message.

So, here's the thing ... Social networks connect ... But they don't replace ... They don't even give insight into where a person's soul is unless that person is willing to risk their social standing at work or was smart enough to keep their work and life networks separate...after all, does your boss really need to know that you're depressed, let alone the reason ...

So...the time has come for me...time to tighten up the social network so my friends can be my friends and let it all hang out ...time to have a beer with a friend IRL ...and time to re-grow that LinkedIn network for the professional side of life.

And ... For my friend ... I'm sorry we didn't have more time here ... Perhaps we will have a chance somewhere else but if we don't ... Well ...

:-(

Tuesday Jul 07, 2009

Vertical and Horizontal Mobility

As far back as I can remember I've been fascinated by the ability to move from location to location to get my work done as well as being able to move from device to device. Horizontal mobility (the ability to move across geography) via virtually ubiquitous wireless Internet access is a reality.

This blog post originated in my house, continued between programming sessions at the Highlands Ranch Public Library and then over lunch at The Corner Bakery with only periodic "sleep" modes on my laptop disrupting the work. Horizontal mobility and the ability to work and communicate from anywhere in the country (and often overseas) is so effective, our customers, peers and even our management hierarchies in many industries may not even know where you are geographically located day to day. Have you ever had a conversation start with "Where are you working from?".

Horizontal Mobility shouldn't be news to anyone, I even carry a 3G modem with me so I can work in the infrequent times that I can't find a wireless network. But that is getting more and more rare, consider I had two "Facebook Friends" update their status from an airplane this week.

While horizontal mobility is in our lives to stay, what about Vertical Mobility ... what I would call the ability to move from device to device to achieve a task or handle content? I should be careful to note, when I say device to device, I mean wildly different classes of devices, such as a phone and a laptop, or an Amazon Kindle, a phone and a laptop, or a TV and a computer and a phone. My assumption several years ago, almost a decade now, was that we would move away from the desktop computers and at some point our kids would simply have purpose-built devices giving access to appropriate content for the device ... I jokingly called this a "Fully Integrated Lifestyle" to a few friends the other week but the more I think about it, that is truly what Vertical Mobility is all about.

Are we there yet?

Absolutely. Over the past few weeks I bought an Amazon Kindle and an iPhone so I could revisit the mobile lifestyle. I last visited it about 3 years ago as I architected a Sun Storage Management Portal. At that time, I was playing with re-rendering portlets from HTML / XML to WML. Basically, I would create alternate views of information based on the device being used to view the information (MVC, Model View Controller).

Today, our devices are so powerful and usability of the devices has hit such a high level that alternate views of information are often not required (though I would argue that good vertical mobility does take into account the device). The image above is a fully dynamic view of our Sun Storage 7000 Appliance taken from my iPhone VPN'd to work.

Horizontal and Vertical Mobility

What other devices can be combined? My favorite for vacation is taking TV shows from my Tivo (like Spongebob Squarepants) and moving them to the iPod so my son can watch them while we are stuck in an airport.

The Amazon Kindle has even joined the party. I purchase books on my Kindle and can read them on my iPhone via the Amazon Kindle downloadable iPhone application.

It is truly an amazing world we live in. What's the infrastructure at work here? The cloud is here, the network continues to be built out through our carriers and most of all, content providers that are becoming more aware of the opportunities that abound in content delivery to these devices and, of course, standards around content structure, authentication and authorization, and every other technology that gives us Horizontal and Vertical Mobility and the amazing ability to live a Fully Integrated Lifestyle.

Ok, times up ... back to my design document ... I'll work on it from my Mac today :-)

Friday Jul 03, 2009

The thing about trees ... and my Kindle DX

I can summarize my thoughts on trees pretty succinctly, I love a good tree. And with that in mind I finally made the leap and bought my Amazon Kindle DX.

I can safely say that I am holding onto a little bit of the future, it is incredible. The things that drove me over the top to get the Kindle are:


  • I am VERY schizophrenic in my reading enjoyment, sometimes having as many as 4 books for reading open, 2 or 3 technical documents, and several large PDFs full of APIs
  • I received The Denver Post Thursdays through Sundays and didn't read it most of those days ... but never knew when I wanted it
  • I go through spurts of travel and I ALWAYS have the wrong book with me ... my moods swing notoriously when I'm traveling and what I want to read or learn about swings with it
  • I get tired of lugging power cords everywhere

And so my great Kindle DX experiment began, just about Father's Day (my gift to myself for being a single Dad with little or no time for book enjoyment).

First off, it is very, very easy to spend money with it ... albeit there is a discount on most books that you purchase for the Kindle. Oddly, this doesn't hold true for most Computer Books, they seem to be full price or near full price across the board. I've already purchased and read several books, including the amazing Darwin's Radio and Darwin's Children books from Greg Bear (conveniently in one "book" for the Kindle) and am now working on The Road and Anathem from Neil Stephenson. All bought sitting, literally, from the comfort of my bed or while camping. I have also downloaded quite a few work PDFs and presentations to it and receive The Denver Post electronically each day, automatically delivered.

Here's a hint on Anathem, I bet my back doesn't hurt from lugging it around and yours does :-) ... and guess what, I have a built-in dictionary so I can just move the cursor to a word and the definition appears on the bottom of the screen.

The downside of newspapers is that they are difficult to read and don't include the two best parts ... the comics and the ads. As such, I still recieve a Sunday Denver Post ... but my recycling bin has definitely went down in size.

When folks ask me about it, I put my Kindle experience rather succinctly:


  • Book Reading (fiction / non-fiction): A+ (as easy on the eyes as paper, easy to navigate, etc...)
  • Reference Literature that you Skip Around in: C (VERY difficult to skip around a book and "leaf through" the pages
  • Newspapers: B-
  • Ease of Use: B+
  • Ease of Shopping: B (I usually shop on a web browser rather than the built in store but if you know what you are buying, the built in store on the Kindle is easy enough)
  • Cost: C- (this has GOT to come down and I completely object to paying for blog subscriptions ... sorry Amazon and Slashdot, ain't gonna pad your profit margins)
  • Ease on the Eyes: A+ (as good as paper and my eyes don't get all wobbly like on a computer...I was literally reading under the covers so-as to not annoy my son at Scout Camp...and it's not backlit so I had my flashlight with me, it was just like when I was a kid!)
  • Portability: A+ (I have cut down the size of my backpack by half ... I still have to carry my Mac)

The power lasts FOREVER on the thing. Even with 3G on I plug it in at most once a week.

Now here is the BIGGEST complaint I have. Flying to San Francisco they SPECIFICALLY point out that your Kindle must be turned off. People, it barely uses power. Might I suggest you just ask folks to turn off the 3G on the Kindle instead?

I have even put all of our docs for the Sun Storage 7000 Appliance onto it for quick access at customer black or gray sites. Very handy to have along.

Now here is a tip for those of us who carry these around, you will have people tell you straight out that you are a heretic for abandoning "the feel of paper" and contributing to the "demise of the book" (Auntie, if you are reading this...this is for you :-). It is SIMPLY not true. In fact, there are a few things that actually seem to work for the author:


  • There is no after-market for a book, this is frustrating to me but as long as most books are discounted I am willing to deal with it for now. But consider, I can't resell a book that is on my e-reader and I understand that this is a slippery slope to create.
  • Personally, books, PDFs, and stuff like that contribute to about 30% of the clutter in my house and frankly, I'm not one to enjoy having bookshelves of the things I've read contributing to the dust in my environment...they are pretty much g-o-n-e
  • I've already saved multiple trees through e-news delivery and Anathem

But in the end, it's not for everyone. But after the end, the demise of the paper novel is nigh my friends, it may not be my generation but I guarantee that my son will be 50% e-books at least in his High School days, by college 75% and his children will not buy paper books, end of story.

I notice this post is exceeding the expected length, drop me a line if there is anything you want to know about!

Sunday Feb 08, 2009

Heroes Included at the OpenSolaris Storage Summit!!!!!!

I was just browsing the agenda for the OpenSolaris Storage Summit that is being held on February 23rd 2009 in San Francisco (city with that crooked street ya know).

It turns out one of the big talks is going to be from Don McAskill from SmugMug ... and I kid you not and I'm not doing any schmooozing here, but it is my favorite favorite photo sharing site so I've already signed up. Also on the list are Mike Shapiro or Randy Bias from GoGrid. What does Storage and OpenSolaris have to do with the cloud????

It turns out, everything in the cloud has to end up SOMEWHERE and if its not using the cloud itself as a storage utility (via the likes of Amazon S3, GoGrid and others), the cloud applications themselves use a boatload of storage. The Sun Storage 7000 family and the Sun Open Storage J4000 family coupled with OpenSolaris are definitely the way to go for infrastructure.

Doh, I'm totally geeked and off to the travel site to figure out how to book my trip and get my kiddo taken care of for the day (though as you can see...he loves San Francisco).

Thursday Feb 05, 2009

IT'S THE little THINGS

In the storage industry (and the computer industry in general) we have this term known as a "dissatisfier". A dissatisfier is basically a feature or a product that people have to use that no matter how well you do it, you will almost never receive a "win" (or even an acknowledgment of how well you implemented the feature) since the customer isn't really buying the product for that feature. On the other hand, if you implement the feature poorly, you can lose deals and cause a great deal of consternation on customer sites.

Solaris 10 Update 6 (as well as the latest OpenSolaris release) fixes a huge dissatisfier for me. After choosing what devices to install the system on, in previous versions of Solaris, you were met with a prompt asking if you wanted to partition the storage by default or customize the directory allocations. At this point you would go in and choose how much storage goes into the root (/), how much goes into home directories (/export/home), how much is swap and you could add your own partitions. I know I'm bad, but since I'm an engineer I never use /export/home on my system so I have to jigger the directories and fix how much storage is allocated.

Along comes ZFS Boot. Now, there are a LOT of reasons to love ZFS boot (we use it on the Sun Storage 7000 and on most of our developer systems). On previous projects, we noticed how quick and easy it was to do things like LiveUpgrade.

Now, remember that ZFS is natively thin provisioned...meaning, you can create file systems that all sit over the same pool of storage (multiple drives, devices, enclosures) and each file system vies for the storage (you can add quotas and reserve space, don't worry).

Today I installed Solaris 10 U6 and blew through the installation process without EVER having to finagle a screen saying how much storage should be allocated to each directory. YAAAAYYYYYYYYYYYYY :-)

Basically, you choose to install with ZFS, select two disks (so the install can be mirrored and one device can fail) and off the install goes happily ever after.

Ahhhhh, its the little things ....

Thursday Jul 17, 2008

BitTorrent on Solaris (Nevada, Solaris 10 Update 5)

After years of trying to squirrel away a few minutes to install my only BitTorrent swarm in our lab, I've finally done it! I thought I'd put the instructions out here on the web since it seems that explicit Solaris-based instructions are relatively sparse (they are sparse because the version I'm using is relatively easy to use and not relegated to a single platform...thus the lack of Solaris-specifics).

I started with a clean build of Solaris 10 Update 5 on one machine, and Build 94 of Nevada on another set of 4 machines. These are the Sun Fire x4150s I've discussed in other posts.

Retrieving all of the packages I needed was straightforward with Blastwave.org (an open source software repository for Solaris). In fact, with just a few commands I had a complete BitTorrent stack installed that runs with Python (don't have Python on the system...Blastwave takes care of that too).

Install BitTorrent
First, read the HOWTO on using Blastwave. I only made it through a few steps of the HOWTO before I was up and running, so don't sweat it if it seems long and you are attention-challenged (like me without a special Nespresso Level 8 or above espresso). Of course, I'm fumbling around more than I should so be sure to read the whole thing when you do have time, it contains information on directories and where stuff gets put (at least I think it does...). Installing what you need to use Blastwave takes only a few minutes. Once you have Blastwave ready and running, retrieve BitTorrent and all its dependencies (primarily Python):

/opt/csw/bin/pkg-get -i bittorrent

That's it, BitTorrent is now installed in /opt/csw/share/BitTorrent

Remember the BitTorrent Components
Here is the 2 minute review of important BitTorrent Network Components and terminology that follows the general deployment diagram here (this is the swarm I've installed in my lab so I'll refer to it throughout):


  • Torrent - A "file" that is prepared to be distributed by BitTorrent, torrent may also refer to the metadata file prepared from the contents of the file you will be distributing
  • Tracker - A "tracker" helps refer downloaders (peers) to each other, sort of an organizer of nodes
  • Swarm - A group of peers that are delivering content to a requester (the downloader itself often becomes a part of the swarm for another requester very early in the download process)
  • Seed - A full copy of a file that clients can obtain. This seems like a heavy load on a single server, but very early on clients will lessen their load on the Seed systems and depend on other peer clients for chunks of the torrent.

In the scenario below, the Tracker and the Seed for my torrent (OpenSolaris 08.05 ISO) are one and the same. The Seed should typically be split from the Tracker though...the Tracker is a bottleneck and you don't want it to be pounded on by Peers.

Start a Tracker
We will start the tracker first. A tracker can be started as a background task. It will sit on a port and listen for requests and write output to a file.

./bttrack.py --port 6969 --dfile dstate --logfile tracker.out

Your tracker is now up and running and ready to organize clients.

Create a Torrent
A "torrent" metadata file contains all of the information about a particular file that can be retrieved through BitTorrent. The torrent is associated with a tracker and when complete will contain a variety of information to ensure the integrity of the final torrent when a client receives it.

The README.txt with the BitTorrent download didn't have proper instructions for building a torrent (seemed a little out of date), so be careful and don't get frustrated. Use the btmaketorrent.py command to build the torrent file. I've chosen to create a torrent from the OpenSolaris 08.05 distribution (my favorite distribution of all, of course...):

./btmaketorrent.py --comment "OpenSolaris 08.05 from Pauls Swarm" http://x4xxx-01.sun.com:6969/announce os200805.iso

Notice the use of the tracker that I set up. You can show the meta information that was generated by using the btshowmetainfo.py command:

# ./btshowmetainfo.py os200805.iso.torrent
btshowmetainfo 4.0.4 - decode BitTorrent metainfo files

metainfo file.: os200805.iso.torrent
info hash.....: fdf239d2524e44432892d01ab354e20a8b77b7e6
file name.....: os200805.iso
file size.....: 719087616 (2743 \* 262144 + 26624)
announce url..: http://x4xxx-01.sun.com:6969/announce

Setting up a Seed
You need a web server available from the system that is going to be a "Seed". Nevada has Apache 2.2 (at least in my build) installed by default, you simply have to turn it on (mileage may vary). You can do this by typing:

# svcadm enable apache22
# svcs | grep -i apache2
online 20:41:36 svc:/network/http:apache22

A better instruction set can be found on BigAdmin.

With a little nosing around the system, I found my httpd.conf file at /etc/apache2/2.2. Look in the file for the DocumentRoot, something like this works:

# cat httpd.conf | grep DocumentRoot
# DocumentRoot: The directory out of which you will serve your
DocumentRoot "/var/apache2/2.2/htdocs"

We will place our .torrent file (created earlier) in the DocumentRoot.

Most web servers you also have to associate the mime type "application/x-bittorrent" with the file extension "torrent", this was already done for the pre-installed Apache 2.2 on Nevada.

Finally, put the complete file on a server (the same one would work like I am doing, but a separate server is recommended). Run the BitTorrent download, saving the file into the location that the file already exists. BitTorrent is smart enough to see all of the chunks are there, but it spends time verifying the are correct. When complete, the computer you run the download is a Seed. This command works for the OpenSolaris 08.05 torrent I created earlier:

# ./btdownloadheadless.py --url http://x4xxx-01.sun.com/os200805.iso.torrent --save_as os200805.iso

This takes a while as it runs through hash checks on the existing file.

The torrent is now ready as a Seed for peer to peer access. You can add as many Seeds as you want depending on how popular you think your file is going to be.

Use another Client to Download a File
I can now go to x4xx-02 and start the headless download, just as the previous one ran.

# ./btdownloadheadless.py --url http://x4xxx-01.sun.com/os200805.iso.torrent --save_as os200805.iso

Remember the torrent itself points back to the tracker. There should be some brief activity by the tracker during the download to see if other peers in the swarm can help.

During the download, you will become a part of the swarm from which other clients can download chunks of OpenSolaris (you have to give it, if you want to get it - The Suburbs).

As the file starts to download, you will see the transfer rate start to go up. With a single Seed, I started slowly ramping up. BitTorrent is careful to balance requests. Remember, I now have a Seed and a Peer with chunks of the file. With a single download in progress, I reached about 20 KB/sec (the upload from the Seed and the download from the client)...there is some obvious throttling going on somewhere.

What if I start a download of the torrent on x4xxx-03? The Seed remained uploading around 20 KB/sec. After an initial hit on x4xxx-02 down to about 14 KB/sec, I quickly moved back to over 20 KB/sec, while x4xxx-03 was peaking near 30 KB/sec. As more chunks moved onto x4xxx-03, x4xxx-02 also sped up, since it could grab chunks from two peers. The Seed remained constant at a 20 KB/sec upload, but x4xx-02 was now also uploading at 20 KB/sec and x4xxx-03 was able to have the aggregate bandwidth.

But wait, I have two more clients sitting idle, I started up x4xxx-04 and x4xxx-05 with the download. Again, an initial hit from x4xxx-02 occurred, as it was heavily relying on the original Seed and clients were also relying on it. Within about a minute, the original Seed was still uploading at 20 KB/sec, but -02, -03 and -04 were also uploading at 20 KB/sec. All clients were now downloading at 20 KB/sec with the original seed still uploading at a constant 20 KB/sec...peer to peer amortization of upload bandwidth at its finest.

As the download moved along, more chunks flooded onto the remaining peers and the sharing became much more efficient. Each of the clients regularly found their way over 25 KB/sec and often into the 30 KB/sec range.

I decided that the graphical output wasn't very fun, so I added my home system that I VPN with into the party using Azureus. With Azureus, I'm able to get graphical displays of what's going on. Here is a picture of my swarm:

Note that the center circle is my client, the fully blue circle is the seed, and you can see the other peers don't yet have a lot of chunks of the file.

Here is another interesting view from Azureus:

Of particular note is the Swarm Speed, we have hit about 100 KB/sec with our 5 peers and an obvious upload throttling on each client at around 20 KB/sec. Well, I'll obviously be looking into that...but I think I've run out of words for the evening and I've more than achieved my purpose tonight. Enjoy BitTorrent on Solaris!

Now, I just have to figure out what other kinds of fun I can have with my swarm :-)

Thursday Jun 26, 2008

Fun with the FMA and SNMP

I've been working back in the Auto Service Request space for the past few weeks and doing some education and stuff as we build and test this product or that, it seemed like a great opportunity to put out a quick and dirty "SNMP with the FMA" blog post. There are some excellent previous materials out there that are extremely applicable and you could probably get by with, so first I will point those out:

With those articles and a little help from your nearby SNMP guru, you are pretty much good to go. I've extended the information in those papers a tiny, tiny bit with how to link the information back to Sun's message repository. The great thing about the FMA is that we have "special sauce" around that can create a full-service cycle (including a Remote Shell application that service can use with your permission and oversight to further diagnose problems from a remote location as well as tools to deliver faults back to Sun Microsystems, including our own Common Array Manager).

Introduction


This guides a user through a 2 system setup with

  • system1 - enabled to send traps when a fault is detected by the FMA
  • system 2 - enabled to receive traps from system1

Throughout this write-up, a roughly equivalent FMA command is given that can be run on the originating host so you can follow what the SNMP trap is delivering. To me, the FMA command on the originating host is definitely the preference though, since it usually provides additional information and formatting that may not be available in the SNMP trap or walk.

Setup of Trap Sender


system1 must be setup to deliver traps when FMA events occur.

To do this, follow the instructions provided at A Louder Voice for the Fault Manager, summarized here:


  • Create the file /etc/sma/snmp/snmp.conf (if it doesn't exist) and add the line "mibs +ALL"
  • Add the line to /etc/sma/snmp/snmpd.conf: trap2sink system2
  • Add the additional line to support deeper information query: dlmod sunFM /usr/lib/fm/amd64/libfmd_snmp.so.1

Assuming you have made no other changes prior to this, your system should be ready to go. You do have to restart the SNMP service at this point (I always reboot...its a Windows habit).

Setup of Trap Receiver


system2 must be setup to receive traps, this is simple for demo purposes:

  • Run /usr/sfw/sbin/snmptrapd -P
  • Watch for traps

Receiving FMA traps


When system2 receives FMA traps from system1, they will look like this (formatting appropriately rearranged)


DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1190862249) 137 days, 19:57:02.49
SNMPv2-MIB::snmpTrapOID.0 = OID: SUN-FM-MIB::sunFmProblemTrap
SUN-FM-MIB::sunFmProblemUUID."33570acb-e108-4ca8-8276-c67aeecf2043" = STRING: "33570acb-e108-4ca8-8276-c67aeecf2043"
SUN-FM-MIB::sunFmProblemCode."33570acb-e108-4ca8-8276-c67aeecf2043" = STRING: ZFS-8000-CS
SUN-FM-MIB::sunFmProblemURL."33570acb-e108-4ca8-8276-c67aeecf2043" = STRING: http://sun.com/msg/ZFS-8000-CS

Check out that URL in the sunFmProblemURL, http://sun.com/msg/ZFS-8000-CS. You can actually go there and get the System Administrator Actions that should be taken as well as an extended view of the problem (not contextualized to your view, but this is the general fault information that you would see if you were on the system itself).

This trap is roughly equivalent to the information you would receive from running the
basic fmdump command on the system with the fault. You could also run the "fmdump -v -u 33570acb-e108-4ca8-8276-c67aeecf2043" command on the trap originator to get a bit more information:

bash-3.00# fmdump -v -u 33570acb-e108-4ca8-8276-c67aeecf2043
TIME UUID SUNW-MSG-ID
Jun 26 10:57:45.3602 33570acb-e108-4ca8-8276-c67aeecf2043 ZFS-8000-CS
100% fault.fs.zfs.pool

Problem in: zfs://pool=data
Affects: zfs://pool=data
FRU: -
Location: -

Navigating Problems on the Remote System

Let's start at the top now. We can quickly navigate all known, unresolved problems on the remote system by walking the sunFmProblemTable:

system2:/root-> /usr/sfw/bin/snmpwalk -v2c -c public -t 20 system1 sunFmProblemTable

This results in a dump of all problems with the UUIDs that can be used for deeper queries. The following printout is only a single problem on the system, additional problems are listed in blocks by the attribute (so all sunFmProblemUUIDs are lumped together followed by all ProblemCodes, and so on).

SUN-FM-MIB::sunFmProblemUUID."0f3dcdf3-f85b-c091-8f1f-ce2164976cda" = STRING: "0f3dcdf3-f85b-c091-8f1f-ce2164976cda"
SUN-FM-MIB::sunFmProblemCode."0f3dcdf3-f85b-c091-8f1f-ce2164976cda" = STRING: ZFS-8000-CS
SUN-FM-MIB::sunFmProblemURL."0f3dcdf3-f85b-c091-8f1f-ce2164976cda" = STRING: http://sun.com/msg/ZFS-8000-CS
SUN-FM-MIB::sunFmProblemDiagEngine."0f3dcdf3-f85b-c091-8f1f-ce2164976cda" = STRING: fmd:///module/fmd
SUN-FM-MIB::sunFmProblemDiagTime."0f3dcdf3-f85b-c091-8f1f-ce2164976cda" = STRING: 2008-6-18,10:7:4.0,-6:0
SUN-FM-MIB::sunFmProblemSuspectCount."0f3dcdf3-f85b-c091-8f1f-ce2164976cda" = Gauge32: 1

This is roughly equivalent to the fmdump command on a system, though with the basic fmdump only the UUID, Code and a MSG-ID are given.

Based on this information, we can look at the ZFS-8000-CS message at sun.com and determine what our next steps should be. It indicates that using the zpool status -x on the system with the fault would be useful. Going to the originating system and running it returns:


bash-3.00# zpool status -x
pool: data
state: UNAVAIL
status: One or more devices could not be opened. There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-D3
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
data UNAVAIL 0 0 0 insufficient replicas
mirror UNAVAIL 0 0 0 insufficient replicas
c2t0d0 UNAVAIL 0 0 0 cannot open
c3t1d0 UNAVAIL 0 0 0 cannot open
mirror UNAVAIL 0 0 0 insufficient replicas
c2t2d0 UNAVAIL 0 0 0 cannot open
c3t3d0 UNAVAIL 0 0 0 cannot open
mirror UNAVAIL 0 0 0 insufficient replicas
c2t4d0 UNAVAIL 0 0 0 cannot open
c3t5d0 UNAVAIL 0 0 0 cannot open
mirror UNAVAIL 0 0 0 insufficient replicas
c2t6d0 UNAVAIL 0 0 0 cannot open
c3t7d0 UNAVAIL 0 0 0 cannot open
mirror UNAVAIL 0 0 0 insufficient replicas
c2t8d0 UNAVAIL 0 0 0 cannot open
c3t9d0 UNAVAIL 0 0 0 cannot open
mirror UNAVAIL 0 0 0 insufficient replicas
c2t10d0 UNAVAIL 0 0 0 cannot open
c3t11d0 UNAVAIL 0 0 0 cannot open
mirror UNAVAIL 0 0 0 insufficient replicas
c2t12d0 UNAVAIL 0 0 0 cannot open
c3t13d0 UNAVAIL 0 0 0 cannot open
mirror UNAVAIL 0 0 0 insufficient replicas
c2t14d0 UNAVAIL 0 0 0 cannot open
c3t15d0 UNAVAIL 0 0 0 cannot open
mirror UNAVAIL 0 0 0 insufficient replicas
c2t16d0 UNAVAIL 0 0 0 cannot open
c3t17d0 UNAVAIL 0 0 0 cannot open
mirror UNAVAIL 0 0 0 insufficient replicas
c2t18d0 UNAVAIL 0 0 0 cannot open
c3t19d0 UNAVAIL 0 0 0 cannot open
mirror UNAVAIL 0 0 0 insufficient replicas
c2t20d0 UNAVAIL 0 0 0 cannot open
c3t21d0 UNAVAIL 0 0 0 cannot open
mirror UNAVAIL 0 0 0 insufficient replicas
c2t22d0 UNAVAIL 0 0 0 cannot open
c3t23d0 UNAVAIL 0 0 0 cannot open
bash-3.00#

For some history on this particular problem, we disconnected a JBOD that had the "data" pool built, so none of the devices are available...ouch.

You can look more deeply at the events that resulted in the problem by walking the sunFmFaultEventTable:

system2:/root-> /usr/sfw/bin/snmpwalk -v2c -c public -t 20 system1 sunFmFaultEventTable
SUN-FM-MIB::sunFmFaultEventProblemUUID."0f3dcdf3-f85b-c091-8f1f-ce2164976cda".1 = STRING: "0f3dcdf3-f85b-c091-8f1f-ce2164976cda"
SUN-FM-MIB::sunFmFaultEventProblemUUID."33570acb-e108-4ca8-8276-c67aeecf2043".1 = STRING: "33570acb-e108-4ca8-8276-c67aeecf2043"
SUN-FM-MIB::sunFmFaultEventProblemUUID."3600a05e-acc1-cae2-c185-f50852156777".1 = STRING: "3600a05e-acc1-cae2-c185-f50852156777"
SUN-FM-MIB::sunFmFaultEventProblemUUID."97bfeb63-7b02-c2b6-c51f-c451a9f760c5".1 = STRING: "97bfeb63-7b02-c2b6-c51f-c451a9f760c5"
SUN-FM-MIB::sunFmFaultEventClass."0f3dcdf3-f85b-c091-8f1f-ce2164976cda".1 = STRING: fault.fs.zfs.pool
SUN-FM-MIB::sunFmFaultEventClass."33570acb-e108-4ca8-8276-c67aeecf2043".1 = STRING: fault.fs.zfs.pool
SUN-FM-MIB::sunFmFaultEventClass."3600a05e-acc1-cae2-c185-f50852156777".1 = STRING: fault.fs.zfs.pool
SUN-FM-MIB::sunFmFaultEventClass."97bfeb63-7b02-c2b6-c51f-c451a9f760c5".1 = STRING: fault.fs.zfs.pool
SUN-FM-MIB::sunFmFaultEventCertainty."0f3dcdf3-f85b-c091-8f1f-ce2164976cda".1 = Gauge32: 100
SUN-FM-MIB::sunFmFaultEventCertainty."33570acb-e108-4ca8-8276-c67aeecf2043".1 = Gauge32: 100
SUN-FM-MIB::sunFmFaultEventCertainty."3600a05e-acc1-cae2-c185-f50852156777".1 = Gauge32: 100
SUN-FM-MIB::sunFmFaultEventCertainty."97bfeb63-7b02-c2b6-c51f-c451a9f760c5".1 = Gauge32: 100
SUN-FM-MIB::sunFmFaultEventASRU."0f3dcdf3-f85b-c091-8f1f-ce2164976cda".1 = STRING: zfs://pool=1ca09fa50e7ca8c7
SUN-FM-MIB::sunFmFaultEventASRU."33570acb-e108-4ca8-8276-c67aeecf2043".1 = STRING: zfs://pool=data
SUN-FM-MIB::sunFmFaultEventASRU."3600a05e-acc1-cae2-c185-f50852156777".1 = STRING: zfs://pool=6f658a6c4b99b18b
SUN-FM-MIB::sunFmFaultEventASRU."97bfeb63-7b02-c2b6-c51f-c451a9f760c5".1 = STRING: zfs://pool=data
SUN-FM-MIB::sunFmFaultEventFRU."0f3dcdf3-f85b-c091-8f1f-ce2164976cda".1 = STRING: -
SUN-FM-MIB::sunFmFaultEventFRU."33570acb-e108-4ca8-8276-c67aeecf2043".1 = STRING: -
SUN-FM-MIB::sunFmFaultEventFRU."3600a05e-acc1-cae2-c185-f50852156777".1 = STRING: -
SUN-FM-MIB::sunFmFaultEventFRU."97bfeb63-7b02-c2b6-c51f-c451a9f760c5".1 = STRING: -
SUN-FM-MIB::sunFmFaultEventResource."0f3dcdf3-f85b-c091-8f1f-ce2164976cda".1 = STRING: zfs://pool=1ca09fa50e7ca8c7
SUN-FM-MIB::sunFmFaultEventResource."33570acb-e108-4ca8-8276-c67aeecf2043".1 = STRING: zfs://pool=data
SUN-FM-MIB::sunFmFaultEventResource."3600a05e-acc1-cae2-c185-f50852156777".1 = STRING: zfs://pool=6f658a6c4b99b18b
SUN-FM-MIB::sunFmFaultEventResource."97bfeb63-7b02-c2b6-c51f-c451a9f760c5".1 = STRING: zfs://pool=data

This is roughly equivalent to the fmdump -v command. The fmdump -V command cannot be duplicated over SNMP, though it can be useful to run on the host side. fmdump -V can provide product, chassis and server IDs as well as a more complete list of faults and ereports that resulted in the diagnosis.

You could also view the fault management information by resource rather than by fault or event over SNMP

system2:/root-> /usr/sfw/bin/snmpwalk -v2c -c public -t 20 system1 sunFmResourceTable
SUN-FM-MIB::sunFmResourceFMRI.1 = STRING: zfs://pool=data
SUN-FM-MIB::sunFmResourceStatus.1 = INTEGER: faulted(5)
SUN-FM-MIB::sunFmResourceDiagnosisUUID.1 = STRING: "33570acb-e108-4ca8-8276-c67aeecf2043"

This is similar to the "fmadm faulty" command that can be run from a system. The faulty parameter results in the some additional information and text, though that text can also be retrieved at the event URL that was earlier identified.

While this is the "SPAM" approach to inquiring about a system, you could also walk each problem as they come in.

As was mentioned previously, there isn't a "ton" of information available within the SNMP MIB itself. The result of receiving a particular trap is often to do additional diagnosis on the system with the fault. So some ssh work may be necessary.

Additional things to do with FMA


You can dump the topology of a system!

Go to /usr/lib/fm/fmd and type "fmtopo", you should get a complete topology as it is recognized by the fault management infrastructure.


bash-3.00# ./fmtopo
TIME UUID
Jun 26 13:20:16 ddba8792-0166-6fb4-a81b-db3de9622649

hc://:product-id=Sun-Fire-X4150:chassis-id=09999:server-id=server1/motherboard=0

hc://:product-id=Sun-Fire-X4150:chassis-id=09999:server-id=server1/motherboard=0/chip=0

hc://:product-id=Sun-Fire-X4150:chassis-id=09999:server-id=server1/motherboard=0/chip=0/cpu=0

hc://:product-id=Sun-Fire-X4150:chassis-id=09999:server-id=server1/motherboard=0/chip=0/cpu=1

hc://:product-id=Sun-Fire-X4150:chassis-id=0811QAR189:server-id=server1/motherboard=0/chip=0/cpu=2

hc://:product-id=Sun-Fire-X4150:chassis-id=09999:server-id=server1/motherboard=0/chip=0/cpu=3

hc://:product-id=Sun-Fire-X4150:chassis-id=09999:server-id=server1/motherboard=0/memory-controller=0

hc://:product-id=Sun-Fire-X4150:chassis-id=09999:server-id=server1/motherboard=0/memory-controller=0/dram-channel=0

hc://:product-id=Sun-Fire-X4150:chassis-id=09999:server-id=server1:serial=518545073102016426:part=7777777:revision/motherboard=0/memory-controller=0/dram-channel=0/dimm=0

hc://:product-id=Sun-Fire-X4150:chassis-id=09999:server-id=server1:serial=518545073102016426:part=7777777:revision/motherboard=0/memory-controller=0/dram-channel=0/dimm=0/rank=0

hc://:product-id=Sun-Fire-X4150:chassis-id=09999:server-id=server1:serial=518545073102016426:part=7777777:revision/motherboard=0/memory-controller=0/dram-channel=0/dimm=0/rank=1

hc://:product-id=Sun-Fire-X4150:chassis-id=09999:server-id=server1/motherboard=0/memory-controller=0/dram-channel=1

hc://:product-id=Sun-Fire-X4150:chassis-id=09999:server-id=server1:serial=555555555:part=777777777 :revision/motherboard=0/memory-controller=0/dram-channel=1/dimm=0

... (and on and on)

Artificial Fault Injection


There isn't a lot of documentation I've found for fminject, but it looks like it can be quite useful. It will, at its easiest, allow you to replay a set of ereports and faults that previously occurred in a system. For example, we can replay the error log that resulted in the FMA trap for the ZFS pool problem that the trap was sent out for. In one window on the system that sent the trap, start up the simulator

cd /usr/lib/fm/fmd
./fmsim -i

In another window, replay an error log

cd /usr/lib/fm/fmd
./fminject -v /var/fm/fmd/errlog

You will see the SNMP Trap re-appear in the trap receiver. Note there is no good way to determine that this is a replayed trap in the trap receiver's window that I've determined, so use this with caution.

Summary


While there is enough information to completely diagnose a problem from an SNMP trap, the traps are a great opportunity for integration with higher level management tools. More often than not, additional commands and information are required to be run on the system with the fault (be it logs or output of various commands, especially in the ZFS space), but the messages themselves typically have the next steps for diagnosis.

Friday Jun 20, 2008

When the clouds disappear...not always a sunny day

I build, maintain and pay for hosting for my friend's charity web site, Play for a Heart. For the past several years I've had a host provider where I would deploy the very simple web site I constructed for her. The host had become my "cloud" in the stormy skies of Cloud Computing.

This cloud kept drifting along, the price was right, it was somewhat reliable, it was ... easy.

I went to the web site on a Saturday morning, it was gone. Somewhere over the course of the last few months the company had turned into a shell of the host provider it formerly was. There was no tech response over the weekend and the forums were more or less ghost towns. I made a post and the only responses were "Leave this host as fast as you can". The outage continued on through the weekend and extended to a large percentage of the host's own sales and marketing sites.

Finally, late Sunday I couldn't take it anymore, I pulled the plug and moved to GoDaddy. The move had nothing to do with Danica Patrick. Here was my very simple logic:


  • Java Support / PHP Support / MySQL Support
  • Able to host multiple domains with a single hosting account
  • Resilient enough to support the onslaught after a Super Bowl commercial
  • Good recommendations and community
  • Great price

And I switched...

Lucky for me I made my friend's site as basic as I could.


  • Very simple HTML
  • Slideshow objects embedded from another Cloud Application (SmugMug so that the links moved right over (and I only put these on after she had enough sponsors that I couldn't figure out how to keep the sponsor page clean)
  • No applications embedded directly from my low-budget hosting solution
  • Domain Names purchased from Yahoo and GoDaddy (this became a HUGE win as the host records would have been locked had I purchased the domain names with this smaller company)

My web site was moved to GoDaddy and back up and running in, literally, 1.5 hours and my exit was complete from my previous host.

Still, the entire experience left me very shaken over Cloud Computing. I've come up with the following set of thoughts when it comes to attaching my digital life to the clouds:


  • Many companies literally own the information you create or have very liberal rights to that information (Always read the License Agreements)
  • Many companies have no exit strategy for your information (once your data is captured or created in the online application it cannot be extracted...this is especially the case in social networking infrastructure)
  • Because of the nature of the cloud, you have no guarantees that many companies that host the applications you depend on are even viable (check the business model and financials if you are tying your life to the cloud)
  • Several companies entire business models are centered on analyzing the information you give to them and monetizing it (in many cases "personalization" in your eyes is really "targeting and demographics" in the company's eyes)
  • Standard Platforms, Standard APIs, Standard Information Models are at the heart of a successful cloud (not fog), this allows you a better chance at having tools to import / export information and interact with that information...and in the case of building an application, it is critical that these standards are the core of the application so that you can get out of your cloud as quickly as you got into it and at the first sign of turbulence
  • In lieu of standards (even defacto), Open APIs, Open Source, Open Architecture, and a good open license can be a huge help...especially with a robust developer community. This assures infrastructure can live beyond the life of any individual cloud if it should disappear.
  • Buyer beware

Well...I'm a huge, huge advocate of Cloud Computing. Our own network.com site, Amazon EC3 and S3, SalesForce.com, they are all leading the wave...and in 10 years it will just be the way it is done. But until then, there will certainly be some growing pains, probably more in the small-company space than in the enterprise space...but they will be there.

Standard's groups, defacto standards, open source cloud infrastructure, entry and exit for information, information licensing, information security and transparent motivation for collection of your information will go a long way to determining whether there is longevity in the sunny skies for cloud computing.

Thursday May 29, 2008

Preservation and Archive Special Interest Group (PASIG)

I am in San Francisco at Sun's Preservation and Archive Special Interest Group (PASIG) Face to Face meeting. Preservation and Archiving is one of the most fascinating and complex problems to face our digital world, in my humble opinion. I've chatted about this before in the blog so I won't belabor the point that taking a chisel to the walls of this beautiful location and etching in the contents of your e-book would have a longer life span than most of our media types in use today.

Still, chiseling has some pretty serious downsides


  • It doesn't scale well (oh, and you have to scale to get up the wall)
  • You have to travel to this site to view the contents (while that might be pleasant, it can be quite inefficient)
  • Its difficult to create a geographically remote backup site
  • It seriously blemishes a beautiful location

It turns out, there is a community of people that think about this problem as their day job and PASIG is a gathering point for that community. The community is made up of the "new" digital librarians, software that provides the foundation for digital preservation and archives, storage vendors that can provide "temporal" storage locations for the digital archive, and system integrators that can customize the entire stack and tailor it for a particular domain (astronomy, human sciences, etc...).

The community is global, but their is an obvious disparity in how this type of work is funded across countries and boundaries. Here in the U.S., the leaders of preservation and archiving efforts appears to be the education community, with a variety of colleges (Penn State, University of Pittsburgh (Pennsylvania), etc...) and individual government agencies (Library of Congress, National Archives, etc...). Abroad, it seems the repositories are often nationally funded.

While various technologies were interesting, it was also interesting to hear the oft recurring theme of "sustainability"...how can we "sustain" the rate of ingest and cost over time of these types of solutions? Consider the cost of the initial storage of 1 GB of data today (just the storage medium), its between $0.25 and $1.00 depending on your storage medium. Now add in the cost of the various transitions that data makes over 20,000 years (life of a cave painting) the system administration, the building of new equipment, the power involved, etc... At this stage of human evolution, it is difficult to even predict what type of media we will store data on in 20 or 30 years. How can one predict the cost of sustaining these archives, when we won't know what the archive will look like, let alone the environment around it, in 20 years, 100 years or even 1,000 years.

Which brings up another interesting thread...whereas a cave painting is required to stay on the rock it originated on, the life of preserved digital data (and archived) must be assumed to migrate over the data generations. Consider each generation of media about 4 years (3-5 is common), tape can technically survive multiple generations and one can build systems (like the Sun Storage 5800 (Honeycomb) platform) that can theoretically store data for 100,000s years. But the reality is that the lifetime of a technology and a media is that you migrate approximately every 3-5 years.

This is a conceptually different model then many people think of when they think of an "archive" or a "storage vault". Rather than think of a monolithic system that weathers the years, its important to think of data as migratory, and evolutionary (much like humans themselves). To survive 30,000 years, our digital data originates...often in the physical world...but increasingly our data originates in the digital world itself...and migrates to a Tier 1 storage device. Regardless of whether it remains there, over the next 20 years, that data will move across 4 or more devices (not counting replicants or backups) if it stays "online".

So what are the three most important attributes of an archive and/or a preservation system (IMHO):


  • The information is safe, stable, and can be verified to be "original" while it resides temporarily within the system
  • The information can be moved off the system when the time is right and can still be certified as "original" data rather than a derivation of the data
  • Information can be "moved into" a new preservation / archive system.

In the end, long-term data is migratory...and our systems and software that support this field must start thinking of data that way. This type of thinking can also be applied backwards into more traditional storage paradigms, how useful would it be for a photo repository to be able to migrate to bigger and bigger system with higher capacity and lower power consumption? At some point, there is a trade-off that will trigger that company to move, and facilitating that move is in the best interest of all companies.

What else came up at the conference? Tons of things, have we thought enough about:


  • Access Rights (and access across time)
  • Roles and Responsibilities in a digital archiving and library maintenance system (the new librarians)
  • Geo-political boundaries as repositories move into the "cloud"

At this very minute, Sam Gustman is discussing The Shoah Foundation. The mission of The Shoah Foundation is To overcome prejudice, intolerance and bigotry - and the suffering they cause - through the educational use of the Institute's visual history testimonies. The documentation of The Holocaust contains over 120,000 hours of online video testimony. The foundation indexes minute by minute of this testimony and makes it accessible for people to teach and understand the impacts of the holocaust. The foundation moves beyond The Holocaust into testimonies of other events in human history that must be recorded and understood. As George Santayana said: Those who do not learn from history are doomed to repeat it.

Sam's presentation isn't online, I'm trying to get Art to post it. This work alone shows that digital archiving and preservation is not just an interesting segment of compute and storage...its a moral imperative.

I found this link to a video that discusses Fedora Commons, one of the open source repository software initiatives for creating and managing digital libraries and archives. The video discusses some of the solutions that are being built with the Fedora Commons software solution:

What a great opportunity the conference has been to sit and think about the big picture (and archiving and preservation truly are the big picture).

About

pmonday

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today