By rickramsey on Mar 28, 2013
-guest post by Brian Zents-
Historically, there has been a perception that tape is more difficult to manage than disk, but why is that? Fundamentally there are differences between disk and tape. Tape is a removable storage medium and disk is always powered on and spinning. With a removable storage one piece of tape media has the opportunity to interact with many tape drives, so when there is an error, customers historically wondered whether the drive or the media was at fault. With a disk system there is no removable media, if there is an error you know exactly which disk platter was at risk and you know what corrective action to take.
However, times have changed. With the release of Oracle’s StorageTek Tape Analytics (STA) you are no longer left wondering if the drive or the media is at risk, because this system does the analysis for you, leaving you with proactive recommendations and resulting corrective actions … just like disk.
For those unfamiliar with STA, it’s an intelligent monitoring application for Oracle tape libraries. Part of the purpose of STA is to allow users to make informed decisions about future tape storage investments based on current realities, but it also is used to monitor the health of your tape library environment. Its functionality can be utilized regardless of the drive and media types within the library, or whether the libraries are in an open system or mainframe environment.
STA utilizes a browser-based user interface that can display a variety of screens. To start understanding errors and whether there is a correlation between drive and media errors, you would click on the Drives screen to understand the health of drives in a library. Screens in STA display both tables and graphs that can be sorted or filtered.
In this screen ...
... it is clear that one specific drive has many more errors relative to the system average.
Next, you would click on the Media screen:
The Media screen helps you quickly identify problematic media. But how do you know if there’s a relationship between the two different types of errors? STA tracks library exchanges, which is convenient because each exchange involves just one drive and one piece of media. So, as shown below, you can easily filter the screen results to just focus in on exchanges involving the problematic drive.
You can sort the corresponding table based on whether the exchange was successful or not. You can then review the errors to see if there is a relationship between the problematic media and drive. You may also want to review the drive’s exchanges to see if media that’s having issues has any similarities to other media that’s having problems. For example, a purchased pack of media could all be having similar problems.
What if there doesn’t appear to be a relationship between media and drive errors? Part of the ingenuity of STA is that just about everything is linked, so root causes are easy to find. First, you can look at an individual drive to see its recent behavior, as show on this screen:
From the table you can see that this particular drive was healthy until recently. The drive indicated it needed a cleaning, and somebody performed that cleaning. However, just a few exchanges later, it started reporting errors. In this case, it’s clear that the drive has an issue that goes beyond the relationship with a specific piece of media and should be taken offline. On the other hand, if the issue appears to be related to the media itself, you should identify a method to transfer the data off of the media, and replace the media.
- Brian Zents