Monday May 26, 2014

Validating Petabytes of Data with Regularity and Thoroughness

by Brian Zents

When former Intel CEO Andy Grove said “only the paranoid survive,” he wasn’t necessarily talking about tape storage administrators, but it’s a lesson they’ve learned well. After all, tape storage is the last line of defense to prevent data loss, so tape administrators are extra cautious in making sure their data is secure. Not surprisingly, we are often asked for ways to validate tape media and the files on them.

In the past, an administrator could validate the media, but doing so was often tedious or disruptive or both. The debut of the Data Integrity Validation (DIV) and Library Media Validation (LMV) features in the Oracle T10000C drive helped eliminate many of these pains. Also available with the Oracle T10000D drive, these features use hardware-assisted CRC checks that not only ensure the data is written correctly the first time, but also do so much more efficiently.

Traditionally, a CRC check takes at least 25 seconds per 4GB file with a 2:1 compression ratio, but the T10000C/D drives can reduce the check to a maximum of nine seconds because the entire check is contained within the drive. No data needs to be sent to a host application. A time savings of at least 64 percent is extremely beneficial over the course of checking an entire 8.5TB T10000D tape.

While the DIV and LMV features are better than anything else out there, what storage administrators really need is a way to check petabytes of data with regularity and thoroughness. With the launch of Oracle StorageTek Tape Analytics (STA) 2.0 in April, there is finally a solution that addresses this longstanding need. STA bundles these features into one interface to automate all media validation activities across all Oracle SL3000 and SL8500 tape libraries in an environment. And best of all, the validation process can be associated with the health checks an administrator would be doing already through STA.

In fact, STA validates the media based on any of the following policies:

  • Random Selection – Randomly selects media for validation whenever a validation drive in the standalone library or library complex is available.
  • Media Health = Action – Selects media that have had a specified number of successive exchanges resulting in an Exchange Media Health of “Action.” You can specify from one to five exchanges.
  • Media Health = Evaluate – Selects media that have had a specified number of successive exchanges resulting in an Exchange Media Health of “Evaluate.” You can specify from one to five exchanges.
  • Media Health = Monitor – Selects media that have had a specified number of successive exchanges resulting in an Exchange Media Health of “Monitor.” You can specify from one to five exchanges.
  • Extended Period of Non-Use – Selects media that have not had an exchange for a specified number of days. You can specify from 365 to 1,095 days (one to three years).
  • Newly Entered – Selects media that have recently been entered into the library.
  • Bad MIR Detected – Selects media with an exchange resulting in a “Bad MIR Detected” error. A bad media information record (MIR) indicates degraded high-speed access on the media.

To avoid disrupting host operations, an administrator designates certain drives for media validation operations. If a host requests a file from media currently being validated, the host’s request takes priority. To ensure that the administrator really knows it is the media that is bad, as opposed to the drive, STA includes drive calibration and qualification features. In addition, validation requests can be re-prioritized or cancelled as needed. To ensure that a specific tape isn’t validated too often, STA prevents a tape from being validated twice within 24 hours via one of the policies described above. A tape can be validated more often if the administrator manually initiates the validation.

When the validations are complete, STA reports the results. STA does not report simply a “good” or “bad” status. It also reports if media is even degraded so the administrator can migrate the data before there is a true failure. From that point, the administrators’ paranoia is relieved, as they have the necessary information to make a sound decision about the health of the tapes in their environment.

About the Photograph

Photograph taken by Rick Ramsey in Death Valley, California, May 2014

- Brian

Follow OTN Garage on:
Web | Facebook | Twitter | YouTube

Tuesday Feb 05, 2013

Do YOU Know Where Your Data Has Been?

When you get change at the grocery store, you just don’t know where it’s been. (Image removed from blog.) And frankly, I don’t want to know, but wherever it’s been, it’s been in different environments with different wear-and-tear. If you try to re-use those dollar bills in a vending machine, you might get your candy bar. Or you might not, if the vending machine says your money is unreadable.

You get a less icky feeling about where your transportable storage has been, that is, until data you were expecting is as unreadable as that old dollar bill. Unfortunately, there is no native data integrity checking as data moves across storage landscapes. However the Oracle T10000C Data Integrity Validation (DIV) feature uses hardware-assisted CRC checks to not only help ensure the data is written correctly the first time, but also does so much more efficiently.

Data at rest is generally not an issue for any storage platform. In tape drives, data is protected with read after write verification as it is written, and Error Correction Code (ECC) is added to ensure data recovery once it is on the medium. In addition, a typical tape drive adds Cyclic Redundancy Code (CRC) protection, as soon as a record is received. This ensures the record does not get corrupted while moving between internal memories. Checking the CRC, though, is a time-consuming process that moves through the following steps:

  1. File pulled from disk to be stored on tape
  2. 256-bit CRC generated and stored in a catalog on a server
  3. File sent to tape drive without the CRC and written to a tape cartridge
  4. Upon recall, the file is called from a tape and sent to a server via the tape drive
  5. 256-bit CRC recreated and compared to catalog in the server

This process takes a minimum of 25 seconds to check the CRC on a 4 GB file, assuming a 2:1 compression ratio and a reasonable server workload. If the tape drives were allowed to assist in some of this workload, the processing time could be dramatically reduced. That’s the premise of the Oracle T10000C DIV feature’s hardware-assisted CRC check. The amount of reduction is simply dependent on the amount of trust the user places in the tape drive itself. While a basic model produces a slightly quicker process, the Oracle T10000C DIV process guarantees it will be done efficiently as shown in the table below.

Steps CRC Verification Model #1 Oracle T10000C Verification Model
1 File pulled from disk to be stored on tape File system sends SCSI Verify Command from server
2 32 bit CRC generated and stored with each record on server Tape drive receives command
3 file sent to tape drive - drive checks CRC File and CRC written to tape
4 File and CRC written to tape Upon recall, file and CRC called from tape to be read
5 Upon recall, file and CRC called from tape to be read Tape drive checks the 32-bit CRC
6 File and CRC checked in tape drive SCSI Verify command and status returned to server
7 32 bit CRC re-created and checked in hardware (Intel)  
Time MINIMUM 14 seconds to check the CRC on 4 GB file (2:1 compression ration) MAXIMUM 9 seconds to verify the CRC on 4 GB file (2:1 compression ratio) independent of server workload

Obviously, built-in-the-drive, end-to-end integrity checking can be much less resource intensive than having to read an entire file to verify that it is still good. Any 32-bit CRC check can be done as specified in ANSI X3.139. This is the same CRC used in the Fibre Channel Protocol and the Fiber Distributed Data Interface (FDDI) for optical transmissions. As a result, the generation polynomial is readily available. While this is a standard interface CRC, it is important to note that this check can be performed outside the interface protocol. In addition, the drive also can generate and use a CRC in the Intel CRC32c format.

Supporting hardware-assisted CRC checking can be as simple as sending a specified SCSI mode select command to turn on the checking. When the Oracle T10000C drive is in its DIV mode, the last 32 bits of any record are treated as a CRC and used to check the integrity of each record. If the CRC check fails, a write error is reported to allow the application to resend the record. A bad record will never be written to tape. If the CRC is correct, that CRC is stored with the record on tape and checked every time the record is read. All of this is done with zero performance loss on the tape drive. If a deferred write error has been reported to the application, the application can determine which record was in error using multiple methods. The recovery is completed when the application resends the previously failed record and the remainder of the data records.

If the drive is being utilized with CRC checks during a subsequent read operation, the CRC will be appended to the record. Verification of the file’s data integrity then is completed with a read verification. In other words, when a drive reads data having a CRC stored along with a record, it will output the CRC appended to the record. This allows the application or driver to perform its own data integrity checks to ensure, months or even years after recording, that the data has not been corrupted. The Intel CRC32c format allows very fast CRC processing and checking by the application. The user application, or driver, can use hardware-assisted CRC checks as follows:

  • Write with hardware-assisted CRC checks and read with hardware-assisted CRC checks
  • Write with hardware-assisted CRC checks and read in normal mode
  • Write in normal mode and read in hardware-assisted CRC checks mode (Note: In this case, the read CRC, which is generated by the drive on the fly, was not stored on tape.)

Another advantage of writing a tape in hardware-assisted CRC mode is the ability of the tape drive to use the Verify command to check an individual record, one file, multiple files, or the entire tape, without having to send all the data to the application to verify the validity of that data. This can be done because the hardware-assisted CRC is recorded on the tape with each record, and the tape drive has the ability to verify each record with that CRC. Because it is only 32 bits, checking only the CRC saves valuable processing resources and time. Ultimately, hardware-assisted CRC checking can have the following options:

  • Verify any record (up to 2MB)
  • Verify entire file (collection of 2MB records)
  • Verify N number of files
  • Verify N number of files of variable record size
  • Verify entire tape with one command
  • Verify mixed mode tape (hardware-assisted CRC check records and non-hardware-assisted CRC check records)
    • A hardware-assisted CRC check check is not made on non-hardware-assisted CRC check records
    • The drive must be in the correct DIV mode for the records it is verifying

- Brian Zents

Follow the OTN Garage:
Blog | Facebook | Twitter | YouTube


Rick Ramsey
Kemer Thomson
and members of the OTN community


« March 2015
Blogs We Like