Wednesday Oct 30, 2013

Back Up to Tape the Way You Shop For Groceries

Imagine if this was how you shopped for groceries:

  1. From the end of the aisle sprint to the point where you reach the ketchup.
  2. Pull a bottle from the shelf and yell at the top of your lungs, “Got it!”
  3. Sprint back to the end of the aisle.
  4. Start again and sprint down the same aisle to the mustard, pull a bottle from the shelf and again yell for the whole store to hear, “Got it!”
  5. Sprint back to the end of the aisle.
  6. Repeat this procedure for every item you need in the aisle.
  7. Proceed to the next aisle and follow the same steps for the list of items you need from that aisle.

Sounds ridiculous, doesn’t it?

Not only is it horribly inefficient, it’s exhausting and can lead to wear out failures on your grocery cart, or worse, yourself. This is essentially how NetApp and some other applications write NDMP backups to tape. In the analogy, the ketchup and mustard are the files to be written, yelling “Got it!” is the equivalent of a sync mark at the end of a file, and the sprint back to the end of an aisle is the process most commonly called a “backhitch” where the drive has to back up on a tape to start writing again.

Writing to tape in this way results in very slow tape drive performance and imposes unnecessary wear on the tape drive and the media, especially when writing small files. The good news is not all tape drives behave this way when writing small files. Unlike midrange LTO drives, Oracle’s StorageTek T10000D tape drive is designed to handle this scenario efficiently.

The difference between the two drive types is that the T10000D drive gives you the ability to write files in a NetApp NDMP backup environment the way you would normally shop for groceries. With grocery shopping, you essentially stream through aisles picking up items as you go, and then after checking out, yell, “Got it!”, though you might do that last step silently. With the T10000D, it has a feature called the Tape Application Accelerator, which prevents the drive from having to stop after each file is written to notify NetApp or another application that the write was successful.

When enabled in the T10000D tape drive, Tape Application Accelerator causes the tape drive to respond to tape mark and file sync commands differently than when disabled:

  • A tape mark received by the tape drive is treated as a buffered tape mark.
  • A file sync received by the tape drive is treated as a no op command.

Since buffered tape marks and no op commands do not cause the tape drive to empty the contents of its buffer to tape and backhitch, the data is written to tape in significantly less time. Oracle has emulated NetApp environments with a number of different file sizes and found the following when comparing the T10000D with the Tape Application Accelerator enabled versus LTO6 tape drives.

Notice how the T10000D is not only monumentally faster, but also remarkably consistent? In addition, the writing of the 50 GB of files is done without a single backhitch. The LTO6 drive, meanwhile, will perform as many as 3,800 backhitches! At the end of writing the entire set of files, the T10000D tape drive reports back to the application, in this case NetApp, that the write was successful via a tape mark.

So if the Tape Application Accelerator dramatically improves performance and reliability, why wouldn’t you always have it enabled? The reason is because tape drive buffers are meant to be just temporary data repositories so in the event of a power loss, there could be data loss in certain environments for the files that resided in the buffer. Fortunately, we do have best practices depending on your environment to avoid this from happening. I highly recommend reading Maximizing Tape Performance with StorageTek T10000 Tape Drives (pdf) to decide which best practice is right for you. The white paper also digs deeper into the benefits of the Tape Application Accelerator. The white paper is free, and after downloading it you can decide for yourself whether you want to yell “Got it!” out loud or just silently to yourself.

Customer Advisory Panel

One final link: Oracle has started up a Customer Advisory Panel program to collect feedback from customers on their current experiences with Oracle products, as well as desires for future product development. If you would like to participate in the program, go to this link at oracle.com.

photo taken on Idaho's Sacajewea Historic Biway by Rick Ramsey

- Brian Zents

Follow OTN on
Blog | Facebook | Twitter | YouTube

Thursday Mar 28, 2013

Is Tape Storage Still Harder to Manage Than Disk Storage?

source

-guest post by Brian Zents-

Historically, there has been a perception that tape is more difficult to manage than disk, but why is that? Fundamentally there are differences between disk and tape. Tape is a removable storage medium and disk is always powered on and spinning. With a removable storage one piece of tape media has the opportunity to interact with many tape drives, so when there is an error, customers historically wondered whether the drive or the media was at fault. With a disk system there is no removable media, if there is an error you know exactly which disk platter was at risk and you know what corrective action to take.

However, times have changed. With the release of Oracle’s StorageTek Tape Analytics (STA) you are no longer left wondering if the drive or the media is at risk, because this system does the analysis for you, leaving you with proactive recommendations and resulting corrective actions … just like disk.

For those unfamiliar with STA, it’s an intelligent monitoring application for Oracle tape libraries. Part of the purpose of STA is to allow users to make informed decisions about future tape storage investments based on current realities, but it also is used to monitor the health of your tape library environment. Its functionality can be utilized regardless of the drive and media types within the library, or whether the libraries are in an open system or mainframe environment.

STA utilizes a browser-based user interface that can display a variety of screens. To start understanding errors and whether there is a correlation between drive and media errors, you would click on the Drives screen to understand the health of drives in a library. Screens in STA display both tables and graphs that can be sorted or filtered.

In this screen ...

... it is clear that one specific drive has many more errors relative to the system average.

Next, you would click on the Media screen:

The Media screen helps you quickly identify problematic media. But how do you know if there’s a relationship between the two different types of errors? STA tracks library exchanges, which is convenient because each exchange involves just one drive and one piece of media. So, as shown below, you can easily filter the screen results to just focus in on exchanges involving the problematic drive.

You can sort the corresponding table based on whether the exchange was successful or not. You can then review the errors to see if there is a relationship between the problematic media and drive. You may also want to review the drive’s exchanges to see if media that’s having issues has any similarities to other media that’s having problems. For example, a purchased pack of media could all be having similar problems.

What if there doesn’t appear to be a relationship between media and drive errors? Part of the ingenuity of STA is that just about everything is linked, so root causes are easy to find. First, you can look at an individual drive to see its recent behavior, as show on this screen:

From the table you can see that this particular drive was healthy until recently. The drive indicated it needed a cleaning, and somebody performed that cleaning. However, just a few exchanges later, it started reporting errors. In this case, it’s clear that the drive has an issue that goes beyond the relationship with a specific piece of media and should be taken offline. On the other hand, if the issue appears to be related to the media itself, you should identify a method to transfer the data off of the media, and replace the media.

- Brian Zents

Follow OTN Garage on:
Blog | Facebook | Twitter | YouTube

Tuesday Feb 05, 2013

Do YOU Know Where Your Data Has Been?

When you get change at the grocery store, you just don’t know where it’s been. (Image removed from blog.) And frankly, I don’t want to know, but wherever it’s been, it’s been in different environments with different wear-and-tear. If you try to re-use those dollar bills in a vending machine, you might get your candy bar. Or you might not, if the vending machine says your money is unreadable.

You get a less icky feeling about where your transportable storage has been, that is, until data you were expecting is as unreadable as that old dollar bill. Unfortunately, there is no native data integrity checking as data moves across storage landscapes. However the Oracle T10000C Data Integrity Validation (DIV) feature uses hardware-assisted CRC checks to not only help ensure the data is written correctly the first time, but also does so much more efficiently.

Data at rest is generally not an issue for any storage platform. In tape drives, data is protected with read after write verification as it is written, and Error Correction Code (ECC) is added to ensure data recovery once it is on the medium. In addition, a typical tape drive adds Cyclic Redundancy Code (CRC) protection, as soon as a record is received. This ensures the record does not get corrupted while moving between internal memories. Checking the CRC, though, is a time-consuming process that moves through the following steps:

  1. File pulled from disk to be stored on tape
  2. 256-bit CRC generated and stored in a catalog on a server
  3. File sent to tape drive without the CRC and written to a tape cartridge
  4. Upon recall, the file is called from a tape and sent to a server via the tape drive
  5. 256-bit CRC recreated and compared to catalog in the server

This process takes a minimum of 25 seconds to check the CRC on a 4 GB file, assuming a 2:1 compression ratio and a reasonable server workload. If the tape drives were allowed to assist in some of this workload, the processing time could be dramatically reduced. That’s the premise of the Oracle T10000C DIV feature’s hardware-assisted CRC check. The amount of reduction is simply dependent on the amount of trust the user places in the tape drive itself. While a basic model produces a slightly quicker process, the Oracle T10000C DIV process guarantees it will be done efficiently as shown in the table below.

Steps CRC Verification Model #1 Oracle T10000C Verification Model
1 File pulled from disk to be stored on tape File system sends SCSI Verify Command from server
2 32 bit CRC generated and stored with each record on server Tape drive receives command
3 file sent to tape drive - drive checks CRC File and CRC written to tape
4 File and CRC written to tape Upon recall, file and CRC called from tape to be read
5 Upon recall, file and CRC called from tape to be read Tape drive checks the 32-bit CRC
6 File and CRC checked in tape drive SCSI Verify command and status returned to server
7 32 bit CRC re-created and checked in hardware (Intel)  
Time MINIMUM 14 seconds to check the CRC on 4 GB file (2:1 compression ration) MAXIMUM 9 seconds to verify the CRC on 4 GB file (2:1 compression ratio) independent of server workload

Obviously, built-in-the-drive, end-to-end integrity checking can be much less resource intensive than having to read an entire file to verify that it is still good. Any 32-bit CRC check can be done as specified in ANSI X3.139. This is the same CRC used in the Fibre Channel Protocol and the Fiber Distributed Data Interface (FDDI) for optical transmissions. As a result, the generation polynomial is readily available. While this is a standard interface CRC, it is important to note that this check can be performed outside the interface protocol. In addition, the drive also can generate and use a CRC in the Intel CRC32c format.

Supporting hardware-assisted CRC checking can be as simple as sending a specified SCSI mode select command to turn on the checking. When the Oracle T10000C drive is in its DIV mode, the last 32 bits of any record are treated as a CRC and used to check the integrity of each record. If the CRC check fails, a write error is reported to allow the application to resend the record. A bad record will never be written to tape. If the CRC is correct, that CRC is stored with the record on tape and checked every time the record is read. All of this is done with zero performance loss on the tape drive. If a deferred write error has been reported to the application, the application can determine which record was in error using multiple methods. The recovery is completed when the application resends the previously failed record and the remainder of the data records.

If the drive is being utilized with CRC checks during a subsequent read operation, the CRC will be appended to the record. Verification of the file’s data integrity then is completed with a read verification. In other words, when a drive reads data having a CRC stored along with a record, it will output the CRC appended to the record. This allows the application or driver to perform its own data integrity checks to ensure, months or even years after recording, that the data has not been corrupted. The Intel CRC32c format allows very fast CRC processing and checking by the application. The user application, or driver, can use hardware-assisted CRC checks as follows:

  • Write with hardware-assisted CRC checks and read with hardware-assisted CRC checks
  • Write with hardware-assisted CRC checks and read in normal mode
  • Write in normal mode and read in hardware-assisted CRC checks mode (Note: In this case, the read CRC, which is generated by the drive on the fly, was not stored on tape.)

Another advantage of writing a tape in hardware-assisted CRC mode is the ability of the tape drive to use the Verify command to check an individual record, one file, multiple files, or the entire tape, without having to send all the data to the application to verify the validity of that data. This can be done because the hardware-assisted CRC is recorded on the tape with each record, and the tape drive has the ability to verify each record with that CRC. Because it is only 32 bits, checking only the CRC saves valuable processing resources and time. Ultimately, hardware-assisted CRC checking can have the following options:

  • Verify any record (up to 2MB)
  • Verify entire file (collection of 2MB records)
  • Verify N number of files
  • Verify N number of files of variable record size
  • Verify entire tape with one command
  • Verify mixed mode tape (hardware-assisted CRC check records and non-hardware-assisted CRC check records)
    • A hardware-assisted CRC check check is not made on non-hardware-assisted CRC check records
    • The drive must be in the correct DIV mode for the records it is verifying

- Brian Zents

Follow the OTN Garage:
Blog | Facebook | Twitter | YouTube

Tuesday May 29, 2012

Is Linear Tape File System (LTFS) Best For Transportable Storage?

Those of us in tape storage engineering take a lot of pride in what we do, but understand that tape is the right answer to a storage problem only some of the time. And, unfortunately for a storage medium with such a long history, it has built up a few preconceived notions that are no longer valid.

When I hear customers debate whether to implement tape vs. disk, one of the common strikes against tape is its perceived lack of usability. If you could go back a few generations of corporate acquisitions, you would discover that StorageTek engineers recognized this problem and started developing a solution where a tape drive could look just like a memory stick to a user. The goal was to not have to care about where files were on the cartridge, but to simply see the list of files that were on the tape, and click on them to open them up. Eventually, our friends in tape over at IBM built upon our work at StorageTek and Sun Microsystems and released the Linear Tape File System (LTFS) feature for the current LTO5 generation of tape drives as an open specification.

LTFS is really a wonderful feature and we’re proud to have taken part in its beginnings and, as you’ll soon read, its future. Today we offer LTFS-Open Edition, which is free for you to use in your in Oracle Enterprise Linux 5.5 environment - not only on your LTO5 drives, but also on your Oracle StorageTek T10000C drives. You can download it free from Oracle and try it out.

LTFS does exactly what its forefathers imagined. Now you can see immediately which files are on a cartridge. LTFS does this by splitting a cartridge into two partitions. The first holds all of the necessary metadata to create a directory structure for you to easily view the contents of the cartridge. The second partition holds all of the files themselves. When tape media is loaded onto a drive, a complete file system image is presented to the user. Adding files to a cartridge can be as simple as a drag-and-drop just as you do today on your laptop when transferring files from your hard drive to a thumb drive or with standard POSIX file operations.

You may be thinking all of this sounds nice, but asking, “when will I actually use it?” As I mentioned at the beginning, tape is not the right solution all of the time. However, if you ever need to physically move data between locations, tape storage with LTFS should be your most cost-effective and reliable answer. I will give you a few use cases examples of when LTFS can be utilized.

Media and Entertainment (M&E), Oil and Gas (O&G), and other industries have a strong need for their storage to be transportable. For example, an O&G company hunting for new oil deposits in remote locations takes very large underground seismic images which need to be shipped back to a central data center. M&E operations conduct similar activities when shooting video for productions. M&E companies also often transfers files to third-parties for editing and other activities.

These companies have three highly flawed options for transporting data: electronic transfer, disk storage transport, or tape storage transport. The first option, electronic transfer, is impractical because of the expense of the bandwidth required to transfer multi-terabyte files reliably and efficiently. If there’s one place that has bandwidth, it’s your local post office so many companies revert to physically shipping storage media. Typically, M&E companies rely on transporting disk storage between sites even though it, too, is expensive.

Tape storage should be the preferred format because as IDC points out, “Tape is more suitable for physical transportation of large amounts of data as it is less vulnerable to mechanical damage during transportation compared with disk" (See note 1, below). However, tape storage has not been used in the past because of the restrictions created by proprietary formats. A tape may only be readable if both the sender and receiver have the same proprietary application used to write the file. In addition, the workflows may be slowed by the need to read the entire tape cartridge during recall.

LTFS solves both of these problems, clearing the way for tape to become the standard platform for transferring large files. LTFS is open and, as long as you’ve downloaded the free reader from our website or that of anyone in the LTO consortium, you can read the data. So if a movie studio ships a scene to a third-party partner to add, for example, sounds effects or a music score, it doesn’t have to care what technology the third-party has. If it’s written back to an LTFS-formatted tape cartridge, it can be read.

Some tape vendors like to claim LTFS is a “standard,” but beauty is in the eye of the beholder. It’s a specification at this point, not a standard. That said, we’re already seeing application vendors create functionality to write in an LTFS format based on the specification. And it’s my belief that both customers and the tape storage industry will see the most benefit if we all follow the same path. As such, we have volunteered to lead the way in making LTFS a standard first with the Storage Network Industry Association (SNIA), and eventually through to standard bodies such as American National Standards Institute (ANSI). Expect to hear good news soon about our efforts.

So, if storage transportability is one of your requirements, I recommend giving LTFS a look. It makes tape much more user-friendly and it’s free, which allows tape to maintain all of its cost advantages over disk!

Note 1 - IDC Report. April, 2011. “IDC’s Archival Storage Solutions Taxonomy, 2011”

- Brian Zents

Website Newsletter Facebook Twitter
About

Contributors:
Rick Ramsey
Kemer Thomson
and members of the OTN community

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
12
13
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Blogs We Like