Thursday Nov 29, 2007

Archival and Resurrection

It somehow seems fitting to resurrect this blog with an entry about data archiving and retrieval.

I had occasion to pull some files off a tape that was about nine years old. Nine years doesn't seem like it's that old. However, it was quite a bit of trouble. The tape in question was an 8mm data tape, physically equivalent to the old video-8 format. Everybody in the office these days (including me) runs Mac laptops or Linux PCs, so systems that support 8mm drives are hard to come by. I'm a pack rat, though, and I had saved an old Ultra2 (SPARC) workstation and an external 8mm tape drive for occasions just like this.

I plugged in the system and booted it and loaded the tape into the tape drive. Didn't read. Ohhh... my tape drive supports only low-density tapes (2GB) but this was a higher-density (5GB) tape. OK, I have another Ultra2 with an internal high-density drive. Booted it. Crap; I don't have the password to login. OK, boot from CD... darn it's so old that the latest Solaris doesn't support it. OK, boot from an older Solaris CD. But it still won't read: the tape drive doesn't work. Arrgh.

OK. Our lab guy has a bunch of spare equipment around so I asked him. The first drive he brought was a 4mm DDS drive. Oops. He came back later with the right tape drive. I plugged it in, booted, and successfully read the tape. My first attempt didn't work though... one has to use the same (or higher) blocking factor that was used to write the tape. What did I usually use? 2048? (One megabyte.) Tried that, and it worked. I was doing "tar xb 2048" and this read a bunch of stuff but had errors; this might have been caused by the stop-start motion of the tape. Trying again with "dd bs=1024k" worked fine and resulted in a tar file that had no errors. (At least, extracting files from it didn't cause tar to complain.) So, partial success: I had retrieved the files from tape and found the ones I was looking for.

Now what? Revisiting this a few days later, I decided to re-read the tape to ensure that I had gotten the right bits, then I'd archive them on other media. I tried to read the tape again, but the drive gave nothing but I/O errors. What? This used to work. Hm, that's odd, the lights on the tape drive were blinking oddly, as if to indicate some kind of error. Worse, I couldn't even eject the tape!! Rebooting, power cycling, etc. didn't work. I had left the drive powered on for several days, so I figured that the drive had overheated or something, so I powered everything off and let it cool down for a couple days. After that, I powered up the system and successfully managed to eject the tape. At that point I shut everything off and decabled the drive. I didn't try to re-read the tape for fear of getting the tape stuck again.

Well, now I have a probably-good read of all these files on disk. Clearly 8mm tape is not a viable archive medium. What is? How about DVD-R? They seem to be on every computer nowadays. This article seems to prefer DVD+R to DVD-R, but I had a whole spindle of DVD-R blanks and this data isn't that important so DVD-R is probably fine.

My Mac has a DVD-R drive so I copied the tar file over to burn it there. I inserted a blank DVD-R, which causes the Mac to create a "burn folder" as a staging area for what to burn. I unpacked the tar file there (using the command line), and it caused the Finder to hang. Crap. The files all seemed to be there, though, so I relaunched the Finder and went ahead with the burn. It complained about "7 items could not be found" which didn't make much sense to me, but I continued anyway. Checking the resulting disc showed that only 1.3GB out of 4GB actually made it onto the DVD. Into the trash. The problem might have been related to symlinks in the tar file. The Mac uses HFS+ by default, and symlinks showed up as "aliases" which might not have been dealt with properly.

OK, then, create a 4GB UFS filesystem image, mount it, unpack the files there, and then use Toast to burn an ISO-9660 disc from it. This seemed to work... though Toast complained about some files (rather a lot, actually) not conforming to standard file naming rules. Most of the problems were, I think, that names were too long. Sigh, but I went ahead and did the burn anyway. It worked, but "du" showed a discrepancy of about 100MB or so less on the disc than in the filesystem. This could be because of the different blocksize between UFS and ISO-9660. Or it could be errors. Hrmmmrm. Most of the files seemed to be there though.

As an insurance policy (hey, discs are cheap) I decided to burn the tar image as a single file to an ISO-9660 disc. Old tape lore has it that one shouldn't write a compressed archive, because if there are any errors, they'll probably ruin the entire archive instead of just a few files. I attempted to write the entire 4GB uncompressed tar image. But the resulting disc had only 2GB on it... huh? Turns out the tar file on the disc was exactly 2147483647 bytes long. (This the largest possible 32-bit signed integer.) Crap! Toast isn't large-file aware. (Maybe this is because I have an old PPC version of Toast I'm running on my Intel Mac.) Throw that disc into the trash. OK, write a compressed file anyway. The gzip-compressed tar image (.tgz) is just a bit over 2 billion bytes (but less than 2GiB) so it wouldn't run into the large-file problem. It worked. Whew.

\* \* \*

What's the point of all this? I don't think there's anything really new that I learned from this experience. However, I was reminded of some things I knew all along and hadn't paid attention to.

1) Media goes obsolete pretty quickly. The 8mm format is basically obsolete after less than ten years.

2) Time passes pretty quickly. Did I really work on that stuff nine years ago??

3) Keeping drives around in order to read old media doesn't really help much either. You might not be able to find a system that the drive connects to, you might not find the software to boot it, or the hardware itself might rot. In fact, the hardware seems to rot faster than the media.

4) Archiving files to new media is no small task. Media choice, tools, and OS/filesystem issues all conspire to create errors.

5) Anything important you should keep multiple copies of, in different formats, or just keep on line. Not enough disk space? Buy another disk, they're cheap.

6) We haven't even talked about finding software to read the old files.

7) I have a box full of old 8mm tapes. I think I have some work to do.




« April 2014