Thursday Jul 14, 2005

When should you clean a DLT or LTO tape drive?

The answer to the question is only when the tape drive cleaning light is on. Yesterday I was having a discussion with a frontline engineer about cleaning tape drives. Apparently a customer had had several replacements of their DLT tape drive and wanted to know what proactive measures they could take to avoid needing to replace their drive in the future. So I had a look at the messages which they had used to justify replacement, and in each case the sense key was "media error." This set alarm bells ringing, because if a DLT or LTO drive reckons there is a problem with the media then you either have a problem with that tape, or an environmental problem which makes the tape media a carrier (like a bacterium in a way). The technology and intelligence which is designed into the DLT and LTO families is such that you should never need to use a cleaning tape. And if you do, it is only because the drive itself has detected that it needs a clean. I remember a performance escalation a few years back where it turned out that the customer was running cleaning tapes through their DLT drives twice a week. This was completely unnecessary and had the decidedly unwanted effect of killing the drive's read and write performance as the heads were degraded. Replacing the drives was their only option. Other customers have configured frequency-based cleaning in NetBackup or Solaris Backup / Legato Networker --- this is a complete nono. Use the TapeAlert function of the drive and configure cleaning for "on-demand" only. If you keep your DLT and LTO drives in a minimally-dusty environment and don't throw your tapes around you should never need to use a cleaning tape.

Data Integrity, or, raid+cache+battery backup

In the last two weeks I've had two enquiries from customers about caching on their raid arrays. In both cases the customers said (words to the effect of) "we want to get the speed benefits from the array's cache, but we don't want to pay for battery backup." (or, "we don't want to replace the battery on our T3/6x20" ) In each case (and in any case like this) the answer is a resounding NO! Why do you think that Sun and other hardware raid vendors design in battery-backup? We worry about the availability and integrity of your data. You should too.

Thursday Apr 21, 2005

gpart saved my laptop

On Tuesday I bfu'd to the latest nightly build of Solaris next so I could take advantage of the boot re-architecture project integration. This went quite well except that I managed to corrupt my boot-archive through not paying attention at the right time and forgetting a step.... grrr. Once I'd fixed that problem (boot cdrom -s, mount -F ufs -o rw,logging /dev/dsk/c0d0s0 /mnt ; /mnt/sbin/bootadm update-archive -R /mnt ; sync ; umount /mnt ; reboot) I felt confident enough to go to the next stage, booting the competition's OS on my laptop.

I figured I should boot it to see what it thought was going on. That was ok, but running partition magic was when things went downhill fast. PM decided that my partition table had errors, and would I like it to fix them? I was really stupid at this point, and clicked yes.

BAD mistake.

Not only could I not boot back to MS-Windows, but I was unable to boot Solaris either...

Fortunately my desktop Solaris box was unaffected, so with a bit of digging I was able to find the System Rescue CD iso, pull it down, burn it and boot from it. That was great, but sfdisk and cfdisk both told me I had a bodgy partition table (duh! I knew that already!) and refused to help. By this point I was getting quite frantic, and googled again and again, eventually coming up with a hit on gpart.

I am very pleased to say that gpart saved my laptop. It was included on the linux System Rescue CD as /usr/bin/gpart.

Gpart has a scan option where it looks at where your partition table should be, and tries to interpret the data which it finds. I used this first, and wrote down exactly what it produced. Fortunately for me it matched what I remembered of my disk layout, so I re-ran it with the "-W" option to write the corrected partition table to disk.

Then deep breaths, sync, sync, sync, reboot..... grub menu.... YAY!!! I'm back to life!

Of course MS-Windows still won't boot properly -- gets to a certain point and hard-hangs, or just reboots the laptop entirely.... but that's a topic for another day.

Now I'm doing another backup of my data to a workstation in the office..... because you never know.

I'm also emailing the author of gpart to thank him for his utility, and request that he enhance the list of known partition types to include Solaris2 (== 0xbf by the way) which is what Solaris10 installations use now.

Thursday Apr 14, 2005

Perpendicular storage.... drool city!

A colleague recently pointed me to Hitachi Global Storage Technologies' lame-as flash animation (hey, we're all geeks, right?) entitled Get Perpendicular. It's all about how disk drive manufacturers can provide us with a phenomenal increase in data capacity by making the bits stand up rather than lie down. According to El Reg we should start to see perpendicular storage available in 2.5in form-factor drives by the end of this year. If you think about the claimed 40Gb per platter capacity then it's no great stretch to imagine having a 4Tb drive in your laptop. I thought I was doing well when I got my new laptop which came with a 60Gb 7200rpm disk. Now imagine a rack full of something similar to these or even these beasties and you start to see why we need something like ZFS to manage it all.

Things that make you go hmmm: SCO gives OpenSolaris a licensing blessing

Alan pointed me to this article --- by SJVN no less --- entitled SCO Gives Sun Blessings to Open-Source Solaris Isn't this a bit like getting a blessing from the anti-Christ? It's well-known that SCO and Linux don't get along, and there's certainly no love from the various Linux communities towards Sun on a number of fronts. No doubt the usual people will froth at the mouth at how this allegedly "proves" that Sun is out to kill Linux. Friends, foes, citizens of the world, if you think that then you are sadly mistaken. A quick check of Sun's Operating Systems link under Products and Solutions/Software shows that Linux is even the top link in the table, with the comment Sun brings a comprehensive systems approach to Linux. Sun provides Java technology, x86-based hardware, Red Hat Enterprise Linux, and SUSE Linux Enterprise Server along with Sun's Java Enterprise System and Sun Java Desktop System -- all supported by Sun services. Side note: the PTS engineer on the other side of my cubicle wall runs linux (debian if I recall) and most definitely supports linux. A lot of my colleagues just here in this office have a box running linux somewhere in their system menageries too. You can't get better than that for putting your money where your mouth is.

Wednesday Apr 06, 2005

SMF entry for the CVS pserver

This is something which I stumbled over last year: with the move to SMF (Service Management Facility), just adding a line to /etc/inet/inetd.conf for a new service such as CVS doesn't work any more. This manifest will let you run CVS as a service on your Solaris 10/Express machine. Save it as /var/svc/manifest/network/cvspserver-tcp.xml and ensure you've got the line cvspserver 2401/tcp #cvs pserver process in your /etc/services, edit the exec_method to suit your site, then run # svccfg import /var/svc/manifest/network/cvspserver-tcp.xml # svccfg disable svc:/network/cvspserver/tcp:default # svccfg enable svc:/network/cvspserver/tcp:default and be on your merry way. You should check the manpages for inetconv(1M) and smf(5), and the entries in the Solaris 10 System Administrator Collection for more information. If you really want to get stuck into SMF, then check out these resources at the bigadmin site: SMF hits on bigadmin Sun Microsystems --- BigAdmin: Solaris Service Management Facility --- Service Developer Introduction Sun Microsystems --- BigAdmin: Solaris Service Management Facility --- Quickstart Guide BigAdmin Feature Article: Solaris 10 OS Feature Spotlight: Predictive Self---Healing

Friday Mar 11, 2005

What justification is there for MS-Windows only software?

Earlier today I accepted a request for assistance from one of our product development groups. The data they've gathered is scsi bus traces, using a scsi bus analyzer from LeCroy (formerly known as Verisys and CatC). The software is available for download but that's not very helpful to me. As I've mentioned before, I run Solaris/x64 on my laptop. Since I got it in late January I've booted it to MS-Windows a total of three times. Once to install MS-Windows and the never-ending security updates, once to install SimCity4 and the scsi bus analyzer software, and once for a sanity check of my hotel room's network. I hate having to reboot in order to do stuff in MS-Windows. I'm one of those people who live and breathe Solaris --- I stopped using linux as my home desktop when I started at Sun, because the differences got too jarring. There's only one source for the software to use with this analyzer, and that's LeCroy. There's no extant or publicly available documentation on the data format that's used in the trace files either, so even if I had the time and energy to spend writing a tool myself, I wouldn't know if I missed things --- unless I rebooted into MS-Windows. Yes, I've emailed the account requesting that they log an RFE for a java-based scsiview tracefile reader, and yes, I've also asked that they port their product to run on Solaris. I don't expect to get a response, because that's my experience with other companies which are enmeshed in the MS-Windows environment. (I'm thinking specifically of Adobe, Canon and Maxis/EA here). So what am I really complaining about? Firstly, that the company does not make their trace file format publicly available. Publicly-available specifications for file formats ensure that customers can access their data. You know, like if a company goes bust or gets bought out and a product line is discontinued. That sort of thing. and StarOffice provide specs like that. It's a good thing [tm]. Secondly, that the company's viewing software is MS-Windows only. There's not even a version for Macs! This means that people whose OS environment of choice is not MS-Windows but Solaris or Linux or MacOS must have an MS-Windows machine in order to do any sort of collaborative work using this tool. Thirdly, the drivers for the actual hardware and the online analysis software are, once more, MS-Windows only. Do you see a pattern here? I certainly do. Of course, companies that refuse to provide specs, platform-independant or multi- platform software will say to you "show me the market outside of the MS-Windows that really wants this product." Funnily enough, it's very difficult to come up with a business case which such a company will admit justifies their investment. What such companies fail to see is that even though the market might be quite small for MacOS, [insert-favourite linux distro name here] or Solaris, the developers who want to use company X's utility to assist with creating the next big thing (Serial-Attached SCSI anybody?) will go elsewhere for their needs. And that next big thing could be the product which keeps hundreds or thousands of people employed, could be something which changes the way we use IT or do business, or could even be what makes it possible for some rocket scientist to create a whole new way of travelling. So would I recommend that particular scsi bus analyzer to you? Not until there's at least a platform-independent tracefile viewer. Now you'll have to wait a minute or five while I reboot to MS-Windows for the scsi trace analysis.....

Friday Feb 18, 2005

my new laptop is broken

My new amd64 laptop (well, not so new now, got it three weeks ago) runs 64bit Solaris.

I couldn't really think of an "interesting" name to call it, but at the time I got it I was playing with some code which was decidedly broken. So that's what the laptop's called.

Of course, given that I bfu it every week or so in order to keep mostly current with new builds of Solaris, I have "warm-brickified" it several times so calling it "broken" seems appropriate.

The main error that I tend to make with bfu is forgetting to correctly merge or update the driver aliases and driver classes. Sometimes I forget the major number too.


I work at Oracle in the Solaris group. The opinions expressed here are entirely my own, and neither Oracle nor any other party necessarily agrees with them.


« June 2016