A lesson in patience...

About a week ago, I bought myself two new 320GB external USB disks. They are destined to become my new home storage infrastructure, of course based on OpenSolaris and ZFS.

The first disk was recognized by Solaris with no problem: devfsadm(1M), then rmformat(1) say:

     3. Logical Node: /dev/rdsk/c7t0d0p0
        Physical Node: /pci@0,0/pci1022,7460@6/pci108e,534d@3,2/storage@1/disk@0,0
        Connected Device: WD       3200JB External  0108
        Device Type: Removable
        Bus: USB
        Size: 305.2 GB
        Label: <None>
        Access permissions: Medium is not write protected.

Fine. Then I attached the second disk, but devfsadm would not come back. At least not after a minute or so. Scratching my head, I turned off the drive and then /var/adm/messages logged:

Apr  6 10:33:26 condorito devfsadmd[143]: [ID 937045 daemon.error] failed to
lookup dev name for /pci@0,0/pci1022,7460@6/pci108e,534d@3,2/storage@2/disk@0,0

Hmm. It got worse when I tried rebooting with the drive attached: Solaris won't make it past the initial message. Hmm. Is the disk broken? Is there a problem with the Solaris USB or disk drive kernel support? And why just the second disk and not the first, they are equal, aren't they?

This is what the USB subsystem knows about the drives. Drive 1, the good one looks like this:

Apr  9 09:55:04 condorito usba: [ID 349649 kern.info]   Western Digital External
HDD 5743414C3733393036383035

But the second, the bad one looks like this:

Apr  9 09:55:44 condorito usba: [ID 349649 kern.info]   Western Digital External
HDD 57442D5743414D5234313836363532

Has Western Digital changed their serial numbering scheme right in the middle of the same product? And why is that a problem for Solaris?

Both drives worked fine with my Powerbook and with my Ferrari 4000 Laptop running Solaris so why can't my home machine (A Sun Java Workstation W1100z) cope with this?

As a last resort, I decided to try devfsadm again, this time waiting for however long it takes. Maybe it can come up with a more useful error message. And, voila, after 14 minutes(!) it did come back and everything was fine. This time, rmformat correctly recognizes the second disk as well:

     1. Logical Node: /dev/rdsk/c4t0d0p0
        Physical Node: /pci@0,0/pci1022,7460@6/pci108e,534d@3,2/storage@2/disk@0,0
        Connected Device: WD       3200JB External  0108
        Device Type: Removable
        Bus: USB
        Size: 305.2 GB
        Label: <None>
        Access permissions: Medium is not write protected.

Running devfsadm again just takes seconds, so something probably has been updated inside the device tree so that everything is fine again. I still don't know what it is and I'd be glad if someone would explain to me what happened, but I'm sure glad that I can now continue migrating my current ZFS filesystems onto the new mirrored ZFS pool...

Lesson learned: If something takes longer than expected, it might still not be broken. Just be patient...

Comments:

Post a Comment:
Comments are closed for this entry.
About

Tune in and find out useful stuff about Sun Solaris, CPU and System Technology, Web 2.0 - and have a little fun, too!

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks
TopEntries
Blogroll
OldTopEntries