Monday Dec 07, 2009

OpenSolaris Bug and Workaround: Package Manager

So, I wanted to use the latest build of OpenSolaris to take advantage of some new features.  I am using the June 2009 release (2009.06) and upgrading to the latest build means using a development build.  I was thinking the easiest way for me to switch to the development builds is to go into the Package Manager application, add the publisher "http://pkg.opensolaris.org/dev/" (and give this publisher a name like "development"), tell Package Manager to set this as the preferred publisher, then delete the "opensolaris.org" publisher from the list because I no longer need that one; I'm getting the software I need from the dev publisher.

Well, that's nice in theory but it didn't work for me, and there's a bug for that (here, check it out).  Turns out that for now, you must name your preferred publisher "opensolaris.org".  So the relatively simple workaround is to create a publisher called "opensolaris.org", and have it point to "http://pkg.opensolaris.org/dev/", then delete the original publisher that is called "opensolaris.org".

Worked for me, and now I'm using build 128 (b128) and it's working (mostly) like a charm.  There are other minor things going on with that build, but I've been able to work around them so far with no problems.  I'm happy.


Powered by ScribeFire.

Sunday Dec 06, 2009

The community fixed my OpenSolaris networking driver bug

I've been doing some nice upgrades to my home media server: I just put 4 2TB drives in it in a raidz configuration.  This now gives me over 5TB of fault-tolerant storage; if one of those disks fails, my data is still fine and I have time to buy a replacement drive and pop it into the machine so that ZFS can heal itself.

But I ran into a problem when trying to transfer my data from one machine to another.  I would try "zfs send <filesystem> | ssh zfs receive <receivingFilesystem>" of a filesystem that is perhaps 100GB large, but the transfer would never complete.  It seemed to go fine until maybe 10GB of content was sent, then the transfer would stall, and the sending side would complain of a time-out, then quit.  When I logged into the receiving machine (which I earlier wrote about here, so you can see the parts list), I found that it could no longer see the network.

I looked into it; turns out that there's an OpenSolaris bug with the RGE network driver, which is what my computer uses for its on-board Realtek Gigabit Ethernet circuitry.  The community really came through for me here.

Here's a discussion thread that talks about what is going on with the bug, and several versions of a fix that, after several iterations, did the trick for me.  I tried a couple of versions of the developer's fix, and finally one version worked like a charm.  I've used it to successfully transfer over a TB of data.  I just hope this fix makes it into an OpenSolaris build soon; the developer thinks it will take a few months for it to get accepted, which sounds sub-optimal for an open source project that is trying to get support from a community of helpers.  But 3 months is better than nothing, and I'm grateful that somebody created a fix.  Nice job, masa; thank you!



About

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle. What more do you need to know, really?

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today