Thursday Jun 07, 2007

Patch 125720-03 on Ultra 40

I thought I should blog about this just in case, like mine, your desktop monitor (temporarily) becomes no better than a cheap terminal!

My Ultra 40 came pre-installed with S10U3, and I have been keeping it up-to-date with all of the latest security and recommended patches using Sun Connection Update Manager. A few days ago, Sun released patch 125720-03 as a security fix for the Xorg server, so I duly installed it like any good admin should ;) along with a couple of other security fixes which came out the same day.

The problem was that after rebooting, the window system did not start up, and what was worse was that I could not get the console login prompt to login to find out why. The moniter showed the normal message indicating that the X server was being started, but other than flashing a couple of times, it did not start or exit to allow me to find out what the cause was.

Luckily, I have my trusty laptop hanging around, so I logged in from there, figured out which patch was the likely suspect and removed it. Within a few seconds, the X server started up and we were up and running again. A quick poke around in /var/dt/Xerrors showed the following:

1214:   /usr/X11/bin/Xorg :0 -depth 24 -nobanner -auth /var/dt/A:0-vCaiJb
 fef70b47 read     (a, 8047950, ff)
 080e391d xf86SigHandler (b, 0, 8047b20) + ed
 fef7013f __sighndlr (b, 0, 8047b20, 80e3830) + f
 fef666ed call_user_handler (b, 0, 8047b20) + 22b
 fef6686d sigacthandler (b, 0, 8047b20) + bb
 --- called from signal handler with signal 11 (SIGSEGV) ---
 feab7443 _nv000949X (f9e00000, 4b00780, 2000, 1820, 842e484, cafe0001) + c3
 00000000 ???????? (0, 0, 1, 0, 0, 20)
 feb86948 ???????? ()

Fatal server error:
Caught signal 11.  Server aborting
This seems to indicate that the X server is crashing whilst trying to access the nVidia driver (that's a guess, BTW).

I logged a bug about this 6565662 which was handled very efficiently by the engineering group responsible for the nVidia drivers and it turns out that this is an issue with the particular version of the nVidia X driver installed on my machine:

pkginfo -l NVDAgraphics
   PKGINST:  NVDAgraphics
      NAME:  NVIDIA Graphics System Software
  CATEGORY:  system,graphics
      ARCH:  i386
   VERSION:  1.0.8776,REV=2006.
   BASEDIR:  /usr
    VENDOR:  NVIDIA Corporation
      DESC:  X and OpenGL Drivers for NVIDIA Quadro graphics
    PSTAMP:  builder2920061016223323
  INSTDATE:  Feb 27 2007 09:54
   HOTLINE:  Please contact your local service provider
    STATUS:  completely installed
     FILES:      115 installed pathnames
                  26 shared pathnames
                  34 directories
                   5 executables
I had version 1.0.8776 installed, but the Xorg server in patch 125720-03 requires a facility of the nVIDIA driver which is not available until later releases, so an upgrade to the nVIDIA driver was required - to at least 1.0.9637.

If you work for Sun, you can get the updated drivers from here. Otherwise, you need to visit nVIDIA.

I have it on good authority that the README for patch 125720-03 will be updated to reflect this dependency in the patch installation instructions.

Friday Nov 24, 2006

Networking wierdness between two S10 boxes

For some months now, I've been having a problem with file transfer between my Sun SPARC workstation and an x86 box - both running Solaris 10.

The problem manifested itself as extremely poor NFS performance and 'scp' - where the SPARC box was the client and the x86 box the server, but only in that direction. scp to the x86 box was seemingly fine. Strangely enough, ftp worked okay in both directions.

What I found was that when using scp to transfer a file from the x86 box to the SPARC box, it would transfer a few KB, then wait a few seconds, transfer a few KB more, wait a bit longer, ... The waits got longer and longer (roughly doubling every time) making it a pain to transfer a file anything larger than a few KB.

At first, I thought it might be a crypto problem, since ftp appeared to be working fine, but NFS was slow when configured as V3 or V4, so I pretty much ruled that out.

In fact, I reached a dead-end on diagnosis and just tolerated it for the past 6 months - up until 2 days ago.

I happened to mention the problem to a colleague and after some discussion he suggested that maybe the Ethernet NIC on my x86 box was running in half-duplex mode, and simply couldn't keep up with ack's from the SPARC box when transferring lots of data. Sure enough, a little experimentation identified that this was a strong possibility. A quick Google located a discussion on db forums which highlighted my 3COM card as being a little problematic. FTP apparently doesn't wait for ack's, it just pours data down the pipe, which would explain why it apparently worked ok and scp ddidn't.

So, after a little careful checking of what drivers might be needed, I settled on a Linksys LNE100TX with drivers kindly provided for all by Garrett Damore.

The new card arrived yesterday, and after some PCI slot reorganisation and a fistful of reboots, the card sprang into life with Garrett's superb 'afe' driver. File transfers are now far, far faster than they were before, in both directions - a transfer of a 320MB file taking just 30s (it took 3.5 minutes yesterday to do the same operation: SPARC -> x86).

I'm happy now and the 3COM NIC is destined for the garbage or maybe eBay.... :)

Wednesday Oct 04, 2006

Live Upgrade ate my system

For reasons which I won't bother the reader with, I decided to use Live Upgrade to upgrade my system from Solaris 10 Update 1 to Solaris 10 6/06.

Since I already upgraded my system from Solaris 9 to Solaris 10 using Live Upgrade, I thought this would be a doddle, and indeed it was.

First, delete the old Solaris 9 boot environment:

        # ludelete solaris_9

Next, create the new boot environment on the same partition as the old Solaris 9 partition:

        # lucreate -m /:/dev/dsk/c0t1d0s0:ufs -n solaris10_u2 -f /tmp/excludes
Note that here I tried to exclude some file systems which I would rather not be part of the new BE. Sadly, this failed, for some reason (maybe because they are part of the current boot env).

Now, do the upgrade from the Solaris 10 6/06 DVD:

        # luupgrade -u -n solaris10_u2 -s /cdrom/sol_10_305_sparc/s0

And finally, activate the new boot environment and reboot:

        # luactivate solaris10_u2
        # shutdown -y -i6 -g0

Those of you already familiar with Live Upgrade will no doubt know that the luactivate command spouts out some information about what to do if your new boot environment fails. It turns out that this was a life-saver for me - especially as I was rather rash in attempting to fix the mess which followed the reboot.

So, the system rebooted and it was immediately apparent that something had gone badly wrong with the live upgrade. A number of modules could not be loaded because of missing symbols: aggr and zfs to name but two. Then, when X-windows tried to start up, it failed and I see I have a core file from 'dtgreet' left in the root directory.

Unfortunately, here's where I panic'ed (no pun intended). Instead of thinking rationally about the problem, I thought it must be a kernel issue, so I rather rashly copied over /platform/sun4u/kernel/sparcv9/unix from the original boot environment into the same place on the current boot environment (without making a backup!!!!!!). Surprise, surprise, ... on attempting to reboot, the kernel panic'ed early in the boot and that was game over.

After some faffing around booting from DVD and trying to correct the fault, I finally recalled the sage words spouted by luactivate, so I managed to pull these from the handy-dandy typescript file I cunningly made whilst performing the upgrade.

luactivate says to "Mount the Current boot environment root slice". However, what you really need to do is to mount the old boot environment root slice, because it is that you need to activate. That's worth remembering!

Now comes another tricky bit: for some bizarre reason, when booting from DVD, the device tree for the disks is set up with the disks hanging off controller 1 instead of controller 0 as they are in the disk boots. So, you happily mount the old boot environment root slice and try to run the luactivate command to reactivate that BE. Needless to say, this fails with a complaint about it being on the wrong disk partition.

The fix for this is to do the following (assuming c0t1d0s4 is the old BE and c0t1d0s0 is the currently active BE):

        # cd /dev/dsk
        # ln -s c1t1d0s4 c0t1d0s4
        # ln -s c1t1d0s0 c0t1d0s0
        # cd ../rdsk
        # ln -s c1t1d0s2 c0t1d0s2
Then, mount the correct device and run luactivate. This should run through successfully.

Finally reboot and we're back to where we were before we started.

I'm not sure if I should re-attempt the live upgrade or just give it up as a bad job for the moment, but I thought I'd share my experience (and stupidity) in case it is helpful to anyone else.

Monday Mar 06, 2006

Cachefsstat wierdness

I've just set up my Solaris 10 work system so that it mounts a directory from a Solaris 10 x86 box and to try to improve the performance, implemented a cached file system:

# cfsadmin -c /export/cache
# mount -F cachefs -obackfstype=nfs,cachedir=/export/cache,actimeo=300 xyzzy:/export/docs /docs
Now, I can reference the files in /docs just fine, and a quick snoop of the network shows that after the first read of a file from that directory, susequent reads appear to be fulfilled from the local cache (ie there is no NFS read at the time of the access). However, cachefsstat insists that I have a 100% miss rate:
# cachefsstat /docs

                 cache hit rate:     0% (0 hits, 18729 misses)
             consistency checks:  25717 (25717 pass, 0 fail)
                       modifies:      0
             garbage collection:      0
I can't find anything in SunSolve which helps, and although this seems to be a reporting issue rather than a genuine 100% cache miss, I'd still like to understand why the reporting is incorrect. Any ideas are welcome !

Wednesday Mar 01, 2006

Riddle me this x86 experts?

My x86 Solaris box has a minor quirk in that if you switch it on and leave it, it will hang at the prompt which says something to the effect of "Press ESC to abort auto-boot...".

If I am not there or simply miss the time window to press ESC, the machine just hangs (forever?) at that point and I have to reset it to get it going again.

To get it to boot, I have to press ESC before the timer expires, accept all of the defaults on the following boot config screens and it then proceeds to boot up just fine.

Have I got some dodgy hardware, just a dodgy configuration, or a boog ?

Thursday Feb 23, 2006

Sun to acquire Aduva

Yesterday, Sun announced its intent to acquire Aduva.

I've been fortunate enough to have been able to take a close look at Aduva's product offerings and can safely say that their technology is very exciting and offers Sun's customers the prospect of significant gains in managing software packages and patches deployed on their networks - both for Solaris and Linux.

If you are a System Administrator (Solaris or Linux), you'll know the nightmare that is managing software dependencies for both application packages and patches or maintenance releases: your user wants package X installed, so what else do you need to make this both functional and up-to-date as far as patching is concerned ?

Well, Aduva's technology takes care of all of that for you. You tell it that you want the system to have package X installed, and the On-Stage software takes care of figuring out what works with that package, what doesn't, and what (if any) patches you might need to also install. All within seconds! No more trial-and-error. No more searching Linux/Sun patch sites for required patches and then finding that they have dependencies on other patches. It's all done for you - even the installation!!

This is quite the coolest technology I've come across in a long time, so if you're a Sys Admin or IT Manager, I suggest you have a read through the On-Stage literature and see how it could help improve the management of your systems.

P.S. I can't tell you anything about how this technology will fit into Sun's offerings until after the deal is closed, so please don't ask me. Just watch this space!

Thursday Jan 05, 2006

NetBeans 5.0 CVS integration rocks!

Once again recently, I've found myself struggling with the idosyncracies of NetBeans 4.1, so I downloaded the most recent stable build of NB 5.0.

It seemlessly copied over my prefs and open projects from the 4.1 installation and I was ready to go. Good so far!!

What I then did was opened up a project I had started in NB4.1 and which I'd closed before committing to our CVS repository. I was no less than amazed to find that NB 5.0 automatically read the CVS config files and even highlighted those files which were (already) changed between my copy and the repository. Top marks and looking rather good too!!

I added some new code, modified a few more files, then selected "CVS->Commit" from the project's context menu and it popped up a window showing which files were new, which were updated and gave me space to enter a commit comment - all in nice colours. This is just great!

Clicking on the Commit button sent off the changes and we're done. This is such a massive improvement over the earlier dev release of NB 5.0 I tried and is even better than Eclipse's CVS integration, which was good, but not this good!

I think there's still a way to go before NB's (built-in) refactoring is any where near as good as Eclipse, but at last the team are making enough progress to have me in the camp which will start up NB in preference to Eclipse for general project work, and believe me: that's progress!

Friday Sep 23, 2005

Solaris 10 for x86 - I'm impressed!

I have an old PC in my home office which acts as time server, HTTP proxy, file store for the other computers (2) on my home network and occasional web surfing use. It's a PIII 500 with 13.5Gb disk and a mere 320Mb RAM and has been running SUSE Linux 9.1 for about a year now.

Now, I may have had it set up wrong somehow, but the performance sucked. Big time! The Squid proxy was as slow as an old tortoise and it took an age to read and write files from the PC over the SAMBA protocol. It did, however, keep good time :)

Matters came to a head just before I went on vacation. I used a 6Gb partition for all of the shared data, and unfortunately, my music managed to fill it up, so I ended up having to remove some of the home directory data backed up from the PC (with fingers crossed, I might add).

Once back from vacation, I ordered a cheap 80Gb ATA133 disk and installed it in the machine last weekend, but what to do next. The machine desperately needs more RAM and a faster CPU, so I've hunted around and think I have something lined up. However, I decided to hold off due to cash-flow problems ;)

I dowloaded the first couple of CD's worth of Fedora Core 4, thinking I would try this out, but before I installed that, I thought: why not try Solaris 10 as I've got it sitting in my cupboard ?

To be honest, my expectation of any sort of success with S10 was very low. However, I stuck the first CD and it booted first time without any problem. I then lurched through the installation process (I don't do it often enough to become good at it) and up the machine came. Unfortunately, due to my careless package selection policy, I couldn't get the window system up and running, and even services like sshd wouldn't come up properly. However, flush with the success of actually getting the machine to boot, I did a re-install, only this time instead of starting from the minimal (core) cluster, and adding packages, I started with the end-user cluster and removed a few bits of stuff I didn't really need. The install went through smoothly, and apart from forgetting to add a swap partition (d'oh!), everything went swimmingly and JDS came up with no problem at all.

I downloaded squid from Blastwave then configured it, Samba and xntpd to give me the same services as I was using on SUSE and now it works perfectly! Not only that, but Squid is running significantly faster and file reads and writes over Samba are very quick now. Whilst I don't know if this is S10 or just a function of having a faster disk, I am really impressed. I've only tried JDS briefly, but it appears to be at least on a par performance-wise with KDE on SuSE, so I'm really chuffed. Now, I can even administer the machine easily rather than spending ages with the Linux manuals trying to figure out how to do X, Y and Z!!

One small negative in the whole process is the way Solaris reports package dependencies during the install. If you manually select a package that depends on another, it will warn you, but the dependent package description is given, and it's sometimes difficult to find the corresponding package in the long list presented. It would be really nice to just have an option to 'resolve dependencies' to automate the correction, and would have saved me a good chunk of time.

All in all, I'm really glad I chose to try Solaris instead of Fedora or SuSE. Now, I just need to get that CPU, motherboard, and RAM upgraded. Oh, and I guess I'll need a new case too ;)

Wednesday Aug 17, 2005

Netbeans is soooooo COOL!

Bet you never thought you'd read that from me eh ?

In order to keep my CV (resumé) employer friendly, I've been looking into Web Services. Yeah, I know I'm several years late coming to this, but other stuff has conspired to keep me away from it!

I noticed a couple of weeks ago, whilst playing with Web Projects in NB 4.1, that there is a folder labelled "Web Services" created when you create a new web project, so now I can justify spending the time on it, I wondered how one went about creating a new web service.

On clicking the "Web Service" link in the new file/folder wizard, a new wizard window popped up to allow me to set some information about my new web service and (lo and behold) create the service from an existing WSDL file. Looks natty, so what does it do ?

I went off to t'Internet in search of an interesting WSDL file, and after sifting through mountains of so-called tutorials, eventually discovered a WSDL file for a real web service. Now, I have to say at this point that you would normally (apparently) use the WSDL file to create a web service client. However, I wanted to see what code NB would generate if I told it to create a service.

So, I input the URL of the WSDL file into the NB web service wizard, and 10 seconds later had a source tree fully populated with the classes needed to handle the input and return parameters for the web service and the framework for providing the service. All I would need to do is bolt in the code which actually processed the input parameters and set the return object contents.

This makes developing my own web service so much easier - once I've trainined myself up on WSDL format and XMLSchema - just define the service in WSDL and use NB to generate the necessary code to support it! All I can say about this is that it is quite the coolest thing in NB I've come across so far. Sorry Eclipse, but that'll be tough to top!

Thursday Aug 11, 2005

Still struggling with NetBeans

I've been trying to do some more serious work in NetBeans over the past couple of weeks, and although there were some significant benefits in speed of development of my little web service, the annoyances are piling up:

  • After a random period (but at least once a day), the menus fail to paint, so you can't do many operations unless they're on the toolbar or you can remember the keyboard accelerator. The only way to clear this is to restart NetBeans!!!
  • CVS integration. Well, where do I start ? Okay, so I accept that I'm using a development release, so I can't expect things to be perfect, but I could not figure out how to create a new project from existing sources in a CVS repository. Sure, I could just about manage to extract file from a repository into some location on my disk, but when I go to create a new project, NB seemed to want to copy the sources to a new location for the project. I'm sure there's an easy way to do this, but the fact that it is not intuitive (to me) is a major downer!
  • Actually, this is a gripe from a colleague - that it appears to be darn near impossible to integrate a free-form project into NB in such a way that you can use all of the cool features of deployment to Tomcat/App Server etc.

That said, I also struggled with Eclipse yesterday in checking a project out of CVS. I wanted it to be a Java project so I could use all of the useful Eclipse features for Java projects, but it only offered me the option of an ANT builder. So, in the end, I checked the project out manually and then created a new Java project from the checked out sources.

All in all, both IDE's have some way to go in usability. For web projects, I think NB has the edge because of its good built-in integration with Tomcat/App Server. However, from a pure editing and refactoring perspective, Eclipse is a long way ahead in usability.

Tuesday Jun 21, 2005

I'm trying to like NetBeans - honestly!

The official mandate inside of Sun is:

Thou shalt use NetBeans IDE and not, under any circumstances, that product spawned by Satan himself.

So, I duly gave up using the other IDE and started using NetBeans.

At the risk of being seen as a heretic, I actually prefer Eclipse most of the time. Its refactoring abilities seem to be way ahead of NetBeans, it is generally faster and I can navigate around it with more ease.

That said, and just to prove that I am open-minded, there are some things about NetBeans which I do particularly like, so I thought I'd just throw them out there and hope that I don't get fired for being a nay-sayer.

Particular Likes:

  • Integration with Tomcat - top marks!
  • JSP and particularly Tag library development.
  • Ability to add all required imports in one mouse click.

Particular Dislikes

  • I can't get NB to tell me which imports, fields and methods are unused (I like this feature of Eclipse).
  • Paging/GC. I suspend my system overnight and when I first go back to the NB screen, it takes at least 20-30s to page in. Eclipse is available for use almost straight away.
  • Don't have fine-grained control over code styling.
  • I'm sure there was something else, but it has eluded me for now.

However, the thing which is driving me completely insane at the moment is losing the menus.
At some time after starting NB, when I click on an NB menu, the menu box appears, but is just filled with blue-grey, making them unusable! I haven't found a way around this other than restarting NB, which is not a particularly elegant solution. If it weren't for the fact that I'm playing with JSP's at the moment, I would honestly drop NB like a stone and return to using Eclipse because of this.

Congratulations, Tim!

I'd just like to offer my congratulations to Tim Uglow on his promotion to Principal Engineer (PE).

I remember when Tim first joined Sun to work in the Communications support group in the UK Response Centre (as it was known then) in the late 80's and am delighted that despite a brief spell outside he continues the good work for Sun.

Well Done Tim!

Thursday May 26, 2005

I'm a bit cross with Sun today

I sold some stock options 5 weeks ago, and I still don't have the proceeds.

Because of the UK tax laws, when a UK employee sells stock options, the proceeds go from the brokerage to Sun, who pass them on to their (external) payroll agency to handle. They _should_ take care of deducting any owing income tax and then send the remainder of the proceeds directly to your bank.

In this case, there was no income tax to be deducted at source because these options were granted way before that tax law came into force, so in theory, the money should have hit my account at least 2 weeks ago.

To compound the frustration, it took two emails to the US stock plan services group to get them to tell me to contact the UK payroll department (whom they copied on the email). Because I didn't get a response, I emailed UK payroll again on Monday this week, but still no response. I appreciate that people are busy, but a quick email response to say that they're looking into it would be nice.

Now, I'm going to have to try not to get angry and resist throwing my toys out of the pram.

Friday Apr 29, 2005

Woohoo - even faster broadband!

For some reason, I missed out on an email to all Sun staff in the UK telling them that they could upgrade to 2mbps ADSL, but fortunately, a colleague forwarded it to me when I was moaning about it.

So, now I've got my order in and have to wait up to 5 business days to get the upgrade. Can't wait!

Unfortunately, my wife says that she never noticed when we upgraded from 512k to 1mbps, so I do wonder whether it'll make a difference for her. Then again, she only really does email and a little surfing of t'Internet, so maybe she wouldn't really notice it anyway.

Monday Mar 21, 2005

Sun Net Connect Remote Servicing

I've been biting my lip on this for a couple of months now, but I think it's safe to talk about it now that Sun Net Connect 3.2 is announced.

For the past couple of years, I've been working on a project to implement a secure collaborative web browsing environment which enables Sun Customers and Sun service engineers to view the same web content from the customer's site at the same time. This will allow Sun engineers to triage problems with web-based services on the customer's WAN without having to either visit the site or rely on descriptions or screenshots of the problem from the customer. Personally, I think this is a big step forward (although I am biased).

Currently, the only "supported" collaborative use is with Sun Storage Automated Diagnostic Environment (StorADE) version 2.4, although it will work happily with other web applications. It even supports well-behaved web applications which use HTML frames!

In addition to shared web browsing, Sun Net Connect 3.2 also contains an application to provide collaborative shell access to customer systems, which enables the triage of applications which are command-line rather than GUI driven. Both services open up some exciting possibilities for reducing the time it takes to resolve an issue.

If you are interested in the possibilities of this technology, then please contact your Sun Service Sales representative to discuss Sun Net Connect 3.2, which also offers extensive system monitoring capabilities.




Top Tags
« April 2014