Monday Apr 06, 2009

Daaaaaaaad, the computer isn't booting

These were the words that my 10 year old boy yelled to me on Sunday.

I'm documenting this as I tried to imagine facing this as an end user, rather than as a Solaris kernel support engineer, and shuddered

The machine he is talking about is the OpenSolaris box that I installed for them and recently upgraded to SRU4 on Friday (more on supported updates shortly).

The box (silver) had been sitting at the boot load screen (those of us old enough to remember the original Battlestar Galactica would refer to it as the cylon screen) with the disk light hard on and the disk rattling away threatening to send itself off into hyperspace. It had apparently been like this for a few hours (he lost interest and went to watch TV before thinking to tell me).

He'd tried resetting it and it didn't help.

"OK", I thought, "I've just upgraded the box, maybe there was a problem with SRU4, let's boot into the prior boot environment." Easy enough to do, just reset and select the prior environment in grub.

No dice. Same issue.

Failsafe boot? No it appears that we don't have one of those.

Right, a single user boot. I want to have a look at what is going on on the console, So we need to get rid of the graphics crud at the start.

This isn't too hard. Have a look at the options in the text boot to be sure, but all I did was hit 'e' (edit) in grub, d (delete) the splash and graphics lines, e (edit) the unix line to take out the ",Graphics... " stuff off the end of the command, hit Enter to go back a screen then hit b (boot) and watch what happens.

I didn't have to wait long.

Let me give you a little more background on this machine. It really is scrounged together. The root pool consists solely of a 4gb disc removed from an ultra 10.

The root zpool was 100% full. The disc full messages scrolled for a while.

OK, once we waited for a few minutes we got the prompt asking for a login name and password to drop us to a root single user. OK, let's go looking for where the space issue is.

A 'zfs list' showed me that rpool/export/home was a little larger than I expected. Unfortunately, as the pool was full, I couldn't mount those. No worries, let's poke around on / to try to find something to remove to make enough space so we can mount things.

A good place to look for such space on a workstation is in /var/log, specifically the Xorg logs.

Let's remove one of those, ....

Bzzzzzt wrong.

Copy on write, .... In order to unlink a file we need to write a new block for the directory entry. Oops no free blocks.

The trick is to lose the space without having to rewrite the directory entry. We need to truncate one of the logs.

# : > Xorg.0.log.old

Much better. For good measure I zapped Xorg.0.log as well.

OK, that looks much better.

Let's mount rpool/export/home and have a look.

# zfs mount rpool/export/home

Ahhh, the kids home directories each have a largish core in them. Remove those, unmount /export/home. Now, as I mounted rpool/export/home and not rpool/export, a directory got created in /export. We need to remove that or the filesystem/local service won't start correctly (it will complain about /export having stuff in it).

Logout of that shell and the system continues on to milestone=multiuser and we're good again and Jake is off to do his daily moves in Kingdom of Loathing and resume his Club Penguin.

About

* - Solaris and Network Domain, Technical Support Centre


Alan is a kernel and performance engineer based in Australia who tends to have the nasty calls gravitate towards him

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Links
Blogroll

No bookmarks in folder

Sun Folk

No bookmarks in folder

Non-Sun Folk
Non-Sun Folks

No bookmarks in folder