Exiting a reboot loop

If you're in a situation where the system is panic'ing during boot, you can use
# boot net -s
to regain control of your system.

In my case, I'd added some diagnostic code to a (PCI) driver (that is used to boot the root filesystem). There was a bug in the driver, and each time during boot, the bug occurred, and so caused the system to panic:

...
000000000180b950 genunix:vfs_mountroot+60 (800, 200, 0, 185d400, 1883000, 18aec00)
  %l0-3: 0000000000001770 0000000000000640 0000000001814000 00000000000008fc
  %l4-7: 0000000001833c00 00000000018b1000 0000000000000600 0000000000000200
000000000180ba10 genunix:main+98 (18141a0, 1013800, 18362c0, 18ab800, 180e000, 1814000)
  %l0-3: 0000000070002000 0000000000000001 000000000180c000 000000000180e000
  %l4-7: 0000000000000001 0000000001074800 0000000000000060 0000000000000000

skipping system dump - no dump device configured
rebooting...
If you're logged in via the console, you can send a BREAK sequence in order to gain control of the firmware's (OBP's) prompt. Enter Ctrl-Shift-[ in order to get the TELNET prompt. Once telnet has control, enter this:
telnet> send brk
You'll be presented with OBP's prompt:
ok
You then enter the following in order to boot into single-user mode via the network:
ok boot net -s
Note that booting from the network under Solaris will implicitly cause the system to be INSTALLED with whatever software had last been configured to be installed. However, we are using boot net -s as a "handle" with which to get at the Solaris prompt. Once at that prompt, we can perform actions as root that will let us back out our buggy driver (ok... MY buggy driver :-)) ...and replace it with the original, non-buggy driver. Entering the boot command caused the following output, as well as left us at the Solaris prompt (in single-user-mode):
Sun Blade 1500, No Keyboard
Copyright 1998-2004 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.16.4, 1024 MB memory installed, Serial #53463393.
Ethernet address 0:3:ba:2f:c9:61, Host ID: 832fc961.

Rebooting with command: boot net -s
Boot device: /pci@1f,700000/network@2  File and args: -s
1000 Mbps FDX Link up
Timeout waiting for ARP/RARP packet
Timeout waiting for ARP/RARP packet
4000 1000 Mbps FDX Link up

Requesting Internet address for 0:3:ba:2f:c9:61
SunOS Release 5.10 Version Generic_118833-17 64-bit
Copyright 1983-2005 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Booting to milestone "milestone/single-user:default".
Configuring devices.
Using RPC Bootparams for network configuration information.
Attempting to configure interface bge0...
Configured interface bge0
Requesting System Maintenance Mode
SINGLE USER MODE
#
Our goal is to now move to the directory containing the buggy driver and replace it with the original driver (that we had saved away before ever loading our buggy driver! :-) However, since we booted from the network, the root filesystem ("/") is NOT mounted on one of our local disks. It is mounted on an NFS filesystem exported by our install server. To verify this, enter the following command:
# mount | head -1
/ on my-server:/export/install/media/s10u2/solarisdvd.s10s_u2dvd/latest/Solaris_10/Tools/Boot remote/read/write/setuid/devices/dev=4ac0001 on Wed Dec 31 16:00:00 1969
As a result, we have to create a temporary mount point and then mount the local disk onto that mount point:
# mkdir /tmp/mnt
# mount /dev/dsk/c0t0d0s0 /tmp/mnt
Note that your system will not necessarily have had its root filesystem on "c0t0d0s0". This is something that you should also have recorded before you ever loaded your.. er... "my" buggy driver! :-) One can find the local disk mounted under the root filesystem by entering:
# df -k /
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/dsk/c0t0d0s0    76703839 4035535 71901266     6%    /
To continue with our example, we can now move to the directory of buggy-driver in order to replace it with the original driver. Note that /tmp/mnt is prefixed to the path of where we'd "normally" find the driver:
# cd /tmp/mnt/platform/sun4u/kernel/drv/sparcv9
# ls -l pci\*
-rw-r--r--   1 root     root      288504 Dec  6 15:38 pcisch
-rw-r--r--   1 root     root      288504 Dec  6 15:38 pcisch.aar
-rwxr-xr-x   1 root     sys       211616 Jun  8  2006 pcisch.orig
# cp -p pcisch.orig pcisch
We can now synchronize any in-memory filesystem data structures with those on disk... and then reboot. The system will then boot correctly... as expected:
# sync;sync
# reboot
syncing file systems... done
Sun Blade 1500, No Keyboard
Copyright 1998-2004 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.16.4, 1024 MB memory installed, Serial #xxxxxxxx.
Ethernet address 0:3:ba:2f:c9:61, Host ID: yyyyyyyy.

Rebooting with command: boot
Boot device: /pci@1e,600000/ide@d/disk@0,0:a  File and args:
SunOS Release 5.10 Version Generic_118833-17 64-bit
Copyright 1983-2005 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Hostname: my-host
NIS domain name is my-campus.Central.Sun.COM

my-host console login:
...so that's how it's done! Of course, the easier way is to never write a buggy-driver... but.. then.. we all "have an eraser on the end of each of our pencils"... don't we ? :-) "...thank you... and good night..."
Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

Search

Recent Posts
Archives
« April 2015
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
  
       
Today