How to shoot yourself in the foot with mdb

This would be funny if it were not for the poor customer who actually did this, lead by the hand by an info doc that suggested you do this:

#mdb -kw
>do_tcp_fusion/W 0
That is it. No information as to what to do next.
However those of you steeped in adb history, remember mdb has full backward compatibility will now that if you type another address at this point it will repeat the previous command. So it will write 0 to the address specified.
If you were unfortunate enough to not be steeped in adb history they you may not know how to exit from the mdb session. If you were to guess that the way to do this was to type “exit” then mdb happily looks up “exit” in the symbol table, converts that to an address and writes 0 into that address:
# mdb -kw
Loading modules: [ unix krtld genunix specfs ufs ip sctp usba s1394 nca ipc nfs audiosup random sppp sd crypto ptm lofs ]
> do_tcp_fusion/W 0
do_tcp_fusion:  0x1             =       0x0
> exit
exit:           0x9de3bf50      =       0x0
> 

If you are quick, lucky, and realise what has happened you can write the instruction back before the system crashes, but on a moderately busy system you have almost no time to. you have to do it before the next process exits. Hitting control D, or exiting mdb using any other method now results in the system crashing:


panic[cpu0]/thread=3000183c020: BAD TRAP: type=10 rp=2a10037ba50 addr=10c8a00
mmu_fsr=0

mdb: illegal instruction fault:
addr=0x10c8a00
pid=2926, pc=0x10c8a00, sp=0x2a10037b2f1, tstate=0x9900001602, context=0x115b
g1-g7: 10403ac, 58692c, 10c865c, 20, 80000305cfcc0ef8, 0, 3000183c020

000002a10037b770 unix:die+9c (10, 2a10037ba50, 10c8a00, 0, 2a10037b830, c0000000
)
  %l0-3: ffffffff7f402000 0000000000000010 ffffffff7e6ebec4 0000000000000000
  %l4-7: 0000000000000000 0000000000001084 0000000000001000 000000000106b800
000002a10037b850 unix:trap+12b8 (2a10037ba50, 0, 0, 1835800, 180c000, 3000183c02
0)
  %l0-3: 0000000000000000 0000000000000010 0000030001832a98 0000000000000000
  %l4-7: 0000000000010008 0000000000010000 0000000000000001 000000000180c180
000002a10037b9a0 unix:ktl0+48 (1, 0, 100173000, 100173, 5, 5)
  %l0-3: 0000000000000003 0000000000001400 0000009900001602 0000000001013c74
  %l4-7: 0000030001832cc0 0000000000000000 0000000000000000 000002a10037ba50

syncing file systems... done
dumping to /dev/dsk/c0t0d0s1, offset 107806720, content: kernel


It is actually a better way to induce a panic than most of the ones documented in books like Panic.


I've changed the info doc in question to have the command specified as:

echo 'do_tcp_fusion/W 0' | mdb -kw


So that it does not lead any more customers down that path, yes I've trawled sunsolve for all the cases where we suggest mdb -kw and updated them in a similar way.


Update: I also filed bug 6505499

Tags:

Comments:

full backward compatibility ????
$ cat foo.c
int some_global = 0;
main() { exit(0); }
$
$ cat foo.c
int some_global = 0;
main() { exit(0); }
$
$ cc foo.c
$
$ adb -w a.out
main:b
:r
a.out: running
breakpoint      ~main:          jsr     r5,csv
some_global/o
_some_gl:       0
some_global/w0
_some_gl:       0       =       0
exit
01:             0100360
main
074:            04567
exit?i
start+01:       bpl     0177743
main?i
~main:          jsr     r5,csv
:c
a.out: running
process terminated
$
/w command do no repeat. Fix the bug for backward compatibility with PDP-11 unix v7 ASAP PLEASE !
Thank you
St├ęphane

Posted by Stephane on December 15, 2006 at 07:27 PM GMT #

Ouch

Of course a better way to do this would have been

# echo do_tcp_fusion/W0 | mdb -kw

Perhaps the infodoc should be thus modified.

Alan.

Posted by Alan Hargreaves on December 16, 2006 at 02:29 AM GMT #

Alan,

That is what I have changed it to, as I said in this entry. Clearly the Ashes are going to your head!

Stephane,

Interesting. Perhaps we need to file a bug against adb and mdb.

Posted by Chris Gerhard on December 16, 2006 at 04:27 AM GMT #

:-)

I'm glad adb is sill on Solaris, and I hope nobody will remove it, because I still use it very often, like, you know, when somebody wants to change a string in a 100 Mb a.out and is complaining that loading the file in hexl mode with emacs causes a max buffer size reached error message ;-)

I didn't used adb when I had an account on a PDP, but I'm pretty sure /W didn't repeat when another symbol was typed in. If somebody cares, Solaris adb should be modified. If someone wants to clear a memory area, he can do symbol,size/Wvalue .

Launching a VAX emulator and loading a 4.3BSD could probably be done in a matter of a few minutes.

I remember ADB sources where written in C but Pascal like, something like # define BEGIN { or something. Are the sources of adb for solaris still available ?

thanks

stephane

Posted by Stephane on December 16, 2006 at 05:02 AM GMT #

Hi again, it seems Chris posted a comment while I was writing mine.

Chris, you're thinking of filling a bug regarding the autorepeat of the /w command which should probably not be ?

Regards - Stephane - Paris - France

Posted by stephane on December 16, 2006 at 05:10 AM GMT #

There is good news and bad news about adb in Solaris. The bad news is that it is not really there any more. The good news is that it is now a link to mdb, which is backward compatable with our old adb implementation (I'm going to check that the old adb we had when it was indeed a different program would repeat write operations, if it does not then this is a bug in mdb otherwise it is just a dangerous feature that I think should be fixed. That will have to wait until Monday as all the systems I have at home run Nevada and since it is the weekend I am trying not to fire up the Sun Ray as that leads to me doing even more work).

The source to mdb is on the opensolaris site.

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/mdb/

but is not in a pascal style, thankfully.

Posted by Chris Gerhard on December 16, 2006 at 05:34 AM GMT #

It's not Pascal like, I just checked. It's worse than this ;-) Example below.
endpcs()
{
	REG BKPTR	bkptr;
	IF pid
	THEN ptrace(PT_KILL,pid,0,0); pid=0; userpc=1;
	     FOR bkptr=bkpthead; bkptr; bkptr=bkptr->nxtbkpt
	     DO IF bkptr->flag
		THEN bkptr->flag=BKPTSET;
		FI
	     OD
	FI
	bpstate=BPOUT;
}
SunOS burton1 5.9 Generic_118558-26 sun4u sparc SUNW,Ultra-4 Solaris

repeats the ?W or/W command at each <CR>, if that helps.

Posted by stephane on December 16, 2006 at 09:52 PM GMT #

SunOS 5.9 mdb and adb are the same command we have to go back to 5.8 for the old adb.

I have filed bug 6505499 (the link will work a when the bug makes it through the fire wall) that mdb should behave as adb did with respect to repeating commands.

Many thanks Stephane for your help.

Posted by Chris Gerhard on December 18, 2006 at 06:47 AM GMT #

You're welcome.
Now, we have two x4600 (16 cores) which will crash after a while (few minutes) running an intensive cpu test program (some stuff from distributed.net).
The (sun) french guy in charge of the case told us something like "the machine is fine ! it's this stupid program which causes the crash, but you don't really need to run this code do you ?". Imagine I had to talk about this adb bug with this guy ;-)

BTW, we fixed the second machine by installing Windows on it and finding some weird (cpu/firmware) patch somewhere so it stops crashing. It's now calculating keys or something (rules ?) just fine !

Posted by stephane on December 19, 2006 at 03:47 PM GMT #

Post a Comment:
Comments are closed for this entry.
About

This is the old blog of Chris Gerhard. It has mostly moved to http://chrisgerhard.wordpress.com

Search

Archives
« July 2015
MonTueWedThuFriSatSun
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today