Been working on inline templates to improve the performance on a couple of hot routines in a customer code. I've a couple of articles on this kind of work if you want to find out more details. There's an introductory article which covers the rules, and there's an article specifically talking about using VIS instructions.
Anyway, one of the most important things to do is to write a test harness, it's very easy to make a mistake and have the template not work for some particular situation. For these routines, one of my colleagues had already written a test harness. I ended up extending it to try a different corner case, and at that point discovered that my code no longer validated. The problem turned out to be a branch that should have been branch >= 2 and I'd coded branch != 2. The original test cases terminated with the value 2 at this point, but the new test I added ended up with the value 1, which still should have terminated, but the inline template as written didn't handle it correctly.
So I fired up dbx to take a look at what was going on:
$ cc -g test.c test.il
$ dbx a.out
(dbx) stop at 150
stopped in main at line 150 in file "test.c"
stop at <line> command tells the debugger to stop at the problem line number (more details). However, the problem actually occurred when j was equal to 1. So I really should specify the break point better (more details).
\*(2) stop at "mcmp-test-all.c":150
(dbx) delete 2
(dbx) stop at 150 -if j==1
(3) stop at "mcmp-test-all.c":150 -if j == 1
(process id 14983)
That got me to the point where the problem occurred. My initial thought was to step through the execution of the inline template using the
nexti command. However, this is pretty inefficient:
stopped in main at 0x00011cfc
0x00011cfc: main+0x1394: sll %l0, 1, %l1
stopped in main at 0x00011d00
0x00011d00: main+0x1398: add %l3, %l1, %l0
stopped in main at 0x00011d04
0x00011d04: main+0x139c: ld [%fp - 1044], %l1
It could take quite a large number of instructions before I actually encountered the problem code. Plus each step takes three lines on screen. However, there's a
tracei command which traces the execution at the assembly code level (more details).
(dbx) tracei next
0x00011d08: main+0x13a0: mov %l0, %o0
0x00011d0c: main+0x13a4: mov %l2, %o1
0x00011d10: main+0x13a8: mov %l1, %o2
0x00011d14: main+0x13ac: nop
The output took me through the code, and knowing the code path I had expected, I could pretty easily see the branch that caused the code to diverge.