Monday Apr 11, 2005

Improving a development tool.

One of the internal development tools we use heavily within the Solaris development organization is a script called "bfu".  Depending on who you talk to, it's either Bonwick/Faulkner Upgrade or Blindingly Fast Upgrade.

In a matter of a few minutes, it will either (a) upgrade the core OS components from a nightly build, or (b) turn it into an unbootable brick.
In other words -- the perfect tool for a developer and something you don't want customers to ever use.  (Of course we'll be letting it loose as part of OpenSolaris).

The core of bfu is the use of cpio to overwrite all the files which originate from a build of the core OS/Networking ("ON") source tree.
This gets you most of the way there, but the remaining bit is to merge various system files (especially key files like /etc/name_to_major and /etc/minor_perm) to include both the new additions in the core OS \*and\* entries added by non-ON packages as part of the original install.
Our packaging system includes "class action" scripts to update these files automatically, but since the dawn of BFU, developers have had to merge these files manually on each BFU; getting this wrong is the leading cause of brickification.

As you might expect, all sorts of hacks have evolved to do this in an automated way.

A few years ago, a coworker developed a package called ACR which does it The Right Way -- using the same scripts used by packaging.
She's been too busy to integrate this into the official source tree, so I picked it up & ran with it, and just integrated it.  Folks in the OpenSolaris preview should see this in nevada build 13.

Wednesday Aug 18, 2004

Getting the details right..

A C-Net story today reports:

The excitement began Thursday with an announcement that French computer scientist Antoine Joux had uncovered a flaw in a popular algorithm called MD5, often used with digital signatures. Then four Chinese researchers released a paper that reported a way to circumvent a second algorithm, SHA-0. err, um. Joux announced a SHA-0 collision, while the chinese found the MD5 collision.

The attack doesn't really "circumvent" SHA-0, and it's not like anyone actually uses the original SHA .. NIST announced that it was flawed in some unspecified way and replaced by SHA-1 which added a rotate to the message schedule for improved mixing.

The report then goes on to mention the use of MD5 by the Solaris Fingerprint Database -- a list of MD5 hashes of officially released solaris binaries -- without clarifying that the attacks on MD5 announced yesterday are not directly relevant to the use of MD5 by the SFPDB.

The research may well be a stepping stone to a future preimage attack on MD5, but it does not put it at risk today; the research likely also will point towards newer hash functions which are resistant to known attacks.

And I can't even tell what Declan meant by: To write a specific backdoor and cloak it with the same hash collision may be much more time-intensive.

Wednesday Jun 16, 2004

dtrace for reverse engineering.

Since Bryan was so kind to list me as a "Friend of DTrace"..

Lately I've been prototyping some extensions to our IPsec implementation. While shaking them out, I've twice had the experience of the system not even getting to the new code when I was expecting it to..

In the pre-dtrace era, finding the point where we went astray was often a frustrating exercise in single-stepping, psychic breakpoint insertion, or iterated debug-by-printf.

With dtrace, it's a matter of running a truss to get the relevant syscall arguments and then doing something along the lines of:


#! /usr/sbin/dtrace -Fs

syscall::so_socket:entry
/arg0 == 29/
{
   self->traceme = 1;
}

fbt:::
/self->traceme/
{}

syscall::so_socket:return
/self->traceme/
{
  self->traceme = 0;
}

Drop the above in a file, chmod +x, run it, watch dtrace insert 45000 probes into the live system in a second or two..

.. and I quickly discovered that the reason I wasn't getting to the spdsock open routine was due to overly conservative device policy, which turned out to be easy to fix. Same deal for a new socket option -- whoops, added it to the wrong switch()..

Saturday Jun 12, 2004

Knowing your tools..

An article featured on Slashdot today, Why Learning Assembly Language Is Still a Good Idea completely misses out one crucial reason why a reading knowledge of assembly and knowlege of a machine's low level architecture will help a systems programmer get their job done: Crash analysis. Over two decades of coding, it's been particularly obvious to me that programmers who don't know how to extract information from a core file, crash dump, or stack trace on average will take a whole lot longer to root-cause and fix bugs. And to do that, you need good tools and a modest ability to read assembler.

It's something like forensic analysis - the dump is the "scene of a crime" often containing many clues as to the cause of the fault.. (If you don't mind an analogy from Hollywood, it brings to mind a scene from a recent rerun of Crossing Jordan - a junior ME misses a bunch of more subtle clues at a murder scene which Jordan spots after a careful once-over; these clues give them a running start at the investigation).

On most unix systems, you generally get the register contents at the time of the fault and from that know which instruction was executing at the time. Source level debuggers can convert that into a source line, but that loses information -- when there's ambiguity, it's often better to work backwards from the instruction rather than try to guess which part of a complex expression the processor tripped over. And if you didn't build -g, a source-level debugger will throw up its hands and point in the vague direction of a function at fault.

So what do you do when the program crashes in production? Go back, rebuild with -g, and try to reproduce it? If it's a once-a-month, hard to reproduce race condition, you may end up waiting a while before it happens again.

Even if you can't conclusively prove anything from a crash, a careful analysis will often give you a big head start towards reproducing the problem, and knowing where to lay traps to catch a precursor to the crash.

Among the folks developing solaris, MDB is the debugger of choice for this sort of analysis. It's an evolving, extensible debugger, with a published API you can use to add your own extensions. GNU's GDB's also good for low-level debugging.

If you have to "lie in wait" for the bug to happen again, dtrace can't be beat. (And, well, it's also great for any number of other observability tasks.. but that's for another day..).

How to learn this stuff? Nothing beats a little experimentation. Sit down with an architecture manual, and a C compiler. Read enough of the front matter of the architecture manual to understand the register layout. Write some simple C code, compile it -S, and look at the assembly output. Learn how your compiler tends to generate code.. Learn how C-level identifiers appear in the low-level object symbol tables. Use your debugger to disassemble a "live" program, and compare with the -S output. Single step through the program an instruction at a time and watch the control flow and how the contents of registers and memory change. If you can't figure out what an instruction does, look it up.

About

sommerfeld

Search

Top Tags
Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today