Friday May 13, 2011

Knuth: "Be an edge, not a node."

At a recent talk, engineering hero and consummate computer scientist Donald Knuth gave the following graph-theoretic advice for future professors: be edges, not nodes. That is, plan to have two sub-specialties in different fields and be able to act as a bridge between disciplines. Perhaps the source of this advice is related to the proof of his Knuth is fondest of: as edges are randomly to a set of nodes, after initially forming a quite disconnected graph, at a certain point there are two quick phase transitions and afterward a large fraction of the nodes then share membership in a single connected component.

I think Knuth's advice is sound for non-academics too (and a step beyond just being T-shaped). Some of the blog entries I've most enjoyed writing are those that bring together tools from different areas of computer science, such as numerical linear algebra and programming languages.

Thursday Apr 28, 2011

See you on the flip side

As has been noted by many other bloggers on blogs.sun.com, the blogs on that site are soon going to be migrated to blogs.oracle.com. As of next week, the previous content of blogs.sun.com/darcy should redirect to and be hosted under blogs.oracle.com/darcy. I will not be making any blog posts here until the migration is complete and I've made a separate backup of this blog's contents in case there is a problem with the move.

See you on the flip side.

Friday Jul 23, 2010

Understanding Inception using try-with-resources

SPOILERS for the film "Inception" as represented using the try-with-resources statement.[Read More]

Thursday Jun 03, 2010

Shaped to a T

Last week I attended a talk by John Hennessy on the "Future of Research Universities" (video). The talk ranged over many subjects, from distinctions between graduate and undergraduate learning, to the difference between training and education, and distinguishing properties of Silicon Valley.

Toward the end of the hour, Hennessy noted the importance of "T-shaped" researchers for effective collaboration: deep enough to contribute in their own field, but with enough breadth to work with colleagues in other areas. Applying the venerable computer science technique of recursion gives a T-shaped fractal all the way down, analogous to Koch's curve:

T-shaped all the way down.

Having a well-balanced combination of depth and breadth strikes me as being important in many other endeavours too. For example, I was reminded of the different stability versus progress trade-offs that developers need to be sensitive to when working on the JDK, such as the benefits of changes in a given area versus the impact on the rest of the release.

Monday Feb 09, 2009

The Three R's of Airplane Maintenance

Complementing the three grade school R's of reading, writing, and 'rithmetic, I've heard it said there are three R's of Windows system administration, retry, reboot, and reinstall. When flying recently, I experienced the Windows R's being applied to airplane maintenance too. After everyone had boarded the plane, the pilot announced there was a problem with the navigation system and they were trying repeatedly to clear the problem. After that didn't work, the pilot came back on to say the maintenance crew was going to power down the entire plane so it could be restarted; even the emergency lighting blinked off for several minutes. Finally, with the fault still showing after the plane booted back up, the airline reinstalled us passengers on a spare plane and we were on our way with just a two hour delay.

Friday Jun 13, 2008

Cheesy X-rays

When flying recently, I was slightly concerned after when my carry on bag was selected for extra scrutiny after going through the X-ray machine; the culprit: cheese. I was bringing back four bars of 1 ¾" × 1 ¾" × 6" cheese I haven't been able to find in California. My bag also had some batteries and electronics in it. The cheese packaging was swabbed and analyzed and I went on my way. I thought this was a little silly, but I moderated my view when I was catching up on back issues of Science News during the flight and read an article that said cheese and plastic explosives can look the same under X rays. The new and improved X-ray system being discussed will be able to tell apart C4 and four cheese!

Tuesday May 27, 2008

Indiana Jones: Ants aren't like scorpions!

There have been many years and many miles since Indiana Jones and the Last Crusade, and after seeing Indiana Jones and the Kingdom of the Crystal Skull this weekend, I think the franchise is a little worse for wear. In the opening action sequence, I was was surprised to learn Soviet gunpowder in the 1950's apparently included iron filings in addition to sulfur, charcoal and potassium nitrate since the powder was magnetic! Somewhere in editing, I'm convinced the line "Ants aren't like scorpions!" must have been deleted, a line I hope will be added back for the DVD edition.

Friday May 16, 2008

A Twisty Maze of Little Molieres

Described as the French "Shakespeare in Love," I watched the film "Molière" about the famous but new-to-me 17th century French playwright and one verbal exchange stuck with me. The exchange is described in the wikipedia article about Molière:

In Le Bourgeois Gentilhomme, the title character, M. Jourdain, composes a love note as follows: "Beautiful marchioness, your beautiful eyes make me die from love" ("Belle marquise, vos beaux yeux me font mourir d'amour"). He then asks his philosophy teacher to rephrase the sentence which he does by shuffling the words in nearly every single way ("Beautiful marchioness, from love," etc.). M. Jourdain then asks which phrasing is best and the teacher promptly replies that the first is best. The phrase "Belle marquise..." is now used to indicate that two different sentences mean the same thing.

I was immediately reminded of the Adventure game's many variants of "A twisty maze of little passages, all different." Exploring the space of combinatorial permutations has had recognized literary value for longer than I thought!

Friday Apr 25, 2008

Test where the failures are likely to be

There is a old joke about walking along one night and coming across someone looking down underneath a streetlight for lost keys. Stopping to help look, after a minute or two of searching you remark, "Your keys don't seem to be here. Where did you drop them?" "Well, I dropped them over in that ally, but it's way too dark to look there!"

While at Berkeley, one of the lessons I learned from Professor Kahan was "test where the failures are likely to be, ", which he stated much more mathematically as "seeking the singularly points of analytic functions." Especially for numerical applications, the tricky inputs to the code can differ markedly from algorithm to algorithm. For example, this was the underlying reason the Pentium fdiv bug was not caught sooner. A new SRT divider algorithm was being used and while billions and billions of existing tests were run and looking fine, new tests targeting the new algorithm were apparently not written. After learning of the general problem, Professor Kahan was able to write a short test program that probed at likely failure points, boundaries in a lookup table, and found incorrect quotients after executing for under a minute.

I keep Professor Kahan's advice in mind went writing regression tests for my JDK work, especially on numerics. At least on occasion, this methodology has flagged a bug unrelated to the code at hand. Tests I wrote for an initially internal "getExponent" method on floating-point numbers included checking adjacent floating-point values around each transition to the next exponent; the lucky by-catch of this was a HotSpot bug which was then corrected. From a code coverage perspective testing at every exponent value is not needed since the code executed is the same, but such thoroughness helps provide robustness against other kinds of failures and didn't take much more time or code in this case.

While the mathematics behind certain math library tests can be quite sophisticated, in some ways the structure of their input is relatively simple compared to, say, the set of legal strings to a Java compiler. In the worst case, for a single-argument floating-point method an exhaustive test "just" has to make sure each of the 232 or 264 possible inputs has the proper value. The set of possible Java programs is much, much larger and categorizing the set of notable transition points can be challenging, but looking for likely failures is still applicable and worthwhile as one aspect of testing.

Friday Mar 07, 2008

Bugs as good news

Often a new bug being reported is not regarded as a cause for celebration, especially late in a release! However, I strive to be dispassionate enough to genuinely welcome a new bug report if the bug is valid and thereby improves the accuracy of the bug database. Ignorance of bugs should not be confused with the absence of bugs. Only when the bug database is accurate, reflecting most of the actual defects and being largely free of spurious issues, is the correlation between the state of the database and the state of the product high enough to draw inferences about the quality of the product based on information in the database.

Thursday Aug 23, 2007

Joe's Rule of Renaming

Joe's Rule of Renaming
If you think renaming something is the solution, make sure you're solving the right problem.

Tuesday Jun 12, 2007

Balance of Error

Given limited resources, optimizing quality doesn't just involve minimizing the number of errors; it also involves balancing different kinds of errors. There are two basic types of errors one can make:

  1. Doing things you should not do.

  2. Not doing things you should do.

For the theologically inclined these would be sins of commission and sins of omission, respectively. Much of statistics deals with bounding the probability of making errors in judgments. From statistics, except in very specialized circumstances, you can't simultaneously control for both type I and type II errors. Therefore, generally the worse kind of error is phrased as a type I error and then the machinery of statistics is applied to the problem in that form; p-values are a bound of the probability making a type I error. However, that doesn't imply that the other type of error is unimportant or should be ignored. A well-publicized example of the need to balance both kinds of errors has occurred in the FDA's drug approval process. A pharmaceutical company must demonstrate safety and efficacy of a new drug before it goes to market. The FDA is primarily concerned with preventing the type I error of releasing of an unsafe drug to the public. However, the type II error of keeping useful drugs off the market can also be problematic and was raised as an issue during the early AIDS crisis.

In a software release, a type I error would be putting back a bad fix which introduces a bug and a type II error would be not fixing an issue that should be addressed. The more time that is spent verifying a fix is good in various ways (code review, testing, more code reviews, more testing), the less time that is available to address other issues. Overall, neither extreme of focusing only on reducing type I errors nor of focusing only on reducing type II errors leads to a global quality optimum for a given amount of resources. Qualitatively, for a given total number of errors the relationship between quality and the ratio of errors is a roughly bell-shaped curve:
Bell-shaped trade off between type I and type II errors

Toward the left side of the curve, type I errors are the dominant cause of reducing quality. As oversight is increased, the type I error rate is reduced and quality increases. However, the amount quality improves for each additional unit of oversight decreases as more oversight is added. Eventually, if enough time is spent reviewing each fix, the marginal change is quality is negative because those resources would have been better directed at producing other fixes. As illustrated in the right half of the graph, as fewer and fewer changes are made, while type I errors are very few, type II errors are numerous and total quality suffers.

Therefore, the mere absence of type I errors does not imply a high-quality release because the release could be fraught with type II errors from missing functionality. An added challenge is that recognizing that a type II error has been made is often much harder than recognizing a type I error occurred since the consequences of a type I error may be seen immediately (e.g. the build breaks) while evidence for a II error may only accumulate over time in the form of an escalation or as diffusely lowered perceived quality or utility.

While not having any defects of either kind is a laudable goal, it is usually not achievable because of the high costs involved. The Mythical Man-Month (TMMM) suggests there it is nearly an order of magnitude more expensive to deliver a mature "programming systems product" compared to just a working "program." Additionally, rather than scaling linearly, the cost of software seems to go up as the amount of code raised to the 1.5 power so larger projects cost disproportionately more. Adding resources can certainly improve quality, but only adding resources without adjusting processes might not be a very efficient means toward that end. A well-balanced low resource project could achieve better quality than a poorly-balanced high resource project.

In the graph above, the relative impact of type I and II errors is symmetrical. However, a project could be more sensitive to one kind of error or the other. For example, a young software project may be judged as being more sensitive to type II errors from missing functionality, such as during a beta release, while a mature project will be less tolerant toward the introduction of type I problems. TMMM summarizes an OS study that found over time repairs to the system become more and more likely to introduce as large a flaw as was resolved; the probability of making type I errors increased with system age.

Differing Error Sensitivities

Since the green line peaks before the balanced one, that corresponds to a project which is more sensitive to type II errors than type I errors. Conversely, the blue graph is more sensitive to type I errors so it peaks after the balanced line.

Assume that to a first approximation engineers work to maximize their contribution to a software release; therefore the process costs will shape what an engineer tries to get done (and along with the error rates of the processes) will affect the overall error ratio. Balancing the processes can alter both the natural error ratio and efficiency of engineering. Two factors which can help manage a project more effectively are:

  • Estimating the shape of a project's error sensitivity graph

  • Determining in which region of that graph the project is being run

Recognizing the different sensitivities helps shape the project's goals. Next, determining where the project is running should guide process changes to improve quality. If there are too many type I errors, more stringent processes should be instituted to catch problems earlier. If there are too many type II errors, the processes should be streamlined to allow more changes to be implemented.

While identifying operating in either extreme of the graph should be uncontroversial, finding the maximum is hard, especially since the error rates are difficult to measure. Some notions from numerical optimization may aid in this search and will be discussed in future blog entries.

Thanks to Alex for feedback on earlier drafts of this entry.

Wednesday Sep 20, 2006

Bug Tracking Blues

Last week, I attended a talk by Matt Doar about Common Problems with Bug Trackers or, phrased more directly, Bug Trackers: Do They Really All Suck?. The talk was hosted by the Silicon Valley ACCU; I'm scheduled to give their October talk about common floating-point issues and misperceptions.

Matt had a number of interesting observations about the often mundane process of working with a bug tracker. First, a poll of the audience revealed that while nearly everyone uses a bug tracker regularly, no one loves their bug tracker. In contrast, many people will passionately defend tools for other infrastructural tasks like source code management (CVS, Subversion, etc.). A likely cause for this lack of affection is the multiple parties who use a bug tracker, engineers, quality organization, program management, and the disparate ways those groups use the tool and the resulting compromises in the tool's design. Matt identified four broad problem areas with bug trackers:

  1. Designing Workflows (What is the state transition diagram?)
  2. Meanings Of Fields (What does "priority" mean? How do priority and severity differ?)
  3. One Bug, Many Releases (How should the same problem being fixed in multiple releases be tracked?)
  4. Trusting History (Can time-based queries be performed?)
These are consistent with discussions and concerns I've had about bug trackers over the years. One observation was that a priority field should implicitly or explicitly include a notion of what party to whom the priority applies, the customer, engineering, testing, etc.

Wednesday Sep 06, 2006

Recommended reading for compiler writers

Some thoughts on this topic raised by Peter, while the Dragon book is certainly a classic and solidly covers the basics, it is now long in the tooth and doesn't cover important compilation and code generation concepts and techniques useful for more recent language environments, like the Java programming language and virtual machine. Requests for compilation references are a common question on the comp.compilers newsgroup and a number of recommendations (including the Dragon book) are listed in the group's FAQ.

For numerics related topics, Goldberg's What Every Computer Scientist Should Know About Floating Point Arithmetic discusses why not all floating-point "optimizations" that can be applied should be applied and Henry S. Warren's Hacker's Delight is a rich resource for low-level arithmetic tricks.

Tuesday Jul 04, 2006

And the rockets' red glare

As seen over Stanford University, July 3, 2006.

Friday Aug 12, 2005

Hello World

I work at Sun Microsystems as a developer of the JDK, mostly on numerics as "Java Floating-Point Czar" and on annotation processing as Spec. Lead of JSR 269. So I expect most of my blog entries here to be on those topics. I also work on core reflection and the launcher.
About

darcy

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
News

No bookmarks in folder

Blogroll