Given limited resources, optimizing quality doesn't just involve minimizing the number of errors; it also involves balancing different kinds of errors. There are two basic types of errors one can make:
Doing things you should not do.
Not doing things you should do.
For the theologically inclined these would be sins of commission and sins of omission, respectively. Much of statistics deals with bounding the probability of making errors in judgments. From statistics, except in very specialized circumstances, you can't simultaneously control for both type I and type II errors. Therefore, generally the worse kind of error is phrased as a type I error and then the machinery of statistics is applied to the problem in that form; p-values are a bound of the probability making a type I error. However, that doesn't imply that the other type of error is unimportant or should be ignored. A well-publicized example of the need to balance both kinds of errors has occurred in the FDA's drug approval process. A pharmaceutical company must demonstrate safety and efficacy of a new drug before it goes to market. The FDA is
with preventing the type I error of releasing of an unsafe drug to the public. However, the type II error of keeping useful drugs off the market can also be problematic and was raised as an issue during the early AIDS crisis.
In a software release, a type I error would be putting back a bad fix which introduces a bug and a type II error would be not fixing an issue that should be addressed. The more time that is spent verifying a fix is good in various ways (code review, testing, more code reviews, more testing), the less time that is available to address other issues. Overall, neither extreme of focusing only on reducing type I errors nor of focusing only on reducing type II errors leads to a global quality optimum for a given amount of resources.
Qualitatively, for a given total number of errors the relationship between quality and the ratio of errors is a roughly bell-shaped curve:
Toward the left side of the curve, type I errors are the dominant cause of reducing quality. As oversight is increased, the type I error rate is reduced and quality increases. However, the amount quality improves for each additional unit of oversight decreases as more oversight is added. Eventually, if enough time is spent reviewing each fix, the marginal change is quality is negative because those resources would have been better directed at producing other fixes. As illustrated in the right half of the graph, as fewer and fewer changes are made, while type I errors are very few, type II errors are numerous and total quality suffers.
Therefore, the mere absence of type I errors does not imply a high-quality release because the release could be fraught with type II errors from missing functionality. An added challenge is that recognizing that a type II error has been made is often much harder than recognizing a type I error occurred since the consequences of a type I error may be seen immediately (e.g. the build breaks) while evidence for a II error may only accumulate over time in the form of an escalation or as diffusely lowered perceived quality or utility.
While not having any defects of either kind is a laudable goal, it is usually not achievable because of the high costs involved.
The Mythical Man-Month (TMMM)
suggests there it is nearly an order of magnitude more expensive to deliver a mature "programming systems product" compared to just a working "program." Additionally, rather than scaling linearly, the cost of software seems to go up as the amount of code raised to the 1.5 power so larger projects cost disproportionately more.
Adding resources can certainly improve quality, but only adding resources without adjusting processes might not be a very efficient means toward that end. A well-balanced low resource project could achieve better quality than a poorly-balanced high resource project.
In the graph above, the relative impact of type I and II errors is symmetrical. However, a project could be more sensitive to one kind of error or the other. For example, a young software project may be judged as being more sensitive to type II errors from missing functionality, such as during a beta release, while a mature project will be less tolerant toward the introduction of type I problems. TMMM summarizes an OS study that found over time repairs to the system become more and more likely to introduce as large a flaw as was resolved; the probability of making type I errors increased with system age.
Since the green line peaks before the balanced one, that corresponds to a project which is more sensitive to type II errors than type I errors. Conversely, the blue graph is more sensitive to type I errors so it peaks after the balanced line.
Assume that to a first approximation engineers work to maximize their contribution to a software release; therefore the process costs will shape what an engineer tries to get done (and along with the error rates of the processes) will affect the overall error ratio. Balancing the processes can alter both the natural error ratio and efficiency of engineering.
Two factors which can help manage a project more effectively are:
Recognizing the different sensitivities helps shape the project's goals. Next, determining where the project is running should guide process changes to improve quality. If there are too many type I errors, more stringent processes should be instituted to catch problems earlier. If there are too many type II errors, the processes should be streamlined to allow more changes to be implemented.
While identifying operating in either extreme of the graph should be uncontroversial, finding the maximum is hard, especially since the error rates are difficult to measure. Some notions from numerical optimization may aid in this search and will be discussed in future blog entries.
Thanks to Alex
for feedback on earlier drafts of this entry.