Be Careful What You Measure, It Might Improve
By Bob Hueston on Jan 18, 2007
Collecting and analyzing metrics is not something to be undertaken without the proper preparation. When you apply metrics to any activity, the metric may "improve" at the expense of other activities; the net effect might be negative.
A long time ago, I was on a large software team developing a new telecommunications product. The project was supposed to be to port an existing code base to a new hardware platform and OS. The differences in the hardware and OS changes were greatly underestimated, and the result was a huge number of bugs.
Management was concerned that (A) product quality was very low, with a high bug count, (B) testing was finding new bugs too slowly, and (C) development was fixing bugs too slowly. So they decided to employ metrics. They decided to measure:
- The number of bug reports filed per tester.
- The number of bugs fixed per developer.
At first glance, this seems like a reasonable approach -- it encouraged testers to find lots of bugs, and developers to fix lots of bugs, and it also encouraged them to do it quickly so the product would ship and bonuses would be paid. The result, however, was a disaster.
Testers began filing bug reports by the truckload. Every combination and permutation of conditions resulted in a new bug report: The wrong ring tone is used when connected to a BRI interface. The wrong ring tone is used when connected to a Tri-BRI interface. The wrong ring tone is used when connected to a PRI interface. The ring tone had nothing to do with the network interface; it was one software bug, but it was reported as three separate bug reports, thus boosting the individual tester's metrics.
Developers were no less shameful. They would seek out the bugs that were quick-and-easy to fix in order to drive up their metrics. Critical, high-priority bugs that would take a long time to diagnose and debug were avoided like The Plague. Also, the quality of code changes dropped, not just because of haste, but if a developer fixed one bug and introduced two others, then the tester would get to file two more bug reports, and the developer could fix two more bugs, driving up everyone's metrics. I honestly believe no one was intentionally introducing new bugs, but the metrics discouraged developers from testing for regression.
In the end, the metrics improved -- the rate of bug reports being filed went up, and the rate of bug fixes increased. But the goal was not achieved; overall, the quality of the product actually decreased. After a few weeks, management realized what was happening. They decided to measure the total number of open bugs, instead of maintaining metrics on a per-person basis.
When designing a system of metrics (and it is a design process), one has to consider the goal, and identify the measurable criteria directly related to that goal. In this example, the goal was to improve product quality quickly. Unfortunately, the selected metrics were only second-order criteria and were not directly related to the goal. The metrics improved while the goal slipped away. In other words, what you measure will improve, so be careful what you measure.