Software testing is a huge effort, especially for automation. Teams can spend a lot of time, money, and resources on testing (or not). People literally make careers out of it. That investment ought to be worthwhile – we shouldn’t test for the sake of testing.
So, therein lies the million-dollar question: How do we know that our tests add meaningful value?
Or, more bluntly: How do we know that testing isn’t a waste of time?
That’s easy: bugs!
The stock answer goes something like this: We know tests add value when they find bugs! So, let’s track the number of bugs we find.
That answer is wrong, despite its good intentions. Bug count is a terrible metric for judging the value of tests.
What do you mean bug counts aren’t good?
I know that sounds blasphemous. Let’s unpack it. Finding bugs is a good thing, and tests certainly should find bugs in the features they cover. But, the premise that the value of testing lies exclusively in the bugs found is wrong. Here’s why:
- The main value of testing is fast feedback. Testing serves two purposes: (1) validating goodness and (2) identifying badness. Passing tests are validated goodness. Failing tests, meaning uncovered bugs, are identified badness. Both types of feedback add value to the development process. Developers can proceed confidently with code changes when trustworthy tests are passing, and management can assess lower risk. Unfortunately, bug counts don’t measure that type of goodness.
- Good testing might actually reduce bug count. Testing means accountability for development. Developers must think more carefully about design. They can also run tests locally before committing changes. They could even do Test-Driven Development. Better practices could prevent many bugs from ever happening.
- Tracking bug count can drive bad behavior. Whether a high bug discovery rate looks good (or, worse, has quotas), testers will strive to post numbers. If they don’t find critical bugs, they will open bug reports for nitpicks and trivialities. The extra effort they spend to report inconsequential problems may not be of value to the business – wasting their time and the developers’ time all for the sake of metrics.
- Bugs are usually rare. Unless a team is dysfunctional, the product usually works as expected. Hundreds of test runs may not yield a single bug. That’s a wonderful thing if the tests have good coverage. Those tests still add value. Saying they don’t belittles the whole testing effort.
Then what metrics should we use?
Bugs happen arbitrarily, and unlimited testing is not possible. Metrics should focus on the return-on-investment for testing efforts. Here are a few:
- Time-to-bug-discovery. Rather than track bug counts, track the time until each bug is discovered. This metric genuinely measures the feedback loop for test results. Make sure to track the severity of each bug, too. For example, if high-severity bugs are not caught until production, then the tests don’t have enough coverage. Teams should strive for the shortest time possible – fast feedback means lower development costs. This metric also encourages teams to follow the Testing Pyramid.
- Coverage. Coverage is the degree to which tests exercise product behavior. Higher coverage means more feedback and greater chances of identifying badness. Most unit test frameworks can use code coverage tools to verify paths through code. Feature coverage requires extra process or instrumentation. Tests should avoid duplicate coverage, too.
- Test failure proportions. Tests fail for a variety of reasons. Ideally, tests should fail only when they discover bugs. However, tests may also fail for other reasons: unexpected feature changes, environment instability, or even test automation bugs. Non-bug failures disrupt the feedback loop: they force a team to fix testing problems rather than product problems, and they might cause engineers to devalue the whole testing effort. Tracking failure proportions will reveal what problems inhibit tests from delivering their top value.
How do you know if your tests are adding value? What metrics do you use? Share in the comments below 💡