logo logo

Introduction to Mutation Testing

Introduction to Mutation Testing

Throughout history, there have been some catastrophic bugs that cost millions and produce a lot of headaches. People make mistakes, and due to the nature of software, sometimes just one line of code is necessary to take down a system and everything that depends on it.

Thankfully, companies and professional software developers are now aware that the best way to catch those errors in the early stages of development is to write automated tests, saving time and money. No matter how many tests are written, it’s unrealistic to believe developers actually checked every case.

Experienced developers make mistakes, especially when they are new to testing. The first few tests are always poorly written, that’s a fact. Then how can you trust your automated tests? 🤔

Here is where mutation testing enters, a technique that can help you and your team write better robust test suites by catching those common small errors we often skip.

Table of Contents

A little bit of history

Mutation testing originated in 1971 when Richard Lipton proposed the initial concepts of mutation in a class term paper titled “Fault Diagnosis of Computer Programs”, but it wasn’t until the end of the decade that greater advances on this topic were published.

Since the late ’70s, a lot of research has taken place, but mainly for reducing the computational cost of this technique:

Increase of publications about Mutation Testing along since it origin until 2009
Source: An Analysis and Survey of the Development of Mutation Testing

It wasn’t until the ’80s that the first tools were developed, PIMS and Mothra for FORTRAN language, Proteus for C, and µJava for Java. Since then and until a few years ago, mutation testing didn’t receive enough attention outside academia. Researchers believe the reasons why the industry failed to adopt mutation testing were due to three primary reasons:

  • Lack of economic incentives for stringent testing
  • Inability to successfully integrate unit testing into software development processes
  • Difficulties providing full and economical automated technology to support mutation analysis and testing

To this year, the first two points I’ll dare say are now accepted as industry standards, and companies, as well as developers, find value in them. Also, thanks to open-source projects we’ve modern state-of-the-art mutation testing libraries in many popular programming languages.

Perhaps that’s why mutation testing has gained attention in the last couple of years, being the center topic for many talks. Could it be the moment to finally adopt this technique broadly?

What is mutation testing?

Mutation testing is based on two hypotheses: Competent programmer and the Coupling effect.

The first one states that the majority of the flaws introduced by developers are due to small syntactic errors, whereas the latter asses that simple faults can cascade and produce other faults in such a way that detecting the simple faults a program will detect the most complex faults ❌

Then how to detect those subtle errors? Mutation testing takes a straightforward approach, it seeds faults into the source code. It does this by applying transformation rules on common lines of code, after the operation we said that a mutant was injected into that line of code.

When we alter the original code we are also changing the behavior of the system. Meaning that a test that was verifying an expected behavior in the original code should fail when it’s run against the mutated code.

The faults injected in the code are called Mutants. Every time a test doesn’t fail, we say a mutant survived because an intentional change in the behavior of the system wasn’t caught by our first line of defense. The unit tests said that everything went alright when instead they should raise an error evidencing the mutant. On the contrary, when a test fails while running against the mutated code, we say that the mutant was killed.

Once we detect a survived mutant, our goal is to kill it, and the way to achieve this is by writing a unit test for the new mutated behavior. After adding the new test or altering an existing one to catch it, we execute again a mutation testing cycle to ensure the Mutant is killed 🔁

Mutation operators

Mutants are created using a set of transformation rules called Mutation operators. Their purpose is to either mimic typical programming errors such as comparing mistakenly two variables, using the wrong operator, or forcing to create new valuable tests like using empty strings and null variables as inputs or dividing by zero.

A list of 22 mutation operators for the Mothra Mutation testing tool written for FORTRAN
Source: An Analysis and Survey of the Development of Mutation Testing

Above we can see a list of mutation operators for the Mothra toolset while below we see a subset of the 30 mutators from a modern tool called Stryker:

A subset of Striker Mutation operators
Source: Subset of mutators from Stryker’s official documentation

How does the technique work?

Originally the first works on this technique involved manual steps like running the tests over the mutated code and the original one, comparing their outputs, and analyze and mark equivalent mutants.

Thanks to years of research modern tools have now automated those steps and reduce the computational cost by focusing on three main strategies.

  • Do fewer: Seek ways of running fewer mutants without incurring an intolerable information loss
  • Do faster: Focus on ways of generating and running each mutated program as quickly as possible
  • Do smarter: Distribute the computational expense over several machines, retain state information between multiple runs or avoid complete execution favoriting partial runs

In the diagram below there is a basic workflow on how mutation testing tools work today. As a very important optimization step, the mutants are seeded only on code that is executed by tests because otherwise it won’t be detected 🕵️‍♂️

Mutation Testing workflow in modern toolsThe mutants are applied at once and the mutated code is created in a temporal package or folder to later be deleted once the process ends.

Benefits of mutation testing

The big benefit of mutation testing is that it helps to identify flaws in our design, showing us cases that we didn’t consider. Every time a mutant survives it means there is a missing test case, but that conclusion derivates into multiple possible decisions.

When mutation testing runs, it will find portions of behavior that aren’t tested. Upon examining the findings, the team could decide either to add a test, improve an existing test, or delete the code. Let’s see in practice the benefits of this technique with the following example:

Suppose we are building a system to manage payments and to prevent messing with the account balance. Every time a user wants to withdraw money from his account we first evaluate if that’s possible by calling the canWithdraw function.

/* withdraw.js */
function canWithdraw(account, amount) {
    return account.balance() > amount;

/* withdraw.test.js */
describe('canWithdraw', ()=> {
    it('should return true when the balance is greater than the amount', () => {
       const account = { balance: () => 50 }
       expect(canWithdraw(account, 20)).toBe(true);
it('should return false when the balance is less than the amount', () => {
       const account = { balance: () => 50 }
       expect(canWithdraw(account, 70)).toBe(false);

The test is probing that the function returns true when the balance is greater than the amount to withdraw, and false when not. What about the case when we want to withdraw the exact amount we have in our account? 🤨

Oops, we forget to add that part, and running the unit tests won’t help us because they say everything is ok. Luckily for us, we decided to try a mutation testing tool that changes account.balance() > amount for account.balance() === amount using an equality operator, and warns us that there is a survived mutant.

Another indirect benefit is that when we use mutation testing on a daily basis and produce redundant and unnecessary code, it can potentially generate more mutants to be aware of. Hence, to avoid being endlessly hunting for mutants, mutation testing encourages us to write simple code ✅

Mutation score

The mutation score is calculated using the results of a mutation testing tool running over our code. That’s unlike code coverage which is focused on completeness by detecting pieces of code that aren’t executed during the tests. Mutation score is a measure to evaluate the quality of our tests, considering how good we are at killing mutants.

Mutation Score = Killed mutants / Mutants covered* 100

The metric calculates the percentage of mutants killed over the total of mutants that successfully were seeded in the source code. This excludes any mutant that raised a compilation error.

Mutation Score badge showing a 82.3

If you like the idea of using mutation scores in your projects you can add a badge to your repositories. Read the documentation of your mutation testing library of choice to know if it has support for the badge.

Problems and disadvantages of mutation testing

One of the biggest problems of mutation testing is the computational cost. For each mutation operation, the framework reads all the source code and analyzes where to insert the mutants, then runs the tests for each different mutant so they don’t influence the results of others.

Although there are lots of approaches for optimizing that process, it’s still at least one order of magnitude slower than unit tests. Not every surviving mutant is legitimate, so it takes brainpower to determine if we want to kill that mutant, if that code is really necessary, or if we want to skip that piece of code from mutations (e.g loggers).

Equivalent mutants

Equivalent mutants can be thought of as “dead-weight”. They require computational power to generate them but don’t introduce any changes in the behavior of the system. Either if we apply them or not the mutated code and the original have the same outputs after running the tests.

Since the original and mutated code has the same outputs, it’s impossible to determine an error and discover the mutants. In JavaScript to sort an array, we can pass to the .sort method a function that will compare the previous and next values. If that function returns:

  • A value greater than zero, sorts b before a
  • A value less than zero, sorts a before b
  • If the value returned is zero, a and b are considered equal

Now, if we write the unit test for this function comparing the output to be greater, equal, or less than zero. No matter if we apply a mutation operator to increase or decrease the value returned, the test won’t be able to detect a change in the behavior. Meaning for the test nothing has changed.

// original
function compare (a, b) {
   if (a > b) {
     return 1;
   } else if (a < b) {
     return -1;
   } else {
     return 0;
// mutated
function compare (a, b) {
   if (a > b) {
     return 2; // +1
   } else if (a < b) {
     return -2; // -1
   } else {
     return 0;

That’s the idea behind equivalent mutants. They are nearly impossible to predict but it’s possible to use constraints and heuristics approximations to detect them 🔍

Another thing to consider is that mutation testing frameworks may use different strategies to tackle these problems, including equivalent mutants. So it’s really important to read their documentation and try their effectiveness, because they may throw different results for the same source code.

Final thoughts

Mutation testing is a powerful, but computationally expensive technique, which is what has held it from becoming widely adopted in practical situations for years.

Today, the situation seems to have changed. With more computational power at our disposal, and open-sourced tools following the state of the art on mutation testing, it may be worth the try to include it in real projects 🤗


About the author

Rodrigo Martínez Díaz

Rodrigo is a Software Engineer with more than 8 years of experience working on startups, medium-sized companies, and big corporate ones. Passionate for his career and an advocate of good practices, Rodrigo shares his experience on learning clean code, robust tests, and building modern Web Apps.

Leave a Reply