logo logo

Self Healing to the Rescue of Flaky Test Automation

Self Healing to the Rescue of Flaky Test Automation

None of us are strangers to flaky test automation and its hurdles. Yet in our quest to building scalable automation, we need to find a way to overcome such flakiness, reduce it or hopefully even eliminate it completely, while regaining reliability in our test cases. In this article, we will discuss reasons for test failures, namely flaky tests and reasons why they happen, and we’ll explore strategies to make our tests resilient and powerful. Let’s get started! 💪🚀

Table of Contents – Rescuing Flaky Test Automation

  1. Reasons for Test Failures
  2. Flaky Tests
  3. Self Healing Tests
  4. Summary

Reasons for Test Failures

An automated test by nature means that it is a deterministic way to execute an end-user simulation. Any change in the parameters of the pre-scripted automated test during the execution of the test means the test would fail. There are 2 main reasons why automated tests fail:

1) Invalid Test

The system-under-test has evolved to a new functionality, but the test is outdated. Hence until the test is updated, it will always fail when executed against the new product functionality.

While this type of test failure is not desirable, the test(s) can be quickly updated to match expectations and the next test run should give the right results. Also, this type of test failures (due to outdated tests) indicates a problem in the team’s way of working. All roles in the team need to work together, in a collaborative fashion and ensure the tests are updated to match expected behavior before the new application is deployed in the environment.

2) Application error

When executing the automated tests, we actually find that the application functionality is not working as expected (by the predetermined script). Since the test was working correctly before and started failing on a recent execution run, the test has found a defect in the application.

This is the true value of automating tests – to automatically find regression issues/unexpected changes in the application.

If there is no update/change done in the application-under-test or the automated tests, then the above 2 types of test failures will happen consistently – i.e. you will see the same failure on every test run.

Flaky Tests

There is, however, another reason for test failures that cannot be seen consistently, but rather are seen intermittently. There is usually some weird parameter that is changing during the test execution, that causes the tests to fail randomly.

These types of tests, that fail intermittently, are typically Flaky Tests – because they do not give consistent results.

It is very easy to dismiss the test result as flaky, and just rerun the test multiple times to see if it works eventually. And if it does, typically teams just ignore that the test(s) did fail on a few occasions, and avoid the additional work to identify the root cause of the failure and fix it 😥

<Side note:>

If I had a magic wand 🧙‍♂️, I would do the following:

  • Disable copy-paste when implementing tests – this leads to crazy amounts of code duplication and hence very bad and unmaintainable code.
  • Remove the concept (and all types of implementations) of “automatic rerun of tests if they fail”.

</Side node>

It is very important to acknowledge the fact that the test failed, and then spend the time to understand and investigate why the test failed in certain circumstances.

There are many reasons why the test may be flaky – here are some of them:

  • Actual issue/defects in the application-under-test
  • Parallel execution issue
  • Test execution sequencing issue
  • Test execution machine issue
  • Unstable application-under-test/environment issue
  • Incorrect element locator strategy
  • Test Data/Application data dependencies
  • Device related issues
  • Browser related issues (browser types/versions)
  • Network issues/glitches
  • Date/Time related
  • Synchronization/timing related issues

Sometimes it may be easy to figure out the root cause of the test failure, but there could be times when you need to get various different roles involved and go for a scavenger hunt to find the reason for failure. But once you find it, you will realize how much better your application and its supporting infrastructure has become.

Self-healing Tests

The choice of toolset, the practices used in building your automation framework can help in some specific cases to overcome test authoring related challenges that cause tests to fail.

In the rest of this section, we will see for example how TestProject makes life easier by having Selenium and Appium self-healing capabilities with regards to locator changes – which is one of the main reasons for UI Functional tests failures.

Let’s see a step-by-step how this self-healing capability works 🌟👉

Step 1: Record a test

I recorded a simple web test using TestProject’s AI-powered test recorder. I have shared the full recorded test using the “Share your test” feature of TestProject. You can find it here and run it yourself: https://app.testproject.io/#/shared-tests/with-me?s_token=8VHFfv82a0-W9UBtU8Rmcg

TestProject’s AI-powered test recorder

Step 2: Run the recorded test

Now that we have our test ready – let’s go ahead and execute the test. The below video demonstrates the test executing against my local Chrome browser:

Step 3: Introducing flakiness to test self-healing

I then updated the locator of one of the steps of the test, in order for us to be able to see how the TestProject self-healing works. In reality, the locator would potentially change as part of evolving functionality in the application, and hence the test would typically fail on subsequent reruns.

TestProject Self Healing

You can see this video of how I updated the test in TestProject:

Step 4: Self-healing test

Given that I have intentionally changed one of the locators (to make the test fail), when I reran this updated test again, instead of the test failing, the test passed. It did take a little longer to execute though as TestProject’s self-healing was working to heal our test 😉 From the next execution onwards, it will be lightning fast as the test will already be self-healed.

See this video that shows the test running successfully post the locator change:

Step 5: See the test reports

To investigate further, I looked in more detail at what happened in the test execution report. Here it clearly shows in the report that the locator had changed, but TestProject found an alternate locator for the same element, and the test passed.

TestProject Test Reports

This video shows how I review the test report and see that the “self-healing feature” was applied to the test:


Solving flakiness and making your test execution intelligent can be achieved with proper thought process, thorough due-diligence and correct tool selection. In this article, we’ve seen how TestProject’s self-healing is one great option to eliminate test flakiness and build resilient test cases. There are of course more strategies to handling flaky tests that you can explore, which I discuss in a recent webinar I hosted on Dec 22nd, 2020:

Eradicating flaky tests has never been so easy 😉❄
Happy Testing!

Leave a Reply