Anti-Pattern: Assuming that more UI automation is better
There is one guaranteed way to lose trust in automation and kill automation before it’s even out of the gate. What might that be? 🤔
If you guessed that using UI automated testing is the culprit, you guessed correctly!
There are very few automation anti-patterns that will kill automation programs more rapidly than the use of UI automated testing on everything.
Despite opinions to the contrary, more automation is not necessarily better. Depending on the organization that’s starting up, less stable automation is arguably much better than more automation when it comes to testing.
Here’s some cool info:
In my consulting work with clients all over the world, I’ve seen all sorts of different types of organizations. Working with these great clients has given me real insight into this issue. Below, I’ll share some of that insight with you.
One organization ran a total of 123k automated UI tests in 7 days – however, if you take a look at this graph, you will see that only 15% of those tests actually passed.
As you can well see, that’s a very low passing rate. Basically, of the 100% of features the organization is testing, a whopping 85% of those features contain bugs.
To break it down further, that equates to approximately ~104,000 bugs that were logged in a seven-day period. If you’re thinking this figure seems highly unlikely, maybe even impossible, you’re not alone.
So what is the cause of all these failures and errors?
They are what we like to call false positives. Basically, failing tests that are not a result of actual faults in the software being tested, but rather a false reading.
So why are these happening? Why did the tests fail?
And are there people behind the scenes who are sorting through these false positives and negatives?
We need to look further at these ~104,000 non-passing tests and ask three main questions:
- Is there one bug in the application that caused all of the failures, or;
- Is it possible that two or more bugs that are causing all of the problems?
- Is it possible that there are actually ~zero bugs found, and that the majority of the failures are instead a result of bad automation efforts?
For my money, I’d be willing to bet it’s the third option – that there are actually a very low number of actual bugs, and that for the most part, it’s bad automation that are causing false positives.
But there’s more to consider:
How many automation engineers would it take to sort through approximately 104,000 non-passing tests in the span of a week? 🤕
When I ran a team of only four automation engineers, we could barely keep up with a small handful of non-passing automated tests per week. It seems impossible to keep up with a number as large as 104,000.
So one can’t help but wonder… Is anybody actually analyzing these non-passing automated tests?
So, moving on, let’s consider: What value are these automated test cases actually serving the entire organization? Do they actually help the business in question make any decisions about the quality of their software?
For instance, if there was an 85% failure rate in manual testing, would one still move the software to production? The answer is likely (and should always be) no… ❗
So, then, if we wouldn’t move a software with an 85% failure rate to production, why then is it considered acceptable for so many automated tests to run, not pass, and then continue to run anyway? 😧
The answer, to my estimation, is that automation is being largely ignored. It’s considered a noise that nobody listens to, including the developers of the automation themselves. They no longer serve any real purpose.
🚀👉 If you want to see these tips in code, as well as dozens of others, check out the Complete Selenium WebDriver with Java Bootcamp.
It might seem a hopeless business, but all is not lost.
There are many organizations that do automation correctly, following proper procedure and carefully analyzing results. Here’s one example:
So why was this automation suite more successful than the others?
For starters, this automation was executed over the course of an entire year, and over that year, there weren’t a terribly high number of failures.
While it’s true that this doesn’t necessarily mean that the automation was successful, it is considered more trustworthy, for two reasons:
- An automation that is passing for months at a time is more trustworthy than one which gets a failure every couple of months;
- The automation that is only 15% is significantly less trustworthy than an automation that only has a small amount of failures in the span of a year.
Here’s where it gets interesting:
When thinking on a single feature, such as a Facebook login or Amazon search, for instance – ask yourself how often does that feature break based on your own experiences?
Based on my own experience, it very rarely, if ever, happens.
So, then, if you have an automated test case for one of these features, which of the above graphs do you think looks more like a true indicator of how the development behaves?
Therein lies the answer…
Automated tests should look pretty much identical to the actual development of a feature.
If it passes most of the time (for instance, a minimum 99.5% of the time and only failing every so often), that would be due to a real regression and not a false result.
So what can you do to make your automation more valuable?
It’s pretty simple.
If your automation isn’t providing a correct result (say, more than 99.5% of the time), you should stop automating. Work on fixing your reliability, for a start. A quality automation only allows for about five false positives out of every 1000 test executions. You should be following this guideline, and not allowing for too many false positives.
You might be asking if that’s impossible, but it really isn’t. I’ve been able to successfully run the team, and below are my execution results:
Unfortunately, I no longer have the exact passing percentage of these metrics, but with a little estimating, you can see that the pass rate of the graph is extremely high.
Look at the red dots on the graph – these signify a failure. Then note one of the long non-failure gaps between approximately 1450 and approximately 1600. That’s around ~150 builds of zero failures.
Therefore, I can say with confidence that every failure on this graph was a bug that was introduced into the system, rather than a false positive, a common problem in UI automation.
I know this seems unbelievable, and I’m not bragging here, just stating facts.
The truth is that 99.5% of reliability from UI automation is really, truly possible. I’ve seen it for myself.
And it gets better:
I recently read an excellent post, written by a Microsoft Engineer, in which they talk about their two years’ experience in moving tests from UI automation to the right level of automation. They spoke of the drastic improvement in automation stability during this time period. Here’s a chart illustrating their experience:
Specifically, look at the data after Sprint 116 when they introduced their new system!
This is just one success story of many in which a company that does automation at the correct system level sees real results.
In conclusion – Less UI Automation for The Win
So as you can see, the data supports the fact that many of what we had considered bugs were actually glitches, or false positives, in UI automation. This is why doing automation at correct system levels is the way to go moving forward.
What do you think about automated testing best practices? Do you agree with my findings? Have you encountered any of these issues yourself?
I would love to hear from you in the comments below. Have a great day! 😉