My automation is not passing!
So? So, what? It’s not your job to “get the automation to pass”. Allow me to explain.
I once worked with a partner team who’s primary responsibility was to create automation in the form of test scripts, execute those scripts, report on their results, and perform maintenance on those scripts.
Don’t judge me; it was like that when I found it so that’s the structure from which I had to start.
The existing test scripts were, shall we say, flaky…in fact, calling them flaky is being charitable. During our daily stand-ups, the team would report on automation execution progress; almost every day there were failures. When these script failures occurred, the status would unfailingly be “we could not get the scripts to pass”.
I knew what they meant, but the wording bugged me. I let that go on for a few weeks because I thought maybe it was just me. Perhaps, I was being too sensitive to the vocabulary; I wasn’t. I soon discovered that “getting the scripts to pass” sometimes included rerunning the scripts until they all passed; other times, “getting the scripts to pass” meant protecting against “our expected unexpected” events, i.e., making the scripts handle “questionable” events so they could keep going. “Our expected unexpected” events were those that occasionally manifested during an automated run. Protecting against them meant checking if they occurred and then adjusting the flow of execution to by-pass the “expected unexpected”, thus continuing on with the automation execution.
The most egregious example of this was when I was working for a SaaS company. It was explained to me that the scripts needed to be rerun because “sometimes they just failed for no reason”. In my new role as the automation manager and architect, I decided I should investigate that “no reason”.
During my investigation, I discovered something different than what was explained. Different, and worse. There were places in one application’s scripts where the “F5” key, i.e. reload, was pressed. When I inquired about why this keypress was present, I was told that “sometimes” the application would respond with an application is unavailable page; pressing “F5” would restore the previous page and the script could pick up where it left off. Needless to say, I had to ask if this intermittent issue of “application is unavailable” had been reported. I was told, “yes, but they said it only happens in the QA environment; it doesn’t happen in production”. Pardon me while I pick myself up off the floor 🤯🤦♂️
The team had never considered that perhaps the intermittent “application is unavailable” issue was not the actual issue, but that it was a symptom of other application issues that could be causing the other intermittent failures, i.e. those that failed for “no reason”. By continuing to modify the scripts to get past these intermittent issues, other issues that were possibly higher impact were hidden because the scripts did, eventually, report a successful result. What was not reported were the hurdles and detours the scripts had to bypass to achieve that “successful” result.
I promise I’m not trying to change people’s vocabulary regarding “can’t get the scripts to pass”; I’m trying to change the mindset. The point of automation is not to get the scripts to pass; it’s to help team members do their jobs, often by helping them do those jobs more effectively or more efficiently. Regarding passing or failing test scripts: If a test is breaking the build, you should probably fix the problem ASAP. Flaky or buggy test? Delete or rewrite it! Flaky or buggy application? Report it along with anticipated value loss or risk!
Changing your mindset does not require changing your vocabulary. That said, sometimes changing your vocabulary can help with a mindset change. Think about what works best for you 🧐