I’ve seen code that I have written go out to production before. This time it was an event that I had carefully orchestrated with the ops on-call team, mostly because of the magnitude of the change. We had rehearsed the change in our preproduction environment and the actual change itself was remarkably uneventful. This change I had made marked the end of over seven years of my career spent with either weekly or bi-weekly regression testing that prepared us for some version of a monolithic release.
After verification, I posted in slack:
“As the member of the QA department with second-most tenure, it is my absolute joy to announce that: The last manual release is complete, the regression is done, and cake is now CD from master.”
Working Down the Manual Regression Process
As I think about that day, a few months ago now, I’m struck by the efforts that it took to get there. My world in 2014 as a tester at Lucid was dominated by manual regression testing:
For us, manual regression testing, or “the regression,” was a pre-launch checklist used by the QA team to indicate a readiness for production release. It took a team of five testers, taking on 670 pre-release checks, often finding release-stopping bugs, working through an entire week to accomplish the task.
We knew that automation was one of the keys to reducing the burden of this manual effort. In my first year at Lucid, we reduced that list to roughly 330 test cases by careful use of automation. By 2020, that list would be reduced to the 120 most valuable test cases that provided confidence in the stability of our release candidate. A more manageable workload, but still not as efficient as we could be.
Eliminating the Manual Regression
In 2021, our efforts turned towards reducing the number of tests on the manual regression to zero. We automated everything we possibly could and challenged the rationale for retaining non-critical test cases. By the start of 2022, the regression was down to a handful of critical integrations with third parties. We aggressively worked towards removing those and trusting our production monitors to give us fast feedback on code released to production. Eventually, we felt confident that what remained should no longer be a barrier to moving away from weekly releasing and towards full continuous deployment.
The technical barriers to removing the regression were overcome in a similar amount of time. Our early efforts on continuous deployment started with a set of five smaller services, then eventually expanded to all but five services that are either dependent on database patches or were a large deployment of frontend code. We developed an automated database patch system that holds relevant releases from going out unless the corresponding patches are deployed as well. For the frontend, we took that large frontend target and broke it down into smaller distinct units of code, which are easier to roll back individually in the case that issues arise in our production system.
Regression Lost, Efficiency Gained
It’s been a seven-year-long journey for me to help take a large and time-consuming regression suite and use a combination of automated testing, production monitoring, continuous delivery, and process improvement to see this manual process finally become unnecessary.
What did we gain from this process?
I would estimate in 2014 that it took roughly 160 person-hours to complete the regression. At the start of 2022 it had been reduced but was still at least 40 person-hours of work. Now that we’ve eliminated it completely, we get a full-time QA member’s amount of time back to focus on other innovations and priorities. Well worth the effort!