How QA Transitioned from Biweekly to Weekly Releases

I imagine it went down something like this: the VP of Engineering, director of QA, and some key engineers at Lucid were hanging out in a Star Wars-themed conference room one day when someone said:

“Hey, we keep developing really cool features for our users. We should release every week instead of every other week!”

The VP of Engineering immediately got a glimmer in his eye. Someone from DevOps groaned at this suggestion. The director of QA started thinking about how we could possibly get the regression done faster.

Now, who knows how the decision actually came to be, but one thing is for sure: Lucid moved to a more frequent release schedule, and we lived to tell the tale. A lot of things had to change across multiple areas of engineering, including processes and strategies over an extended period of time. Here’s what we as a QA team ended up changing and what we learned along the way.

The hurdle

The biggest obstacle was the regression, a suite of tests done after development and before deployment to production to ensure nothing in the product “regressed.” These tests were broken into four main focus areas: Lucidchart, Lucidpress, mobile applications, and integrations with external software. Every test case had a priority of medium (the system default), and we usually completed items in order of how they appeared in the list (or we picked the least boring cases first).

Eight manual testers would start the regression on Monday morning with a goal to finish by Wednesday night. Automated tests were also run during this time. Deployment to production was on Thursday afternoon, which gave QA and engineering a half day on Thursday morning to tie up any loose ends. This amounted to roughly 200 hours worth of work that had to be completed before we could release.

The changes

In our transition to weekly releases, we tried some things that didn’t work. We tried some things that did, and we kept them. Ultimately our changes fell into three main categories:

Reformat the regression
Split the teams
Share the load

Reformat our regression

Our first step toward weekly releases was shortening the length of time between start and finish. We achieved this goal by cutting out Wednesday for testing and reducing the “tying up loose ends” time from five to two hours. With this change, we would finish the regression Tuesday afternoon and begin the release to production on Wednesday morning.

At first, we tried shortening our very lengthy regression to a more session-based test style. Broad feature areas were included on the regression list, but the tester would decide how to test and document what was tested. This process didn’t end up working for us because there wasn’t enough consistency from regression to regression on what features would be tested. By this time, we started bringing on more new people who were still learning the products and features, and they needed a more detailed test suite.

As a result, we started identifying and prioritizing our test cases based on risk. We knew that, because we had a shorter period to test, we might not always finish each of the hundreds of medium priority test cases, so we wanted to tackle the most important/risky test cases first (why didn’t we think of this before?). Better prioritization helped us identify blockers in the most critical or risky parts of our product faster. Additionally, if we didn’t finish the medium/low regression by end of business Tuesday, we knew we wouldn’t need to put in a late night to get everything done. This new format also prevented testers from picking all the easiest test cases first and leaving the “hard” but important ones until the end.

Split the teams

Once we changed the regression, we split 18 development teams into two groups, Eagle and Bear, with opposite sprint and release schedules. The week Eagle started their sprint was the same week Bear was finishing theirs, which staggered the amount of new code that would be released each week. This process introduces less risk each deployment and makes it quicker and easier to identify the cause of blockers. For this reason, we as a company decided we could accept if the manual team didn’t finish all the medium/low priority tests. If a low priority bug did make it into production, we could get a fix to production the following week, thus reducing the number of emergency releases. Another benefit of this strategy was that code merged early could be released early, regardless of whether it was your team’s week to release.

On the subject of development teams, we had to make a paradigm shift away from merging untested or undertested code because “the regression will catch it” to only merging production-ready code. Developers used to merge their code to the master branch, and then QA would verify and test. Now, members of QA are included as approvers and will test before code is merged. Members of QA adjusted by not only testing the new code itself but also by questioning what else the changes could affect and looking into areas that they would previously wait until regression to test. This way, we ensure more bugs are caught early on.

As we were preparing to switch to weekly releases, we had to decide whether to have the whole QA team complete the regression each week or to split the team and rotate weeks. We found it benefitted the QA team to take turns every other week to avoid regression burnout.

Share the load

To really make this transition to weekly releases work, we had to make testing a part of everyone’s job. We started to involve developers, product managers, and UX designers in regression testing, and we also began amping up and improving our automated tests.

Developers take turns attending a Monday afternoon two-hour war room where they work with a member of QA to complete a section of the regression. UX designers also pair up with their team’s QA member for an hour every other week to complete sections of the regression. Product managers have also helped with the regression when there are time-sensitive stories that need extra eyes. Implementing these practices has helped developers and UX designers interact with areas of the product that they aren’t often exposed to and improve their product knowledge.

We had to have buy-in from the rest of development on the idea that they shouldn’t “add to the regression,” which may or may not have involved slight public shaming at team lead meetings. Developers made it a priority to write automated tests for every new feature they implemented. Writing tests is an expected part of feature development and teams briefly explain what automation work they performed in their sprint reports to managers.

Existing automated tests were often very flaky and unreliable, which lowered everyone’s expectations of them providing useful information. The automation team spent about a year cleaning up the tests in an effort to prove the tests can be helpful. We also took all the reliable tests that we only run against the release candidate branch and started running them for every code change before it got merged. This helps identify test failures early on and was very instrumental in moving toward weekly releases. Having a stable automation suite is essential so that developers don’t spend valuable time digging into false positives from flaky tests.

Many of our test cases could be handed off to other testers, which led us to try Rainforest QA, a crowdsourcing platform. Our team identified what could be crowdsourced, wrote more specific instructions, and made some tweaks to our testing environment. The rationale to try crowdsourcing was that it would free up some time for the testing staff to focus on more high-risk testing, while the more repetitive tests could be done remotely with results coming back quickly (typically within an hour). Writing more detailed test cases for crowdsourced testing simplified the process for writing automated tests and proved to be invaluable in transitioning cases from being tested manually to being automated. Adding detail to our test cases required an investment of time up front, but now 50% of our Lucidchart regression is crowdsourced, and it has reduced our overall regression time by 20 hours.

Conclusion

It sounds clean, easy, and logical when I spell it out on paper, but this evolution represents roughly six months of trial and error, paradigm shifts, and process changes across multiple departments to get to a working solution. Though it took a lot of hard work and convincing of others, we have been steadily deploying to production every week for the past six months.

Is it a perfect process? No.

Does it work for everybody? No.

Will we always do it this way? Heck no!

Now that we have weekly releases running smoothly, our next goal is to separate our monolithic application into more microservices that can be continuously deployed. Achieving this goal will involve other mindset changes for many people as well as tweaks to the process. Wish us luck!

1 Comment

Dinesh • November 1, 2018 at 1:10 pm

We made a similar transition at my last company, main things to focus was certainly the regression tests. we focused on getting most important areas automated and then do some work to get them to run in parallel in less than 4 hours.
Some work was also involved on dev ops side to make release and rollbacks easy
Biggest change was certainly mindset too. How to work as a team, feature flags, QA buy-in for merges coming late in sprint etc.