Making Backwards Compatible Changes

In large, complex, codebases where not every piece of code is released to production at once and customers expect to be able to access their data at any time, maintaining backwards compatibility is important. Here are a few fundamental principles we follow to ensure backwards compatibility in various situations, as well as some examples of how they can be applied.

Fundamental principles

Elayne's bookshelf full of computer science books — A small computer science library.

Do not rely on release order

Let’s assume that we’re improving some software for a library (an actual library with books), and they want to have an option on the front page of their website where you can click a button to show a random book with its blurb. There is already a mechanism for searching for a specific book, but there is no endpoint for returning a random one with its associated data.

We write the back end code, creating a new endpoint to select a random book from the database. We write the front end code that submits the request to our new random book endpoint in the back end, parses the data, and displays it on the page. We run it locally and successfully test it ourselves. Our quality assurance specialist is able to load it and run it as well. Everything looks great, so we go ahead and release it.

Let’s say our front end code is released before our back end code. Because this is a super popular library, hundreds of users see the main page of the website every minute. Users see the button to see a random book, click it, and get an ugly error message. The back end is released a few minutes later, but tons of users already had a negative experience on the page.

Now let’s say our back end code is released first. The new endpoint goes out first, and nothing is accessing it. Then the front end code is released, and everything is working. Wonderful! But then someone else’s code that went out with the release ran into an unexpected error, and the back end actually needs to be rolled back. Now only the front end of our change is out, and we run into the same ugly error as before.

So, how can we keep code updates from resulting in these types of errors? We can split up the code into multiple pull requests, making sure the first one that is needed is released and beyond the point where it could be rolled back before releasing the second one. Or, put the front-end changes in this example behind a feature flag, so that the random book generator won’t show up for the user and won’t make the calls to the back end until everything is fully released and you turn the flag on.

Ask what will happen if a change is rolled back

In the library example above, a single service being rolled back caused clear problems. There can also be issues if an entire change is rolled back.

Let’s look at this on a small scale, for a single pull request before the code is released. Say that I moved the code for blurb processing from folder A to folder B so that it would be accessible where I needed it in the front end. I merge my changes into the master branch. My friend is working on another bit of the library’s website, where they also want to show the blurb for a book when a user adds the book to their new Favorites list. They import the blurb processing code from the new location, folder B. Someone realizes there’s an issue with my code before it’s released, and asks me to revert my changes. I revert my commits, so now the blurb processing code is back in folder A, but my friend’s code is referencing this file in folder B. Once my friend’s code is merged, the master branch is broken.

Lots of things went wrong in this scenario, some of which could have been prevented with automated tests and build checks, but it illustrates how rolling code back can be problematic. If anything is rolled back from a release, we should ask ourselves if there could be any issues with the data that was saved while it was released. Is everything still going to work properly?

An arrow from table A to table B with a question mark over it

Update data in the right order

What if you want to make a change to how data is stored? Maybe you want to move information from table A to table B. You need to change two things: where your code is writing and reading its data, and you need to move the pre-existing data that was in table A to table B. So how do you do it?

If you move the data first, there will be a period of time when users are still writing to table A, before you change where the code is writing and reading, so that data will effectively be lost.

If you change where your code is writing and reading data, and then move the data, there will be a period of time where users are writing new data, but can’t read their old data.

Is it possible to move data without having any downtime or losing access to it? If you follow the right steps, yes. Here is an example of how you might make it work:

Original state. Read from and write to table A.
Read from table A. Write to table A and table B.
Transfer data from table A to table B. Because we were already writing to both tables, all the data will now be in table B, as well as table A.
Start reading from table B.
Remove writes to table A and the table itself.

Following this pattern, our changes are fully backwards-compatible. No data was lost or inaccessible during any point in the process.

Consider customer impact

There are some cases where customers need to be directly informed about big changes. A good example of this is with API updates and deprecations. There is a cost to maintaining super old code and functionality, and sometimes it is necessary to notify customers that they should switch over to a new and better process. Another well-known example would be Microsoft asking everyone to switch from Internet Explorer to Microsoft Edge, a completely new product. Hopefully this kind of drastic change that requires customer action is a rare occurrence, but when it does happen, clear communication and following a well-defined process is vital.

As another example, let’s imagine you are changing the URL for your website. Customers need to change their API calls to hit the new URL, or when the old URL is no longer supported, they will just get errors instead of the data they need. To ensure a smooth transition, you might first get the new URL working, notify customers that they need to switch to the new URL by a certain date, and then remove the old URL after that date. If it was communicated well and there was enough time while both URLs are working for customers to make the switch, end users should never notice that a change was made.

Conclusion

Backwards compatibility problems can occur in tons of different ways. Hopefully this post gives some ideas that will help you avoid some major pitfalls. If you have any other suggestions for maintaining backwards compatibility, put them in the comments!