Package and managers
Packages are collections of software: data files, binaries, executables, source archives, etc. These are published, resolved, downloaded, and installed with package managers.
Unfortunately, in 2017, almost every package manager has the the ability, the default, or even the common practice of creating nondeterminism in the form of version ranges.
This problem is illustrated in this conversation between two developers:
Bob: It looks like the tests fail.
Alice: Really? I ran them yesterday and they work fine for me.
Bob: Yeah. Here’s the error…
Alice: Try X, Y, and Z.
Bob: Still doesn’t work.
Alice: Okay, well try…
Bob: I think it’s octo-json-parser.
Alice: What version do you have?
Bob: Oh, I have 2.0.0.
Alice: Oh, that was just released this morning.
Floating software dependencies do all of the following:
- Produce different builds.
- Affect test results.
- Change which bugs are present.
They inhibit building and using older versions of the project, complicate bug reports, and make CI results unreliable. A version number or commit hash is no longer sufficient information to describe the state—the rest of the universe must be considered as well.
In the wild
Even relatively stable Java-land is not immune to poor software dependency decisions. For years, aws-java-sdk-core depended on joda-time [2.2,). aws-java-sdk-core version 1.9.40 will evidently be compatible with whatever version of joda-time is released in 2040. It will depend on the latest version of joda-time until the end of time.
The fallacy of harmless upgrades
Having established that 1.8.0 – Infinity is a bad version range, perhaps we can be more discriminating and only take the latest 1.8.x version. We’ll use 1.8.0, 1.8.1, 1.8.2, etc.
This notion is so common that some package managers dedicate syntax to this practice.
If everyone follows semantic versioning, it’s safe, right? Wrong.
The hypothetical story of the trailing comma bug:
- Version 1.8.0 of octo-json-parser provides a function
parseJson(), which is documented to parse JSON for valid input and produce an error for invalid input.
- A bug is reported where
parseJson()considers trailing commas to be okay, like
["abc",123,]. This is a bug, since the behavior differs from the documented API.
- The bug is corrected, so trailing commas produce an error.
- Since the documented API of octo-json-parser has not changed, semantic versioning classifies this as a patch release. And so the maintainers release version 1.8.1.
Wonderful. Bugs getting patched, clients getting 1.8.x updates…the machine of software humming along.
But wait—a client of octo-json-parser has been validating user input with
parseJson() and persisting the input to a database. The data is retrieved on demand and reparsed and processed with octo-json-parser. Upgrading to version 1.8.1 suddenly corrupts (makes unreadable) their months of stored data. The 1.8.0 -> 1.8.1 upgrade has failed.
Certainly some changes are safer than others, but there’s no such thing as a 100% innocuous change. Any dissenting opinion lacks imagination (and experience).
The fallacy of security
"~1.8.0" because it includes important security updates automatically. The reality is that no clever versioning scheme fixes security. In the event of a vulnerability, the exposure must be assessed, the fixed version must be deployed, and penetration and damage must be analyzed. Using version ranges for security is like bringing a Band-Aid onto an airplane in case there is a mechanical problem.
A note about external VCS
Some package managers (e.g., npm and Go) can depend on other VCS respos. These can have similar problems. If using Git, never depend on HEAD or a branch. Always use a tag or commit hash.
Package management in 2017 shouldn’t be so messy. Changes should be limited to a very well-defined scope, which is almost always a version-controlled repository. A reliable build must be hermetic. It shouldn’t depend on some bits moving around half a world away. Violating this principle will leave your builds as a thing of wax, molded by the current bit of weather.
There are several ways to overcome version ranges.
Don’t allow version range
Some forward-thinking package managers don’t allow version ranges at all, e.g., Nix and Guix. (In fact, these two take reproducible versioning to another level by using the package hash as the version.)
Don’t use version ranges
Even if version ranges are allowed, you don’t need to use them. Naturally, this only works if your transitive dependencies play by the same rules. For Java, this is a workable strategy. For Ruby, perhaps not.
And some poorly made build tools can be non-deterministic even using fixed ranges, e.g., npm.
Use version control sources
Copy the sources for dependencies into your repo, e.g., Go vendoring. This requires you to incorporate your dependencies’ build processes into your own, including the same build tools, header files, etc. Doing so can quickly get complicated. For example, on the JVM alone, projects could be using Ant, Maven, Gradle, SBT, Make, scalac, groovyc, jythonc, Clojure, etc.
Use version control outputs
Copy the binaries for dependencies into your repo. However, this bloats the size of your repo. While DCVS is awesome, hauling around 100GB of bygone binaries in the history is not. You may need Git LFS or another strategy for dealing with large repos.
This is an operational solution to the problem. Create a copy artifact repos—either plan copy or something more sophisticated like Artifactory—in a way that allows you to control when updates are made available. The downside is that this solution takes place outside version control.
Use version lock file
Bundlr has Gemfile.lock; npm has shrinkwrap; Go has Glock. These solutions generate complete version info for transitive dependencies, which is put under version control. They make updates more complicated than a simple one-line change, and complete version info now involves hundreds of lines. But it’s better than non-determinism.
Upgrade your dependencies. Security updates are important. Functional fixes are important. Performance improvements are important. But always upgrade intelligently and deliberately, under version control.
The important thing is to ensure that builds, deployments, tests, and versions are as reliable and reproducible as possible. If your package management solution hinders that, you should fix it. Just as you wouldn’t begin a Python script with
from __future__ import *, don’t use version ranges in software dependencies. Software is hard enough—don’t make it any flakier than it already is.
FYI, there actually is one valid reason for using version ranges: to document/enforce compatibility. This is the intent in package managers like apt and yum. Unfortunately, these package managers choose to resolve the versions in a non-deterministic way. This could be made to work in a deterministic way (e.g., require the lower point of the version range to exist and resolve to the lowest version of the intersection), but I’m not aware of any systems that do that.
If you don’t ever use version ranges how do you deal with dependencies that transitively depend on the different versions of the same library? For example, if you use library A that depends on version 1.1.1 of library C and library B that depends on version 1.1.2 of C, how do you resolve the conflict? Some languages can handle multiple versions of the same library, but most can’t.
How is having the package manager resolve to the ‘lowest version of the intersection’ any less non-deterministic than what they do now by resolving to the highest version?
I would argue that a good CI workflow is the better solution to this. Have all your dependencies update the the latest version automatically and ensure you have adequate automated testing in place to verify core your core behavior. You still want to review the release notes of your dependencies periodically to verify there isn’t an area that requires deeper inspection, but I would rather get my updates as soon as possible and fail fast, so that you don’t get into the situation where you have tons of massive updates to make.
Hey Paul, does your opinion about npm being a poorly made build tool still stand for npm version 5? They use a package-lock.json file now to pin down the dependencies for reproducible and hopefully stable builds even if version ranges are used. In general I agree on not using version ranges when defining your dependencies at all.
Please note: I can speak only for the Node.js npm/Yarn ecosystem.
This article has terribly outdated information. Version ranges are great and the only way to manage correctly your dependencies without having tons of duplicates. Also they are a security guarantee, because with them you can get security fixes “automatically”.
Yet there have been problems, that’s why both Yarn and npm@5 implement a lock file which gives you all the guarantees of a range-less package management with the power of a range-based dependency tree. (It’s different from shrinkwrapping!)
If module developers following this (IMHO not lucid) article start bundling specific versions in their npm packages everyone will suffer. It’s as stupid as it get.
Please, do your research before calling out on practices 🙁
@tim A CI workflow is only part of the solution. If a dependency is updated, that doesn’t trigger your CI. If you test on your CI, then deploy that version later, something could have changed. Package managers need lock files, which some have.
We use Yarn, which has a lock file to ensure dependencies are the exact version we tested in developent and on CI. After each sprint, we make a release, then I run a script that updates all dependencies to their latest version, one per commit, then runs them on CI. If it fails we use git bisect to isolate the problem. If it passes, we merge it and use the new versions to develop on for the next sprint.