How Apt transports work, how we wrote one for S3, and how you can write your own.
Debs in S3
Lucid Software uses Debian packages (debs) for packaging and installation. Custom scripts download the debs from a private AWS S3 bucket. As much as we loved the quirks of homegrown scripts, we wanted to move to a use a proper Apt (Advanced Package Tool) repository and toolchain.
This required us getting Apt to work with S3.
People have already written custom Apt transports for S3. There’s apt-s3 in C, which is a fork of a fork of a fork of apt-transport-s3. And there’s an identically named — but completely separate — apt-transport-s3 in Python. However, neither project
- used the standard AWS credential resolution; instead they required ad-hoc credential files
- supported version 4 of AWS signatures (mandatory in eu-central-1)
- supported If-Modified-Since caching
- supported pipelining
Creating a custom transport
Given the drawbacks of current solutions, I decided to author a better Apt transport method. The Apt transport method documentation is sparse, but it gives an adequate overview.
Each protocol — http, https, ssh, etc. — is implemented as an executable, and placed in a file named by its schema in /usr/lib/apt/methods. apt-get invokes each of these executables. It sends messages to the process via its stdin and receives messages from the process via its stdout. Messages have a HTTP-like text format consisting of an initial status/command followed by several RFC-822 fields, and terminated by a blank line.
However, the Apt method documentation has omissions or ambiguity. For example, on the subject of pipelining, the documentation says only
Methods should set the pipeline bit if their underlying protocol supports pipelining. The only known method that does support pipelining is http.
mv /usr/lib/apt/methods/http /usr/lib/apt/http-real
“Setting the pipleline bit” is rather unclear. So I proxied the “official” http method to observe the inputs and outputs.
mv /usr/lib/apt/methods/http /usr/lib/apt/methods/http-real echo '#!/bin/sh > /usr/lib/apt/methods/http' > /usr/lib/apt/methods/http echo 'tee /tmp/in | /usr/lib/apt/methods/http-real "$@" | tee /tmp/out' >> /usr/lib/apt/methods/http
Now after apt-get update, incoming messages for the http transport are in /tmp/in, and outgoing messages are in /tmp/out.
100 Capabilities Version: 1.2 Pipeline: true Send-Config: true 600 URI Acquire URI: http://mirrors.xmission.com/ubuntu/dists/trusty/main/i18n/Translat... Filename: /var/lib/apt/lists/partial/mirrors.xmission.com_ubuntu_dists_... Fail-Ignore: true Index-File: true
201 URI Done URI: http://mirrors.xmission.com/ubuntu/dists/trusty/main/i18n/Translat... Filename: /var/lib/apt/lists/partial/mirrors.xmission.com_ubuntu_dists_... Size: 762361 Last-Modified: Tue, 15 Apr 2014 16:42:29 GMT MD5-Hash: 6d991ed7d035b51aa77883a107896db9 MD5Sum-Hash: 6d991ed7d035b51aa77883a107896db9 SHA1-Hash: 8aa7a170afdf02c587c700b63d090c6edd794a02 SHA256-Hash: ed8741c9fb597579cbbb491f1f2a3bd8851e373aae9e61deddb46913d0... SHA512-Hash: 2004577b96a20392c6934679cb40c81486967f67927c4ff9dd1dc32da2...
So “Pipeline: true” is needed for pipelining. Some more lessons:
- Although documentation mentions only MD5-Hash, if the method does not provide hashes for all algorithms in the package index, apt-get fails with “Failed to fetch … Hash Sum mismatch” (example). Include all standard algorithms: MD5-Hash, SHA1-Hash, SHA256-Hash, SHA512-Hash.
- Sometimes the downloaded lists can become corrupted and cause odd issues.
rm -r /var/lib/apt/listsfixes that.
- Don’t forget to flush! Otherwise, apt will hang while waiting for you.
- If you support pipelining, set “Single-Instance” to “yes”. This will start a single process for your method and reuse it.
- Even for a cache hit, include the Filename from the URI Acquire request in the response.
- The standard configuration apt mechanism is /etc/apt/apt.conf.d . Prefer that over requiring ad-hoc files all over the place.
- When possible, use conditional Last-Modified/If-Modified-Since caching. This allows the client to avoid downloading megabytes of package lists on every update.
- What apt means by “pipelining” (queuing requests) is really “multiplexing” (accepting responses in arbitrary order relative to requests).
The project has been released as apt-boto-s3. The implementation is under 250 lines of Python. See it at https://github.com/lucidsoftware/apt-boto-s3/blob/v1.0/s3.py.
See the Github repo for more information, including how to install it from our public Bintray apt repo.