Apt transport for S3

Apt transport for S3 thumbnail

How Apt transports work, how we wrote one for S3, and how you can write your own.


Debs in S3

Debian packages and S3

Lucid Software uses Debian packages (debs) for packaging and installation. Custom scripts download the debs from a private AWS S3 bucket. As much as we loved the quirks of homegrown scripts, we wanted to move to a use a proper Apt (Advanced Package Tool) repository and toolchain.

This required us getting Apt to work with S3.

Existing solutions

People have already written custom Apt transports for S3. There’s apt-s3 in C, which is a fork of a fork of a fork of apt-transport-s3. And there’s an identically named — but completely separate — apt-transport-s3 in Python. However, neither project

  • used the standard AWS credential resolution; instead they required ad-hoc credential files
  • supported version 4 of AWS signatures (mandatory in eu-central-1)
  • supported If-Modified-Since caching
  • supported pipelining

Creating a custom transport

Given the drawbacks of current solutions, I decided to author a better Apt transport method. The Apt transport method documentation is sparse, but it gives an adequate overview.

Each protocol — http, https, ssh, etc. — is implemented as an executable, and placed in a file named by its schema in /usr/lib/apt/methods. apt-get invokes each of these executables. It sends messages to the process via its stdin and receives messages from the process via its stdout. Messages have a HTTP-like text format consisting of an initial status/command followed by several RFC-822 fields, and terminated by a blank line.

Apt transport architecture

However, the Apt method documentation has omissions or ambiguity. For example, on the subject of pipelining, the documentation says only

Methods should set the pipeline bit if their underlying protocol supports pipelining. The only known method that does support pipelining is http.

mv /usr/lib/apt/methods/http /usr/lib/apt/http-real

“Setting the pipleline bit” is rather unclear. So I proxied the “official” http method to observe the inputs and outputs.

mv /usr/lib/apt/methods/http /usr/lib/apt/methods/http-real
echo '#!/bin/sh > /usr/lib/apt/methods/http' > /usr/lib/apt/methods/http
echo 'tee /tmp/in | /usr/lib/apt/methods/http-real "$@" | tee /tmp/out' >> /usr/lib/apt/methods/http

Now after apt-get update, incoming messages for the http transport are in /tmp/in, and outgoing messages are in /tmp/out.

100 Capabilities
Version: 1.2
Pipeline: true
Send-Config: true
600 URI Acquire
URI: http://mirrors.xmission.com/ubuntu/dists/trusty/main/i18n/Translat...
Filename: /var/lib/apt/lists/partial/mirrors.xmission.com_ubuntu_dists_...
Fail-Ignore: true
Index-File: true
201 URI Done
URI: http://mirrors.xmission.com/ubuntu/dists/trusty/main/i18n/Translat...
Filename: /var/lib/apt/lists/partial/mirrors.xmission.com_ubuntu_dists_...
Size: 762361
Last-Modified: Tue, 15 Apr 2014 16:42:29 GMT
MD5-Hash: 6d991ed7d035b51aa77883a107896db9
MD5Sum-Hash: 6d991ed7d035b51aa77883a107896db9
SHA1-Hash: 8aa7a170afdf02c587c700b63d090c6edd794a02
SHA256-Hash: ed8741c9fb597579cbbb491f1f2a3bd8851e373aae9e61deddb46913d0...
SHA512-Hash: 2004577b96a20392c6934679cb40c81486967f67927c4ff9dd1dc32da2...

So “Pipeline: true” is needed for pipelining. Some more lessons:

  • Although documentation mentions only MD5-Hash, if the method does not provide hashes for all algorithms in the package index, apt-get fails with “Failed to fetch … Hash Sum mismatch” (example). Include all standard algorithms: MD5-Hash, SHA1-Hash, SHA256-Hash, SHA512-Hash.
  • Sometimes the downloaded lists can become corrupted and cause odd issues. rm -r /var/lib/apt/lists fixes that.
  • Don’t forget to flush! Otherwise, apt will hang while waiting for you.
  • If you support pipelining, set “Single-Instance” to “yes”. This will start a single process for your method and reuse it.
  • Even for a cache hit, include the Filename from the URI Acquire request in the response.
  • The standard configuration apt mechanism is /etc/apt/apt.conf.d . Prefer that over requiring ad-hoc files all over the place.
  • When possible, use conditional Last-Modified/If-Modified-Since caching. This allows the client to avoid downloading megabytes of package lists on every update.
  • What apt means by “pipelining” (queuing requests) is really “multiplexing” (accepting responses in arbitrary order relative to requests).

apt-boto-s3

The project has been released as apt-boto-s3. The implementation is under 250 lines of Python. See it at https://github.com/lucidsoftware/apt-boto-s3/blob/v1.0/s3.py.

See the Github repo for more information, including how to install it from our public Bintray apt repo.

1 Comment

  1. I had to edit your tee command slightly to get it working:

    “`
    mv /usr/lib/apt/methods/http /usr/lib/apt/methods/http-real
    echo ‘#!/bin/sh’ > /usr/lib/apt/methods/http
    echo ‘tee /tmp/in | /usr/lib/apt/methods/http-real “$@” | tee /tmp/out’ >> /usr/lib/apt/methods/http
    chmod +x /usr/lib/apt/methods/http
    “`

Your email address will not be published.