Decoding and Encoding Base64 VLQs in Source Maps

If you ever need to write some code to read a source map, then you should probably start by trying to find a good library to do it for you. If for some reason you still need to write your own source map reader, then at some point you will need to know how to decode base64 variable-length quantities (base64 VLQs). In this post we will explain both how to decode and encode the base64 VLQs found in source maps.

Background

A source map is formatted as a JSON object. For example, here’s the source map for a minified “Hello, World” JavaScript application:

{
  "version": 3,
  "sources": [
    "demo/src/greeter.js",
    "demo/src/index.js"
  ],
  "names": [
    "window",
    "alert",
    "greeting",
    "greet",
    "constructor"
  ],
  "mappings": "A;aAYQA,MAAAC,MAAA,CAAaC,CCRrBC,IDEIC,QAAW,EAAW,CAElB,IAAAF,EAAA,CCJYA,cDEM,CAMLA,GAAb",
  "file": "output.min.js",
}

If you glance over the fields in this example source map, you will find most fields are straightforward except for the mappings field. This field represents a list of integer arrays; each array in the list defines a pair of corresponding file locations, one in the minified (or “generated”) code and one in the original source code.

Of course, it is hard to tell mappings is a list of integer arrays because each array has been encoded as base64 VLQs. The authors of the source map spec required this encoding for performance reasons. Integer arrays encoded as base64 VLQs tend to be more compact than normal JSON arrays, which helps limit source map file sizes.

Encoding integer sequences as base64 variable-length quantities can be interpreted as a two-step process. First, the integers are encoded as variable-length quantities, and then secondly, the variable-length quantities are encoded in base64—two distinct encodings working in tandem. (This process is depicted in figure 1.) In this post, the topic ordering mirrors the steps in this process. First we will explain variable-length quantities and then we will explain base64. At the end, we will provide some sample encoding/decoding code.


A flowchart for the encoding process of base64 VLQs
Figure 1. Encoding a sequence of integers can be interpreted as a two-step process. First, we encode the sequence as variable-length quantities. Second, we encode the variable-length quantities in base64.

VLQs

A variable-length quantity (VLQ) is a bit-level integer representation. (You might think of it as an alternative to two’s complement.) The VLQ encoding’s chief advantage—which the name not so subtly hints at—is length flexibility. The number of bits we need to represent an integer as a VLQ is roughly proportional to the integer’s number of binary digits. Theoretically, we can encode any integer of any length as a VLQ.

Before getting more specific, it’s worth noting that the VLQs used in source maps differ slightly from the VLQs described in the linked Wikipedia page above. The core ideas are largely the same—the main difference is that the source map VLQs are just easier to encode in base64, which we’ll explain more later in this post.

The following pseudocode describes how to encode some integer x as a VLQ:

function vlqEncode(x)
    // Note: xBits is a bit array 
    var xBits := getBits(x)
    var vlq := []

    // Define some helpful constants
    var SIGN_BIT := 0
    var CONTINUATION_BIT := 5

    while xBits is not empty 
        sextet := a new bit array of length 6 and filled with 0s
        if this is the first iteration
            sextet[SIGN_BIT] := 1 if x < 0 and 0 otherwise
            // Note: sextet[1..4] is the array slice of bits 1 through 4
            sextet[1..4] := the 4 least significant bits in xBits
            xBits := shift xBits right 4
        else
            sextet[0..4] := the 5 least significant bits in xBits
            xBits := shift xBits right 5
        sextet[CONTINUATION_BIT] := 1 if xBits is not empty and 0 otherwise
        push sextet onto the array vlq 

    return vlq

Notice that the return value, VLQ, is an array of sextets. Each sextet stores a slice of the binary digits. Within the VLQ array, the sextets contain x’s binary digits in little-endian order. At the bit-level, individual sextets in the VLQ array all share a common format except for the first one, which has the format shown here in figure 2:


|c_0|d_3|d_2|d_1|d_0|s|
Figure 2. Bits in a VLQ’s first sextet.

The bits are labeled like so:

  • s – The sign bit.
  • diX’s ith most significant bit.
  • c0 – The continuation bit. If another sextet follows this one, then this bit is set to 1; otherwise it is set to 0.

Any sextet following the first has the format shown here in figure 3:


|c_j|d_(i+4)|d_(i+3)|d_(i+2)|d_(i+1)|d_i|
Figure 3. Bits in a VLQ sextet that follows the first sextet.

The labels here are the same as the first sextet. Notice, though, we drop the sign bit, and store 5 of x’s binary digits instead of 4. Again, the continuation bit, cj, is 1 in every sextet except the last.

The table below shows the VLQ encoding for a few example integers. The table tries to show the correspondence between groups of binary digits and the sextet where they appear in the VLQ encoding: The group of binary digits labeled (i) appears embedded into the VLQ sextet segment of the same label. We have also color-coded and labeled the bits in the VLQ encoding: continuation bits are green and labeled with a c; sign bits are blue and labeled with an s; padding bits are red and labeled with a p; and the integer’s binary digits are black.

Table 1. Example VLQ encodings for a few integers.

VQL Sequences

Source maps contain VLQ sequences and not individual VLQs. These sequences are formed by simply concatenating VLQs together. For example, here is the sequence for the VLQs from table 1.

 1 1 0 0 0 0 1 0 0 1 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 0 1 0 1 
├───────────┴───────────┴───────────┴───────────┴───────────┴───────────┴───────────┼───────────┼───────────┴───────────┤
├───────────────────────────────────── 1227133512 ──────────────────────────────────┼──── 8 ────┼───────── -81 ─────────┤
Figure 4. Sequence of VLQs from table 1.

Concatenating the individual VLQs together like this isn’t so different than, say, regular integer arrays, where the individual integers are strung together in a block of contiguous memory. Unlike integer arrays, however, the elements in a VLQ sequence vary in length, which highlights a problem. How do you distinguish each individual VLQs in the sequence? In integer arrays, we do this by relying on the fixed size of integers; each item is predictably four bytes apart (or however long an integer is implemented in your language and on your machine). In a VLQ sequence, however, we must rely on the continuation bits. Notice that in our example VLQ sequence, every sextet with a 0 continuation bit marks the end of one VLQ and the start of a new one in the proceeding sextet. We’ve highlighted these continuation bits in figure 5. A continuation bit of 0 is essentially a sentinel value.

                                                                         |           |                       |
                                                                         v           v                       V
 1 1 0 0 0 0 1 0 0 1 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 0 1 0 1 
├───────────┴───────────┴───────────┴───────────┴───────────┴───────────┴───────────┼───────────┼───────────┴───────────┤
├───────────────────────────────────── 1227133512 ──────────────────────────────────┼──── 8 ────┼───────── -81 ─────────┤
Figure 5. Continuation bits of 0 mark denote the end of one VLQ and the start of another (if more sextets follow).

Base64

As we mentioned earlier, the VLQ sequences in the source map’s mappings field are encoded in base64. This encoding is generally used to to embed natively binary data in human-readable, text based formats like HTML or JSON. For example, one common usage is inlining a JPEG or PNG image in an image tag:

<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAXCAIAAACNlFiOAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAB3RJTUUH4wcUEh4vUW2qzAAAAB1pVFh0Q29tbWVudAAAAAAAQ3JlYXRlZCB3aXRoIEdJTVBkLmUH
AAABDElEQVQ4y+2Tu0oDQRiFv7kV0VbBVrQzlZdSEPICFnYWxsZHSOMLJPWWgpZiIVZiZ0ANiGki
2NhEUEhIEYRElmx2l7EJIdFxN0LKfNXwc878MHMO9hf1ti2V7VdgnUhG6IUUy+QvuKvzF3p4unrh
tIrfxygS0MBzk+ItjQ5GIQTJ6MI1Tx9omXLxEFlroCWTIwX/QzIzTNvQ6aGVIo7SpdZy84pXQWPT
1Y/veA80uxg5klYnrS5ehfs3jBqEwm2wEESc1zirkjFjuXQYgojtZY4u+fTJmMRXCmOyS5zscZwj
iNzdGGyILQtz7K+zuwbgh2kVPdggvznRV+idFQ63WJwfn0pyq/Tjn2oh+AbBH3HQ5G6LEQAAAABJ
RU5ErkJggg==">

Embedding binary data, such as an image file in HTML, is possible because base64 encoded data is formatted as a string of ASCII characters.

The base64 encoding process is standardized and well specified. This spec’d-out process is robust enough to encode just about any chunk of bytes. However, the VLQ sequences in source maps are encoded using a simplified, non-standardized version of this process. The following pseudocode illustrates this alternate method:

function base64Encode(vlqs)
    var BASE64_ALPHABET = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
    var base64Vlqs = ''
    for each sextet in vlqs
        base64Vlqs += BASE64_ALPHABET[sextet]
    return base64Vlqs

This alternative method is really quite simple. The basic idea in this pseudocode (and in the spec’d-out base64 version) is to represent each possible sextet with a unique ASCII character. (Note how there are 64 unique six-bit combinations.) This is done by interpreting the sextet as an unsigned integer index into the base64 alphabet. Conveniently, source map VLQs are based on sextets, which simplifies the task.

An example implementation

Up to this point, we have used pseudocode examples to illustrate some of the concepts behind base64 encoded VLQs, and these examples have mostly focused on the encoding process. Now, let’s take a concrete look at some TypeScript code for doing both the encoding and decoding.

First, we provide the code used to encode an integer sequence:

const BASE64_ALPHABET =
  'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';

const BIT_MASKS = {
  LEAST_FOUR_BITS: 0b1111,
  LEAST_FIVE_BITS: 0b11111,
  CONTINUATION_BIT: 0b100000,
  SIGN_BIT: 0b1,
};

export function base64VlqEncode(integers: number[]): string {
  return integers
    .map(vlqEncode)
    .map(base64Encode)
    .join('');
}

function vlqEncode(x: number): number[] {
  if (x === 0) {
    return [0];
  }
  let absX = Math.abs(x);
  const sextets: number[] = [];
  while (absX > 0) {
    let sextet = 0;
    if (sextets.length === 0) {
      sextet = x < 0 ? 1 : 0; // set the sign bit
      sextet |= (absX & BIT_MASKS.LEAST_FOUR_BITS) << 1; // shift one ot make space for sign bit
      absX >>>= 4;
    } else {
      sextet = absX & BIT_MASKS.LEAST_FIVE_BITS;
      absX >>>= 5;
    }
    if (absX > 0) {
      sextet |= BIT_MASKS.CONTINUATION_BIT;
    }
    sextets.push(sextet);
  }
  return sextets;
}

function base64Encode(vlq: number[]): string {
  return vlq.map(s => BASE64_ALPHABET[s]).join('');
}

The top-level method, base64VlqEncode, takes an array of integers as inputs and outputs its base64 VLQ encoding. This method takes a functional approach, using pipeline methods to first transform each integer into a VLQ, and then encode these VLQs in base64 before finally concatenating them altogether.

The other two support methods, vlqEncode and base64Encode, follow their similarly named pseudocode equivalents.

Next, we provide the code used to decode VLQs:

const REVERSE_BASE64_ALPHABET: Map<string, number=""> = (() => {
  const characterIndexPairs = BASE64_ALPHABET.split('').map<[string, number]>(
    (c: string, i: number) => [c, i]
  );
  return new Map<string, number="">(characterIndexPairs);
})();

export function base64VlqDecode(base64Vlqs: string): number[] {
  const vlqs: number[] = base64Decode(base64Vlqs);
  return splitVlqs(vlqs).map(vlqDecode);
}

function base64Decode(base64Vlqs: string): number[] {
  return base64Vlqs.split('').map(c => {
    const sextet = REVERSE_BASE64_ALPHABET.get(c);
    if (sextet === undefined) {
      throw new Error(`${base64Vlqs} is not a valid base64 encoded VLQ`);
    }
    return sextet;
  });
}

function splitVlqs(vlqs: number[]): number[][] {
  const splitVlqs: number[][] = [];
  let vlq: number[] = [];
  vlqs.forEach((sextet: number) => {
    vlq.push(sextet);
    if ((sextet & BIT_MASKS.CONTINUATION_BIT) === 0) {
      splitVlqs.push(vlq);
      vlq = [];
    }
  });
  if (vlq.length > 0) {
    throw new Error('Malformed VLQ sequence: The last VLQ never ended.');
  }
  return splitVlqs;
}

function vlqDecode(vlq: number[]): number {
  let x = 0;
  let isNegative = false;
  vlq.reverse().forEach((sextet: number, index: number) => {
    if (index === vlq.length - 1) {
      isNegative = (sextet & BIT_MASKS.SIGN_BIT) === 1;
      sextet >>>= 1; // discard sign bit
      x <<= 4;
      x |= sextet & BIT_MASKS.LEAST_FOUR_BITS;
    } else {
      x <<= 5;
      x |= sextet & BIT_MASKS.LEAST_FIVE_BITS;
    }
  });
  return isNegative ? -x : x;
}</string,></string,>

As you might expect, the decoding process is essentially the reverse of the encoding process. In our top-level decoding method, base64VlqDecode, we first decode the base64 values, recovering the VLQ sextets. Next, we split the sextets into individual VLQs using the splitVlqs helper method. The splitVlqs method splits the VLQs by checking the continuation bits and using the 0 continuation bits as delimiters. Once we’ve split the VLQs, we reconstruct each individual integer using the vlqDecode method. This method loops over the sextets and re-concatenates the bits together.

This example implementation can also be found in this GitHub repository. (Other reference implementations include the encoder implemented by Mozilla’s popular source-map library.)

If we apply this decoding code to the source map seen at the beginning of this post, we can see the underlying integer arrays:

{
...
  "mappings": "[0];[13, 0, 12, 8, 0],[6, 0, 0, 0, 1],[6, 0, 0, 0],[1, 0, 0, 13, 1],[1, 1, -8, -21, 1],[4, -1, 2, 4, 1],[8, 0, 0, 11],[2, 0, 0, 11],[1, 0, 2, -18],[4, 0, 0, 0, -2],[2, 0, 0, 0],[1, 1, -4, 12, 0],[14, -1, 2, 6],[1, 0, 6, -5, 0],[3, 0, 0, -13]",
...
}

Once the integer arrays are decoded, we can now interpret them as location pairings between the generated code and the original source. This interpretation is a topic for a separate post. (Although, the source map spec is fairly clear in this area.)

3 Comments

  1. I’ve never had the fun of encoding before but it’s a nice thing 🙂 I’m waiting for the next article. about interpretation

  2. Great article!
    I’m waiting for jest article. 🙂

  3. At first I read the Wikipedia article, which I branched into from the Source Maps Specs (version 3). I immediately noticed that the so-called VLQ’s from the spec were different from the ones on Wikipedia (using six bits in a 7-bit blocks with a continuation bit rather than seven bits in byte blocks). I however did not understand that this was intended to simplify the Base64 conversion and I found your explanations to be very clear. I also did not understand (by reading the spec) that what they were referring to as VLQ “segments” are actually arrays of numbers, each of which was individually encoded as a VLQ number. The continuation bit allows us to discern the boundary between the different array elements. After I gathered this information I was finally able to decipher the nature of the segment array members (explained in the spec):
    1. column in the generated file (resets for each line, with the “;” separator, incremental within each line, with the “,” separator)
    2. source file in the sources array in the source map JSON object (incremental throughout the source map)
    3. line number in the source file (incremental throughout the source map)
    4. column number in the source file (resets for a new line, incremental within each line)
    5. symbol name in the names array in the source map JSON object (incremental throughout the source map)

    I think to best understand how this work, you must write a simple program (i.e. a hello world JS program), minify it using something like webpack (with a devtool of “source-maps”, I was following along the SurviveJS ebook, excellent resource) and then open the source map file and individually decode the strings in the mappings field using the author’s demo tool: http://tyglewis.com/base64-vlq-codec/ (from his GitHub repo linked in this article)

    Excellent work thank you so so much!

Your email address will not be published.