Thursday, 22 August, 2019 UTC


Summary

If you ever need to write some code to read a source map, then you should probably start by trying to find a good library to do it for you. If for some reason you still need to write your own source map reader, then at some point you will need to know how to decode base64 variable-length quantities (base64 VLQs). In this post we will explain both how to decode and encode the base64 VLQs found in source maps.
Background
A source map is formatted as a JSON object. For example, here’s the source map for a minified “Hello, World” JavaScript application:
{
  "version": 3,
  "sources": [
    "demo/src/greeter.js",
    "demo/src/index.js"
  ],
  "names": [
    "window",
    "alert",
    "greeting",
    "greet",
    "constructor"
  ],
  "mappings": "A;aAYQA,MAAAC,MAAA,CAAaC,CCRrBC,IDEIC,QAAW,EAAW,CAElB,IAAAF,EAAA,CCJYA,cDEM,CAMLA,GAAb",
  "file": "output.min.js",
}
If you glance over the fields in this example source map, you will find most fields are straightforward except for the mappings field. This field represents a list of integer arrays; each array in the list defines a pair of corresponding file locations, one in the minified (or “generated”) code and one in the original source code.
Of course, it is hard to tell mappings is a list of integer arrays because each array has been encoded as base64 VLQs. The authors of the source map spec required this encoding for performance reasons. Integer arrays encoded as base64 VLQs tend to be more compact than normal JSON arrays, which helps limit source map file sizes.
Encoding integer sequences as base64 variable-length quantities can be interpreted as a two-step process. First, the integers are encoded as variable-length quantities, and then secondly, the variable-length quantities are encoded in base64—two distinct encodings working in tandem. (This process is depicted in figure 1.) In this post, the topic ordering mirrors the steps in this process. First we will explain variable-length quantities and then we will explain base64. At the end, we will provide some sample encoding/decoding code.


Figure 1. Encoding a sequence of integers can be interpreted as a two-step process. First, we encode the sequence as variable-length quantities. Second, we encode the variable-length quantities in base64. VLQs
A variable-length quantity (VLQ) is a bit-level integer representation. (You might think of it as an alternative to two’s complement.) The VLQ encoding’s chief advantage—which the name not so subtly hints at—is length flexibility. The number of bits we need to represent an integer as a VLQ is roughly proportional to the integer’s number of binary digits. Theoretically, we can encode any integer of any length as a VLQ.
Before getting more specific, it’s worth noting that the VLQs used in source maps differ slightly from the VLQs described in the linked Wikipedia page above. The core ideas are largely the same—the main difference is that the source map VLQs are just easier to encode in base64, which we’ll explain more later in this post.
The following pseudocode describes how to encode some integer x as a VLQ:
function vlqEncode(x)
    // Note: xBits is a bit array 
    var xBits := getBits(x)
    var vlq := []

    // Define some helpful constants
    var SIGN_BIT := 0
    var CONTINUATION_BIT := 5

    while xBits is not empty 
        sextet := a new bit array of length 6 and filled with 0s
        if this is the first iteration
            sextet[SIGN_BIT] := 1 if x < 0 and 0 otherwise
            // Note: sextet[1..4] is the array slice of bits 1 through 4
            sextet[1..4] := the 4 least significant bits in xBits
            xBits := shift xBits right 4
        else
            sextet[0..4] := the 5 least significant bits in xBits
            xBits := shift xBits right 5
        sextet[CONTINUATION_BIT] := 1 if xBits is not empty and 0 otherwise
        push sextet onto the array vlq 

    return vlq
Notice that the return value, VLQ, is an array of sextets. Each sextet stores a slice of the binary digits. Within the VLQ array, the sextets contain x’s binary digits in little-endian order. At the bit-level, individual sextets in the VLQ array all share a common format except for the first one, which has the format shown here in figure 2:


Figure 2. Bits in a VLQ's first sextet.
The bits are labeled like so:
  • s - The sign bit.
  • di - X’s ith most significant bit.
  • c0 - The continuation bit. If another sextet follows this one, then this bit is set to 1; otherwise it is set to 0.
Any sextet following the first has the format shown here in figure 3:


Figure 3. Bits in a VLQ sextet that follows the first sextet.
The labels here are the same as the first sextet. Notice, though, we drop the sign bit, and store 5 of x’s binary digits instead of 4. Again, the continuation bit, cj, is 1 in every sextet except the last.
The table below shows the VLQ encoding for a few example integers. The table tries to show the correspondence between groups of binary digits and the sextet where they appear in the VLQ encoding: The group of binary digits labeled (i) appears embedded into the VLQ sextet segment of the same label. We have also color-coded and labeled the bits in the VLQ encoding: continuation bits are green and labeled with a c; sign bits are blue and labeled with an s; padding bits are red and labeled with a p; and the integer’s binary digits are black.
Table 1. Example VLQ encodings for a few integers. VQL Sequences
Source maps contain VLQ sequences and not individual VLQs. These sequences are formed by simply concatenating VLQs together. For example, here is the sequence for the VLQs from table 1.
 1 1 0 0 0 0 1 0 0 1 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 0 1 0 1 
├───────────┴───────────┴───────────┴───────────┴───────────┴───────────┴───────────┼───────────┼───────────┴───────────┤
├───────────────────────────────────── 1227133512 ──────────────────────────────────┼──── 8 ────┼───────── -81 ─────────┤
Figure 4. Sequence of VLQs from table 1.
Concatenating the individual VLQs together like this isn’t so different than, say, regular integer arrays, where the individual integers are strung together in a block of contiguous memory. Unlike integer arrays, however, the elements in a VLQ sequence vary in length, which highlights a problem. How do you distinguish each individual VLQs in the sequence? In integer arrays, we do this by relying on the fixed size of integers; each item is predictably four bytes apart (or however long an integer is implemented in your language and on your machine). In a VLQ sequence, however, we must rely on the continuation bits. Notice that in our example VLQ sequence, every sextet with a 0 continuation bit marks the end of one VLQ and the start of a new one in the proceeding sextet. We've highlighted these continuation bits in figure 5. A continuation bit of 0 is essentially a sentinel value.
                                                                         |           |                       |
                                                                         v           v                       V
 1 1 0 0 0 0 1 0 0 1 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 0 1 0 1 
├───────────┴───────────┴───────────┴───────────┴───────────┴───────────┴───────────┼───────────┼───────────┴───────────┤
├───────────────────────────────────── 1227133512 ──────────────────────────────────┼──── 8 ────┼───────── -81 ─────────┤
Figure 5. Continuation bits of 0 mark denote the end of one VLQ and the start of another (if more sextets follow). Base64
As we mentioned earlier, the VLQ sequences in the source map’s mappings field are encoded in base64. This encoding is generally used to to embed natively binary data in human-readable, text based formats like HTML or JSON. For example, one common usage is inlining a JPEG or PNG image in an image tag:
<img src="
AAABDElEQVQ4y+2Tu0oDQRiFv7kV0VbBVrQzlZdSEPICFnYWxsZHSOMLJPWWgpZiIVZiZ0ANiGki
2NhEUEhIEYRElmx2l7EJIdFxN0LKfNXwc878MHMO9hf1ti2V7VdgnUhG6IUUy+QvuKvzF3p4unrh
tIrfxygS0MBzk+ItjQ5GIQTJ6MI1Tx9omXLxEFlroCWTIwX/QzIzTNvQ6aGVIo7SpdZy84pXQWPT
1Y/veA80uxg5klYnrS5ehfs3jBqEwm2wEESc1zirkjFjuXQYgojtZY4u+fTJmMRXCmOyS5zscZwj
iNzdGGyILQtz7K+zuwbgh2kVPdggvznRV+idFQ63WJwfn0pyq/Tjn2oh+AbBH3HQ5G6LEQAAAABJ
RU5ErkJggg==">
Embedding binary data, such as an image file in HTML, is possible because base64 encoded data is formatted as a string of ASCII characters.
The base64 encoding process is standardized and well specified. This spec’d-out process is robust enough to encode just about any chunk of bytes. However, the VLQ sequences in source maps are encoded using a simplified, non-standardized version of this process. The following pseudocode illustrates this alternate method:
function base64Encode(vlqs)
    var BASE64_ALPHABET = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
    var base64Vlqs = ''
    for each sextet in vlqs
        base64Vlqs += BASE64_ALPHABET[sextet]
    return base64Vlqs
This alternative method is really quite simple. The basic idea in this pseudocode (and in the spec’d-out base64 version) is to represent each possible sextet with a unique ASCII character. (Note how there are 64 unique six-bit combinations.) This is done by interpreting the sextet as an unsigned integer index into the base64 alphabet. Conveniently, source map VLQs are based on sextets, which simplifies the task.
An example implementation
Up to this point, we have used pseudocode examples to illustrate some of the concepts behind base64 encoded VLQs, and these examples have mostly focused on the encoding process. Now, let’s take a concrete look at some TypeScript code for doing both the encoding and decoding.
First, we provide the code used to encode an integer sequence:
const BASE64_ALPHABET =
  'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';

const BIT_MASKS = {
  LEAST_FOUR_BITS: 0b1111,
  LEAST_FIVE_BITS: 0b11111,
  CONTINUATION_BIT: 0b100000,
  SIGN_BIT: 0b1,
};

export function base64VlqEncode(integers: number[]): string {
  return integers
    .map(vlqEncode)
    .map(base64Encode)
    .join('');
}

function vlqEncode(x: number): number[] {
  if (x === 0) {
    return [0];
  }
  let absX = Math.abs(x);
  const sextets: number[] = [];
  while (absX > 0) {
    let sextet = 0;
    if (sextets.length === 0) {
      sextet = x < 0 ? 1 : 0; // set the sign bit
      sextet |= (absX & BIT_MASKS.LEAST_FOUR_BITS) << 1; // shift one ot make space for sign bit
      absX >>>= 4;
    } else {
      sextet = absX & BIT_MASKS.LEAST_FIVE_BITS;
      absX >>>= 5;
    }
    if (absX > 0) {
      sextet |= BIT_MASKS.CONTINUATION_BIT;
    }
    sextets.push(sextet);
  }
  return sextets;
}

function base64Encode(vlq: number[]): string {
  return vlq.map(s => BASE64_ALPHABET[s]).join('');
}
The top-level method, base64VlqEncode, takes an array of integers as inputs and outputs its base64 VLQ encoding. This method takes a functional approach, using pipeline methods to first transform each integer into a VLQ, and then encode these VLQs in base64 before finally concatenating them altogether.
The other two support methods, vlqEncode and base64Encode, follow their similarly named pseudocode equivalents.
Next, we provide the code used to decode VLQs:
const REVERSE_BASE64_ALPHABET: Map = (() => {
  const characterIndexPairs = BASE64_ALPHABET.split('').map(
    (c: string, i: number) => [c, i]
  );
  return new Map(characterIndexPairs);
})();

export function base64VlqDecode(base64Vlqs: string): number[] {
  const vlqs: number[] = base64Decode(base64Vlqs);
  return splitVlqs(vlqs).map(vlqDecode);
}

efunction base64Decode(base64Vlqs: string): number[] {
  return base64Vlqs.split('').map(c => {
    const sextet = REVERSE_BASE64_ALPHABET.get(c);
    if (sextet === undefined) {
      throw new Error(`${base64Vlqs} is not a valid base64 encoded VLQ`);
    }
    return sextet;
  });
}

function splitVlqs(vlqs: number[]): number[][] {
  const splitVlqs: number[][] = [];
  let vlq: number[] = [];
  vlqs.forEach((sextet: number) => {
    vlq.push(sextet);
    if ((sextet & BIT_MASKS.CONTINUATION_BIT) === 0) {
      splitVlqs.push(vlq);
      vlq = [];
    }
  });
  if (vlq.length > 0) {
    throw new Error('Malformed VLQ sequence: The last VLQ never ended.');
  }
  return splitVlqs;
}

function vlqDecode(vlq: number[]): number {
  let x = 0;
  let isNegative = false;
  vlq.reverse().forEach((sextet: number, index: number) => {
    if (index === vlq.length - 1) {
      isNegative = (sextet & BIT_MASKS.SIGN_BIT) === 1;
      sextet >>>= 1; // discard sign bit
      x <
As you might expect, the decoding process is essentially the reverse of the encoding process. In our top-level decoding method, base64VlqDecode, we first decode the base64 values, recovering the VLQ sextets. Next, we split the sextets into individual VLQs using the splitVlqs helper method. The splitVlqs method splits the VLQs by checking the continuation bits and using the 0 continuation bits as delimiters. Once we’ve split the VLQs, we reconstruct each individual integer using the vlqDecode method. This method loops over the sextets and re-concatenates the bits together.
This example implementation can also be found in this GitHub repository. (Other reference implementations include the encoder implemented by Mozilla's popular source-map library.)
If we apply this decoding code to the source map seen at the beginning of this post, we can see the underlying integer arrays:
{
...
  "mappings": "[0];[13, 0, 12, 8, 0],[6, 0, 0, 0, 1],[6, 0, 0, 0],[1, 0, 0, 13, 1],[1, 1, -8, -21, 1],[4, -1, 2, 4, 1],[8, 0, 0, 11],[2, 0, 0, 11],[1, 0, 2, -18],[4, 0, 0, 0, -2],[2, 0, 0, 0],[1, 1, -4, 12, 0],[14, -1, 2, 6],[1, 0, 6, -5, 0],[3, 0, 0, -13]",
...
}
Once the integer arrays are decoded, we can now interpret them as location pairings between the generated code and the original source. This interpretation is a topic for a separate post. (Although, the source map spec is fairly clear in this area.)
The post Decoding and Encoding Base64 VLQs in Source Maps appeared first on Lucidchart.