Base64 Encoding: A Visual Explanation

Base64 encoding appears here and there in web development. Perhaps its most familiar usage is in HTML image tags when we inline our image data (more on this later):

<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAALCAYAAABCm8wlAAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAB3RJTUUH4QoPAxIb88htFgAAABl0RVh0Q29tbWVudABDcmVhdGVkIHdpdGggR0lNUFeBDhcAAACxSURBVBjTdY6xasJgGEXP/RvoonvAd8hDyD84+BZBEMSxL9GtQ8Fis7i6BkGI4DP4CA4dnQON3g6WNjb2wLd8nAsHWsR3D7JXt18kALFwz2dGmPVhJt0IcenUDVsgu91eCRZ9IOMfAnBvSCz8I3QYL0yV6zfyL+VUxKWfMJuOEFd+dE3pC1Finwj0HfGBeKGmblcFTIN4U2C4m+hZAaTrASSGox6YV7k+ARAp4gIIOH0BmuY1E5TjCIUAAAAASUVORK5CYII=">
An image embedded directly into an HTML image tag

As a programmer, it is easy to accept this random-looking ASCII string as the “Base64 encoded” abstraction and move on. To go from raw bytes to the Base64 encoding, however, is a straightforward process, and this post illustrates how we get there. We’ll also discuss some of the why behind Base64 encoding and a couple places you may see it.

A visualization

The gist of the encoding process is captured in the following interactive visualization. Type in some ASCII characters in the top input and hit the “Encode” button.

If you run a few strings through this visualization, you may notice that the encoding process is simply a pair of nested loops. The outer loop iterates over the data in 24-bit increments; the spec refers to these as “input groups.” The inner loop iterates over each input group 6 bits at a time. Each 6-bit value is interpreted as an unsigned integer that is used to index an alphabet of 64 characters. The indexed alphabet value is the output. With the help of ES6 generators, this encoding process can be implemented with just a handful of functions:

/**
 * @param {Uint8Array} bytes
 * @return {string} Base64 encoded string
 */
function base64Encode(bytes) {
   let encoding = '';
   for (let group of groups24Bits(bytes)) {
      for (let value of values6Bits(group)) {
         if (value !== undefined) {
            encoding += ALPHABET[value];
         } else {
            encoding += PAD;
         }
      }
   }
   return encoding;
}

const ALPHABET = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
const PAD = '=';

/**
 * @param {Uint8Array} bytes
 * @return {Uint8Array} The next input group (yielded on each execution)
 */
function* groups24Bits(bytes) {
   for (let i = 0; i < bytes.length; i += 3) {
      yield bytes.slice(i, i + 3); // 3 bytes/3 octets/24 bits
   }
}

/**
 * @param {Uint8Array} group Expected to be array of 1 to 3 bytes
 * @return {number|undefined} The next 6-bit value from the 
 * input group (yielded on each execution)
 */
function* values6Bits(group) {
   const paddedGroup = Uint8Array.from([0, 0, 0]);
   paddedGroup.set(group);

   let numValues = Math.ceil((group.length * 8) / 6);
   for (let i = 0; i < numValues; i++) { let base64Value; if (i == 0) { base64Value = (paddedGroup[0] & 0b11111100) >> 2;
      } else if (i == 1) {
         base64Value = (paddedGroup[0] & 0b00000011) << 4; base64Value = base64Value | ((paddedGroup[1] & 0b11110000) >> 4);
      } else if (i == 2) {
         base64Value = (paddedGroup[1] & 0b00001111) << 2; base64Value = base64Value | ((paddedGroup[2] & 0b11000000) >> 6);
      } else if (i == 3) {
         base64Value = paddedGroup[2] & 0b00111111;
      }
      yield base64Value;
   }

   let numPaddingValues = 4 - numValues;
   for (let j = 0; j < numPaddingValues; j++) {
      yield undefined;
   }
}

If there is an “interesting” part to the encoding process, it is the ending conditions where we must apply padding. Each input group is required to be 24 bits long (or equivalently three 8-bit bytes). (It seems likely the spec writers chose 24-bit input groups since 24 is the least common multiple of 6 and 8.) In the implementation given above, we pad the final group with bytes of zeroes when the final input group is only 1 or 2 bytes long. As we iterate over this final input group, if the 6-bit value consists entirely of padding bits, then = is the output character, the designated padding character. If, however, the 6-bit value straddles “real” bits and padding bits—as can be seen in the input “foob”—then the alphabet is still indexed and the padding bits are taken to be zeroes.

A couple usages

You will not find any mention of “HTML” in the Base64 spec. Instead, the authors simply mention that Base64 encoding is used in environments where, “perhaps for legacy reasons,” the “storage or transfer” of data is limited to ASCII characters. More or less, this idea sums up the browser and its heavy consumption of HTML, JSON, CSS, and JavaScript. Increasingly, this text is encoded using UTF-8, a superset of ASCII. In this text-heavy ecosystem, Base64 encoding finds various niche applications.

Data URLs

The first part of a URL is the scheme. It is the prefix string that goes before the first colon; for example, it is the https in https://example.com or the beginning ftp in ftp://ftp.funet.fi/pub/standards/RFC/rfc4648.txt. The scheme tells the client (a browser or a different network app) how to retrieve the resource and what protocol to follow. The scheme prefix also makes URLs extensible and suitable for future protocols. If a new protocol comes along, we can create a new URL scheme for it and still identify resources by URL.

The data scheme is one such extension, which we saw in the image encoded in the introduction. This scheme tells clients, “My resource’s data is located right here in the rest of this URL string.” URLs that use the data scheme follow this format:

data:[<mediatype>][;base64],<data>
Data URL form

You can find more particulars about this format in the spec, but we will focus on the image “data URLs” we mentioned at the outset. Here are a couple examples of the big three binary image formats used in a few different contexts:

.triangle-icn {
    background-image: url(data:image/gif;base64,R0lGODlhCAAHAIABAGSV7f///yH+EUNyZWF0ZWQgd2l0aCBHSU1QACH5BAEKAAEALAAAAAAIAAcAAAINjI8BkMq41onRUHljAQA7);
    background-repeat: no-repeat;
    background-position: center;
}
Embedding a GIF directly into a CSS rule
const image = new Image();
image.src =   "data:image/jpg;base64,/9j/4AAQSkZJRgABAQEAWQBZAAD/2wBDAAMCAgMCAgMDAwMEAwMEBQgFBQQEBQoHBwYIDAoMDAsKCwsNDhIQDQ4RDgsLEBYQERMUFRUVDA8XGBYUGBIUFRT/2wBDAQMEBAUEBQkFBQkUDQsNFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBT/wgARCAAHAAgDAREAAhEBAxEB/8QAFAABAAAAAAAAAAAAAAAAAAAAB//EABUBAQEAAAAAAAAAAAAAAAAAAAUG/9oADAMBAAIQAxAAAAFXph//xAAWEAADAAAAAAAAAAAAAAAAAAABFRb/2gAIAQEAAQUCoS8//8QAGBEAAgMAAAAAAAAAAAAAAAAAABEUI0H/2gAIAQMBAT8Bk3PD/8QAGxEAAAcBAAAAAAAAAAAAAAAAAAESFBUxQuH/2gAIAQIBAT8Bjyap1fB//8QAGxAAAQQDAAAAAAAAAAAAAAAAEQABEhQzUWH/2gAIAQEABj8CmXrYxza//8QAGxAAAQQDAAAAAAAAAAAAAAAAAQARIUFRcZH/2gAIAQEAAT8hI7j3Vy86hf/aAAwDAQACAAMAAAAQf//EABoRAAEFAQAAAAAAAAAAAAAAAJEAESFRgaH/2gAIAQMBAT8QeLlnkL//xAAaEQACAgMAAAAAAAAAAAAAAAABIRFhAFHB/9oACAECAQE/EARV18QtS8//xAAYEAEBAAMAAAAAAAAAAAAAAAABEQAh8P/aAAgBAQABPxBQyNdodCLh/9k=" 

const canvas = document.getElementById("myCanvas");
const ctx = canvas.getContext('2d');
ctx.drawImage(image, 0, 0);
Drawing a Base64 encoded JPG onto a canvas
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAHCAYAAAA1WQxeAAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAAA3XAAAN1wFCKJt4AAAAB3RJTUUH4QoQAiIwiYqPWwAAAHNJREFUCNd1zaENQjEAhOGvgAFbAwnMwhoMUFeBYBEEVXQAGIIFGARXjXkC817SEPjdXfLfBR2ptD0WeNQcGUPPDUPNcTcVs85OWGObSjtNfUilwRxDt/SuOa7SpQmjfcbx6+5eczyEVNoGL79ZznD1n+cHk7cb99sXV8cAAAAASUVORK5CYII=">
Again, a PNG embedded directly into an HTML image tag

Source maps

Another common but less visible usage of Base64 encoding is in source maps. Below is a source map generated by Google’s Closure compiler:

{
    "version":3,
    "file":"",
    "lineCount":1,
    "mappings":"AAUIA,OAAAC,IAAA,CAAYC,CAGFC,IATAC,QAAQ,EAAW,CAE7B,IAAAF,EAAA,CAOsBG,cATO,CAMjBH,GAAZ;",
    "sources":["greet.js"],
    "names":["console","log","_greeting","greeter","Greeter","greeting"]
}

Here Base64 encoding is used for the mappings field. The comma and semicolon delimited snippets are the Base64 encoded binary data of integers encoded as variable-length quantities (VLQ).

Images and source maps are just a couple places Base64 encoding is used. If you know of others or any novel uses of Base64 encoding, please mention them in the comments below. It also might be worth “inspecting” page sources to find others. For example, in Chrome, if you go to chrome://dino you can find that the offline dinosaur game’s image assets (and it appears sound assets) are Base64 encoded. (Examining these assets—which are also embedded on YouTube’s homepage—is how I discovered the dinosaur can duck under the low-flying pterodactyls.)

8 Comments

  1. > You will not find any mention of “HTML” in the Base64 spec.

    Base64 was actually originally created for email, as a method of encoding binary data (like attachments) in an email. This was/is necessary because SMTP requires all content of an email to be ASCII.

  2. Walter StuccoOctober 25, 2017 at 2:53 am

    @thayne

    > because SMTP requires all content of an email to be ASCII.

    Base64 was invented to transfer binaries over the wire, it’s 6 bit wide, a subset of ASCII narrow enough to be common to every other encoding

    8bit ASCII is not text safe

  3. Hello Ty,

    I like your encoding visualization and explanation. Thanks for making and sharing it.

    At the end of your article you asked for mentions of any novel uses of Base64, and I think my recently released Octology project should qualify from many angles (with hopefully many more to come). When you have time, please check it out at:

    https://GitHub.Com/pip/Octology

    Examining the screen shots, you may notice that all the command prompts contain my b64 d8 stamps (and d8s follow each b64 mapped entry in tsgr and stamp moves in ckm8 as well). If you dive into my .Hrc file or dox/2du/8.txt, you will discover far more actual and planned uses of b64.

    Base64 and segmenting almost everything by multiples of 8 has been my passion for many years now. The whole project is basically in a prototype preview or maybe an early alpha state at present, but I’m hoping to determine a set of reasonable milestones to reach by August or October 8th of next year (2018), when I may feel ready to declare a new beta-testing, feature-freezing, and bug-fixing status before some more complete and stable official release might become worthy of the https://CPAN.Org (which might require another 3 or 6 bits of time units beyond b8a).

    Thanks again for the useful article and please feel free to let me know whatever you might think about Octology, if you care to.

    Peace. =)

  4. Base64 encoding is a great way to transmit binary data in text-only fields, ie embedded in Powershell scripts for rapid deployment and those binary objects can be run from memory. Very cool stuff.

  5. Thanks for the article, it was helpful to understand base 64 more. Cool visualization!

  6. Great read. Few years ago i made an online base64 decoding/encoding tool: https://codebeautify.net/base64/decode

  7. Mike BetteridgeOctober 5, 2020 at 5:28 am

    Whilst at the IBM PPDC in Sindelfingen Germany, I worked on a PC to Mainframe link. It utilized a 3270 emulator’s APIs to fill in encoded data and send it the response came back with the same encoding. I had the idea if using a base 64 mechanism for characters that were not allowed in 3270. It was also necessary to convert these escaped characters from ASCII to EBCDIC and back. Data compression was included as well.

    The year was 1985. So far I have not found reference to any earlier usage of this technique, Would love to know if I was the first.

  8. Hi, I found your cool site today. I tried inputting some text and clicking ENCODE but it doesn’t seem to do anything. I do have javascript enabled. I’ve tried Firefox and Microsoft Edge browsers.

Your email address will not be published.