Base64 Explained

Ramkumar Menon
Director, Product Strategy

Base64 is a mechanism to enable representing and transferring binary data over mediums that allow only printable characters.It is most popular form of the “Base Encoding”, the others known in use being Base16 and Base32.

The need for Base64 arose from the need to attach binary content to emails like images, videos or arbitrary binary content .  Since SMTP [RFC 5321] only allowed 7-bit US-ASCII characters within the messages,  there was a need to represent these binary octet streams using the seven bit ASCII characters.

Here is what RFC 5321 [Simple Mail Transfer protocol] states.

“Commands and replies are composed of characters from the ASCII character set [6]. When the transport service provides an 8-bit byte (octet) transmission channel, each 7-bit character is transmitted, right justified, in an octet with the high-order bit cleared to zero. More specifically, the unextended SMTP service provides 7-bit transport only. An originating SMTP client that has not successfully negotiated an appropriate extension with a particular server (see the next paragraph) MUST NOT transmit messages with information in the high-order bit of octets. If such messages are transmitted in violation of this rule, receiving SMTP servers MAY clear the high- order bit or reject the message as invalid.”

This led to the evolution and popularity of the Internet Standards like MIME [stands for Multipurpose Internet Mail Extensions]. MIME provided mechanisms to allow things like writing text using characters from a repertoire that require a different character encoding, and more importantly, allow one or more binary attachments to the e-mail.

Since the underlying medium supports only plain ASCII text, MIME defined a set of binary-to-text encodings that enable capturing these binary octet streams into printable ASCII characters that can be used with mediums like SMTP. Base64 is one such binary-to-text encoding. For this purpose, MIME defines, among others, a header named Content transfer encoding that indicates whether a binary-to-text encoding has been applied on the message content, and if so, specifies the actual encoding that has been employed. Base64 is one of them.

On the other hand, 7 bit ASCII characters contain a set of 94 printable characters and 33 non-printable ones. 64 is the highest power of 2 that can be represented using only printable characters that are mostly common among different character encodings in existence, most importantly, ASCII. What this means is that a hypothetical Base128 encoding will not be limited to the permitted set of printable characters, and hence will be unsuitable for the use-case at hand.

The following is the character subset of US-ASCII that is used for Base64.

  1. [a-z] – 26 characters           
  2. [A-Z] – 26 characters         
  3. [0-9] – 10 characters          
  4. [+]  - 1 character [filler character]
  5. [/]   - 1 character [filler character]
  6. [=]  - Used for Padding purposes, as explained later.

Since the numerals and alphabets make up for only 62 characters in all, Base64 chose “+” and “/” to fill the gap. The following is an excerpt from RFC 4648 illustrating the Alphabet for Base64.


The Encoding Process

The process to encode the input stream is fairly straightforward.

a) The octet stream is read from left to right.

b) Three 8-bit groups within the input stream is concatenated to form a 24-bit group.

c) This 24-bit group is further treated as four 6-bit groups that is right justified using zeroes. The grouping into 6 bits is for the simple reason that 6 bits will cover the range of printable characters [0-26-1]

d) Each of these 4 groups is then encoded using the above-mentioned chart in table 1.

The case when the input bit stream contains less than 24 characters will be explained after the following example.

Let’s say we wish to encode the string “ORACLE” using the Base-64 alphabet.

Input String O R A C L E . .
Binary Representation 010011112 010100102 010000012 010000112 010011002 010001012 . .
After regrouping into 6-bit groups.
[Binary and decimal equivalents are shown.]
001001 2
After mapping the above eight 8-bit bytes using Table 1 T 1 J B Q 0 x F

Base64 encoded string : T1JBQ0xF

In the above scenario, the input character string contains 48 bits, an exact multiple of 24 that enables exact grouping into 6 bit groups. But in the event that there are less than 24 characters, or if the number of bits in the input stream is not a multiple of 24, padding is used to make up for the remaining bits.

The methodology for padding is as follows.

Add as many “zero” bits to the right of the 6-bit grouped bit stream so that the total bit length is a multiple of 24. If the modified input data contains any octets that contain only padded zeroes, replace each of those octets with the padding character “=”.

This is illustrated in the next example.

In the table below, the input string contains 40 bits. Hence 8 more bits need to be padded on the right to make up for 48, an exact multiple of 24.

Input String M E N O N   . . .
Binary Representation






. . .
After regrouping into 6-bit groups.
[Binary and decimal equivalents are shown.]. The bold zeroes on the right indicates the padded zeroes - 2+6.
010101 2



1101002 [5210]



After mapping the above eight 8-bit bytes using Table 1 T U V O Q 0 4 =

Base64 Encoded String : TUVOQ04=

As evident from the above two examples, Base64 encoded data will always be much larger than the size of the original input stream. It has been estimated that it approximately increases the size by around 137% –i.e. a third more than the original size.

Other Applications for Base64

Base64 has been used for other purposes as well, in addition to being used as a mechanism for content encoding within MIME.

a) Content obfuscation

For instance, it is used for simple obfuscation of data when exchanged between applications. Of course, any base64 encoded string can be reverse engineered to obtain the original set of bytes.Hence it cannot replace any good encryption mechanism. 

b) Binary content handling in Web Services

Base64 can also be used to send to / receive messages with binary content from Web Services. Note that this is not an efficient mechanism for large payloads due to the the size bloat-up caused by the Base-64 transformation.For such use-cases, it is advisable to send the payload as an attachment using  SOAP with Attachments or Message Transmission Optimization Mechanism [MTOM].

Base64 and XML

XML documents can be carriers for binary content as well. Binary data can be base-64 encoded and be specified inline within any XML 1.0 document.

Data within XML entities belong to the Unicode repertoire. The following is the production for a character in an XML 1.0 document.

Char   ::=   #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

The XML Document, or any entity within the XML document can declare its character encoding as a part of a character encoding declaration. For the entire document, this is usually specified as a part of the XML document declaration.

<?xml version=”1.0” encoding=”UTF-8”?>

The character encodings supported by an XML processor may vary, but compliance to XML 1.0 requires acceptance of XML documents encoded using Unicode transformation formats[UTF] UTF-8 and UTF-16.

XML 1.0 specification states

“All XML processors MUST accept the UTF-8 and UTF-16 encodings of Unicode.“

UTF-8 and UTF-16 are character encodings, just like US-ASCII or ISO-8859-1. They associate a numeric code with each character in a character repertoire.  Whereas ASCII works off a limited 128 character repertoire, Unicode is much more comprehensive, covering for all major and most minor written languages of the world.  The two transformation formats provide character encodings for all characters in Unicode. When a processing application receives a UTF encoded XML document containing base64 encoded data, the decoding must be performed on the character data obtained after performing the character encoding processing. This is because these character encodings actually determine how the bytes are ordered [Endianness], and also determine how many bytes represent a single character. Detailed discussion of Unicode and its associated transformation formats is outside the scope of this document.

Base64 and XML Schema

The XML Schema datatype library defines a core datatype whose value space contains base64 encoded binary data. It is named “base64Binary”. This helps facilitate description of binary element content.

Note: The bas64Binary is the datatype used for defining opaque content within your messages in BPEL PM. You would have seen the usage of this datatype while modeling an adapter interaction.

Listing 1: WSDL illustrating opaque content definition for binary data


Listing 2 : Audit Trail of writing opaque data out


For explicitly base64 encoding a document, XML or otherwise, the product provides a Base64Encoder utility. There aren’t any XPath extension functions that enable base64 encoding of documents.

The utility can be used from within a Java embedding activity to achieve the desired results.

<bpelx:exec name="encodeMessage" language="java" version="1.3" id="BxExe0">


   try {    
     com.collaxa.common.util.Base64Encoder encoder = new com.collaxa.common.util.Base64Encoder();    
     String encodedData = "" + encoder.encode("" + getVariableData("payloadVar"));    

    }catch(Throwable ex) {    

      //Handle errors here



Interestingly, XML Schema does not provide a way to indicate the media type of the binary data. Jonathan Marsh from the WSDL WG has a note on this.

Quoting  JM -

“One aspect of XML-based messages are difficult to fully capture in XML Schema is the meaning of base64-encoded binary data. XML Schema does provide facilities to describe that element content is base64-encoded binary (through the xs:base64Binary simple type), but it does not provide simple and user-accessible facilities to indicate the format of that binary data. The WSDL WG in conjunction with the XMLP WG, developed a W3C WG Note describing schema extensions that allow the media type (or a set of related media types) to be described. Using this facility, a WSDL consumer can determine not only that a specific message should contain base64-encoded binary data, but that that binary data represents a particular media type such as image/jpeg.”

Join the discussion

Comments ( 17 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.