Base64 Explained

 

Base64 is a mechanism to enable representing and transferring binary data over mediums that allow only printable characters.It is most popular form of the “Base Encoding”, the others known in use being Base16 and Base32.

The need for Base64 arose from the need to attach binary content to emails like images, videos or arbitrary binary content .  Since SMTP [RFC 5321] only allowed 7-bit US-ASCII characters within the messages,  there was a need to represent these binary octet streams using the seven bit ASCII characters.

Here is what RFC 5321 [Simple Mail Transfer protocol] states.

“Commands and replies are composed of characters from the ASCII character set [6]. When the transport service provides an 8-bit byte (octet) transmission channel, each 7-bit character is transmitted, right justified, in an octet with the high-order bit cleared to zero. More specifically, the unextended SMTP service provides 7-bit transport only. An originating SMTP client that has not successfully negotiated an appropriate extension with a particular server (see the next paragraph) MUST NOT transmit messages with information in the high-order bit of octets. If such messages are transmitted in violation of this rule, receiving SMTP servers MAY clear the high- order bit or reject the message as invalid.”

This led to the evolution and popularity of the Internet Standards like MIME [stands for Multipurpose Internet Mail Extensions]. MIME provided mechanisms to allow things like writing text using characters from a repertoire that require a different character encoding, and more importantly, allow one or more binary attachments to the e-mail.

Since the underlying medium supports only plain ASCII text, MIME defined a set of binary-to-text encodings that enable capturing these binary octet streams into printable ASCII characters that can be used with mediums like SMTP. Base64 is one such binary-to-text encoding. For this purpose, MIME defines, among others, a header named Content transfer encoding that indicates whether a binary-to-text encoding has been applied on the message content, and if so, specifies the actual encoding that has been employed. Base64 is one of them.

On the other hand, 7 bit ASCII characters contain a set of 94 printable characters and 33 non-printable ones. 64 is the highest power of 2 that can be represented using only printable characters that are mostly common among different character encodings in existence, most importantly, ASCII. What this means is that a hypothetical Base128 encoding will not be limited to the permitted set of printable characters, and hence will be unsuitable for the use-case at hand.

The following is the character subset of US-ASCII that is used for Base64.

  1. [a-z] – 26 characters           
  2. [A-Z] – 26 characters         
  3. [0-9] – 10 characters          
  4. [+]  - 1 character [filler character]
  5. [/]   - 1 character [filler character]
  6. [=]  - Used for Padding purposes, as explained later.

Since the numerals and alphabets make up for only 62 characters in all, Base64 chose “+” and “/” to fill the gap. The following is an excerpt from RFC 4648 illustrating the Alphabet for Base64.

base64table

The Encoding Process

The process to encode the input stream is fairly straightforward.

a) The octet stream is read from left to right.

b) Three 8-bit groups within the input stream is concatenated to form a 24-bit group.

c) This 24-bit group is further treated as four 6-bit groups that is right justified using zeroes. The grouping into 6 bits is for the simple reason that 6 bits will cover the range of printable characters [0-26-1]

d) Each of these 4 groups is then encoded using the above-mentioned chart in table 1.

The case when the input bit stream contains less than 24 characters will be explained after the following example.

Let’s say we wish to encode the string “ORACLE” using the Base-64 alphabet.

Input String O R A C L E . .
Binary Representation 010011112 010100102 010000012 010000112 010011002 010001012 . .
After regrouping into 6-bit groups.
[Binary and decimal equivalents are shown.]
0100112
[1910]
1101012
[5310]
001001 2
[910]
0000012
[110]
0100002
[1610]
1101002
[5210]
1100012
[4910]
000101\2
[510]
After mapping the above eight 8-bit bytes using Table 1 T 1 J B Q 0 x F

Base64 encoded string : T1JBQ0xF

In the above scenario, the input character string contains 48 bits, an exact multiple of 24 that enables exact grouping into 6 bit groups. But in the event that there are less than 24 characters, or if the number of bits in the input stream is not a multiple of 24, padding is used to make up for the remaining bits.

The methodology for padding is as follows.

Add as many “zero” bits to the right of the 6-bit grouped bit stream so that the total bit length is a multiple of 24. If the modified input data contains any octets that contain only padded zeroes, replace each of those octets with the padding character “=”.

This is illustrated in the next example.

In the table below, the input string contains 40 bits. Hence 8 more bits need to be padded on the right to make up for 48, an exact multiple of 24.

Input String M E N O N   . . .
Binary Representation

010011012

010001012

010011102

010011112

010011102

. . .
After regrouping into 6-bit groups.
[Binary and decimal equivalents are shown.]. The bold zeroes on the right indicates the padded zeroes - 2+6.
0100112
[1910]
0101002
[2010]
010101 2
[2110]

0011102
[1410]

0100112
[1610]

1101002 [5210]

111000

[5610]

000000
=
After mapping the above eight 8-bit bytes using Table 1 T U V O Q 0 4 =

Base64 Encoded String : TUVOQ04=

As evident from the above two examples, Base64 encoded data will always be much larger than the size of the original input stream. It has been estimated that it approximately increases the size by around 137% –i.e. a third more than the original size.

Other Applications for Base64

Base64 has been used for other purposes as well, in addition to being used as a mechanism for content encoding within MIME.

a) Content obfuscation

For instance, it is used for simple obfuscation of data when exchanged between applications. Of course, any base64 encoded string can be reverse engineered to obtain the original set of bytes.Hence it cannot replace any good encryption mechanism. 

b) Binary content handling in Web Services

Base64 can also be used to send to / receive messages with binary content from Web Services. Note that this is not an efficient mechanism for large payloads due to the the size bloat-up caused by the Base-64 transformation.For such use-cases, it is advisable to send the payload as an attachment using  SOAP with Attachments or Message Transmission Optimization Mechanism [MTOM].

Base64 and XML

XML documents can be carriers for binary content as well. Binary data can be base-64 encoded and be specified inline within any XML 1.0 document.

Data within XML entities belong to the Unicode repertoire. The following is the production for a character in an XML 1.0 document.

Char   ::=   #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

 

The XML Document, or any entity within the XML document can declare its character encoding as a part of a character encoding declaration. For the entire document, this is usually specified as a part of the XML document declaration.

<?xml version=”1.0” encoding=”UTF-8”?>

The character encodings supported by an XML processor may vary, but compliance to XML 1.0 requires acceptance of XML documents encoded using Unicode transformation formats[UTF] UTF-8 and UTF-16.

XML 1.0 specification states

“All XML processors MUST accept the UTF-8 and UTF-16 encodings of Unicode.“

UTF-8 and UTF-16 are character encodings, just like US-ASCII or ISO-8859-1. They associate a numeric code with each character in a character repertoire.  Whereas ASCII works off a limited 128 character repertoire, Unicode is much more comprehensive, covering for all major and most minor written languages of the world.  The two transformation formats provide character encodings for all characters in Unicode. When a processing application receives a UTF encoded XML document containing base64 encoded data, the decoding must be performed on the character data obtained after performing the character encoding processing. This is because these character encodings actually determine how the bytes are ordered [Endianness], and also determine how many bytes represent a single character. Detailed discussion of Unicode and its associated transformation formats is outside the scope of this document.

Base64 and XML Schema

The XML Schema datatype library defines a core datatype whose value space contains base64 encoded binary data. It is named “base64Binary”. This helps facilitate description of binary element content.

Note: The bas64Binary is the datatype used for defining opaque content within your messages in BPEL PM. You would have seen the usage of this datatype while modeling an adapter interaction.

Listing 1: WSDL illustrating opaque content definition for binary data

opaque

Listing 2 : Audit Trail of writing opaque data out

opaque_audit

For explicitly base64 encoding a document, XML or otherwise, the product provides a Base64Encoder utility. There aren’t any XPath extension functions that enable base64 encoding of documents.

The utility can be used from within a Java embedding activity to achieve the desired results.

<bpelx:exec name="encodeMessage" language="java" version="1.3" id="BxExe0">

<![CDATA[

   try {    
     com.collaxa.common.util.Base64Encoder encoder = new com.collaxa.common.util.Base64Encoder();    
     String encodedData = "" + encoder.encode("" + getVariableData("payloadVar"));    
     setVariableData("payloadVar",encodedData);     

    }catch(Throwable ex) {    

      //Handle errors here
    }

]]>

</bpelx:exec>

Interestingly, XML Schema does not provide a way to indicate the media type of the binary data. Jonathan Marsh from the WSDL WG has a note on this.

Quoting  JM -

“One aspect of XML-based messages are difficult to fully capture in XML Schema is the meaning of base64-encoded binary data. XML Schema does provide facilities to describe that element content is base64-encoded binary (through the xs:base64Binary simple type), but it does not provide simple and user-accessible facilities to indicate the format of that binary data. The WSDL WG in conjunction with the XMLP WG, developed a W3C WG Note describing schema extensions that allow the media type (or a set of related media types) to be described. Using this facility, a WSDL consumer can determine not only that a specific message should contain base64-encoded binary data, but that that binary data represents a particular media type such as image/jpeg.”

Comments:

Ramkumar - I read your blog entries regularly and found them to be very useful. Great service! I am stuck on a problem now. Hope you can help me. I get SOAP messages from an external applciation into JMS queues. I am able to dequeue the messages from both BPEL and ESB along with the JMS headers with no problems. But since the message itself is a SOAP message, I am unable to proceed further since the xsd does not match with the payload. However, if I put the SOAP header elements into the xsd, JDeveloper craps out with the message "Unable to parse schema". Can you suggest anything that I can try to get this issue fixed? The client applciations cannot change to send non-SOAP messages unfortunately. Thanks for the help in advance, Shanthi

Posted by Shanthi Viswanathan on January 09, 2009 at 08:37 AM PST #

You need to have the schema based off SOAP instead, which may not look like a bright idea. i.e. use SOAP.xsd and edit the body section to have your specific payload to do the trick. I shall see if I have better ideas, but this seems to be a crude way to meet your needs.

Posted by Ramkumar Menon on January 12, 2009 at 02:40 PM PST #

I guess modeling them as opaque seems to be the best choice! Ram

Posted by Ramkumar Menon on January 30, 2009 at 08:01 AM PST #

Hi Ram,liked you blog about base64 encoding.Is there a XSLT and XQuery to do base64 decoding.Thanks in advance for your help. - Neha

Posted by guest on May 17, 2009 at 02:59 PM PDT #

Neha, You can always wrap up the Base64Encoder functionality into a custom XPath function and use it within XSLT. There are samples shipped along with the BPEL install that illustrate how to write custom xpath functions in xslt. HTH, Ram

Posted by Ramkumar Menon on May 17, 2009 at 03:23 PM PDT #

Know whether the site is opened too late.

Posted by arka sokaklar izle on November 16, 2010 at 12:05 PM PST #

The theory's not superb in terms of modernity, but in spite of that distinguished repayment for creativity and earliest approach. Cash Loan Network

Posted by payday no faxing on April 25, 2011 at 06:10 PM PDT #

IP4 4.2.1 03.10.01 Tethered Jailbroken I can't seem to get past the first respring, it crashes with the white SWOD over the pineapple, and doesn't move…and that's with adding nothing from cydia, except SSH. If I don't respring and just close cydia everything works, but I would like to make some springboard and theme mods, i.e. no clock, no carrier, but when I do, and then reboot or respring, it crashes… Any ideas ? Thank you

Posted by MOV Converter on April 25, 2011 at 11:35 PM PDT #

I downloaded fontswap and hardly any of the fonts are compatible. Are you using a different app?No not worth it to me I hate advertising I'll just pay the extra money and have an ad free iPad.Guys, I think you are without a doubt, the last people to 'sled' down Centennial Park Mountain.

Posted by wmv converter for mac on April 26, 2011 at 06:54 AM PDT #

Sorry to alter the subject, but what's the title from the bold sans used for the Best AND WORST OF 2009 header?

Posted by Christian Louboutin on April 26, 2011 at 11:28 AM PDT #

excellent info, keep it comming

Posted by Randell Hnyda on April 27, 2011 at 12:06 AM PDT #

Great beat ! I would like to apprentice whilst you amend your website, how can i subscribe for a blog web site? The account helped me a acceptable deal. I have been tiny bit familiar of this your broadcast offered shiny transparent concept.

Posted by Deedee Soberanis on April 27, 2011 at 06:21 AM PDT #

I congratulate, what words..., an excellent idea

Posted by website value calculator on April 28, 2011 at 06:54 AM PDT #

Nice to be visiting your blog again, it has been months for me. Well this article that i've been waited for so long. I need this article to complete my assignment in the college, and it has same topic with your article. Thanks, great share.

Posted by Auto Power Blogs Review on April 28, 2011 at 03:06 PM PDT #

Thanks so much for sharing all of the awesome info! I am looking forward to checking out more posts!

Posted by born this way lady gaga lyrics on April 28, 2011 at 05:44 PM PDT #

Is there another methods of connect to this web site with no opting-in to the RSS? I am not sure exactly why but I can’t have the RSS filled on my viewer while I can read this from my opera.

Posted by Connie Fletcher on April 28, 2011 at 09:49 PM PDT #

The new Zune browser is surprisingly good, but not as good as the iPod's. It works well, but isn't as fast as Safari, and has a clunkier interface. If you occasionally plan on using the web browser that's not an issue, but if you're planning to browse the web alot from your PMP then the iPod's larger screen and better browser may be important.

Posted by dizi izle on April 29, 2011 at 02:39 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Principal Product Manager

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today