Revision as of 07:26, 4 July 2007 editDraicone (talk | contribs)2,734 edits →External links: Rm link spam. Please see WP:NOT.← Previous edit | Latest revision as of 07:31, 1 January 2025 edit undoZac67 (talk | contribs)Extended confirmed users11,644 edits Reverted 2 edits by Vinaychandranvs (talk): Link spamTags: Twinkle Undo | ||
Line 1: | Line 1: | ||
{{Short description|Group of binary-to-text encoding schemes}} | |||
{{Mergefrom|Radix-64|date=January 2007}} | |||
In ], '''Base64''' is a group of ] schemes that transforms ] into a sequence of ] characters, limited to a set of 64 unique characters. More specifically, the source binary data is taken 6 bits at a time, then this group of 6 bits is mapped to one of 64 unique characters. | |||
As with all binary-to-text encoding schemes, Base64 is designed to carry data stored in binary formats across channels that only reliably support text content. Base64 is particularly prevalent on the ]<ref>{{cite web |title= Base64 encoding and decoding – Web APIs |url= https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding |publisher= MDN Web Docs |archive-url= https://web.archive.org/web/20141111151440/https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding |archive-date= 2014-11-11 |url-status=live}}</ref> where one of its uses is the ability to embed ] or other binary assets inside textual assets such as ] and ] files.<ref>{{cite web |title= When to base64 encode images (and when not to) |date= 28 August 2011 |url=https://www.davidbcalhoun.com/2011/when-to-base64-encode-images-and-when-not-to/ |archive-url= https://web.archive.org/web/20230829143759/https://www.davidbcalhoun.com/2011/when-to-base64-encode-images-and-when-not-to/ |archive-date=2023-08-29 |url-status=live}}</ref> | |||
{{Table Numeral Systems}} | |||
'''Base64''' or '''quadrosexagesimal''' is a ] using a ] of ]. It is the largest ] base that can be represented using only printable ] characters. This has led to its use as a transfer encoding for e-mail among other things. All well-known variants that are known by the name ''Base64'' use the characters A–Z, a–z, and 0–9 in that order for the first 62 digits but the symbols chosen for the last two digits vary considerably between different systems. Several other encoding methods such as ] and later versions of ] use a different set of 64 characters to represent 6 binary digits, but these are never called by the name Base64. | |||
Base64 is also widely used for sending ] attachments, because ] – in its original form – was designed to transport ] characters only. Encoding an attachment as Base64 before sending, and then decoding when received, assures older SMTP servers will not interfere with the attachment. | |||
==Base 64 Encoding Schemes== | |||
===Privacy-Enhanced Mail (PEM)=== | |||
Base64 encoding causes an overhead of 33–37% relative to the size of the original binary data (33% by the encoding itself; up to 4% more by the inserted line breaks). | |||
The first known use of Base 64 encoding for electronic data transfer was the ] (PEM) protocol, proposed by RFC 989 in ]. PEM defines a "printable encoding" scheme that uses Base 64 encoding to transform an arbitrary sequence of ]s to a format that can be expressed in short lines of 7-bit characters, as required by transfer protocols such as ]. | |||
{{TOC limit|3}} | |||
The current version of PEM (specified in RFC 1421) uses a 64-character alphabet consisting of upper- and lower-case ] characters (A–Z, a–z), the numerals (0–9), and the "+" and "/" symbols. The "=" symbol is also used as a special suffix code. The original specification, RFC 989, additionally used the "*" symbol to delimit encoded but unencrypted data within the output stream. | |||
==Design== | |||
To convert data to PEM printable encoding, the first byte is placed in the ] eight bits of a 24-bit buffer, the next in the middle eight, and the third in the ] eight bits. If there are fewer than three bytes left to encode (or in total), the remaining buffer bits will be zero. The buffer is then used, six bits at a time, most significant first, as indices into the string: <code>"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"</code>, and the indicated character is output. | |||
The particular set of 64 characters chosen to represent the 64-digit values for the base varies between implementations. The general strategy is to choose 64 characters that are common to most encodings and that are also printable. This combination leaves the data unlikely to be modified in transit through information systems, such as email, that were traditionally not ].<ref name="autogenerated2006">{{cite IETF |title= The Base16, Base32, and Base64 Data Encodings |rfc= 4648 |date=October 2006 |publisher=] |access-date= March 18, 2010}}</ref> For example, ]'s Base64 implementation uses <code>A</code>–<code>Z</code>, <code>a</code>–<code>z</code>, and <code>0</code>–<code>9</code> for the first 62 values. Other variations share this property but differ in the symbols chosen for the last two values; an example is ]. | |||
The earliest instances of this type of encoding were created for dial-up communication between systems running the same ] – for example, ] for ] and ] for the ] (later adapted for the ]) – and could therefore make more assumptions about what characters were safe to use. For instance, uuencode uses uppercase letters, digits, and many punctuation characters, but no lowercase.<ref name="rfc 1421">{{cite IETF |title= Privacy Enhancement for InternetElectronic Mail: Part I: Message Encryption and Authentication Procedures |rfc= 1421 |date=February 1993 |publisher=] |access-date= March 18, 2010}}</ref><ref name="rfc 2045">{{cite IETF |title= Multipurpose Internet Mail Extensions: (MIME) Part One: Format of Internet Message Bodies |rfc= 2045 |date=November 1996 |publisher=] |access-date= March 18, 2010}}</ref><ref name="rfc 3548">{{cite IETF |title= The Base16, Base32, and Base64 Data Encodings |rfc= 3548 |date=July 2003 |publisher=] |access-date= March 18, 2010}}</ref><ref name="autogenerated2006"/> | |||
The process is repeated on the remaining data until less than four octets remain. If three octets remain, they are processed normally. If fewer than three octets (24 bits) are remaining to encode, the input data is right-padded with zero bits to form an integral multiple of six bits. | |||
==Base64 table from RFC 4648== | |||
After encoding padded data, if two octets were remaining to encode, one "=" character is appended to the output; if one octet was remaining, two "=" characters are appended. This signals the decoder that the zero bits added due to padding should not be emitted in the reconstructed data. This also guarantees that the encoded output length is a multiple of 4 bytes. | |||
<span id="Base64table">This is the Base64 alphabet defined in .</span> See also {{sectionlink||Variants summary table}}. | |||
{|class="wikitable" style="text-align:center" | |||
PEM requires that all encoded lines consist of exactly 64 printable characters, with the exception of the last line, which may contain fewer printable characters. Lines are delimited by whitespace characters according to local (platform-specific) conventions. | |||
|+ Base64 alphabet defined in RFC 4648. | |||
!scope="col"| Index !!scope="col"| Binary !!scope="col"| {{abbr|Char.|Character}} | |||
|rowspan="17"| | |||
!scope="col"| Index !!scope="col"| Binary !!scope="col"| {{abbr|Char.|Character}} | |||
|rowspan="17"| | |||
!scope="col"| Index !!scope="col"| Binary !!scope="col"| {{abbr|Char.|Character}} | |||
|rowspan="17"| | |||
!scope="col"| Index !!scope="col"| Binary !!scope="col"| {{abbr|Char.|Character}} | |||
|- | |||
| 0 || 000000 || <code>A</code> || 16 || 010000 || <code>Q</code> || 32 || 100000 || <code>g</code> || 48 || 110000 || <code>w</code> | |||
|- | |||
| 1 || 000001 || <code>B</code> || 17 || 010001 || <code>R</code> || 33 || 100001 || <code>h</code> || 49 || 110001 || <code>x</code> | |||
|- | |||
| 2 || 000010 || <code>C</code> || 18 || 010010 || <code>S</code> || 34 || 100010 || <code>i</code> || 50 || 110010 || <code>y</code> | |||
|- | |||
| 3 || 000011 || <code>D</code> || 19 || 010011 || <code>T</code> || 35 || 100011 || <code>j</code> || 51 || 110011 || <code>z</code> | |||
|- | |||
| 4 || 000100 || <code>E</code> || 20 || 010100 || <code>U</code> || 36 || 100100 || <code>k</code> || 52 || 110100 || <code>0</code> | |||
|- | |||
| 5 || 000101 || <code>F</code> || 21 || 010101 || <code>V</code> || 37 || 100101 || <code>l</code> || 53 || 110101 || <code>1</code> | |||
|- | |||
| 6 || 000110 || <code>G</code> || 22 || 010110 || <code>W</code> || 38 || 100110 || <code>m</code> || 54 || 110110 || <code>2</code> | |||
|- | |||
| 7 || 000111 || <code>H</code> || 23 || 010111 || <code>X</code> || 39 || 100111 || <code>n</code> || 55 || 110111 || <code>3</code> | |||
|- | |||
| 8 || 001000 || <code>I</code> || 24 || 011000 || <code>Y</code> || 40 || 101000 || <code>o</code> || 56 || 111000 || <code>4</code> | |||
|- | |||
| 9 || 001001 || <code>J</code> || 25 || 011001 || <code>Z</code> || 41 || 101001 || <code>p</code> || 57 || 111001 || <code>5</code> | |||
|- | |||
| 10 || 001010 || <code>K</code> || 26 || 011010 || <code>a</code> || 42 || 101010 || <code>q</code> || 58 || 111010 || <code>6</code> | |||
|- | |||
| 11 || 001011 || <code>L</code> || 27 || 011011 || <code>b</code> || 43 || 101011 || <code>r</code> || 59 || 111011 || <code>7</code> | |||
|- | |||
| 12 || 001100 || <code>M</code> || 28 || 011100 || <code>c</code> || 44 || 101100 || <code>s</code> || 60 || 111100 || <code>8</code> | |||
|- | |||
| 13 || 001101 || <code>N</code> || 29 || 011101 || <code>d</code> || 45 || 101101 || <code>t</code> || 61 || 111101 || <code>9</code> | |||
|- | |||
| 14 || 001110 || <code>O</code> || 30 || 011110 || <code>e</code> || 46 || 101110 || <code>u</code> || 62 || 111110 || <code>+</code> | |||
|- | |||
| 15 || 001111 || <code>P</code> || 31 || 011111 || <code>f</code> || 47 || 101111 || <code>v</code> || 63 || 111111 || <code>/</code> | |||
|- | |||
| colspan="12" | || colspan="2" {{n/a|Padding}} || = | |||
|} | |||
== |
==Examples== | ||
The example below uses ] text for simplicity, but this is not a typical use case, as it can already be safely transferred across all systems that can handle Base64. The more typical use is to encode ] (such as an image); the resulting Base64 data will only contain 64 different ASCII characters, all of which can reliably be transferred across systems that may corrupt the raw source bytes. | |||
The ] (Multipurpose Internet Mail Extensions) specification, defined in RFC 2045, lists "base64" as one of several ] schemes. MIME's base64 encoding is based on that of the RFC 1421 version of PEM: it uses the same 64-character alphabet and encoding mechanism as PEM, and uses the "=" symbol for output padding in the same way. | |||
Here is a well-known ] from ]: | |||
MIME does not specify a fixed length for base64-encoded lines, but it does specify a maximum length of 76 characters. Additionally it specifies that any extra-alphabetic characters must be ignored by a compliant decoder, although most implementations use a <code>CR/LF</code> ] pair to delimit encoded lines. | |||
{{Quote box | |||
Thus, the actual length of MIME-compliant base64-encoded binary data is usually about 137% of the original data length, though for very short messages the overhead can be a lot higher. | |||
| align = none | |||
| style = margin:1em 0; | |||
| border = 2px | |||
| fontsize = 800 | |||
| quote = Many hands make light work. | |||
}} | |||
When the quote (without trailing whitespace) is encoded into Base64, it is represented as a byte sequence of 8-bit-padded ] characters encoded in ]'s Base64 scheme as follows (newlines and white spaces may be present anywhere but are to be ignored on decoding): | |||
===UTF-7=== | |||
], described in RFC 2152, introduced a system called '''Modified Base64'''. This data encoding scheme is used to encode ] as ] characters for use in 7-bit transports such as ]. It is a variant of the base64 encoding used in MIME. | |||
{{Quote box | |||
The "Modified Base64" alphabet consists of the MIME base64 alphabet, but does not use the "=" padding character. UTF-7 is intended for use in mail headers (defined in RFC 2047), and the "=" character is reserved in that context as the escape character for "quoted-printable" encoding. Modified base64 simply omits the padding and ends immediately after the last BASE64 digit containing useful bits (leaving 0-4 unused bits in the last base64 digit) | |||
| align = none | |||
| style = margin:1em 0; | |||
| border = 2px | |||
| fontsize = 800 | |||
| quote={{mono|1=TWFueSBoYW5kcyBtYWtlIGxpZ2h0IHdvcmsu}} | |||
}} | |||
In the above quote, the encoded value of ''Man'' is ''TWFu''. Encoded in ASCII, the characters ''M'', ''a'', and ''n'' are stored as the byte values <code>77</code>, <code>97</code>, and <code>110</code>, which are the 8-bit binary values <code>01001101</code>, <code>01100001</code>, and <code>01101110</code>. These three values are joined together into a 24-bit string, producing <code>010011010110000101101110</code>. Groups of 6 bits (6 bits have a maximum of 2<sup>6</sup> = 64 different binary values) are ] from start to end (in this case, there are four numbers in a 24-bit string), which are then converted into their corresponding Base64 character values. | |||
===OpenPGP=== | |||
OpenPGP, described in RFC 2440, describes ] encoding, also known as "ASCII Armor". Radix-64 is identical to the "base64" encoding described from MIME, with the addition of a 24-bit ] checksum. The checksum is calculated on the input data before encoding; the checksum is then encoded with the same base64 algorithm and concatenated to the output data. | |||
As this example illustrates, Base64 encoding converts three ] into four encoded characters. | |||
===IRCu=== | |||
In the ] used by the ] ] and compatible software, a version of Base 64 encoding is used to encode client/server numerics and binary IP addresses. Client and server numerics have fixed sizes which match up with an exact number of base64 digits so no padding is needed. Binary IP addresses have leading zero bits added to make them fit<!--FIXME: check details of how ipv6 addresses are handled in latest ircu betas-->. The symbol set is slightly different from the MIME alphabet, using instead of +/ to avoid clashes with other parts of the protocol that uses + internally as a marker to begin user modes. | |||
{| class="wikitable" style="text-align:center;" | |||
===RFC 3548=== | |||
|+ Encoding of the source string ⟨Man⟩ in Base64 | |||
RFC 3548 (The Base16, Base32, and Base64 Data Encodings) is an informational (non-normative) memo that attempts to unify the RFC 1421 and RFC 2045 specifications of base64 encodings, alternative-alphabet encodings, and the seldom-used Base 32 and Base 16 encodings. | |||
|- style="font-weight:bold;" | |||
! rowspan=2 scope="row" | Source <br/>ASCII text | |||
! scope="row" | Character | |||
| colspan="8" | M | |||
| colspan="8" | a | |||
| colspan="8" | n | |||
|- | |||
! scope="row" | Octets | |||
| colspan="8" | 77 (0x4d) | |||
| colspan="8" | 97 (0x61) | |||
| colspan="8" | 110 (0x6e) | |||
|- | |||
! colspan=2 scope="row" | Bits | |||
| 0 || 1 || 0 || 0 || 1 || 1 || 0 || 1 | |||
| 0 || 1 || 1 || 0 || 0 || 0 || 0 || 1 | |||
| 0 || 1 || 1 || 0 || 1 || 1 || 1 || 0 | |||
|- | |||
! rowspan=3 scope="row" | Base64<br/>encoded | |||
! scope="row" | Sextets | |||
| colspan="6" | 19 | |||
| colspan="6" | 22 | |||
| colspan="6" | 5 | |||
| colspan="6" | 46 | |||
|- style="font-weight:bold;" | |||
! scope="row" | Character | |||
| colspan="6" | T | |||
| colspan="6" | W | |||
| colspan="6" | F | |||
| colspan="6" | u | |||
|- | |||
! scope="row" | Octets | |||
| colspan="6" | 84 (0x54) | |||
| colspan="6" | 87 (0x57) | |||
| colspan="6" | 70 (0x46) | |||
| colspan="6" | 117 (0x75) | |||
|} | |||
<code>=</code> padding characters might be added to make the last encoded block contain four Base64 characters. | |||
RFC 3548 forbids implementations from adding non-alphabetic characters unless they are written to a specification that refers to RFC 3548 and specifically requires otherwise; it also declares that decoder implementations must reject data that contains non-alphabetic characters unless they are written to a specification that refers to RFC 3548 and specifically requires otherwise. | |||
] to ] transformation is useful to convert between binary and Base64. Such conversion is available for both advanced calculators and programming languages. For example, the hexadecimal representation of the 24 bits above is 4D616E. The octal representation is 23260556. Those 8 octal digits can be split into pairs ({{nowrap|23 26 05 56}}), and each pair is converted to decimal to yield {{nowrap|19 22 05 46}}. Using those four decimal numbers as indices for the Base64 alphabet, the corresponding ASCII characters are ''TWFu''. | |||
===RFC 4648=== | |||
This RFC obsoletes RFC 3548 and focuses on base 64/32/16: | |||
If there are only two significant input octets (e.g., 'Ma'), or when the last input group contains only two octets, all 16 bits will be captured in the first three Base64 digits (18 bits); the two ]s of the last content-bearing 6-bit block will turn out to be zero, and discarded on decoding (along with the succeeding <code>=</code> padding character): | |||
: ''This document describes the commonly used base 64, base 32, and base 16 encoding schemes. It also discusses the use of line-feeds in encoded data, use of padding in encoded data, use of non-alphabet characters in encoded data, use of different encoding alphabets, and canonical encodings.'' | |||
{|class="wikitable" style="text-align:center;" | |||
== Example == | |||
|- style="font-weight:bold;" | |||
! rowspan=2 scope="row" | Source <br/>ASCII text | |||
! scope="row" | Character | |||
| colspan="8" | M | |||
| colspan="8" | a | |||
| colspan="8" rowspan="2" {{n/a|}} | |||
|- | |||
! scope="row" | Octets | |||
| colspan="8" | 77 (0x4d) | |||
| colspan="8" | 97 (0x61) | |||
|- | |||
! colspan=2 scope="row" | Bits | |||
| 0 || 1 || 0 || 0 || 1 || 1 | |||
| 0 || 1 || 0 || 1 || 1 || 0 | |||
| 0 || 0 || 0 || 1 | |||
| style="background-color:lightblue;" | 0 | |||
| style="background-color:lightblue;" | 0 | |||
| {{n/a|{{fsp}}}} | |||
| {{n/a|{{fsp}}}} | |||
| {{n/a|{{fsp}}}} | |||
| {{n/a|{{fsp}}}} | |||
| {{n/a|{{fsp}}}} | |||
| {{n/a|{{fsp}}}} | |||
|- | |||
! rowspan=3 scope="row" | Base64<br/>encoded | |||
! scope="row" | Sextets | |||
| colspan="6" | 19 | |||
| colspan="6" | 22 | |||
| colspan="6" | 4 | |||
| colspan="6" {{n/a|Padding}} | |||
|- style="font-weight:bold;" | |||
! scope="row" | Character | |||
| colspan="6" | T | |||
| colspan="6" | W | |||
| colspan="6" | E | |||
| colspan="6" | = | |||
|- | |||
! scope="row" | Octets | |||
| colspan="6" | 84 (0x54) | |||
| colspan="6" | 87 (0x57) | |||
| colspan="6" | 69 (0x45) | |||
| colspan="6" | 61 (0x3D) | |||
|} | |||
If there is only one significant input octet (e.g., 'M'), or when the last input group contains only one octet, all 8 bits will be captured in the first two Base64 digits (12 bits); the four ]s of the last content-bearing 6-bit block will turn out to be zero, and discarded on decoding (along with the succeeding two <code>=</code> padding characters): | |||
A quote from ] ]: | |||
{| class="wikitable" style="text-align:center;" | |||
:''Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure.'' | |||
|- style="font-weight:bold;" | |||
! rowspan=2 scope="row" | Source <br/>ASCII text | |||
! scope="row" | Character | |||
| colspan="8" | M | |||
| colspan="16" rowspan="2" {{n/a|}} | |||
|- | |||
! scope="row" | Octets | |||
| colspan="8" | 77 (0x4d) | |||
|- | |||
! colspan=2 scope="row" | Bits | |||
| 0 || 1 || 0 || 0 || 1 || 1 | |||
| 0 || 1 | |||
| style="background-color:lightblue;" | 0 | |||
| style="background-color:lightblue;" | 0 | |||
| style="background-color:lightblue;" | 0 | |||
| style="background-color:lightblue;" | 0 | |||
| {{n/a|{{fsp}}}} | |||
is encoded in MIME's base64 scheme as follows: | |||
| {{n/a|{{fsp}}}} | |||
| {{n/a|{{fsp}}}} | |||
| {{n/a|{{fsp}}}} | |||
| {{n/a|{{fsp}}}} | |||
| {{n/a|{{fsp}}}} | |||
| {{n/a|{{fsp}}}} | |||
TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz | |||
| {{n/a|{{fsp}}}} | |||
IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg | |||
| {{n/a|{{fsp}}}} | |||
dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu | |||
| {{n/a|{{fsp}}}} | |||
dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo | |||
| {{n/a|{{fsp}}}} | |||
ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4= | |||
| {{n/a|{{fsp}}}} | |||
|- | |||
! rowspan=3 scope="row" | Base64 <br/>encoded | |||
! scope="row" | Sextets | |||
| colspan="6" | 19 | |||
| colspan="6" | 16 | |||
| colspan="6" {{n/a|Padding}} | |||
| colspan="6" {{n/a|Padding}} | |||
|- style="font-weight:bold;" | |||
! scope="row" | Character | |||
| colspan="6" | T | |||
| colspan="6" | Q | |||
| colspan="6" | = | |||
| colspan="6" | = | |||
|- | |||
! scope="row" | Octets | |||
| colspan="6" | 84 (0x54) | |||
| colspan="6" | 81 (0x51) | |||
| colspan="6" | 61 (0x3D) | |||
| colspan="6" | 61 (0x3D) | |||
|} | |||
===Output padding=== | |||
In the above quote the encoded value of ''Man'' is ''TWFu''. Encoded in ], ''M'', ''a'', ''n'' are stored as the bytes <code>77</code>, <code>97</code>, <code>110</code>, which are <code>01001101</code>, <code>01100001</code>, <code>01101110</code> in base 2. These three bytes are joined together in a 24 bit buffer producing <code>010011010110000101101110</code>. Packs of 6 bits (6 bits has a maximum of 64 different binary values) are converted into 4 numbers (24 = 6x4) which are then converted to their corresponding values in Base 64. | |||
Because Base64 is a six-bit encoding, and because the decoded values are divided into 8-bit octets, every four characters of Base64-encoded text (4 sextets = {{times|4|6}} = 24 bits) represents three octets of unencoded text or data (3 octets = {{times|3|8}} = 24 bits). This means that when the length of the unencoded input is not a multiple of three, the encoded output must have padding added so that its length is a multiple of four. The padding character is <code>=</code>, which indicates that no further bits are needed to fully encode the input. (This is different from <code>A</code>, which means that the remaining bits are all zeros.) The example below illustrates how truncating the input of the above quote changes the output padding: | |||
<!-- This is the encoding of **THE WHOLE** of the above passage and the ending fits in with both the above encoding and the first line of the following example, verified using | |||
{| class="wikitable" | |||
http://www.motobit.com/util/base64-decoder-encoder.asp | |||
| Text content | |||
In the previous version, the example started with a space, which was not visible and thus quite misleading. --> | |||
| colspan="8" align="center"| '''M''' | |||
{|class="wikitable" | |||
| colspan="8" align="center"| '''a''' | |||
! scope="col" colspan=2 | Input | |||
| colspan="8" align="center"| '''n''' | |||
! scope="col" colspan=2 | Output | |||
|- | |||
! scope="col" rowspan=2 | Padding | |||
| ASCII | |||
|- | |||
| colspan="8" align="center"| 77 | |||
! scope="col" | Text | |||
| colspan="8" align="center"| 97 | |||
! scope="col" | Length | |||
| colspan="8" align="center"| 110 | |||
! scope="col" | Text | |||
|- | |||
! scope="col" | Length | |||
| Bit pattern ||0||1||0||0||1||1||0||1||0||1||1||0||0||0||0||1||0||1||1||0||1||1||1||0 | |||
|- | |||
| ''light {{bg|lightgrey|wor}}{{bg|#cef2e0|k.}}'' || 11 | |||
| Index | |||
| {{mono|1=bGlnaHQg{{bg|lightgrey|d29y}}{{bg|#cef2e0|2=ay4=}}}} || 16 | |||
| colspan="6" align="center"| 19 | |||
| 1 | |||
| colspan="6" align="center"| 22 | |||
|- | |||
| colspan="6" align="center"| 5 | |||
| ''light {{bg|lightgrey|wor}}{{bg|#cef2e0|k}}'' || 10 | |||
| colspan="6" align="center"| 46 | |||
| {{mono|1=bGlnaHQg{{bg|lightgrey|d29y}}{{bg|#cef2e0|2=aw==}}}} || 16 | |||
|- | |||
| 2 | |||
| Base64-Encoded | |||
|- | |||
| colspan="6" align="center"| '''T''' | |||
| ''light {{bg|lightgrey|wor}}'' || 9 | |||
| colspan="6" align="center"| '''W''' | |||
| {{mono|1=bGlnaHQg{{bg|lightgrey|d29y}}}} || 12 | |||
| colspan="6" align="center"| '''F''' | |||
| 0 | |||
| colspan="6" align="center"| '''u''' | |||
|- | |||
| ''light {{bg|lightgrey|wo}}'' || 8 | |||
| {{mono|1=bGlnaHQg{{bg|lightgrey|2=d28=}}}} || 12 | |||
| 1 | |||
|- | |||
| ''light {{bg|lightgrey|w}}'' || 7 | |||
| {{mono|1=bGlnaHQg{{bg|lightgrey|2=dw==}}}} || 12 | |||
| 2 | |||
|} | |||
The padding character is not essential for decoding, since the number of missing bytes can be inferred from the length of the encoded text. In some implementations, the padding character is mandatory, while for others it is not used. An exception in which padding characters are required is when multiple Base64 encoded files have been concatenated. | |||
As this example illustrates, Base 64 encoding converts 3 uncoded bytes (in this case, ASCII characters) into 4 encoded ASCII characters. | |||
===Decoding Base64 with padding=== | |||
The example below illustrates how shortening the input changes the output padding: | |||
When decoding Base64 text, four characters are typically converted back to three bytes. The only exceptions are when padding characters exist. A single <code>=</code> indicates that the four characters will decode to only two bytes, while <code>==</code> indicates that the four characters will decode to only a single byte. For example: | |||
{| class="wikitable" | |||
Input ends with: ''carnal pleasure.'' Output ends with: c3VyZS4= | |||
! Encoded !! Padding !! Length !! Decoded | |||
Input ends with: ''carnal pleasure'' Output ends with: c3VyZQ== | |||
|- | |||
Input ends with: ''carnal pleasur'' Output ends with: c3Vy | |||
| {{mono|1=bGlnaHQg{{bg|lightgrey|2=dw==}}}} | |||
Input ends with: ''carnal pleasu'' Output ends with: c3U= | |||
| <code>==</code> || 1 | |||
| ''light {{bg|lightgrey|w}}'' | |||
|- | |||
| {{mono|1=bGlnaHQg{{bg|lightgrey|2=d28=}}}} | |||
| <code>=</code> || 2 | |||
| ''light {{bg|lightgrey|wo}}'' | |||
|- | |||
| {{mono|1=bGlnaHQg{{bg|lightgrey|d29y}}}} | |||
| {{CNone|None}} || 3 | |||
| ''light {{bg|lightgrey|wor}}'' | |||
|} | |||
Another way to interpret the padding character is to consider it as an instruction to discard 2 trailing bits from the bit string each time a <code>=</code> is encountered. For example, when `{{mono|1=bGlnaHQg{{bg|lightgrey|2=dw==}}}}` is decoded, we convert each character (except the trailing occurrences of <code>=</code>) into their corresponding 6-bit representation, and then discard 2 trailing bits for the first <code>=</code> and another 2 trailing bits for the other <code>=</code>. In this instance, we would get 6 bits from the <code>d</code>, and another 6 bits from the <code>w</code> for a bit string of length 12, but since we remove 2 bits for each <code>=</code> (for a total of 4 bits), the <code>dw==</code> ends up producing 8 bits (1 byte) when decoded. | |||
===Decoding Base64 without padding=== | |||
==Implementation== | |||
Without padding, after normal decoding of four characters to three bytes over and over again, fewer than four encoded characters may remain. In this situation, only two or three characters can remain. A single remaining encoded character is not possible, because a single Base64 character only contains 6 bits, and 8 bits are required to create a byte, so a minimum of two Base64 characters are required: The first character contributes 6 bits, and the second character contributes its first 2 bits. For example: | |||
The traditional (MIME) base64 encoding and decoding processes are fairly simple to implement. Here an example using Javascript is given, including the MIME/etc required line breaks at particular line lengths. It is worth noting however, that many base64 functions (e.g. in ]) return base64 encrypted strings without the line breaks, as the line breaks can be inserted easily after encoding, and many times the base64 encoding is desired only for safely transferring data via ] or inserting into a database, etc -- times when the line breaks are known to be unnecessary and therefore undesirable. The newline inserting and removing in these functions here can easily be commented out (they are each only one line in the respective functions) if they are not needed. | |||
{| class="wikitable" | |||
An array of the base 64 characters is necessary for encoding, such as: | |||
! Length !! Encoded !! Length !! Decoded | |||
var base64chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'.split(""); | |||
|- | |||
| 2 || {{mono|1=bGlnaHQg{{bg|lightgrey|dw}}}} | |||
| 1 || ''light {{bg|lightgrey|w}}'' | |||
|- | |||
| 3 || {{mono|1=bGlnaHQg{{bg|lightgrey|d28}}}} | |||
| 2 || ''light {{bg|lightgrey|wo}}'' | |||
|- | |||
| 4 || {{mono|1=bGlnaHQg{{bg|lightgrey|d29y}}}} | |||
| 3 || ''light {{bg|lightgrey|wor}}'' | |||
|} | |||
Decoding without padding is not performed consistently among decoders. In addition, allowing padless decoding by definition allows multiple strings to decode into the same set of bytes, which can be a security risk.<ref>{{cite conference |last1=Chalkias |first1=Konstantinos |last2=Chatzigiannis |first2=Panagiotis |title=Base64 Malleability in Practice |conference=ASIA CCS '22: 2022 ACM on Asia Conference on Computer and Communications Security |date=30 May 2022 |pages=1219–1221 |doi=10.1145/3488932.3527284 |url=https://eprint.iacr.org/2022/361.pdf}}</ref> | |||
And decoding will require the inverse list (swap the indices for the values), such as: | |||
var base64inv = {}; for (var i = 0; i < base64chars.length; i++) { base64inv] = i; } | |||
==Implementations and history== | |||
Note that in real implementations, it is better to explicitly list the entire array/hash for each list above -- the one-liners here are given to demostrate the idea as directly as possible, rather than being the ideal in practice. | |||
===Variants summary table=== | |||
Implementations may have some constraints on the alphabet used for representing some bit patterns. This notably concerns the last two characters used in the alphabet at positions 62 and 63, and the character used for padding (which may be mandatory in some protocols or removed in others). The table below summarizes these known variants and provides links to the subsections below. | |||
{|class="wikitable" style="text-align:center" | |||
! rowspan=2 | Encoding | |||
! colspan=3 | Encoding characters | |||
! colspan=3 | Separate encoding of lines | |||
! rowspan=2 | Decoding non-encoding characters | |||
|- | |||
! 62nd | |||
! 63rd | |||
! ''pad'' | |||
! Separators | |||
! Length | |||
! Checksum | |||
|- | |||
! {{rh}} | {{nowrap|}}: Base64 for ] (deprecated) | |||
| <code>+</code> || <code>/</code> || {{yes|{{code|{{=}}}} mandatory}} | |||
| {{yes|CR+LF}} || {{Yes|64, or lower for the last line}} || {{No}} || {{No}} | |||
|- | |||
! {{rh}} | {{nowrap|}}: Base64 transfer encoding for ] | |||
| <code>+</code> || <code>/</code> || {{yes|{{code|{{=}}}} mandatory}} | |||
| {{yes|CR+LF}} || {{Yes|At most 76}} || {{No}} || {{partial|Discarded}} | |||
|- | |||
! {{rh}} | {{nowrap|}}: Base64 for ] | |||
| <code>+</code> || <code>/</code> || {{No}} | |||
| colspan=3 {{No}} || {{No}} | |||
|- | |||
! {{rh}} | {{nowrap|}}: Base64 encoding for IMAP mailbox names | |||
| <code>+</code> || <code>,</code> || {{No}} | |||
| colspan=3 {{No}} || {{No}} | |||
|- | |||
! {{rh}} | {{nowrap|}}: base64 (standard){{efn|name=common|This variant is intended to provide common features where they are not desired to be specialized by implementations, ensuring robust engineering. This is particularly in light of separate line encodings and restrictions, which have not been considered when previous standards have been co-opted for use elsewhere. Thus, the features indicated here may be overridden.}} | |||
| <code>+</code> || <code>/</code> || {{Optional|{{code|{{=}}}} optional}} | |||
| colspan=3 {{No}} || {{No}} | |||
|- | |||
! {{rh}} | {{nowrap|}}: base64url (URL- and filename-safe standard){{efn|name=common}} | |||
| <code>-</code> || <code>_</code> || {{Optional|{{code|{{=}}}} optional}} | |||
| colspan=3 {{No}} || {{No}} | |||
|- | |||
! {{rh}} | {{nowrap|}}: Radix-64 for ] | |||
| <code>+</code> || <code>/</code> || {{yes|{{code|{{=}}}} mandatory}} | |||
| {{yes|CR+LF}} || {{Yes|At most 76}} || {{Optional|Radix-64 encoded 24-bit ]}} || {{No}} | |||
|- | |||
! {{rh}} | Other variations | |||
| colspan="7" | See {{section link||Applications not compatible with RFC 4648 Base64}} | |||
|} | |||
{{notelist}} | |||
===Privacy-enhanced mail=== | |||
The first known standardized use of the encoding now called MIME Base64 was in the ] (PEM) protocol, proposed by {{IETF RFC|989}} in 1987. PEM defines a "printable encoding" scheme that uses Base64 encoding to transform an arbitrary sequence of ] to a format that can be expressed in short lines of 6-bit characters, as required by transfer protocols such as ].<ref>{{cite IETF |title=Privacy Enhancement for Internet Electronic Mail |rfc=989 |date=February 1987 |publisher=] |access-date=March 18, 2010}}</ref> | |||
The current version of PEM (specified in {{IETF RFC|1421}}) uses a 64-character alphabet consisting of upper- and lower-case ] (<code>A</code>–<code>Z</code>, <code>a</code>–<code>z</code>), the numerals (<code>0</code>–<code>9</code>), and the <code>+</code> and <code>/</code> symbols. The <code>=</code> symbol is also used as a padding suffix.<ref name="rfc 1421"/> The original specification, {{IETF RFC|989}}, additionally used the <code>*</code> symbol to delimit encoded but unencrypted data within the output stream. | |||
The base64 encoding function: | |||
function base64_encode (s) | |||
{ | |||
// the result/encrypted string, the padding string, and the pad count | |||
var r = ""; var p = ""; var c = s.length % 3; | |||
// add a right zero pad to make this string a multiple of 3 characters | |||
if (c > 0) { for (; c < 3; c++) { p += '='; s += "\0"; } } | |||
// increment over the length of the string, three characters at a time | |||
for (c = 0; c < s.length; c += 3) { | |||
// we add newlines after every 76 output characters, according to the MIME specs | |||
if (c > 0 && (c / 3 * 4) % 76 == 0) { r += "\r\n"; } | |||
// these three 8-bit (ASCII) characters become one 24-bit number | |||
var n = (s.charCodeAt(c) << 16) + (s.charCodeAt(c+1) << 8) + s.charCodeAt(c+2); | |||
// this 24-bit number gets separated into four 6-bit numbers | |||
n = ; | |||
// those four 6-bit numbers are used as indices into the base64 character list | |||
r += base64chars] + base64chars] + base64chars] + base64chars]; | |||
// add the actual padding string, after removing the zero pad | |||
} return r.substr(0, r.length - p.length) + p; | |||
} | |||
To convert data to PEM printable encoding, the first byte is placed in the ] eight bits of a 24-bit ], the next in the middle eight, and the third in the ] eight bits. If there are fewer than three bytes left to encode (or in total), the remaining buffer bits will be zero. The buffer is then used, six bits at a time, most significant first, as indices into the string: "<code>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/</code>", and the indicated character is output. | |||
The process is repeated on the remaining data until fewer than four octets remain. If three octets remain, they are processed normally. If fewer than three octets (24 bits) are remaining to encode, the input data is right-padded with zero bits to form an integral multiple of six bits. | |||
The base64 decoding function: | |||
function base64_decode (s) | |||
{ | |||
// replace any incoming padding with a zero pad (the 'A' character is zero) | |||
var p = (s.charAt(s.length-1) == '=' ? (s.charAt(s.length-2) == '=' | |||
? 'AA' : 'A') : ""); var r = ""; s = s.substr(0, s.length - p.length) + p; | |||
// remove/ignore any characters not in the base64 characters list -- particularly newlines | |||
s = s.replace(new RegExp('', 'g'), ""); | |||
// increment over the length of this encrypted string, four characters at a time | |||
for (var c = 0; c < s.length; c += 4) { | |||
// each of these four characters represents a 6-bit index in the base64 characters list | |||
// which, when concatenated, will give the 24-bit number for the original 3 characters | |||
var n = (base64inv << 18) + base64inv + | |||
(base64inv << 12) + (base64inv << 6); | |||
// split the 24-bit number into the original three 8-bit (ASCII) characters | |||
r += String.fromCharCode((n >>> 16) & 255, (n >>> 8) & 255, n & 255); | |||
// remove any zero pad that was added to make this a multiple of 24 bits | |||
} return r.substr(0, r.length - p.length); | |||
} | |||
After encoding the non-padded data, if two octets of the 24-bit buffer are padded-zeros, two <code>=</code> characters are appended to the output; if one octet of the 24-bit buffer is filled with padded-zeros, one <code>=</code> character is appended. This signals the decoder that the zero bits added due to padding should be excluded from the reconstructed data. This also guarantees that the encoded output length is a multiple of 4 bytes. | |||
The above implementation is best with a language like Javascript that handles string concatenation of arbitrary length strings very efficiently. Other languages (e.g. C) will work much more efficiently by allocating memory for a new string/array of the appropriate size (the output string length is easily calculated from the input string at the very beginning) and then simply setting each character index, as opposed to concatenation. | |||
PEM requires that all encoded lines consist of exactly 64 printable characters, with the exception of the last line, which may contain fewer printable characters. Lines are delimited by whitespace characters according to local (platform-specific) conventions. | |||
==URL Applications== | |||
Base64 encoding can be helpful when fairly lengthy identifying information is used in an HTTP environment. ], a database persistence framework for ] objects, uses Base64 encoding to encode a relatively large unique id (generally 128-bit ]s) into a string for use as an HTTP parameter in HTTP forms or HTTP GET ]s. Also, many applications need to encode binary data in a way that is convenient for inclusion in URLs, including in hidden web form fields, and Base64 is a convenient encoding to render them in not only a compact way, but in a relatively unreadable one when trying to obscure the nature of data from a casual human observer. | |||
===MIME=== | |||
Using a URL-encoder on standard Base64, however, is inconvenient as it will translate the '+' and '/' characters into special '%XX' hexadecimal sequences ('+' = '%2B' and '/' = '%2F'). When this is later used with database storage or across heterogeneous systems, they will themselves choke on the '%' character generated by URL-encoders (because the '%' character is also used in ANSI SQL as a wildcard). | |||
{{Main|MIME}} | |||
The ] (Multipurpose Internet Mail Extensions) specification lists Base64 as one of two ] schemes (the other being ]).<ref name="rfc 2045"/> MIME's Base64 encoding is based on that of the {{IETF RFC|1421}} version of PEM: it uses the same 64-character alphabet and encoding mechanism as PEM and uses the <code>=</code> symbol for output padding in the same way, as described at {{IETF RFC|2045}}. | |||
For this reason, a '''modified Base64 for URL''' variant exists, where ''no'' padding '=' will be used, and the '+' and '/' characters of standard Base64 are respectively replaced by '*' and '-', so that using URL encoders/decoders is no longer necessary and has no impact on the length of the encoded value, leaving the same encoded form intact for use in relational databases, web forms, and object identifiers in general. | |||
MIME does not specify a fixed length for Base64-encoded lines, but it does specify a maximum line length of 76 characters. Additionally, it specifies that any character outside the standard set of 64 encoding characters (For example CRLF sequences), must be ignored by a compliant decoder, although most implementations use a CR/LF ] pair to delimit encoded lines. | |||
Another variant called '''modified Base64 for regexps''' uses '!-' instead of '*-' to replace the standard Base64 '+/', because both '+' and '*' may be reserved for ] (note that '' used in the IRCu variant above would not work in that context). | |||
Thus, the actual length of MIME-compliant Base64-encoded binary data is usually about 137% of the original data length ({{fract|4|3}}×{{fract|78|76}}), though for very short messages the overhead can be much higher due to the overhead of the headers. Very roughly, the final size of Base64-encoded binary data is equal to 1.37 times the original data size + 814 bytes (for headers). The size of the decoded data can be approximated with this formula: | |||
There are other variants that use '_-' or '._' when the Base64 variant string must be used within valid identifiers for programs, or '.-' for use in ] name tokens (''Nmtoken''), or even '_:' for use in more restricted XML identifiers (''Name''). | |||
bytes = (string_length(encoded_string) − 814) / 1.37 | |||
===UTF-7=== | |||
==Other applications== | |||
{{Main|UTF-7}} | |||
], described first in {{IETF RFC|1642}}, which was later superseded by {{IETF RFC|2152}}, introduced a system called ''modified Base64''. This data encoding scheme is used to encode ] as ] characters for use in 7-bit transports such as ]. It is a variant of the Base64 encoding used in MIME.<ref>{{cite IETF |title=UTF-7 A Mail-Safe Transformation Format of Unicode |rfc=1642 |date=July 1994 |publisher=] |access-date=March 18, 2010}}</ref><ref>{{cite IETF |title=UTF-7 A Mail-Safe Transformation Format of Unicode |rfc=2152 |date=May 1997 |publisher=] |access-date=March 18, 2010}}</ref> | |||
The "Modified Base64" alphabet consists of the MIME Base64 alphabet, but does not use the "<code>=</code>" padding character. UTF-7 is intended for use in mail headers (defined in {{IETF RFC|2047}}), and the "<code>=</code>" character is reserved in that context as the escape character for "quoted-printable" encoding. Modified Base64 simply omits the padding and ends immediately after the last Base64 digit containing useful bits leaving up to three unused bits in the last Base64 digit. | |||
===OpenPGP=== | |||
{{further|Pretty Good Privacy#OpenPGP}} | |||
], described in {{IETF RFC|4880}}, describes '''Radix-64''' encoding, also known as "]". Radix-64 is identical to the "Base64" encoding described by MIME, with the addition of an optional 24-bit ]. The ] is calculated on the input data before encoding; the checksum is then encoded with the same Base64 algorithm and, prefixed by the "<code>=</code>" symbol as the separator, appended to the encoded output data.<ref>{{cite IETF |title=OpenPGP Message Format |rfc=4880 |date=November 2007 |publisher=] |access-date=March 18, 2010}}</ref> | |||
===RFC 3548=== | |||
{{IETF RFC|3548}}, entitled ''The Base16, Base32, and Base64 Data Encodings'', is an informational (non-normative) memo that attempts to unify the {{IETF RFC|1421}} and {{IETF RFC|2045}} specifications of Base64 encodings, alternative-alphabet encodings, and the Base32 (which is seldom used) and Base16 encodings. | |||
Unless implementations are written to a specification that refers to {{IETF RFC|3548}} and specifically requires otherwise, RFC 3548 forbids implementations from generating messages containing characters outside the encoding alphabet or without padding, and it also declares that decoder implementations must reject data that contain characters outside the encoding alphabet.<ref name="rfc 3548" /> | |||
===RFC 4648=== | |||
{{IETF RFC|4648}} obsoletes {{IETF RFC|3548}} and focuses on Base64/32/16: | |||
: ''This document describes the commonly used Base64, Base32, and Base16 encoding schemes. It also discusses the use of line feeds in encoded data, the use of padding in encoded data, the use of non-alphabet characters in encoded data, use of different encoding alphabets, and canonical encodings.'' | |||
===URL applications=== | |||
Base64 encoding can be helpful when fairly lengthy identifying information is used in an HTTP environment. For example, a database persistence framework for ] objects might use Base64 encoding to encode a relatively large unique id (generally 128-bit ]s) into a string for use as an HTTP parameter in HTTP forms or HTTP GET ]s. Also, many applications need to encode binary data in a way that is convenient for inclusion in URLs, including in hidden web form fields, and Base64 is a convenient encoding to render them in a compact way. | |||
Using standard Base64 in ] requires encoding of '<code>+</code>', '<code>/</code>' and '<code>=</code>' characters into special ] hexadecimal sequences ('<code>+</code>' becomes '<code>%2B</code>', '<code>/</code>' becomes '<code>%2F</code>' and '<code>=</code>' becomes '<code>%3D</code>'), which makes the string unnecessarily longer. | |||
For this reason, '''modified Base64 for URL''' variants exist (such as '''base64url''' in {{IETF RFC|4648}}), where the '<code>+</code>' and '<code>/</code>' characters of standard Base64 are respectively replaced by '<code>-</code>' and '<code>_</code>', so that using ] is no longer necessary and has no effect on the length of the encoded value, leaving the same encoded form intact for use in relational databases, web forms, and object identifiers in general. A popular site to make use of such is ].<ref>{{cite web |title=Here's Why YouTube Will Practically Never Run Out of Unique Video IDs |url=https://www.mentalfloss.com/article/77598/heres-why-youtube-will-never-run-out-unique-video-ids |website=www.mentalfloss.com |access-date=27 December 2021 |language=en |date=23 March 2016}}</ref> Some variants allow or require omitting the padding '<code>=</code>' signs to avoid them being confused with field separators, or require that any such padding be percent-encoded. Some libraries {{which|date=December 2020}} will encode '<code>=</code>' to '<code>.</code>', potentially exposing applications to relative path attacks when a folder name is encoded from user data.{{Citation needed|date=June 2022}} | |||
===Javascript (DOM Web API) === | |||
The <code>atob()</code> and <code>btoa()</code> JavaScript methods, defined in the HTML5 draft specification,<ref>{{cite web|title=7.3. Base64 utility methods|url=https://w3c.github.io/html/webappapis.html#atob|website=HTML 5.2 Editor's Draft|publisher=]|access-date=2 January 2018}} Introduced by , 2021-02-01.</ref> provide Base64 encoding and decoding functionality to web pages. The <code>btoa()</code> method outputs padding characters, but these are optional in the input of the <code>atob()</code> method. | |||
===Other applications=== | |||
] | |||
Base64 can be used in a variety of contexts: | Base64 can be used in a variety of contexts: | ||
* Base64 can be used to transmit and store text that might otherwise cause ] | |||
* ] and ] both use Base64 to obscure e-mail ] | |||
* Base64 is used to encode character strings in ] files | |||
* Base64 is often used as a quick but insecure shortcut to obscure secrets without incurring the overhead of cryptographic ] | |||
* Base64 is often used to embed binary data in an ] file, using a syntax similar to <code><nowiki><data encoding="base64">…</data></nowiki></code> e.g. ]s in ]'s exported <code>bookmarks.html</code>. | |||
* Spammers use Base64 to evade basic anti-] tools, which often do not decode Base64 and therefore cannot detect keywords in encoded messages. | |||
* Base64 is used to encode |
* Base64 is used to encode binary files such as images within scripts, to avoid depending on external files. | ||
* Base64 can be used to embed ] files in HTML pages.<ref>{{Cite web |title=Encode PDF (Portable Document Format) File (.pdf) to Base64 Data |url=https://base64.online/encoders/encode-pdf-to-base64?utm_campaign=og |access-date=2024-03-21 |website=base64.online |language=en}}</ref> | |||
* Base64 is sometimes used to embed binary data in an ] file, using a syntax similar to <data encoding="base64">......</data> ] ]'s <tt>bookmarks.html</tt>. | |||
* The ] can use Base64 to represent file contents. For instance, background images and fonts can be specified in a ] stylesheet file as <code>data:</code> URIs, instead of being supplied in separate files. | |||
* Base64 is also used when communicating with Fiscal Signature/Printing devices (usually, over COM or LPT ports) to minimize the delay when transferring receipt characters for signing. | |||
* Although not part of the official specification for the ] format, some viewers can interpret Base64 when used for embedded elements, such as raster images inside SVG files.<ref>{{cite web|url=http://jsfiddle.net/MxHPq/|title=Edit fiddle |website=jsfiddle.net}}</ref> | |||
* Base64 can be used to store/transmit relatively small amounts of binary data via a computer's text ] functionality, especially in cases where the information doesn't warrant being permanently saved or when information must be quickly sent between a wide variety of different, potentially incompatible programs. An example is the representation of the public keys of ] recipients as Base64 encoded text strings, which can be easily copied and pasted into users' ]. | |||
* Binary data that must be quickly verified by humans as a safety mechanism, such as ] or ], is often represented in Base64 for easy checking, sometimes with additional formattings, such as separating each group of four characters in the representation of a ] key fingerprint with a space. | |||
* ]s which contain binary data will sometimes store it encoded in Base64 rather than simply storing the raw binary data, as there is a stronger guarantee that all QR code readers will accurately decode text, as well as the fact that some devices will more readily save text from a QR code than potentially malicious binary data. | |||
=== Applications not compatible with RFC 4648 Base64 === | |||
Some applications use a Base64 alphabet that is significantly different from the alphabets used in the most common Base64 variants (see ] above). | |||
* The ''']''' alphabet includes no lowercase characters, instead using ASCII codes 32 ("<code> </code>" (space)) through 95 ("<code>_</code>"), consecutively. Uuencoding uses the alphabet <span class="nowrap">"<code> !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ^_</code>"</span>. Avoiding all lower-case letters was helpful, because many older printers only printed uppercase. Using consecutive ASCII characters saved computing power, because it was only necessary to add 32, without requiring a lookup table. Its use of most punctuation characters and the space character may limit its usefulness in some applications, such as those that use these characters as syntax.{{citation needed|date=April 2016}} | |||
* ] (HQX), which was used within the ], excludes some visually confusable characters like '<code>7</code>', '<code>O</code>', '<code>g</code>' and '<code>o</code>'. Its alphabet includes additional punctuation characters. It uses the alphabet <span class="nowrap">"<code><nowiki>!"#$%&'()*+,-012345689@ABCDEFGHIJKLMNPQRSTUVXYZ[`abcdefhijklmpqr</nowiki></code>"</span>. | |||
* A ] environment can use non-synchronized continuation bytes as base64: <code>0b10<b>xxxxxx</b></code>. See ]. | |||
* Several other applications use alphabets similar to the common variations, but in a different order: | |||
** Unix stores password hashes computed with ] in the ] using an encoding called <span id="B64">B64</span>. crypt's alphabet puts the punctuation <code>.</code> and <code>/</code> before the alphanumeric characters. crypt uses the alphabet "<code>./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</code>" without padding. An advantage over RFC 4648 is that sorting encoded ASCII data results in the same order as sorting the plain ASCII data. | |||
** The ''']''' 5.5 standard for genealogical data interchange encodes multimedia files in its text-line hierarchical file format. GEDCOM uses the same alphabet as crypt, which is <span class="nowrap">"<code>./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</code>"</span>.<ref>{{cite web|url=http://homepages.rootsweb.ancestry.com/~pmcbride/gedcom/55gctoc.htm |title=The GEDCOM Standard Release 5.5 |publisher=Homepages.rootsweb.ancestry.com |access-date=2012-06-21}}</ref> | |||
** ''']''' hashes are designed to be used in the same way as traditional crypt(3) hashes, but bcrypt's alphabet is in a different order than crypt's. bcrypt uses the alphabet <span class="nowrap">"<code>./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789</code>"</span>.<ref>{{cite web|url=https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libc/crypt/bcrypt.c?rev=1.1&content-type=text/x-cvsweb-markup|title=src/lib/libc/crypt/bcrypt.c r1.1|author-link=Niels Provos|first=Niels|last=Provos|date=1997-02-13|access-date=2018-05-18}}</ref> | |||
** ''']''' uses a mostly-alphanumeric character set similar to crypt, but using <code>+</code> and <code>-</code> rather than <code>.</code> and <code>/</code>. Xxencoding uses the alphabet <span class="nowrap">"<code>+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</code>"</span>. | |||
** '''6PACK''', used with some ]s, uses an alphabet from 0x00 to 0x3f.<ref>{{cite web|url=http://private.freepage.de/cgi-bin/feets/freepage_ext/41030x030A/rewrite/alexs/xfr/flexnet/6pack_en/6pack.htm|title=6PACK a "real time" PC to TNC protocol|access-date=2013-05-19|archive-date=2012-02-24|archive-url=https://web.archive.org/web/20120224051938/http://private.freepage.de/cgi-bin/feets/freepage_ext/41030x030A/rewrite/alexs/xfr/flexnet/6pack_en/6pack.htm|url-status=dead}}</ref> | |||
** ] supports numeric literals in Base64. Bash uses the alphabet <span class="nowrap">"<code>0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_</code>"</span>.<ref>{{cite web |title=Shell Arithmetic |url=https://www.gnu.org/software/bash/manual/html_node/Shell-Arithmetic.html |website=Bash Reference Manual |access-date=8 April 2020 |quote=Otherwise, numbers take the form n, where the optional base is a decimal number between 2 and 64 representing the arithmetic base, and n is a number in that base.}}</ref> | |||
==See also== | ==See also== | ||
* ] | |||
* ] | |||
* ] | |||
* ] | |||
* ] | |||
* ] | |||
* ] | * ] | ||
* ] (also called Base85) | |||
* ] | |||
* ] | |||
* ] | |||
* ] | |||
* ] for a comparison of various encoding algorithms | |||
* ] | |||
* ] | * ] | ||
== |
==References== | ||
{{Reflist|2}} | |||
*RFC 989 and RFC 1421 (Privacy Enhancement for Electronic Internet Mail) | |||
*RFC 2045 (Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies) | |||
{{Data Exchange}} | |||
*RFC 3548 and RFC 4648 (The Base16, Base32, and Base64 Data Encodings) | |||
* | |||
* tutorial with accompanying lecture slides | |||
* and — Web based Base64 encoding/decoding tools | |||
* | |||
* | |||
] | ] | ||
] | ] | ||
] | ] | ||
] | ] | ||
] | ] | ||
] |
] | ||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] |
Latest revision as of 07:31, 1 January 2025
Group of binary-to-text encoding schemesIn computer programming, Base64 is a group of binary-to-text encoding schemes that transforms binary data into a sequence of printable characters, limited to a set of 64 unique characters. More specifically, the source binary data is taken 6 bits at a time, then this group of 6 bits is mapped to one of 64 unique characters.
As with all binary-to-text encoding schemes, Base64 is designed to carry data stored in binary formats across channels that only reliably support text content. Base64 is particularly prevalent on the World Wide Web where one of its uses is the ability to embed image files or other binary assets inside textual assets such as HTML and CSS files.
Base64 is also widely used for sending e-mail attachments, because SMTP – in its original form – was designed to transport 7-bit ASCII characters only. Encoding an attachment as Base64 before sending, and then decoding when received, assures older SMTP servers will not interfere with the attachment.
Base64 encoding causes an overhead of 33–37% relative to the size of the original binary data (33% by the encoding itself; up to 4% more by the inserted line breaks).
Design
The particular set of 64 characters chosen to represent the 64-digit values for the base varies between implementations. The general strategy is to choose 64 characters that are common to most encodings and that are also printable. This combination leaves the data unlikely to be modified in transit through information systems, such as email, that were traditionally not 8-bit clean. For example, MIME's Base64 implementation uses A
–Z
, a
–z
, and 0
–9
for the first 62 values. Other variations share this property but differ in the symbols chosen for the last two values; an example is UTF-7.
The earliest instances of this type of encoding were created for dial-up communication between systems running the same OS – for example, uuencode for UNIX and BinHex for the TRS-80 (later adapted for the Macintosh) – and could therefore make more assumptions about what characters were safe to use. For instance, uuencode uses uppercase letters, digits, and many punctuation characters, but no lowercase.
Base64 table from RFC 4648
This is the Base64 alphabet defined in RFC 4648 §4 . See also § Variants summary table.
Index | Binary | Char. | Index | Binary | Char. | Index | Binary | Char. | Index | Binary | Char. | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 000000 | A |
16 | 010000 | Q |
32 | 100000 | g |
48 | 110000 | w
| |||
1 | 000001 | B |
17 | 010001 | R |
33 | 100001 | h |
49 | 110001 | x
| |||
2 | 000010 | C |
18 | 010010 | S |
34 | 100010 | i |
50 | 110010 | y
| |||
3 | 000011 | D |
19 | 010011 | T |
35 | 100011 | j |
51 | 110011 | z
| |||
4 | 000100 | E |
20 | 010100 | U |
36 | 100100 | k |
52 | 110100 | 0
| |||
5 | 000101 | F |
21 | 010101 | V |
37 | 100101 | l |
53 | 110101 | 1
| |||
6 | 000110 | G |
22 | 010110 | W |
38 | 100110 | m |
54 | 110110 | 2
| |||
7 | 000111 | H |
23 | 010111 | X |
39 | 100111 | n |
55 | 110111 | 3
| |||
8 | 001000 | I |
24 | 011000 | Y |
40 | 101000 | o |
56 | 111000 | 4
| |||
9 | 001001 | J |
25 | 011001 | Z |
41 | 101001 | p |
57 | 111001 | 5
| |||
10 | 001010 | K |
26 | 011010 | a |
42 | 101010 | q |
58 | 111010 | 6
| |||
11 | 001011 | L |
27 | 011011 | b |
43 | 101011 | r |
59 | 111011 | 7
| |||
12 | 001100 | M |
28 | 011100 | c |
44 | 101100 | s |
60 | 111100 | 8
| |||
13 | 001101 | N |
29 | 011101 | d |
45 | 101101 | t |
61 | 111101 | 9
| |||
14 | 001110 | O |
30 | 011110 | e |
46 | 101110 | u |
62 | 111110 | +
| |||
15 | 001111 | P |
31 | 011111 | f |
47 | 101111 | v |
63 | 111111 | /
| |||
Padding | = |
Examples
The example below uses ASCII text for simplicity, but this is not a typical use case, as it can already be safely transferred across all systems that can handle Base64. The more typical use is to encode binary data (such as an image); the resulting Base64 data will only contain 64 different ASCII characters, all of which can reliably be transferred across systems that may corrupt the raw source bytes.
Here is a well-known idiom from distributed computing:
Many hands make light work.
When the quote (without trailing whitespace) is encoded into Base64, it is represented as a byte sequence of 8-bit-padded ASCII characters encoded in MIME's Base64 scheme as follows (newlines and white spaces may be present anywhere but are to be ignored on decoding):
TWFueSBoYW5kcyBtYWtlIGxpZ2h0IHdvcmsu
In the above quote, the encoded value of Man is TWFu. Encoded in ASCII, the characters M, a, and n are stored as the byte values 77
, 97
, and 110
, which are the 8-bit binary values 01001101
, 01100001
, and 01101110
. These three values are joined together into a 24-bit string, producing 010011010110000101101110
. Groups of 6 bits (6 bits have a maximum of 2 = 64 different binary values) are converted into individual numbers from start to end (in this case, there are four numbers in a 24-bit string), which are then converted into their corresponding Base64 character values.
As this example illustrates, Base64 encoding converts three octets into four encoded characters.
Source ASCII text |
Character | M | a | n | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Octets | 77 (0x4d) | 97 (0x61) | 110 (0x6e) | ||||||||||||||||||||||
Bits | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | |
Base64 encoded |
Sextets | 19 | 22 | 5 | 46 | ||||||||||||||||||||
Character | T | W | F | u | |||||||||||||||||||||
Octets | 84 (0x54) | 87 (0x57) | 70 (0x46) | 117 (0x75) |
=
padding characters might be added to make the last encoded block contain four Base64 characters.
Hexadecimal to octal transformation is useful to convert between binary and Base64. Such conversion is available for both advanced calculators and programming languages. For example, the hexadecimal representation of the 24 bits above is 4D616E. The octal representation is 23260556. Those 8 octal digits can be split into pairs (23 26 05 56), and each pair is converted to decimal to yield 19 22 05 46. Using those four decimal numbers as indices for the Base64 alphabet, the corresponding ASCII characters are TWFu.
If there are only two significant input octets (e.g., 'Ma'), or when the last input group contains only two octets, all 16 bits will be captured in the first three Base64 digits (18 bits); the two least significant bits of the last content-bearing 6-bit block will turn out to be zero, and discarded on decoding (along with the succeeding =
padding character):
Source ASCII text |
Character | M | a | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Octets | 77 (0x4d) | 97 (0x61) | |||||||||||||||||||||||
Bits | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | |||||||
Base64 encoded |
Sextets | 19 | 22 | 4 | Padding | ||||||||||||||||||||
Character | T | W | E | = | |||||||||||||||||||||
Octets | 84 (0x54) | 87 (0x57) | 69 (0x45) | 61 (0x3D) |
If there is only one significant input octet (e.g., 'M'), or when the last input group contains only one octet, all 8 bits will be captured in the first two Base64 digits (12 bits); the four least significant bits of the last content-bearing 6-bit block will turn out to be zero, and discarded on decoding (along with the succeeding two =
padding characters):
Source ASCII text |
Character | M | |||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Octets | 77 (0x4d) | ||||||||||||||||||||||||
Bits | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | |||||||||||||
Base64 encoded |
Sextets | 19 | 16 | Padding | Padding | ||||||||||||||||||||
Character | T | Q | = | = | |||||||||||||||||||||
Octets | 84 (0x54) | 81 (0x51) | 61 (0x3D) | 61 (0x3D) |
Output padding
Because Base64 is a six-bit encoding, and because the decoded values are divided into 8-bit octets, every four characters of Base64-encoded text (4 sextets = 4 × 6 = 24 bits) represents three octets of unencoded text or data (3 octets = 3 × 8 = 24 bits). This means that when the length of the unencoded input is not a multiple of three, the encoded output must have padding added so that its length is a multiple of four. The padding character is =
, which indicates that no further bits are needed to fully encode the input. (This is different from A
, which means that the remaining bits are all zeros.) The example below illustrates how truncating the input of the above quote changes the output padding:
Input | Output | Padding | ||
---|---|---|---|---|
Text | Length | Text | Length | |
light work. | 11 | bGlnaHQgd29yay4= | 16 | 1 |
light work | 10 | bGlnaHQgd29yaw== | 16 | 2 |
light wor | 9 | bGlnaHQgd29y | 12 | 0 |
light wo | 8 | bGlnaHQgd28= | 12 | 1 |
light w | 7 | bGlnaHQgdw== | 12 | 2 |
The padding character is not essential for decoding, since the number of missing bytes can be inferred from the length of the encoded text. In some implementations, the padding character is mandatory, while for others it is not used. An exception in which padding characters are required is when multiple Base64 encoded files have been concatenated.
Decoding Base64 with padding
When decoding Base64 text, four characters are typically converted back to three bytes. The only exceptions are when padding characters exist. A single =
indicates that the four characters will decode to only two bytes, while ==
indicates that the four characters will decode to only a single byte. For example:
Encoded | Padding | Length | Decoded |
---|---|---|---|
bGlnaHQgdw== | == |
1 | light w |
bGlnaHQgd28= | = |
2 | light wo |
bGlnaHQgd29y | None | 3 | light wor |
Another way to interpret the padding character is to consider it as an instruction to discard 2 trailing bits from the bit string each time a =
is encountered. For example, when `bGlnaHQgdw==` is decoded, we convert each character (except the trailing occurrences of =
) into their corresponding 6-bit representation, and then discard 2 trailing bits for the first =
and another 2 trailing bits for the other =
. In this instance, we would get 6 bits from the d
, and another 6 bits from the w
for a bit string of length 12, but since we remove 2 bits for each =
(for a total of 4 bits), the dw==
ends up producing 8 bits (1 byte) when decoded.
Decoding Base64 without padding
Without padding, after normal decoding of four characters to three bytes over and over again, fewer than four encoded characters may remain. In this situation, only two or three characters can remain. A single remaining encoded character is not possible, because a single Base64 character only contains 6 bits, and 8 bits are required to create a byte, so a minimum of two Base64 characters are required: The first character contributes 6 bits, and the second character contributes its first 2 bits. For example:
Length | Encoded | Length | Decoded |
---|---|---|---|
2 | bGlnaHQgdw | 1 | light w |
3 | bGlnaHQgd28 | 2 | light wo |
4 | bGlnaHQgd29y | 3 | light wor |
Decoding without padding is not performed consistently among decoders. In addition, allowing padless decoding by definition allows multiple strings to decode into the same set of bytes, which can be a security risk.
Implementations and history
Variants summary table
Implementations may have some constraints on the alphabet used for representing some bit patterns. This notably concerns the last two characters used in the alphabet at positions 62 and 63, and the character used for padding (which may be mandatory in some protocols or removed in others). The table below summarizes these known variants and provides links to the subsections below.
Encoding | Encoding characters | Separate encoding of lines | Decoding non-encoding characters | ||||
---|---|---|---|---|---|---|---|
62nd | 63rd | pad | Separators | Length | Checksum | ||
RFC 1421: Base64 for Privacy-Enhanced Mail (deprecated) | + |
/ |
= mandatory
|
CR+LF | 64, or lower for the last line | No | No |
RFC 2045: Base64 transfer encoding for MIME | + |
/ |
= mandatory
|
CR+LF | At most 76 | No | Discarded |
RFC 2152: Base64 for UTF-7 | + |
/ |
No | No | No | ||
RFC 3501: Base64 encoding for IMAP mailbox names | + |
, |
No | No | No | ||
RFC 4648 §4: base64 (standard) | + |
/ |
= optional
|
No | No | ||
RFC 4648 §5: base64url (URL- and filename-safe standard) | - |
_ |
= optional
|
No | No | ||
RFC 4880: Radix-64 for OpenPGP | + |
/ |
= mandatory
|
CR+LF | At most 76 | Radix-64 encoded 24-bit CRC | No |
Other variations | See § Applications not compatible with RFC 4648 Base64 |
- ^ This variant is intended to provide common features where they are not desired to be specialized by implementations, ensuring robust engineering. This is particularly in light of separate line encodings and restrictions, which have not been considered when previous standards have been co-opted for use elsewhere. Thus, the features indicated here may be overridden.
Privacy-enhanced mail
The first known standardized use of the encoding now called MIME Base64 was in the Privacy-enhanced Electronic Mail (PEM) protocol, proposed by RFC 989 in 1987. PEM defines a "printable encoding" scheme that uses Base64 encoding to transform an arbitrary sequence of octets to a format that can be expressed in short lines of 6-bit characters, as required by transfer protocols such as SMTP.
The current version of PEM (specified in RFC 1421) uses a 64-character alphabet consisting of upper- and lower-case Roman letters (A
–Z
, a
–z
), the numerals (0
–9
), and the +
and /
symbols. The =
symbol is also used as a padding suffix. The original specification, RFC 989, additionally used the *
symbol to delimit encoded but unencrypted data within the output stream.
To convert data to PEM printable encoding, the first byte is placed in the most significant eight bits of a 24-bit buffer, the next in the middle eight, and the third in the least significant eight bits. If there are fewer than three bytes left to encode (or in total), the remaining buffer bits will be zero. The buffer is then used, six bits at a time, most significant first, as indices into the string: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
", and the indicated character is output.
The process is repeated on the remaining data until fewer than four octets remain. If three octets remain, they are processed normally. If fewer than three octets (24 bits) are remaining to encode, the input data is right-padded with zero bits to form an integral multiple of six bits.
After encoding the non-padded data, if two octets of the 24-bit buffer are padded-zeros, two =
characters are appended to the output; if one octet of the 24-bit buffer is filled with padded-zeros, one =
character is appended. This signals the decoder that the zero bits added due to padding should be excluded from the reconstructed data. This also guarantees that the encoded output length is a multiple of 4 bytes.
PEM requires that all encoded lines consist of exactly 64 printable characters, with the exception of the last line, which may contain fewer printable characters. Lines are delimited by whitespace characters according to local (platform-specific) conventions.
MIME
Main article: MIMEThe MIME (Multipurpose Internet Mail Extensions) specification lists Base64 as one of two binary-to-text encoding schemes (the other being quoted-printable). MIME's Base64 encoding is based on that of the RFC 1421 version of PEM: it uses the same 64-character alphabet and encoding mechanism as PEM and uses the =
symbol for output padding in the same way, as described at RFC 2045.
MIME does not specify a fixed length for Base64-encoded lines, but it does specify a maximum line length of 76 characters. Additionally, it specifies that any character outside the standard set of 64 encoding characters (For example CRLF sequences), must be ignored by a compliant decoder, although most implementations use a CR/LF newline pair to delimit encoded lines.
Thus, the actual length of MIME-compliant Base64-encoded binary data is usually about 137% of the original data length (4⁄3×78⁄76), though for very short messages the overhead can be much higher due to the overhead of the headers. Very roughly, the final size of Base64-encoded binary data is equal to 1.37 times the original data size + 814 bytes (for headers). The size of the decoded data can be approximated with this formula:
bytes = (string_length(encoded_string) − 814) / 1.37
UTF-7
Main article: UTF-7UTF-7, described first in RFC 1642, which was later superseded by RFC 2152, introduced a system called modified Base64. This data encoding scheme is used to encode UTF-16 as ASCII characters for use in 7-bit transports such as SMTP. It is a variant of the Base64 encoding used in MIME.
The "Modified Base64" alphabet consists of the MIME Base64 alphabet, but does not use the "=
" padding character. UTF-7 is intended for use in mail headers (defined in RFC 2047), and the "=
" character is reserved in that context as the escape character for "quoted-printable" encoding. Modified Base64 simply omits the padding and ends immediately after the last Base64 digit containing useful bits leaving up to three unused bits in the last Base64 digit.
OpenPGP
Further information: Pretty Good Privacy § OpenPGPOpenPGP, described in RFC 4880, describes Radix-64 encoding, also known as "ASCII armor". Radix-64 is identical to the "Base64" encoding described by MIME, with the addition of an optional 24-bit CRC. The checksum is calculated on the input data before encoding; the checksum is then encoded with the same Base64 algorithm and, prefixed by the "=
" symbol as the separator, appended to the encoded output data.
RFC 3548
RFC 3548, entitled The Base16, Base32, and Base64 Data Encodings, is an informational (non-normative) memo that attempts to unify the RFC 1421 and RFC 2045 specifications of Base64 encodings, alternative-alphabet encodings, and the Base32 (which is seldom used) and Base16 encodings.
Unless implementations are written to a specification that refers to RFC 3548 and specifically requires otherwise, RFC 3548 forbids implementations from generating messages containing characters outside the encoding alphabet or without padding, and it also declares that decoder implementations must reject data that contain characters outside the encoding alphabet.
RFC 4648
RFC 4648 obsoletes RFC 3548 and focuses on Base64/32/16:
- This document describes the commonly used Base64, Base32, and Base16 encoding schemes. It also discusses the use of line feeds in encoded data, the use of padding in encoded data, the use of non-alphabet characters in encoded data, use of different encoding alphabets, and canonical encodings.
URL applications
Base64 encoding can be helpful when fairly lengthy identifying information is used in an HTTP environment. For example, a database persistence framework for Java objects might use Base64 encoding to encode a relatively large unique id (generally 128-bit UUIDs) into a string for use as an HTTP parameter in HTTP forms or HTTP GET URLs. Also, many applications need to encode binary data in a way that is convenient for inclusion in URLs, including in hidden web form fields, and Base64 is a convenient encoding to render them in a compact way.
Using standard Base64 in URL requires encoding of '+
', '/
' and '=
' characters into special percent-encoded hexadecimal sequences ('+
' becomes '%2B
', '/
' becomes '%2F
' and '=
' becomes '%3D
'), which makes the string unnecessarily longer.
For this reason, modified Base64 for URL variants exist (such as base64url in RFC 4648), where the '+
' and '/
' characters of standard Base64 are respectively replaced by '-
' and '_
', so that using URL encoders/decoders is no longer necessary and has no effect on the length of the encoded value, leaving the same encoded form intact for use in relational databases, web forms, and object identifiers in general. A popular site to make use of such is YouTube. Some variants allow or require omitting the padding '=
' signs to avoid them being confused with field separators, or require that any such padding be percent-encoded. Some libraries will encode '=
' to '.
', potentially exposing applications to relative path attacks when a folder name is encoded from user data.
Javascript (DOM Web API)
The atob()
and btoa()
JavaScript methods, defined in the HTML5 draft specification, provide Base64 encoding and decoding functionality to web pages. The btoa()
method outputs padding characters, but these are optional in the input of the atob()
method.
Other applications
Base64 can be used in a variety of contexts:
- Base64 can be used to transmit and store text that might otherwise cause delimiter collision
- Base64 is used to encode character strings in LDAP Data Interchange Format files
- Base64 is often used to embed binary data in an XML file, using a syntax similar to
<data encoding="base64">…</data>
e.g. favicons in Firefox's exportedbookmarks.html
. - Base64 is used to encode binary files such as images within scripts, to avoid depending on external files.
- Base64 can be used to embed PDF files in HTML pages.
- The data URI scheme can use Base64 to represent file contents. For instance, background images and fonts can be specified in a CSS stylesheet file as
data:
URIs, instead of being supplied in separate files. - Although not part of the official specification for the SVG format, some viewers can interpret Base64 when used for embedded elements, such as raster images inside SVG files.
- Base64 can be used to store/transmit relatively small amounts of binary data via a computer's text clipboard functionality, especially in cases where the information doesn't warrant being permanently saved or when information must be quickly sent between a wide variety of different, potentially incompatible programs. An example is the representation of the public keys of cryptocurrency recipients as Base64 encoded text strings, which can be easily copied and pasted into users' wallet software.
- Binary data that must be quickly verified by humans as a safety mechanism, such as file checksums or key fingerprints, is often represented in Base64 for easy checking, sometimes with additional formattings, such as separating each group of four characters in the representation of a PGP key fingerprint with a space.
- QR codes which contain binary data will sometimes store it encoded in Base64 rather than simply storing the raw binary data, as there is a stronger guarantee that all QR code readers will accurately decode text, as well as the fact that some devices will more readily save text from a QR code than potentially malicious binary data.
Applications not compatible with RFC 4648 Base64
Some applications use a Base64 alphabet that is significantly different from the alphabets used in the most common Base64 variants (see Variants summary table above).
- The Uuencoding alphabet includes no lowercase characters, instead using ASCII codes 32 ("
_
"), consecutively. Uuencoding uses the alphabet "!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ^_
". Avoiding all lower-case letters was helpful, because many older printers only printed uppercase. Using consecutive ASCII characters saved computing power, because it was only necessary to add 32, without requiring a lookup table. Its use of most punctuation characters and the space character may limit its usefulness in some applications, such as those that use these characters as syntax. - BinHex 4 (HQX), which was used within the classic Mac OS, excludes some visually confusable characters like '
7
', 'O
', 'g
' and 'o
'. Its alphabet includes additional punctuation characters. It uses the alphabet "!"#$%&'()*+,-012345689@ABCDEFGHIJKLMNPQRSTUVXYZ[`abcdefhijklmpqr
". - A UTF-8 environment can use non-synchronized continuation bytes as base64:
0b10xxxxxx
. See UTF-8#Self-synchronization. - Several other applications use alphabets similar to the common variations, but in a different order:
- Unix stores password hashes computed with crypt in the
/etc/passwd
file using an encoding called B64. crypt's alphabet puts the punctuation.
and/
before the alphanumeric characters. crypt uses the alphabet "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
" without padding. An advantage over RFC 4648 is that sorting encoded ASCII data results in the same order as sorting the plain ASCII data. - The GEDCOM 5.5 standard for genealogical data interchange encodes multimedia files in its text-line hierarchical file format. GEDCOM uses the same alphabet as crypt, which is "
./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
". - bcrypt hashes are designed to be used in the same way as traditional crypt(3) hashes, but bcrypt's alphabet is in a different order than crypt's. bcrypt uses the alphabet "
./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
". - Xxencoding uses a mostly-alphanumeric character set similar to crypt, but using
+
and-
rather than.
and/
. Xxencoding uses the alphabet "+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
". - 6PACK, used with some terminal node controllers, uses an alphabet from 0x00 to 0x3f.
- Bash supports numeric literals in Base64. Bash uses the alphabet "
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_
".
- Unix stores password hashes computed with crypt in the
See also
- 8BITMIME
- Ascii85 (also called Base85)
- Base16
- Base32
- Base36
- Base62
- Binary-to-text encoding for a comparison of various encoding algorithms
- Binary number
- URL
References
- "Base64 encoding and decoding – Web APIs". MDN Web Docs. Archived from the original on 2014-11-11.
- "When to base64 encode images (and when not to)". 28 August 2011. Archived from the original on 2023-08-29.
- ^ The Base16, Base32, and Base64 Data Encodings. IETF. October 2006. doi:10.17487/RFC4648. RFC 4648. Retrieved March 18, 2010.
- ^ Privacy Enhancement for InternetElectronic Mail: Part I: Message Encryption and Authentication Procedures. IETF. February 1993. doi:10.17487/RFC1421. RFC 1421. Retrieved March 18, 2010.
- ^ Multipurpose Internet Mail Extensions: (MIME) Part One: Format of Internet Message Bodies. IETF. November 1996. doi:10.17487/RFC2045. RFC 2045. Retrieved March 18, 2010.
- ^ The Base16, Base32, and Base64 Data Encodings. IETF. July 2003. doi:10.17487/RFC3548. RFC 3548. Retrieved March 18, 2010.
- Chalkias, Konstantinos; Chatzigiannis, Panagiotis (30 May 2022). Base64 Malleability in Practice (PDF). ASIA CCS '22: 2022 ACM on Asia Conference on Computer and Communications Security. pp. 1219–1221. doi:10.1145/3488932.3527284.
- Privacy Enhancement for Internet Electronic Mail. IETF. February 1987. doi:10.17487/RFC0989. RFC 989. Retrieved March 18, 2010.
- UTF-7 A Mail-Safe Transformation Format of Unicode. IETF. July 1994. doi:10.17487/RFC1642. RFC 1642. Retrieved March 18, 2010.
- UTF-7 A Mail-Safe Transformation Format of Unicode. IETF. May 1997. doi:10.17487/RFC2152. RFC 2152. Retrieved March 18, 2010.
- OpenPGP Message Format. IETF. November 2007. doi:10.17487/RFC4880. RFC 4880. Retrieved March 18, 2010.
- "Here's Why YouTube Will Practically Never Run Out of Unique Video IDs". www.mentalfloss.com. 23 March 2016. Retrieved 27 December 2021.
- "7.3. Base64 utility methods". HTML 5.2 Editor's Draft. World Wide Web Consortium. Retrieved 2 January 2018. Introduced by changeset 5814, 2021-02-01.
- <image xlink:href="data:image/jpeg;base64,
JPEG contents encoded in Base64
" ... /> - "Encode PDF (Portable Document Format) File (.pdf) to Base64 Data". base64.online. Retrieved 2024-03-21.
- "Edit fiddle". jsfiddle.net.
- "The GEDCOM Standard Release 5.5". Homepages.rootsweb.ancestry.com. Retrieved 2012-06-21.
- Provos, Niels (1997-02-13). "src/lib/libc/crypt/bcrypt.c r1.1". Retrieved 2018-05-18.
- "6PACK a "real time" PC to TNC protocol". Archived from the original on 2012-02-24. Retrieved 2013-05-19.
- "Shell Arithmetic". Bash Reference Manual. Retrieved 8 April 2020.
Otherwise, numbers take the form n, where the optional base is a decimal number between 2 and 64 representing the arithmetic base, and n is a number in that base.
Data exchange formats | |
---|---|
Human readable | |
Binary | |
Comparison of data-serialization formats |