Benefits of Base64

Source: Internet
Author: User
Tags 0xc0 uppercase letter

1.

As mentioned in yesterday's MIME note, MIME mainly uses two encoding conversions----quoted-printable and Base64----Convert 8-bit non-English characters to 7-bit ASCII characters.

Although this is intended to satisfy the requirement that non-ASCII characters are not directly used in e-mail, there are other important meanings as well:

A) All binaries can therefore be converted to printable text encoding and edited using text software;

b) The ability to encrypt text simply.

2.

First, a brief introduction to the Quoted-printable encoding conversion mode. It is mainly used in the case of acsii text with a small amount of non-ASCII characters, not suitable for converting pure binary files.

It specifies that each 8-bit byte be converted to 3 characters.

The first character is a "=" number, which is fixed.

The next two characters are two hexadecimal digits representing the first four bits and the last four digits of the byte.

For example, the ASCII code in the "Paging Key" (Form feed) is 12, the binary form is 00001100, written in hexadecimal is 0C, so its encoded value is "=0c". The ASCII value of the "=" number is 61, and the binary form is 00111101 because its encoded value is "=3d". In addition to printable ASCII code, all other characters must be converted in this manner.

All printable ASCII code characters (decimal values from 33 to 126) remain intact, except for "=" (decimal value 61).

3.

The following is a detailed description of how BASE64 is encoded.

The so-called Base64, that is to choose 64 characters----lowercase letter A-Z, uppercase letter A-Z, number 0-9, the symbol "+", "/" (plus the "=" as the pad word, is actually 65 characters)----as a basic character set. All other symbols are then converted to the characters in this character set.

Specifically, the conversion method can be divided into four steps.

The first step is to set every three bytes as a group, which is a total of 24 bits.

In the second step, the 24 bits are divided into four groups, each with 6 bits.

The third step is to add two 00 to the front of each group and expand it to 32 bits, or four bytes.

The fourth step, according to the following table, gets the corresponding symbol for each byte of the extension, which is the encoded value of BASE64.

0 A + R i-Z

1 B S and J 52 0

2 C T-K 53 1

3 D U PNS L 54 2

4 E V M 55 3

5 F N 56 4

6 G X 57 5

7 H x P 58 6

8 I Z-Q 59 7

9 J 60 8

Ten K B (s) 61 9

L-C-T 62 +

63 M/D

N-E V

-O-F-W

P. g x

Q, H, y

Because Base64 converts three bytes into four bytes, Base64 encoded text will be one-third or so larger than the original text.

4.

Give a concrete example of how the English word man turns into Base64 encoding.

Text Content M A N
Ascii 77 97 110
Bit pattern 0 1 0 0 1 1 0 1 0 1 1 0 0 0 0 1 0 1 1 0 1 1 1 0
Index 19 22 5 46
base64-encoded T W F U

In the first step, the ASCII values of "M", "a" and "N" are 77, 97, and 110 respectively, and the corresponding binary values are 01001101, 01100001, 01101110, connect them to a 24-bit binary string 010011010110000101101110.

In the second step, the 24-bit binary string is divided into 4 groups, each set of 6 bits: 010011, 010110, 000101, 101110.

In the third step, add two 00 to the front of each group and expand to 32 bits, which is four bytes: 00010011, 00010110, 00000101, 00101110. Their decimal values are 19, 22, 5, 46, respectively.

Fourth step, according to the table above, get each value corresponding to the BASE64 code, namely T, W, F, U.

So, man's Base64 code is TWFU.

5.

If the number of bytes is less than three, this is handled:

A) Two bytes of case: The two bytes of a total of 16 bits, according to the above rules, to three groups, the last group in addition to the front plus two 0, followed by two 0. This gets a three-bit Base64 code, and then a "=" number at the end.

For example, the "Ma" string is two bytes, can be converted into three groups after 00010011, 00010110, 00010000, corresponding Base64 values are T, W, E, and then a "=" number, so "Ma" Base64 code is twe=.

b) One byte case: The 8 bits of this byte are converted to two groups according to the above rules, and the last group is added 4 0 after the first addition of two 0. This gets a two-bit Base64 code, and then a two "=" number at the end.

For example, the letter "M" is a byte, can be converted to two groups of 00010011, 00010000, the corresponding Base64 values are T, Q, and then two "=" number, so "M" Base64 code is tq==.

6.

Another example of Chinese, how to convert Chinese character "Yan" into Base64 code?

It is important to note that the kanji itself can have many encodings, such as gb2312, Utf-8, GBK, and so on, each of the Base64 corresponding values of the code is different. The following example takes Utf-8 as an example.

First, the Utf-8 code for "Strict" is e4b8a5, written in binary is three bytes of "11100100 10111000 10100101". Convert this 24-bit binary string to the four set of 32-bit binary values "00111001 00001011 00100010 00100101", according to the Rules in section 3rd, with the corresponding decimal numbers 57, 11, 34, 37, their corresponding Base64 values being 5, L, I, L.

Therefore, the Chinese character "strict" (Utf-8 code) Base64 value is 5Lil.

7.

In the PHP language, there are a couple of specialized functions for Base64 conversions: Base64_encode () for encoding, Base64_decode () for decoding.

The feature of these functions is that they are Base64 encoded according to the rules regardless of the encoding of the input text. Therefore, if you want to Base64 the corresponding value under UTF-8 encoding, you must guarantee that the input text is UTF-8 encoded.

8.

This section describes how to encode Base64 in the JavaScript language.

First, assuming that the encoding of the Web page is Utf-8, we would like to have the same Base64 encoded with PHP and JavaScript for the same string.

This creates a problem. Because the strings inside JavaScript are saved in the form of utf-16, we first have to convert the value of Utf-8 to utf-16 and encode it, then decode it, and then we need to turn the value of utf-16 back into Utf-8.

There are already some existing JavaScript functions written on the Web:


/* Utf.js-utf-8 <=> UTF-16 convertion
*
* Copyright (C) 1999 Masanao Izumo <[email protected]>
* version:1.0
* Lastmodified:dec 25 1999
* This library is free. You can redistribute it and/or modify it.
*/

/*
* Interfaces:
* UTF8 = Utf16to8 (UTF16);
* utf16 = utf8to16 (UTF8);
*/

function Utf16to8 (str) {
var out, I, Len, C;

out = "";
len = str.length;
for (i = 0; i < len; i++) {
c = str.charcodeat (i);
if ((c >= 0x0001) && (c <= 0x007F)) {
Out + = Str.charat (i);
} else if (C > 0x07ff) {
Out + = String.fromCharCode (0xE0 | ((c >> b) & 0x0F));
Out + = String.fromCharCode (0x80 | ((c >> 6) & 0x3F));
Out + = String.fromCharCode (0x80 | ((c >> 0) & 0x3F));
} else {
Out + = String.fromCharCode (0xC0 | ((c >> 6) & 0x1F));
Out + = String.fromCharCode (0x80 | ((c >> 0) & 0x3F));
}
}
return out;
}

function Utf8to16 (str) {
var out, I, Len, C;
var char2, Char3;

out = "";
len = str.length;
i = 0;
while (I < Len) {
c = str.charcodeat (i++);
Switch (c >> 4)
{
Case 0:case 1:case 2:case 3:case 4:case 5:case 6:case 7:
0xxxxxxx
Out + = Str.charat (i-1);
Break
Case 12:case 13:
110x xxxx 10xx xxxx
CHAR2 = Str.charcodeat (i++);
Out + = String.fromCharCode (((C & 0x1F) << 6) | (Char2 & 0x3F));
Break
Case 14:
1110 xxxx 10xx xxxx 10xx xxxx
CHAR2 = Str.charcodeat (i++);
CHAR3 = Str.charcodeat (i++);
Out + = String.fromCharCode (((C & 0x0F) << 12) |
((Char2 & 0x3F) << 6) |
((Char3 & 0x3F) << 0));
Break
}
}

return out;
}

The above code defines two functions, and Utf16to8 () is used to turn utf-16 into utf-8,utf8to16 to convert Utf-8 to utf-16.

The following is the real function for base64 encoding.

/* Copyright (C) 1999 Masanao Izumo <[email protected]>
* version:1.0
* Lastmodified:dec 25 1999
* This library is free. You can redistribute it and/or modify it.
*/

/*
* Interfaces:
* b64 = base64encode (data);
* data = Base64decode (B64);
*/


var base64encodechars = "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz0123456789+/";
var base64decodechars = new Array (
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, 62,-1,-1,-1, 63,
52, 53, 54, 55, 56, 57, 58, 59, 60, 61,-1,-1,-1,-1,-1,-1,
-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,-1,-1,-1,-1,-1,
-1, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,-1,-1,-1,-1,-1);

function Base64Encode (str) {
var out, I, Len;
var C1, C2, C3;

len = str.length;
i = 0;
out = "";
while (I < Len) {
C1 = Str.charcodeat (i++) & 0xFF;
if (i = = len)
{
Out + = Base64encodechars.charat (C1 >> 2);
Out + = Base64encodechars.charat ((C1 & 0x3) << 4);
Out + = "=";
Break
}
C2 = Str.charcodeat (i++);
if (i = = len)
{
Out + = Base64encodechars.charat (C1 >> 2);
Out + = Base64encodechars.charat (((C1 & 0x3) << 4) | ((C2 & 0xF0) >> 4));
Out + = Base64encodechars.charat ((C2 & 0xF) << 2);
Out + = "=";
Break
}
C3 = Str.charcodeat (i++);
Out + = Base64encodechars.charat (C1 >> 2);
Out + = Base64encodechars.charat (((C1 & 0x3) << 4) | ((C2 & 0xF0) >> 4));
Out + = Base64encodechars.charat (((C2 & 0xF) << 2) | ((C3 & 0xC0) >>6));
Out + = Base64encodechars.charat (C3 & 0x3F);
}
return out;
}

function Base64decode (str) {
var C1, C2, C3, C4;
var i, Len, out;

len = str.length;
i = 0;
out = "";
while (I < Len) {
/* C1 */
do {
C1 = Base64decodechars[str.charcodeat (i++) & 0xFF];
} while (I < len && C1 = =-1);
if (C1 = =-1)
Break

/* C2 */
do {
C2 = Base64decodechars[str.charcodeat (i++) & 0xFF];
} while (I < len && C2 = =-1);
if (C2 = =-1)
Break

Out + = String.fromCharCode ((C1 << 2) | ((C2 & 0x30) >> 4));

/* C3 */
do {
C3 = Str.charcodeat (i++) & 0xFF;
if (C3 = = 61)
return out;
C3 = Base64decodechars[c3];
} while (I < Len && C3 = =-1);
if (C3 = =-1)
Break

Out + = String.fromCharCode (((C2 & 0XF) << 4) | ((C3 & 0x3C) >> 2));

/* C4 */
do {
C4 = str.charcodeat (i++) & 0xFF;
if (C4 = = 61)
return out;
C4 = base64decodechars[c4];
} while (I < len && C4 = =-1);
if (C4 = =-1)
Break
Out + = String.fromCharCode (((C3 & 0x03) << 6) | c4);
}
return out;
}

The Base64Encode () in the code above is used for encoding, and Base64decode () is used for decoding.

Therefore, the encoding of the Utf-8 character is written like this:

Sencoded=base64encode (Utf16to8 (str));

Then, decode to write this:

Sdecoded=utf8to16 (Base64decode (sencoded));

Finish

Benefits of Base64

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.