The following is a detailed description of how BASE64 is encoded.
The so-called Base64, that is to choose 64 characters----lowercase letter A-Z, uppercase letter A-Z, number 0-9, the symbol "+", "/" (plus the "=" as the pad word, is actually 65 characters)----as a basic character set. All other symbols are then converted to the characters in this character set.
Specifically, the conversion method can be divided into four steps.
The first step is to set every three bytes as a group, which is a total of 24 bits.
In the second step, the 24 bits are divided into four groups, each with 6 bits.
The third step is to add two 00 to the front of each group and expand it to 32 bits, or four bytes.
The fourth step, according to the following table, gets the corresponding symbol for each byte of the extension, which is the encoded value of BASE64.
0 A + R i-Z
1 B S and J 52 0
2 C T-K 53 1
3 D U PNS L 54 2
4 E V M 55 3
5 F N 56 4
6 G X 57 5
7 H x P 58 6
8 I Z-Q 59 7
9 J 60 8
Ten K B (s) 61 9
L-C-T 62 +
63 M/D
N-E V
-O-F-W
P. g x
Q, H, y
Because Base64 converts three bytes into four bytes, Base64 encoded text will be one-third or so larger than the original text.
1.
Give a concrete example of how the English word man turns into Base64 encoding.
Text Content |
M |
A |
N |
Ascii |
77 |
97 |
110 |
Bit pattern |
0 |
1 |
0 |
0 |
1 |
1 |
0 |
1 |
0 |
1 |
1 |
0 |
0 |
0 |
0 |
1 |
0 |
1 |
1 |
0 |
1 |
1 |
1 |
0 |
Index |
19 |
22 |
5 |
46 |
base64-encoded |
T |
W |
F |
U |
In the first step, the ASCII values of "M", "a" and "N" are 77, 97, and 110 respectively, and the corresponding binary values are 01001101, 01100001, 01101110, connect them to a 24-bit binary string 010011010110000101101110.
In the second step, the 24-bit binary string is divided into 4 groups, each set of 6 bits: 010011, 010110, 000101, 101110.
In the third step, add two 00 to the front of each group and expand to 32 bits, which is four bytes: 00010011, 00010110, 00000101, 00101110. Their decimal values are 19, 22, 5, 46, respectively.
Fourth step, according to the table above, get each value corresponding to the BASE64 code, namely T, W, F, U.
So, man's Base64 code is TWFU.
2.
If the number of bytes is less than three, this is handled:
A) Two bytes of case: The two bytes of a total of 16 bits, according to the above rules, to three groups, the last group in addition to the front plus two 0, followed by two 0. This gets a three-bit Base64 code, and then a "=" number at the end.
For example, the "Ma" string is two bytes, can be converted into three groups after 00010011, 00010110, 00010000, corresponding Base64 values are T, W, E, and then a "=" number, so "Ma" Base64 code is twe=.
b) One byte case: The 8 bits of this byte are converted to two groups according to the above rules, and the last group is added 4 0 after the first addition of two 0. This gets a two-bit Base64 code, and then a two "=" number at the end.
For example, the letter "M" is a byte, can be converted to two groups of 00010011, 00010000, the corresponding Base64 values are T, Q, and then two "=" number, so "M" Base64 code is tq==.
3.
Another example of Chinese, how to convert Chinese character "Yan" into Base64 code?
It is important to note that the kanji itself can have many encodings, such as gb2312, Utf-8, GBK, and so on, each of the Base64 corresponding values of the code is different. The following example takes Utf-8 as an example.
First, the Utf-8 code for "Strict" is e4b8a5, written in binary is three bytes of "11100100 10111000 10100101". Convert this 24-bit binary string to the four set of 32-bit binary values "00111001 00001011 00100010 00100101", according to the Rules in section 3rd, with the corresponding decimal numbers 57, 11, 34, 37, their corresponding Base64 values being 5, L, I, L.
Therefore, the Chinese character "strict" (Utf-8 code) Base64 value is 5Lil.
4.
In the PHP language, there are a couple of specialized functions for Base64 conversions: Base64_encode () for encoding, Base64_decode () for decoding.
The feature of these functions is that they are Base64 encoded according to the rules regardless of the encoding of the input text. Therefore, if you want to Base64 the corresponding value under UTF-8 encoding, you must guarantee that the input text is UTF-8 encoded.
BASE64 encoding and Conversion method