Introduction to base64 encoding

Source: Internet
Author: User
I bet when you see the word base64, you will think where you have seen it, because you can see this article online. Article You have already used it in the background. If you know something about the binary number, you can start to read it.

Open an email to view its original information (you can use a text editor to view the received and exported emails ). You will see an effect similar to this:

Date: Thu, 25 Dec 2003 06:33:07 + 0800
From: "ESX ?! "<Snaix@yeah.net '> snaix@yeah.net>
Reply-to: snaix@yeah.net '> snaix@yeah.net
To: "Snail IX" <snaix@126.com '> snaix@126.com>
Subject:
X-mailer: Foxmail 5.0 beta2 [CN]
Mime-type: 1.0
Content-Type: text/plain;
Charset = "gb2312"
Content-transfer-encoding: base64

Xoo6w6osu25havgncg0koaghodxiysfsu7j2qmfzzty0tcsy4sru08q8/qohdqonckjlc3qgv2lz
Bytes
U1g/Snapshot
Bytes
Mi0ynq0k

Have you seen the "base64" mark? Have you seen a line of garbled characters marked below? Maybe you will suddenly realize, right! This is base64 encoding.

What is base64?

According to rfc2045, base64 is defined as base64 content Transfer Encoding. It is designed to describe the 8-bit bytes of any sequence as a form that is not easily recognized by people. (The base64 content-transfer-encoding is designed to represent arbitrary sequences of octets in a form that need not be humanly readable .)

Why use base64?

When designing this code, I think the designers mainly consider three issues:
1. encryption?
2. EncryptionAlgorithmComplexity and Efficiency
3. How to Handle transmission?

Encryption is positive, but the purpose of encryption is not to allow users to send very secure emails. This encryption method is mainly used to "Prevent the gentleman from defending against the villain ". That is, you can see nothing at a glance.
Based on this purpose, the complexity and efficiency of the encryption algorithm cannot be too large or too low. Similar to the previous reason, mime protocols are used to send and receive emails, rather than secure ones. Therefore, the complexity of the algorithm is small and the efficiency is high. Otherwise, resources are greatly occupied by email sending, and the path is a bit distorted.

However, if it is based on the above two points, we can use the simplest Caesar method. Why does base64 seem more complex than Caesar? This is because, during the email transmission process, because of historical reasons, the email is only allowed to transmit ASCII characters, that is, an 8-bit low 7-bit. Therefore, if you send an email with non-ASCII characters (that is, the maximum bit of the byte is 1), you may encounter problems through the gateway with "historical problems. The Gateway may set the maximum position to 0! Obviously, this is the case! Therefore, in order to send emails normally, this issue must be considered! Therefore, the solutions such as Caesar, which only relies on changing the positions of letters, will not work. For more information, see rfc2046.
Base64 encoding is generated based on the preceding reasons.

Algorithm details

Base64 encoding requires that three 8-bit bytes (3*8 = 24) be converted into four 6-bit bytes (4*6 = 24 ), then add two zeros before the six bits to form the 8-bit one-byte format.
Specific conversion formats:
String "3"
11010101 11000101 00110011

00110101 00011100 00010100 00110011
Table 1

Consider this: connect 8-bit bytes into a string of 110101011100010100110011
Then, six values are selected in order, and then two zeros are added before the six binary numbers to form a new byte. Then select 6 more, add 0, and so on until all 24 binary numbers are selected.
Let's take a look at the actual results:

String "3"
11010101 HEX: D5 11000101 HEX: C5 00110011 HEX: 33

00110101 00011100 00010100 00110011
'5' character '^ "' character '^ t' character '3'
Decimal 53 decimal 34 decimal 20 decimal 51
Table 2

In this case, is the string "3" represented as "5 ^" ^ T3 "by base64 ?. Error!
Base64 encoding is not simply based on the converted content. The character like '^ "' is a control character and cannot be displayed on a computer. It cannot be used in some cases. Base64 has its own encoding table:

Table 1: The base64 alphabet
Value encoding value Encoding
0 A 17 R 34 I 51 Z
1 B 18 S 35 J 52 0
2 C 19 t 36 K 53 1
3 D 20 u 37 L 54 2
4 E 21 V 38 m 55 3
5 F 22 W 39 n 56 4
6g 23x40 o 57 5
7 H 24 y 41 P 58 6
8 I 25 Z 42 Q 59 7
9 J 26 A 43 R 60 8
10 K 27 B 44 s 61 9
11 l 28 C 45 t 62 +
12 m 29 D 46 U 63/
13 N 30 E 47 V (PAD) =
14 O 31 F 48 W
15 p 32G 49 x
16 Q 33 H 50 y
Table 3

This is also the origin of the base64 name, And the base64 encoding result is not changed to the data because the encoding is 0 for the higher two digits and 6 for the lower two, but to the form shown in the table above, for example, "A" has seven digits, while "A" has only six digits. In the table, the encoded number corresponds to the decimal value of the new byte. Therefore, we can obtain the corresponding base64 encoding from table 2:

String "3"
11010101 HEX: D5 11000101 HEX: C5 00110011 HEX: 33

00110101 00011100 00010100 00110011
'5' character '^ "' character '^ t' character '3'
Decimal 53 decimal 34 decimal 20 decimal 51
Character '1' character 'I' character 'U' character 'Z'
Table 4

In this way, the string "3" is encoded into the string "1iuz.
Base64 converts three bytes into four bytes. Therefore, the encodedCodeThe amount (in bytes, the same below) is about 1/3 more than the amount of code Before encoding. The reason is "about" is that if the code size is exactly three integer times, it is naturally 1/3 more. But what if not?
Careful people may have noticed that the last character in the base64 alphabet has a (PAD) = character. This character is used to solve this problem.
When the code volume is not an integer multiple of 3, the remainder of the Code volume/3 is 2 or 1. During conversion, if the result is less than six digits, 0 is used to fill in the corresponding position, and then two zeros are added before the six digits. After the empty output result is converted, "=" is used to fill the bits. For example, if the remaining result is two bytes of "sheets ":

String "Zhang"
11010101 HEX: D5 11000101 HEX: C5

00110101 00011100 00010100
Decimal 53 decimal 34 decimal 20 pad
Character '1' character 'I' character 'U' character '='
Table 6

In this way, the last two bytes are sorted into "1iu = ".
Similarly, if the original code has only one byte left, two "=" will be added ". Only in these two cases, the base64 encoding can end with two "=" at most"
Decoding base64 is just a simple inverse process of encoding. You can discuss it yourself. I will give the decoding algorithm at the end of the article.

Algorithm Implementation
In fact, the algorithm details are basically clear. UsedProgramExcept for constraints, you can take the following steps:
Read 3 bytes of data, use and to take the first 6 digits, and move the first two digits to the right of the new variable, take the first 2 bits of the first byte and the first 4 bits of the second byte into the new variable and move the two bits to the right ...... And so on.
The algorithm implemented by C-language decoding:
Byte lmovebit (INT base, int movenum)
{
Byte result = base;
If (movenum = 0) return 1;
If (movenum = 1) return movenum;
Result = base <(MoveNum-1 );
Return result;
}

Char base64_alphabet [] =
{'A', 'B', 'C', 'D', 'E', 'E', 'F', 'G', 'h', 'I ', 'J', 'k', 'l', 'M', 'n', 'O', 'P ',
'Q', 'R', 's', 't', 'U', 'V', 'w', 'x', 'y', 'z ', 'A', 'B', 'C', 'D', 'E', 'E', 'F ',
'G', 'h', 'I', 'J', 'k', 'l', 'M', 'n', 'O', 'P ', 'Q', 'R', 's', 't', 'U', 'V ',
'W', 'x', 'y', 'z', '0', '1', '2', '3', '4', '5 ', '6', '7', '8', '9', '+', '/', '= '};
Byte base64decode (char * base64code, DWORD base64length)
{
Char Buf [4];
Int I, J;
Int K;
Int L = 0;
Byte temp1 [4], temp2;
Byte * buffer = new byte [base64 length * 3/4];
DWORD base64a = (base64length/4)-1;
DWORD base64b = 0;
For (; base64b <base64a + 1; base64b ++)
{
For (I = 0; I <4; I ++)
{
Buf [I] = * (base64code + (base64b * 4) + I );
For (j = 0; j <65; j ++)
{
If (BUF [I] = base64_alphabet [J])
{
Temp1 [I] = J;
Break;
}
}
}
I --;
For (k = 1; k <4; k ++)
{
If (temp1 [I-(k-1)] = 64) {m_padnum ++; continue ;}
Temp1 [I-(k-1)] = temp1 [I-(k-1)]/lmovebit (2, (k-1) * 2 );
Temp2 = temp1 [I-K];
Temp2 = temp2 & (lmovebit (2, K * 2)-1 );
Temp2 * = lmovebit (2, 8-(2 * k); // move 4
Temp1 [I-(k-1)] = temp1 [I-(k-1)] + temp2;
Buffer [base64b * 3 + (3-k)] = temp1 [I-(k-1)];
}
}
Return buffer;
}

According to this algorithm, the e-mail content provided at the beginning of the article can be decoded:
Hello, snaix

This is a base64 test email!

Best wishes!
ESX ?!
Snaix@yeah.net> snaix@yeah.net
2003-12-25

If the article has problems please point out and contact me: snaix@126.com> snaix@126.com

References:
Rfc2045
Rfc2046
Wonderful base64 encoding, Luo Cong
And other materials from the Internet

You can also
Http://popscanner.icpcn.com/download/base64.doc

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.