Talking about Base64 coding

Source: Internet
Author: User
Tags control characters

Talking about Base64 coding

I bet you see the word Base64 when you see it, because you used it in the background when you were able to read the article online. If you know something about binary numbers, you can start reading it.

Open an email to view its original information (you can view it in a text editor by collecting and exporting the message). You will see an effect similar to this:

Date:thu, DEC 2003 06:33:07 +0800
From: "Esx?!" <[email protected]' >[email protected]' >[email protected]' >[email protected]>
Reply-to:[email protected]' >[email protected]' >[email protected]' >[email protected]
To: "Snaix" <[email protected]' >[email protected]>
Subject:
X-mailer:foxmail 5.0 Beta2 [CN]
mime-version:1.0
Content-type:text/plain;
charset= "gb2312"
Content-transfer-encoding:base64

Xoo6w6osu25havgncg0koaghodxiysfsu7j2qmfzzty0tcsy4sru08q8/qohdqonckjlc3qgv2lz
Agvziq0kiakjcqkncqghoaghoaghoaghoaghoaghoaghoaghoaghoaegicagicagicagicagicbl
U1g/iq0koaghoaghoaghoaghoaghoaghoaghoaghoaghosagicagicagicagicagihnuywl4qhll
ywgubmv0dqqhoaghoaghoaghoaghoaghoaghoaghoaghoaghoaghoaghicagicagicagmjawmy0x
mi0ynq0k

Have you seen the "Base64" flag? Did you see the line below the tag garbled? Maybe you'll suddenly realize it, right! This is the BASE64 code.

What is Base64?

As defined by RFC2045, Base64 is defined as: the Base64 content transfer encoding is designed to describe an arbitrary sequence of 8-bit bytes as a form that is not easily recognizable by humans. (The Base64 content-transfer-encoding is designed to represent arbitrary sequences of octets in a form that need not being HU Manly readable.)

Why use Base64?

In designing this code, I think the designers are mainly considering 3 questions:
1. Is it encrypted?
2. Complexity and efficiency of cryptographic algorithms
3. How do I handle transfers?

Encryption is positive, but the purpose of encryption is not to send users a very secure email. This type of encryption is mainly "anti-gentleman against the villain." That is, to reach a glance to completely see the content can be.
The complexity and efficiency of cryptographic algorithms cannot be too large and too low based on this purpose. Similar to the previous reason, the protocol for sending email, such as the MIME protocol, addresses how to send and receive email, rather than how to securely send and receive email. Therefore, the complexity of the algorithm is small, high efficiency, or because the sending of e-mail and a lot of resources, the road is a bit crooked.

However, if it is based on the above two points, then we use the simplest method of Caesar, why does the Base64 look more complex than the Caesar law? This is because, for historical reasons, email is only allowed to transmit ASCII characters, which is a 8-byte low of 7 bits, during the transmission of the email. Therefore, if you send an email with non-ASCII characters (that is, the highest bit of the byte is 1), you may have problems with a gateway that has a history issue. The gateway may have the highest position of 0! Obviously, that's how the problem arises! Therefore, in order to be able to send email properly, this problem must be considered! So a scheme like Caesar, which simply changes the position of the letters, will not be. Refer to RFC2046 for this.
Based on some of the main reasons above, Base64 coding is produced.

algorithm Detailed

The BASE64 encoding requires the conversion of 3 8-bit bytes (3*8=24) into 4 6-bit bytes (4*6=24), followed by 6 two in front of 0 bits, forming a 8-bit byte form.
Specific conversion forms between:
String "3"
11010101 11000101 00110011

00110101 00011100 00010100 00110011
Table 1

Consider this: Connecting 8 bits of bytes to a string of 110101011100010100110011
And then each time the order of 6 out and then the 62 binary number before adding two 0, it becomes a new byte. Then select 6, add 0, and so on, until 24 binary numbers are all selected.
Let's take a look at the actual results:

String "3"
11010101 hex:d5 11000101 hex:c5 00110011 hex:33

00110101 00011100 00010100 00110011
Character ' 5 ' character ' ^\ ' character ' ^t ' character ' 3 '
Decimal 530 binary 340 binary 200 binary 51
Table 2

So the "Zhang 3" This string is Base64 expressed as "5^\^t3"? Wrong!
BASE64 encoding is not simply encoded using the converted content. Characters like ' ^\ ' are control characters that cannot be displayed on a computer and cannot be used on certain occasions. The BASE64 has its own encoding table:

Table 1:the Base64 Alphabet
Value Encoding value Encoding value Encoding value Encoding
0 A + R i-Z
1 B S and J 52 0
2 C T-K 53 1
3 D U PNS L 54 2
4 E V M 55 3
5 F N 56 4
6 G X 57 5
7 H x P 58 6
8 I Z-Q 59 7
9 J 60 8
Ten K B (s) 61 9
L-C-T 62 +
63 M/D
N-e-V (PAD) =
-O-F-W
P. g x
Q, H, y
Table 3

This is also the origin of the name of the Base64, and the result of BASE64 encoding is not based on the algorithm to make the encoding high two bits is 0 and the lower 6 for the data, but into the form of the table, such as "a" has 7 bits, and "a" is only 6 bits. In the table, the encoded number corresponds to the decimal value of the resulting new byte. Therefore, the corresponding BASE64 code can be obtained from Table 2:

String "3"
11010101 hex:d5 11000101 hex:c5 00110011 hex:33

00110101 00011100 00010100 00110011
Character ' 5 ' character ' ^\ ' character ' ^t ' character ' 3 '
Decimal 530 binary 340 binary 200 binary 51
Character ' 1 ' character ' I ' character ' U ' character ' z '
Table 4

In this way, the string "3" is encoded into the string "1iUz".
Base64 converts 3 bytes to 4 bytes, so the encoded amount of code (in bytes, below) is approximately 1/3 more than the amount of code before encoding. This is said to be "about", because if the code amount is exactly 3 of the integer multiples, then naturally is more than 1/3. But what if it's not?
The attentive person may have noticed that the last one in the Base64 alphabet has a (pad) = character. The purpose of this character is to deal with this problem.
When the code amount is not an integer multiple of 3, the remainder of the code amount/3 is naturally 2 or 1. Conversion, the result is not enough 6 bits to fill the corresponding position with the zero, and then in the 6-bit front two 0. The result of the empty output is to use the "=" to fill the position. For example, if the last remaining 2 bytes of "Zhang":

String "Zhang"
11010101 hex:d5 11000101 Hex:c5

00110101 00011100 00010100
Decimal 530-Binary 340-in-pad
Character ' 1 ' character ' I ' character ' U ' character ' = '
Table 6

In this way, the last 2 bytes are organized into "1iu=".
Similarly, the Wakahara code has only one byte left, then two "=" will be added. In both cases, the BASE64 encoding will have a maximum of two "=" at the end of the encoding
As for the decoding of the Base64, just a simple coding of the inverse process, the reader can explore by themselves. I will give the decoding algorithm at the end of the article.

Algorithm Implementation
In fact, the algorithm in detail when the basic has been said very clearly. For the program, to remove the constraint judgment, probably can be divided into the following steps:
Read Data 3 bytes? Use and take the first 6 bits, put in a new variable? Move two bits to the right and two bits 0 high? And take the first byte of the last 2 bits and the second byte the first 4 bits shift into the new variable? Shift right two bits, clear 0 ... And so on
Decoding the algorithm of the Class C language implementation:
BYTE lmovebit (int base, int movenum)
{
BYTE result=base;
if (movenum==0) return 1;
if (movenum==1) return movenum;
result=base<< (MoveNum-1);
return result;
}

Char base64_alphabet[]=
{' A ', ' B ', ' C ', ' D ', ' E ', ' F ', ' G ', ' H ', ' I ', ' J ', ' K ', ' L ', ' M ', ' N ', ' O ', ' P ',
' Q ', ' R ', ' S ', ' T ', ' U ', ' V ', ' W ', ' X ', ' Y ', ' Z ', ' A ', ' B ', ' C ', ' d ', ' e ', ' f ',
' G ', ' h ', ' I ', ' j ', ' K ', ' l ', ' m ', ' n ', ' o ', ' P ', ' Q ', ' R ', ' s ', ' t ', ' u ', ' V ',
' W ', ' x ', ' y ', ' z ', ' 0 ', ' 1 ', ' 2 ', ' 3 ', ' 4 ', ' 5 ', ' 6 ', ' 7 ', ' 8 ', ' 9 ', ' + ', '/', ' = '};
BYTE Base64decode (char *base64code, DWORD base64length)
{
Char buf[4];
int i,j;
int k;
int l=0;
BYTE TEMP1[4],TEMP2;
BYTE *buffer=new BYTE[BASE64LENGTH*3/4];
DWORD base64a= (BASE64LENGTH/4)-1;
DWORD base64b=0;
for (; base64b{
for (i=0;i<4;i++)
{
buf[i]=* (base64code+ (base64b*4) +i);
for (j=0;j<65;j++)
{
if (Buf[i]==base64_alphabet[j])
{
Temp1[i]=j;
Break
}
}
}
i--;
for (k=1;k<4;k++)
{
if (temp1[i-(k-1)]==64) {m_padnum++; continue;}
temp1[i-(k-1)]=temp1[i-(k-1)]/lmovebit (2, (k-1) * *);
TEMP2=TEMP1[I-K];
temp2=temp2& (Lmovebit (2,k*2)-1);
Temp2*=lmovebit (2,8-(2*k));//move 4
temp1[i-(k-1)]=temp1[i-(k-1)]+temp2;
buffer[base64b*3+ (3-k)]=temp1[i-(k-1)];
}
}
return Buffer;
}

According to this algorithm, the initial email content of the article can be decoded as:
Hello, Snaix.

This is a Base64 test mail!

Best wishes!
Esx?!
[email protected]' >[email protected]' >[email protected]' >[email protected]
2003-12-25

Please indicate and contact me if there is a problem with the article:[email protected]' >[email protected]

Main references:
RFC2045
RFC2046
The wonderful Base64 code, Horizont
And some other information from the Internet.

The doc document can also be
Http://popscanner.icpcn.com/download/base64.doc
Download

Talking about Base64 coding

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.