The Base64 thing.

Source: Internet
Author: User

What is Base64?

Base64 is a coded format. is to re-assemble the information flow (byte stream) according to certain specifications, showing the encoding format of completely irrelevant content.

PS. definition is my own summary, I think the definition of knowledge, as long as concise, not wrong, articulate clearly, do not rigidly adhere to a word a word, it is important to really understand its principle can be. (In fact, because I don't know what the standard definition is ...)

Second, the origin of Base64?

People who know a little bit about computer information storage know that inside the computer, everything is stored in binary. and directly to the binary unit for processing, it is obviously inconvenient to deal with, if the order of magnitude is too large, we tend to use the form of increasing the count unit. (interested students can refer to "from 1 to Infinity" this book), so in the computer world (anti-theft connection: This article starting from the http://www.cnblogs.com/jilodream/) there is a byte of this thing. Consists of one byte per 8 bit bits. This allows any information to be divided into how many bytes are consumed. As shown in the following table:

According to everyone's knowledge, the size of the total 52 letters, plus numbers plus special symbols, and so on, a byte can be completely expressed as a single character (the ASCII table is used here).

but for non-English-speaking countries, it's a bit of a headache. It is obviously very expensive to translate the message before it is translated into English and then translated into the native language. How to solve it? is to use multiple bytes to represent one character. So UTF-8, GBK, Unicode and other coding came into being (interested students can check their differences). These codes basically solve the problem that the world's known languages effectively store and flow in computing.

But the problem comes again, and sometimes the devices we use don't support these complex internationalization codes at all.

For example, I sent an e-mail to the leader at home, saying that I had recently been unwell and would like to take a day off to see the world outside.

Here's what I look like:

The leader opened the mail, found that it is a screen garbled, and found that I did not come today.

The leader's expression is this:

When I was in the mood for a vacation, the leader smiled and asked me where I was going these two days.

Here's what I look like:

When the leader and I said: The phone is not through, the mail is garbled, people are missing.

Here's what I look like:

When I explained the thing was this: Balabala.

My expression is this: "This is the case, you may not believe it, but it really is the garbage bin first move the hand."

After listening to my explanation.

The leader's expression is this:

Okay, so, why do I get the message from upstream (i), to the downstream (leadership) there is no understanding of the situation?

This is due to historical reasons, the earlier email was only allowed to transmit ASCII characters (do not know whether the improvement now), that is, a byte of the low 7 bits, that is, 128 kinds of representations. So when I use a variety of international characters to send messages, it is bound to appear the highest bit of bytes is actually 1 of the problem (anti-theft connection: This article starting from http://www.cnblogs.com/jilodream/). Given robustness and fault tolerance, some network devices change the first bit position to 0. The original whole set of non-ASCII characters, it was abruptly to turn into an ASCII code. So when the recipient opens the message for reading, the mail system will receive the data abruptly and then turn to non-ASCC code, everything is late. (Remember earlier, college teachers to study in Europe, Germany, to write e-mail to the teacher can only use pinyin).

In fact, the removal of the gateway, other routes and other hardware devices will appear such an unsupported problem. (for 128~255 these invisible characters, the processing of different routing devices after receiving is not the same, which is why the sending of information often uses transcoding, rather than the use of direct transmission of a single byte of the method (here to see http://www.zhihu.com/question/ 36306744/answer/71626823 Guo an unintentional narrative). In addition to the mail system, many other systems will not support some special characters, causing the sender and receiver, both sides of the information inconsistencies.

Here is a simple example, when we chat over the Internet, the data are sent to the server first. A special character is included in the server discovery information, and the service side itself does not support this special character. Then from the font, find the default character fill. This special character is not displayed when the receiving party receives the message. Instead, there are some default characters, such as ' ◇ ', '-'. (PS, the impression of QQ game is not supported '. ') of this special character).

In order to solve this kind of problems, so Base64 was born.

How does the Base64 handle string encoding?

Here we go straight to the point and explain about the Base64 coding.

First, it is clear that all information must exist in the form of bytes. Then there are three categories of his length for the 3 number:

(1) Length divided by more than 3 0:len%3=0;

(2) length divided by more than 3 1:len%3=1;

(3) length divided by more than 3 2:len%3=2;

Paragraph (1):

We divide every 3 words (anti-theft connection: This article starts from the http://www.cnblogs.com/jilodream/) section into one copy, and then split it into 4 copies on average. Then there are 6 bit bits per serving.

3*8bit=24bit=4*6bit

Then add a 00 to the front of each unit, which makes each unit equal to 8 bits, i.e. a bit.

As the table shows, we can see that each byte is really valid with a bit bit only 6 bits. 2^6=8*8=64. That is, each byte can exhibit 64 characteristics. This is also the origin of Base64.

Paragraph (2):

We divide every 3 bytes into one, and then split it evenly into 4 parts. The last remaining 1 bytes are split into 6+2 form, 8bit=6+2bit, as shown in the following table:

There are two scenarios involved here.

1) for units less than 6bit bits: this directly at the end of the 0, until 6 bits. Here the last remaining 2 bit bits.

2) for units that are not allocated at all, use "=" directly in the cell. Here is the last addition of 2 =.

Similar to the scene (1), the cell number precedes the ' 0 '

Paragraph (3) of the situation

We divide every 3 bytes into one, and then split it evenly into 4 parts. The last remaining 2 bytes are split into 6+6+4 form, 16bit=6*2+2bit, as shown in the following table:

This is similar to the scene (2), which involves the two scenarios below.

1) for units less than 6bit: this straight (anti-theft connection: This article starting from http://www.cnblogs.com/jilodream/) at the end of the 0, until the 6-bit. Here the last remaining 4 bit bits.

2) for units that are not allocated at all, use "=" directly in the cell. Here is the last addition of 1 =.

Thus, regardless of the length of the information, it can be re-divided into a new byte (character) stream with such a rule (Base64 encoding specification).

There is also a copy of the Base64 Code conversion table:

As the following table:

Thus, regardless of any byte stream, we can use the character in the table + ' = ' to represent it. This is called the Base64 coding process. With Base64 coding, we can solve the problems left over from the previous article.

And for the BASE64 decoding process, the principle of the same as above, just a reverse process, here no longer repeat.

Iv. Application of BASE64 coding technology

(1) Transmission of data, as far as possible to do data (anti-theft connection: This article from http://www.cnblogs.com/jilodream/) to the normal transmission in the network, and not because of hardware or software incompatibility, resulting in data distortion.

(2) Simple encryption of data, such as the URL or address space processing (such as Baidu Cloud address, peer-to link), simple Base64 can be done by simple processing, to prevent the naked eye can directly look out the processing rules of the information. At the same time can also effectively carry out the Internet information transmission.

The Base64 thing.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.