The principle of coding--BASE64 coding

Source: Internet
Author: User


BASE64 is a coding method that is typically used to encode binary data as a writable character form of data. This is a reversible encoding method. The encoded data is a string that contains characters such as A-Z, A-Z, 0-9, +,/a total of 64 characters: 26 + 26 + 10 + 1 + 1 = 64.

"Note: It is actually 65 characters," = "is the fill character".

developers are sure to be familiar with Base64 coding, and whether or not they have a clear understanding of it is not necessarily. In fact, Base64 is simply no longer easy, and if it's understood or ambiguous, it shouldn't be. About Base64, take a few minutes to get a thorough understanding of it. The article below is affixed with a Base64 codec, easy to read the article at the same time to experiment.

  I. BASE64 code ORIGIN

Why do you have Base64 code? Because some network transport channels do not support all bytes, such as traditional mail only supports the transmission of visible characters, such as ASCII code control characters can not be transmitted by mail. The use of this is greatly limited, and it is not possible to send all of the bytes of the binary stream as visible characters. The best way to do this is to do an extension to support the delivery of binary files without changing the traditional protocol. The non-printable characters can also be printed with printable characters, the problem is solved. Base64 coding comes into being, Base64 is a representation method of binary data based on 64 printable characters.

  Two. Base64 Coding principle

Take a look at the index table of the Base64, with the characters selected as "A-Z, A-Z, 0-9, +,/" 64 printable characters. The numeric value represents the index of the character, which is specified by the standard BASE64 protocol and cannot be changed. 64 characters with 6 bit can all be said, a byte has 8 bit bit, the remaining two bit is wasted, so you have to sacrifice a portion of space. What you need to figure out here is that a Base64 character is 8 bit, but the valid part only has 6 bits on the right and two on the left is always 0.

So how do you use 6 effective bits to represent 8 bits of a traditional character? The least common multiple of 8 and 6 is 24, which means that 3 traditional bytes can be represented by 4 Base64 characters, guaranteeing the same number of significant digits, so that the number of bytes in the 1/3 is more than the Base64 of only 6 effective bits. You can also say that with two Base64 characters can also represent a traditional character, but the use of least common multiple scheme is actually the most waste reduction. The diagram below is easier to understand. Man is three characters, altogether 24 valid bit, had to use 4 Base64 characters to gather up 24 valid bits. The red box indicates that the corresponding base64,6 effective bit is converted to the corresponding index value and corresponding to the Base64 character Map, and the Base64 character corresponding to "man" is "twfu". Speaking of which, I don't know if you've found it. The smallest unit to convert to Base64 is three bytes , each of which is a three-byte, three-byte conversion for a string, corresponding to the four bytes of Base64. That's pretty much the truth.

But what if you didn't find enough three bytes to convert to the end? Wish finally realized, we can use two Base64 to represent a character or three Base64 to represent two characters, like a corresponding to the second Base64 of the bits only two, put the four behind 0. So a corresponds to the Base64 character is QQ. As I have already said, the principle is that the smallest unit of the Base64 character is a group of four characters , which is only two characters and two "=" behind. In fact, do not "=" do not delay decoding, the reason for using "=", it may be considered that the multi-segment encoding after the BASE64 string will not cause confusion. This shows that the Base64 string can only last one or two "=", the middle is impossible to appear "=". The encoding process for the Chinese character "BC" is the same.

Three. Summary 

Speaking of BASE64 encoding can be a bit odd, because most encodings are converted from characters to binary, and the process of converting from binary to character is called decoding. And the concept of Base64 is exactly reversed, from binary to character called encoding, from character to binary called decoding.

BASE64 encoding is mainly used in the transmission, storage, representation of the field of binary, but also can be used to encrypt, but this kind of encryption is relatively simple, just a glance does not know what content, of course, you can also Base64 character sequences to be customized to encrypt.

BASE64 encoding is the process of binary to character, like some Chinese characters with different encoding to binary, the resulting binary is not the same, so the resulting Base64 character is not the same. For example, "Internet" corresponds to the utf-8 format of the BASE64 encoding is "5LIK572R", corresponding to GB2312 format Base64 encoding is "yc/n+a==".

The principle of coding--BASE64 coding

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

Tags Index: