The following is reproduced from: http://blog.csdn.net/wodeyuer125/article/details/45150223
Developers are sure to be familiar with Base64 coding, and whether or not they have a clear understanding of it is not necessarily. In fact, Base64 is simply no longer easy, and if it's understood or ambiguous, it shouldn't be. About Base64, take a few minutes to get a thorough understanding of it. The article below is affixed with a Base64 codec, easy to read the article at the same time to experiment.
I. BASE64 code ORIGIN
Why do you have Base64 code? Because some network transport channels do not support all bytes, such as traditional mail only supports the transmission of visible characters, such as ASCII code control characters can not be transmitted by mail. The use of this is very limited, such as the picture binary stream of each byte is not all visible characters, so it can not be transmitted. The best way to do this is to do an extension to support the delivery of binary files without changing the traditional protocol. The non-printable characters can also be printed with printable characters, the problem is solved. Base64 coding comes into being, Base64 is a representation method of binary data based on 64 printable characters.
Two. Base64 Coding principle
Take a look at the index table of the Base64, with the characters selected as "A-Z, A-Z, 0-9, +,/" 64 printable characters. The numeric value represents the index of the character, which is specified by the standard BASE64 protocol and cannot be changed. 64 characters with 6 bit can all be said, a byte has 8 bit bit, the remaining two bit is wasted, so you have to sacrifice a portion of space. What you need to figure out here is that a Base64 character is 8 bit, but the valid part only has 6 bits on the right and two on the left is always 0.
So how to use 6 effective bit to represent the 8 bit of the traditional character? The least common multiple of 8 and 6 is 24, which means that 3 traditional bytes can be represented by 4 Base64 characters, guaranteeing the same number of significant digits, so that the number of bytes in the 1/3 is more than the Base64 of only 6 effective bits. You can also say that with two Base64 characters can also represent a traditional character, but the use of least common multiple scheme is actually the most waste reduction. The diagram below is easier to understand. Man is three characters, altogether 24 valid bit, had to use 4 Base64 characters to gather up 24 valid bits. The red box indicates that the corresponding base64,6 effective bit is converted to the corresponding index value and corresponding to the Base64 character Map, and the Base64 character corresponding to "man" is "twfu". Speaking of which, I don't know if you've found it. The smallest unit to convert to Base64 is three bytes, each of which is a three-byte, three-byte conversion for a string, corresponding to the four bytes of Base64. That's pretty much the truth.
But what happens if you find that you don't have enough three bytes to convert to the end. Wish finally realized, we can use two Base64 to represent a character or three Base64 to represent two characters, like the next figure of a corresponding to the second Base64 bits only two, put the four behind 0. So a corresponds to the Base64 character is QQ. As I have already said, the principle is that the smallest unit of the Base64 character is a group of four characters, which is only two characters and two "=" behind. In fact, do not "=" do not delay decoding, the reason for using "=", it may be considered that the multi-segment encoding after the BASE64 string will not cause confusion. This shows that the Base64 string can only last one or two "=", the middle is impossible to appear "=". The encoding of the character "BC" in the image below is the same.
Three. Summary
Speaking of BASE64 encoding can be a bit odd, because most encodings are converted from characters to binary, and the process of converting from binary to character is called decoding. And the concept of Base64 is exactly reversed, from binary to character called encoding, from character to binary called decoding.
BASE64 encoding is mainly used in the transmission, storage, representation of the field of binary, but also can be used to encrypt, but this kind of encryption is relatively simple, just a glance does not know what content, of course, you can also Base64 character sequences to be customized to encrypt.
BASE64 encoding is the process of binary to character, like some Chinese characters with different encoding to binary, the resulting binary is not the same, so the resulting Base64 character is not the same. For example, "Internet" corresponds to the utf-8 format of the BASE64 encoding is "5LIK572R", corresponding to GB2312 format Base64 encoding is "yc/n+a==".