An efficient base64 encoding/Decoding Algorithm

Source: Internet
Author: User

What is base64?

According to rfc2045, base64 is defined as base64 content Transfer Encoding. It is designed to describe the 8-bit bytes of any sequence as a form that is not easily recognized by people. (The base64 content-transfer-encoding is designed to represent arbitrary sequences of octets in a form that need not be humanly readable .)

Why use base64?

When designing this code, I think the designers mainly consider three issues:
1. encryption?
2. complexity and efficiency of encryption algorithms
3. How to Handle transmission?

Encryption is positive, but the purpose of encryption is not to allow users to send very secure emails. This encryption method is mainly used to "Prevent the gentleman from defending against the villain ". That is, you can see nothing at a glance.
The complexity and efficiency of encryption algorithms for this purpose cannot be too large or too low. Similar to the previous reason, the mime protocol and other protocols used to send emails solve the problem of how to send and receive emails, rather than how to send and receive emails safely. Therefore, the complexity of the algorithm is small and the efficiency is high. Otherwise, resources are greatly occupied by email sending, and the path is a bit distorted.

However, if it is based on the above two points, we can use the simplest Caesar method. Why does base64 seem more complex than Caesar? This is because, during the email transmission process, for historical reasons, the email is only allowed to transmit ASCII characters, that is, an 8-Byte Low 7-bit. Therefore, if you send an email with a non-ASCII character (that is, the maximum bit of the byte is 1), you may encounter a problem through the gateway with "historical problems. The Gateway may set the maximum position to 0! Obviously, this is the case! Therefore, in order to send emails normally, this issue must be considered! Therefore, the solutions such as Caesar, which only relies on changing the positions of letters, will not work. For more information, see rfc2046.
Base64 encoding is generated based on the preceding reasons.

Algorithm details

Base64 encoding requires that three 8-bit bytes (3*8 = 24) be converted into four 6-bit bytes (4*6 = 24 ), then add two zeros before the six bits to form the 8-bit one-byte format.
Specific conversion formats:
String "3"
11010101 11000101 00110011

00110101 00011100 00010100 00110011
Table 1

Consider this: connect 8-bit bytes into a string of 110101011100010100110011
Then, six values are selected in order, and then two zeros are added before the six binary numbers to form a new byte. Then select 6 more, add 0, and so on until all 24 binary numbers are selected.
Let's take a look at the actual results:

String "3"
11010101 HEX: D5 11000101 HEX: C5 00110011 HEX: 33

00110101 00011100 00010100 00110011
Character '5' ^/'character' ^ t' character '3'
Decimal 53 decimal 34 decimal 20 decimal 51
Table 2

In this case, is the string "3" represented as "5 ^/^ T3" by base64 ?. Error!
Base64 encoding is not simply based on the converted content. The character '^/' is a control character and cannot be displayed on a computer. It cannot be used in some cases. Base64 has its own encoding table:

Table 1: The base64 alphabet
Value encoding value Encoding
0 A 17 R 34 I 51 Z
1 B 18 S 35 J 52 0
2 C 19 t 36 K 53 1
3 D 20 u 37 L 54 2
4 E 21 V 38 m 55 3
5 F 22 W 39 n 56 4
6g 23x40 o 57 5
7 H 24 y 41 P 58 6
8 I 25 Z 42 Q 59 7
9 J 26 A 43 R 60 8
10 K 27 B 44 s 61 9
11 l 28 C 45 t 62 +
12 m 29 D 46 U 63/
13 N 30 E 47 V (PAD) =
14 O 31 F 48 W
15 p 32G 49 x
16 Q 33 H 50 y
Table 3

This is also the origin of the base64 name, And the base64 encoding result is not changed to the data because the encoding is 0 for the higher two digits and 6 for the lower two, but to the form shown in the table above, for example, "A" has seven digits, while "A" has only six digits. In the table, the encoded number corresponds to the decimal value of the new byte. Therefore, we can obtain the corresponding base64 encoding from table 2:

String "3"
11010101 HEX: D5 11000101 HEX: C5 00110011 HEX: 33

00110101 00011100 00010100 00110011
Character '5' ^/'character' ^ t' character '3'
Decimal 53 decimal 34 decimal 20 decimal 51
Character '1' character 'I' character 'U' character 'Z'
Table 4

In this way, the string "3" is encoded into the string "1iuz.
Base64 converts three bytes into four bytes. Therefore, the amount of code after encoding (in bytes, the same below) is about 1/3 more than the amount of code Before encoding. The reason is "about" is that if the code size is exactly three integer times, it is naturally 1/3 more. But what if not?
Careful people may have noticed that the last character in the base64 alphabet has a (PAD) = character. This character is used to solve this problem.
When the code volume is not an integer multiple of 3, the remainder of the Code volume/3 is 2 or 1. During conversion, if the result is less than six digits, 0 is used to fill in the corresponding position, and then two zeros are added before the six digits. After the empty output result is converted, "=" is used to fill the bits. For example, if the remaining result is two bytes of "sheets ":

String "Zhang"
11010101 HEX: D5 11000101 HEX: C5

00110101 00011100 00010100
Decimal 53 decimal 34 decimal 20 pad
Character '1' character 'I' character 'U' character '='
Table 6

In this way, the last two bytes are sorted into "1iu = ".
Similarly, if the original code has only one byte left, two "=" will be added ". Only in these two cases, the base64 encoding can end with two "=" at most"
Decoding base64 is just a simple inverse process of encoding. You can discuss it yourself. I will give the decoding algorithm at the end of the article.

For more information, see rfc2045, rfc2046, and "Fantastic base64 encoding". Luo Cong and other materials from the Internet

Algorithm Implementation (independently implemented)

# Include <iostream. h>
# Include <string. h>
Char encodetab [] = "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz0123456789 +/= ";
Char decodetab [124] = {0 };

//------------------------------------------------------------------------------
Void Init ()
{
Int I = 0;
Int J = 0;
For (I = 65; I <= 90; I ++)
Decodetab [I] = J ++;
For (I = 97; I <= 122; I ++)
Decodetab [I] = J ++;
For (I = 48; I <= 57; I ++)
Decodetab [I] = J ++;
Decodetab [43] = J ++;
Decodetab [47] = J ++;
Decodetab [61] = J ++;
}

//------------------------------------------------------------------------------
Char * encode (const void * buff, int buffsize)
{
Const unsigned char * A = (const unsigned char *) Buff;
Int Alen = buffsize;
Int blen = Alen/3*4;
If (Alen % 3)
Blen + = 4;
Char * B = new char [blen + 1];
Memset (B, '/0', blen + 1 );
Int J = 0;
//-----------------------------------
For (INT I = 0; I <= alen-3; I + = 3)
{
/*
// This code segment is more efficient, but due to compiler optimization, you need to adjust the compiler options in VC to run
B [J] = A [I]> 2;
B [J + 1] = (a [I] <6)> 2) | (A [I + 1]> 4 );
B [J + 2] = (a [I + 1] <4)> 2) | (A [I + 2]> 6 );
B [J + 3] = (a [I + 2] <2)> 2;
*/
B [J] = A [I]> 2;
B [J + 1] = (a [I] & 0x3) <4) | (A [I + 1]> 4 );
B [J + 2] = (a [I + 1] & 0xf) <2) | (A [I + 2]> 6 );
B [J + 3] = A [I + 2] & 0x3f;
J + = 4;
}

Int REM = Alen % 3;
If (REM = 1)
{
B [J] = A [I]> 2;
B [J + 1] = (a [I] & 0x3) <4 );
B [J + 2] = 64;
B [J + 3] = 64;
}
Else if (REM = 2)
{
B [J] = A [I]> 2;
B [J + 1] = (a [I] & 0x3) <4) | (A [I + 1]> 4 );
B [J + 2] = (a [I + 1] & 0xf) <2 );
B [J + 3] = 64;
}
 
//-----------------------------------
For (j = 0; j <blen; j ++)
{
B [J] = encodetab [B [J];
}
Return B;
}

//------------------------------------------------------------------------------
Void * decode (const char * s, Int & bufflen)
{
Int chrlen = strlen (s );
If (chrlen <= 0 | chrlen % 4)
Return NULL;
Bufflen = chrlen/4*3;
Char * CHR = new char [chrlen + 1];
Unsigned char * buff = new unsigned char [bufflen];

Int I = 0;
Int J = 0;
For (I = 0; I <chrlen; I ++)
CHR [I] = decodetab [s [I];

For (I = 0; I <chrlen; I + = 4)
{
Buff [J] = (CHR [I] <2) | (CHR [I + 1]> 4 );
Buff [J + 1] = (CHR [I + 1] <4) | (CHR [I + 2]> 2 );
Buff [J + 2] = (CHR [I + 2] <6) | CHR [I + 3];
J + = 3;
}

If (s [chrlen-1] = ')
Bufflen --;
If (s [chrlen-2] = ')
Bufflen --;
Return Buff;

}

//------------------------------------------------------------------------------
Int main ()
{
Init ();
Char * c = "This code segment is more efficient, but due to compiler optimization, you must adjust the compiler options in VC to run the code ";
Char * D = encode (C, strlen (c ));
Cout <D <Endl;
 
 
Int Len = 0;
Char * org = (char *) decode (D, Len );
For (INT I = 0; I <Len; I ++)
Cout <* (Org + I );
Cout <Endl;

Return 0;
}

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.