1. The origin of Base64
Base64 was first used to troubleshoot e-mail transmission issues. Because of "historical issues," earlier e-mail gateways allow only ASCII (binary 00000000-01111111) characters to be transmitted, and if non-ASCII characters pass through such gateways, the bits of the characters may be tampered with (such as changing 10000001 to 00000001). The BASE64 encoding is generated to guarantee the transmission of non-ASCII characters.
2. Principle
Base64, as the name implies, is a 64-character encoding algorithm.
The following is the character mapping table for BASE64, see RFC 2045 for details
Value |
Encoding |
Value |
Encoding |
Value |
Encoding |
Value |
Encoding |
0 |
A |
17 |
R |
34 |
I |
51 |
Z |
1 |
B |
18 |
S |
35 |
J |
52 |
0 |
2 |
C |
19 |
T |
36 |
K |
53 |
1 |
3 |
D |
20 |
U |
37 |
L |
54 |
2 |
4 |
E |
21st |
V |
38 |
M |
55 |
3 |
5 |
F |
22 |
W |
39 |
N |
56 |
4 |
6 |
G |
23 |
X |
40 |
O |
57 |
5 |
7 |
H |
24 |
Y |
41 |
P |
58 |
6 |
8 |
I |
25 |
Z |
42 |
Q |
59 |
7 |
9 |
J |
26 |
A |
43 |
R |
60 |
8 |
10 |
K |
27 |
B |
44 |
S |
61 |
9 |
11 |
L |
28 |
C |
45 |
T |
62 |
+ |
12 |
M |
29 |
D |
46 |
U |
63 |
/ |
13 |
N |
30 |
E |
47 |
V |
14 |
O |
31 |
F |
48 |
W |
(PAD) |
= |
15 |
P |
32 |
G |
49 |
X |
16 |
Q |
33 |
H |
50 |
Y |
Where value is a 10 binary value, encoding is the encoded character corresponding to it
Because Base64 has 64 characters, it takes only 6 bits to represent a base64 character (*1<<6=64*). In addition, because the computer basic processing unit is byte, so character encoding is in bytes to encode characters, for example, the character ' a ' ASCII encoding for01000001, kanji ' you ' UTF-8 encoded as11100100 10111101 10100000, GBK encoded as11000100 11100011. Thus, the BASE64 encoding steps are as follows:
Step1 encodes a string to be encoded in some encoding format, such as UTF-8, to obtain a byte array
Step2 grouping byte arrays into groups of three bytes (24 bit)
STEP3 divides each packet by 6 bit (less than 6 bits of the low 0), resulting in N (*2<=n<=4*) 6-bit binary codes (hereinafter referred to as the BASE64 encoding unit).
STEP4 converts each BASE64 encoding unit to 10 binary values, and the corresponding BASE64 encoded characters are obtained according to the table above, and the grouping of less than 4 encoding units is populated with =.
BASE64 Encoding Process Example |
- |
Raw string |
I X |
UTF-8 encoding, grouping |
0xe6 0x88 0x91; 0x78 |
Binary representation |
11100110, 10001000, 10010001; 01111000 |
Dividing the coding unit |
111001,101000,100010,010001; 011110,0000000(supplementary position) |
Value |
57,40,34,17; 30,0 |
Correspondence encoding |
5,o,i,r; E,a, =,=(filler) |
Final result |
5oirea== |
3. Use
1) Save binary data, such as key.
2) Use the HTTP protocol to transmit binary data in the URL.
Replace characters that do not conform to the URL in the BASE64 encoding with other valid characters and get UrlBase64 encoded by removing the carriage return line. With URLBASE64 encoding, binary parameters can be placed directly in the URL.
These functions can also be implemented in hex code, but the result of Base64 encoding is shorter than the hex result.
4. Implement
The classes in the JDKsun.misc.BASE64Encoderand thesun.misc.BASE64DecoderBase64 codec implementations are provided, but the package that begins with sun in the JDK is the internal implementation of sun, and the code under the package does not guarantee compatibility with other Jave platforms (see why developers should not Write Programs that call ' Sun ' Packages), so you need to use a third-party implementation, or you can copy the two classes into your own code.
The Maven dependencies used in the author's test environment are as follows:
<!--bouncycastle depends on -->
<dependency>
<groupId>org.bouncycastle</groupId>
<artifactId>bcprov-jdk16</artifactId>
<version>1.46</version>
</dependency>
<!--commons-codec-->
<dependency>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
<version>1.10</version>
</dependency>
4.1 Bouncycastle Implementation
The packageorg.bouncycastle.util.encodersprovides a tool class base64,urlbase64 with Hex for base64,urlbase64 and hex encoding/decoding operations.
Final String charset="UTF-8";
String input="Base64 encoding / decoding";
//coding
String encoded=new String(Base64.encode(input.getBytes(charset)),"ASCII");
String urlSafeEncoded=new String(UrlBase64.encode(input.getBytes(charset)),"ASCII");
//decoding
Assert.assertEquals(input, new String(Base64.decode(encoded),charset));
Assert.assertEquals(input, new String(UrlBase64.decode(urlSafeEncoded),charset));
4.2 Commons-codec Implementation
The tool classorg.apache.commons.codec.binary.Base64provides encoding and decoding methods for Base64 and UrlBase64.
Final String charset="UTF-8";
String input="Base64 encoding / decoding";
String encoded= Base64.encodeBase64String(input.getBytes(charset));
String urlSafeEncoded=Base64.encodeBase64URLSafeString(input.getBytes(charset));
Assert.assertEquals(input, new String(Base64.decodeBase64(encoded), charset));
Assert.assertEquals(input, new String(Base64.decodeBase64(urlSafeEncoded), charset));
The difference between 4.3 bouncycastle and Commons-codec
Standard BASE64 encoded string every 76 characters identifier a line (called a chunked), line end to add a carriage return line break "\ r \ n". But Bouncycastle does not, and COMMONS-CODEC supports the standard implementation, but it is not used by default.
++++++++++++ |
Bouncycastle |
Commons-codec |
Standard (chunked) encoding |
Not supported |
Base64.encodeBase64ChunkedOrBase64.encodeBase64(binaryData, true) |
UrlBase64 |
No carriage return, ' + ' becomes '-', '/' becomes ' _ ', ' = ' becomes '. ' |
Same as Bouncycastle, but with the padding removed |
Java encryption and decryption (ii) BASE64 encoding