This article mainly introduces the use of Base64 module in Python to process character encoding tutorial, sample code based on the python2.x version, the need for friends can refer to the
Base64 is a method that uses 64 characters to represent arbitrary binary data.
Open exe with Notepad, JPG, pdf These files, we will see a lot of garbled, because the binary file contains many characters can not be displayed and printed, so if you want Notepad such text processing software can handle binary data, you need a binary to string conversion method. Base64 is one of the most common binary coding methods.
The principle of Base64 is simple, first, to prepare an array containing 64 characters:
?
1 |
[' A ', ' B ', ' C ', ... ' A ', ' B ', ' C ', ... ' 0 ', ' 1 ', ... '+', '/'] |
Then, the binary data is processed, each 3 byte group, altogether is 3x8=24bit, is divided into 4 groups, each group is just 6 bit:
So we get 4 numbers as the index, and then look up the table and get the corresponding 4 characters, which is the encoded string.
Therefore, the BASE64 encoding will encode 3 bytes of binary data into 4 bytes of text data, the length of 33% increase, the advantage is that the encoded text data can be in the message body, Web page and so on directly display.
What if the binary data you want to encode is not a multiple of 3, and then there are 1 or 2 bytes left? Base64 with x00 byte at the end of the complement, and then at the end of the code plus 1 or 2 = number, indicating how many bytes, decoding, will automatically remove.
Python's built-in base64 can be encoded directly into Base64:
?
1 2 3 4 5 |
>>> Import base64 >>> base64.b64encode (' binaryx00string ') ' ymluyxj5ahn0cmluzw== ' >>> Base64.b64decode (' ymluyxj5ahn0cmluzw== ') ' binaryx00string ' |
Because the standard Base64 code may appear characters + and/, in the URL can not be directly as a parameter, so there is a "url safe" base64 encoding, in fact, is the character + and/respectively into-and _:
?
1 2 3 4 5 6 |
>>> base64.b64encode (' ixb7x1dxfbxefxff ') ' abcd++//' >>> base64.urlsafe_b64encode (' Ixb7x1dxfbxefxff ') ' abcd--__ ' >>> base64.urlsafe_b64decode (' abcd--__ ') ' Ixb7x1dxfbxefxff ' |
You can also define the order of 64 characters by yourself so that you can customize the BASE64 encoding, but it is usually completely unnecessary.
Base64 is a coding method that uses a look-up table and cannot be used for encryption, even if you use a custom coded table.
Base64 is useful for coding small pieces of content, such as digital certificate signatures, cookie content, and so on.
because = character may also appear in the Base64 encoding, but = use in the URL, the cookie inside can cause ambiguity, so, many Base64 code will take = Remove:
?
1 2 3 4 |
# standard BASE64: ' ABCD '-> ' ywjjza== ' # automatically removed =: ' ABCD '-> ' Ywjjza ' |
remove = after how to decode it? Because the Base64 is 3 bytes into 4 bytes, so the length of the BASE64 encoding is always a multiple of 4, so you need to add = Base64 string length into 4 multiples, you can decode the normal.
Please write a base64 decoding function that can handle minus =:
?
1 2 3 4 5 6 7 8 |
>>> base64.b64decode (' ywjjza== ') ' ABCD ' >>> base64.b64decode (' Ywjjza ') traceback (most recent call Last):. Typeerror:incorrect padding >>> safe_b64decode (' Ywjjza ') ' ABCD ' |
Summary
Base64 is an arbitrary binary to text string encoding method that is often used to transmit a small amount of binary data in a URL, Cookie, or Web page.