Excerpt from: Https://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/ 001431954588961d6b6f51000ca4279a3415ce14ed9d709000
Base64 is a method that represents any binary data in 64 characters .
When we open exejpgpdf These files with Notepad, we will see a lot of garbled characters, because the binaries contain many character that can't be displayed and printed, so if you want text processing software like Notepad to handle binary data, Requires a binary-to-string conversion method. Base64 is one of the most common binary encoding methods.
The Base64 principle is simple, first, to prepare a 64-character array:
[' A ', ' B ', ' C ', ... ' A ', ' B ', ' C ', ... ' 0 ', ' 1 ', ... ' + ', '/']
Then, the binary data processing, every 3 bytes a group, is a3x8=24bit, divided into 4 groups, each group of exactly 6 bit:
So we get 4 numbers as index, then look up the table, get the corresponding 4 characters, is the encoded string.
Therefore, the BASE64 encoding will encode 3 bytes of binary data into 4 bytes of text data, the length of 33%, the advantage is that the encoded text data can be displayed directly in the message body, Web pages and so on .
What if the binary data to be encoded is not a multiple of 3 and the last 1 or 2 bytes are left? Base64 with the byte at the end of the top\x00, and then add 1 or 2 at the end of the code=, indicating how many bytes, decoding the time, will be automatically removed.
Python's built-inbase64codec that can be base64 directly:
>>> import base64 >>> base64.b64encode(b‘binary\x00string‘)
b‘YmluYXJ5AHN0cmluZw==‘ >>> base64.b64decode(b‘YmluYXJ5AHN0cmluZw==‘)
b‘binary\x00string‘
Since the standard BASE64 encoding may appear after the character +and/ , in the URL can not be directly as a parameter , so there is a "url safe" base64 encoding, is actually the character+and the/distinction-becomes_and :
>>> base64.b64encode(b‘i\xb7\x1d\xfb\xef\xff‘)
b‘abcd++//‘ >>> base64.urlsafe_b64encode(b‘i\xb7\x1d\xfb\xef\xff‘)
b‘abcd--__‘ >>> base64.urlsafe_b64decode(‘abcd--__‘)
b‘i\xb7\x1d\xfb\xef\xff‘
You can also define the order of 64 characters yourself, so that you can customize the BASE64 encoding, but it is generally not necessary at all.
Base64 is a method of encoding by looking up a table and cannot be used for encryption, even if a custom encoding table is used .
BASE64 is suitable for encoding small pieces of content, such as digital certificate signatures, cookie content, and so on.
Since the=characters may also appear in the Base64 encoding, but=used in the URL, the cookie will cause ambiguity, so a lot of Base64 code will be=removed:
# Standard Base64:
‘Abcd’-> ‘YWJjZA ==‘
# Automatically remove =:
‘Abcd’-> ‘YWJjZA’
=How do you decode it after you remove it? Because Base64 is to change 3 bytes to 4 bytes, the length of theBase64 encoding is always a multiple of 4 , so you need to add the length of the Base64 string to a multiple of 4 to=decode it normally.
Summary
Base64 is an arbitrary binary-to-text string encoding method used to transmit small amounts of binary data in URLs, cookies, and Web pages .
Python Learning Notes (34)-built-in modules (3) Base64