[Python module Learning] using the Base64 module for binary data encoding

Source: Internet
Author: User
Tags base64 lowercase uppercase letter
Base64 Module Preface

Yesterday the team's sister came to ask questions about the POP3 protocol, so today a little bit about the format of the POP3 protocol and the poplib in Python. And the POP server back to the data in a part of the need to use the Base64 to decode, so the way to see the next Python inside the Base64 module.

The Base64 module, which provides functions related to encoding and decoding of Base16,base32,base64,base85 and Ascii85, is given first. The contents of the Poplib module will be sent up later. Well, dug another pit, this life dug pit fill not finish ...

The following excerpt is from Http://bbs.chinaunix.net/thread-1150250-1-1.html, detailing why the returned data is BASE64 encoded first:

For historical reasons, some mail systems on the Internet support only 7Bit of Word transfer, the code for the word is 8Bit, and when you send Chinese in an email, if you have these email systems that support only 7Bit characters, you will change the 1 of the eighth bit of the word into 0.
Take the word "Chinese" as an example, hex as A4a4a4e5, when the highest bits are cleared off, they become 24242465, which is "$$ $e". Telnet also has this problem.
In addition to Chinese mail, the use of email to send pictures, programs, compression files, and so on will also happen this problem. So in email, you can use each email encoding to solve this problem, and the 8Bit will be encoded in a certain way, and you'll be able to pass through an email system that supports only 7Bit characters.
Common email codes are UU and mime, while MIME (multipurpose Internet Mail extentions) is generally translated into " Multi-media delivery mode ", in the name of meaning, it is the standard can be sent to the media type of file, can be in a mail to attach various types of files together to send out.
MIME defines two ways of encoding: Base64 and QP (quote-printable), which differ in their use of time, qp the rule is for 7bits of data to be repeated encode, only 8bits data into 7bits. The QP code is suitable for us-ascii text content, such as our Chinese file, and the Base64 code is to encode the entire file into 7bits, which is used when sending binary files to the file. " Depending on the way the code is encoded, it will affect the size of the file after the password. Some of the more lazy lazy software will be used Base64 code. Base64

The Base64 module provides 6 functions for encoding and decoding Base64, which can be divided into three groups.

Base64.b64encode (S, altchars=none)
Base64.b64decode (S, Altchars=none, Validate=false)

The parameter s represents the data that needs to be encoded/decoded. The type of the parameter s of the B64encode must be a byte packet (bytes). The B64decode parameter s can be either a byte packet (bytes) or a string (str).

Since the BASE64 encoded data may contain ' + ' or '/' two symbols, it can cause bugs if the encoded data is used in the URL or the path to the file system. So the Base64 module provides a way to replace the ' + ' and '/' in the encoded data.

The parameter altchars must be a byte packet of length 2, which is used to replace ' + ' and '/' in the encoded data. This parameter defaults to none.

Parameter validate defaults to False. If it is true, the Base64 module checks to see if there are any characters in the Base64 alphabet in S before decoding, and if so, throws an error BINASCII. Error:non-base64 Digit found.

If the length of the data is incorrect, an error binascii is thrown. Error:incorrect padding.

>>> Import base64
>>> x = base64.b64encode (b ' Test ')
>>> x
B ' dgvzda== '
>>> Base64.b64decode (x)
B ' Test '

Base64.standard_b64encode (s)
Base64.standard_b64decode (s)

This set of functions passes the parameter s directly to the previous set of functions.

Base64.urlsafe_b64encode (s)
Base64.urlsafe_b64decode (s)

This set of functions is also based on the first set of functions, but encoding will replace ' + ' and '/' in the output data with '-' and ' _ '. The '-' and ' _ ' are replaced with ' + ' and '/' in the data before decoding.

Alternatively, the BASE64 encoding also produces a symbol ' = ', which is used to populate the data length to multiples of 4. Base32

Base64.b32encode (s)
Base64.b32decode (S, Casefold=false, Map01=none)

The parameter S is consistent with the Base64.

The BASE32 encoded character range is [2-7a-z] and is not supported in lowercase letters. However, when the parameter casefold is true, Base32 can accept lowercase input when decoding. However, for security reasons, this parameter defaults to False.

Base32 decoding also allows the number 0 to be replaced with a capital letter o, replacing the number 1 with an uppercase letter I or L. The parameter map01 can specify which character to replace the number 1 (the source code does not qualify for either the letter I or L), and when this parameter is not none, the number 0 is always replaced with the letter O. Also for security reasons, this parameter defaults to none. Base16

Base64.b16encode (s)
Base64.b16decode (S, casefold=false)

The BASE16-encoded character range is [0-9a-f].

The function of parameter S and Casefold is consistent with Base32. Base85

Base64.b85encode (b, Pad=false)
Base64.b85decode (b)

Parameter b is the data for encoding/decoding, and the type requirement is consistent with the BASE64 parameter S.

When the parameter pad is true, the data is filled in multiples of 4 in length with B ' "before encoding. However, these padding data will not be removed when decoding.

This set of functions is added after Python3.4. Ascii85

Base64.a85encode (b, *, Foldspaces=false, wrapcol=0, Pad=false, Adobe=false)

Parameter b is the data used for encoding, and the type must be bytes.

The argument foldspaces to true uses B ' Y ' to represent 4 consecutive spaces.

Parameter Wrapcol is an integer, and when Wrapcol is 0 o'clock, this integer controls the number of characters in the encoded output by adding a newline character B ' \ n '.

When the parameter pad is true, the data is padded with B ' and ' to a multiple of length 4 before encoding. These padding data are not removed when decoding.

parameter adobe specifies whether the data is in Adobe format. Adobe ASCII85 's encoded data is surrounded by <\~ and \~>, and if this argument is true, the returned data is added to the symbol.

Base64.a85decode (b, *, Foldspaces=false, Adobe=false, ignorechars=b ' \t\n\r\v ')

Parameter b is the data used for encoding, and the type can be bytes or str.

The argument foldspaces to true uses B ' Y ' to represent 4 consecutive spaces.

parameter adobe specifies whether the data is in Adobe format. Adobe ASCII85 's encoded data is surrounded by <\~ and \~>, and if this argument is true, the Base64 will remove the symbol before decoding.

The parameter ignorechars specifies the characters that need to be ignored when decoding. The default contains all the whitespace characters in ASCII.

This set of functions is added after Python3.4.

the official documentation for the Base64 module mentions that Base85 and Ascii85 use 5 characters to encode 4 bytes, while Base64 uses 6 characters to encode 4 bytes (actually 4 characters encoded 3 bytes), which are more efficient than Base64 when space is not sufficient. Old API

Base64 still retains part of the old API for some special purposes.

base64.encode (input, Output)
base64.decode (input, Output)

This set of functions uses a binary file as the data source and writes the encoded/decoded data to the binary file.

Base64.encodebytes (s)
Base64.decodebytes (s)

Encodebytes and B64encode are internally all b2a_base64 of the calling Binascii module, except that the Encodebytes parameter uses the default value True when B2a_base64 invokes newline. In other words, encodebytes will add a newline character B ' \ n ' every 76 bytes when outputting the data.

The decodebytes is basically consistent with the B64decode under the default parameters. Only the check of the parameter type is not the same, Decodebytes only supports bytes type of data.

base64.encodestring (s)
base64.decodestring (s)

This set of functions is discarded after Python3.1, and the previous set of functions is currently invoked directly. Summary

The Base64 module provides an interface for encoding binary data, including standard BASE64,BASE32,BASE16 and fact criteria Ascii85 and Base85. By learning this module, the way to learn about the binary data encoding the various details, feel quite deep. Sometimes we think we know the computer, the Internet, in fact, everyone sees is only bucket, insignificant. This field for me still has a lot of unknown, is waiting to explore, and I will not stop the pace of exploration.

reference materials:


Thank you for the above content provider.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.