How to Use the base64 module to process character encoding in Python

Source: Internet
Author: User

How to Use the base64 module to process character encoding in Python

This article describes how to use the base64 module to process character encoding in Python. The sample code is based on Python 2.x. For more information, see

Base64 is a method that uses 64 characters to represent any binary data.

When opening files such as exe, jpg, and pdf in notepad, we will see a lot of garbled characters, because the binary file contains many characters that cannot be displayed or printed, to enable text processing software such as NotePad to process binary data, a binary to String Conversion Method is required. Base64 is the most common binary encoding method.

The Base64 principle is very simple. First, prepare an array containing 64 characters:

?

1

['A', 'B', 'C ',... 'A', 'B', 'C ',... '0', '1 ',... '+', '/']

Then, the binary data is processed. Each 3 bytes is a group of 3x8 = 24bit, which is divided into 4 groups. Each group has exactly 6 bits:

In this way, we get four numbers as indexes, and then look up the table to obtain the corresponding 4 characters, which are encoded strings.

Therefore, Base64 encoding encodes the Three-byte binary data into 4-byte text data, increasing the length by 33%. The advantage is that the encoded text data can be directly displayed in the mail body, webpage, and so on.

If the binary data to be encoded is not a multiple of 3, what if one or two bytes are left at the end? After Base64 is supplemented with \ x00 bytes at the end, one or two equal signs are added at the end of the encoding to indicate how many bytes are supplemented. This will be automatically removed during decoding.

Python built-in base64 can be directly used for base64 encoding/decoding:

?

1

2

3

4

5

>>> Import base64

>>> Base64.b64encode ('binary \ x00string ')

'Ymluyxj5ahn0cmluzw ='

>>> Base64.b64decode ('ymluyxj5ahn0cmluzw = ')

'Binary \ x00string'

Because the character + and/may appear after the standard Base64 encoding, it cannot be directly used as a parameter in the URL, so there is another "url safe" base64 encoding, in fact, it is to convert the character + and/into-and _:

?

1

2

3

4

5

6

>>> Base64.b64encode ('I \ xb7 \ x1d \ xfb \ xef \ xff ')

'Abcd ++ //'

>>> Base64.urlsafe _ b64encode ('I \ xb7 \ x1d \ xfb \ xef \ xff ')

'Abcd --__'

>>> Base64.urlsafe _ b64decode ('abcd --__')

'I \ xb7 \ x1d \ xfb \ xef \ xff'

You can also customize the order of 64 characters so that you can customize Base64 encoding. However, it is generally unnecessary.

Base64 is a table-based encoding method. It cannot be used for encryption, even if a custom encoding table is used.

Base64 is applicable to the encoding of small content, such as digital certificate signature and Cookie content.

Because the = character may also appear in Base64 encoding, but the = character may cause ambiguity in URLs and cookies, many Base64 encoding will remove the = character:

?

1

2

3

4

# Standard Base64:

'Abcd'-> 'ywjjza ='

# Automatically remove =:

'Abcd'-> 'ywjjza'

How can I decode after removing =? Because Base64 converts three bytes into four bytes, the length of Base64 encoding is always a multiple of four. Therefore, you need to add = to change the length of Base64 string to a multiple of four, you can decode it properly.

Please write a base64 decoding function that can handle the removal of =:

?

1

2

3

4

5

6

7

8

>>> Base64.b64decode ('ywjjza = ')

'Abcd'

>>> Base64.b64decode ('ywjjza ')

Traceback (most recent call last ):

...

TypeError: Incorrect padding

>>> Safe_b64decode ('ywjjza ')

'Abcd'

Summary

Base64 is an arbitrary binary to text string encoding method. It is often used to transmit a small amount of binary data in URLs, cookies, and webpages.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.