International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > Python

Tutorial on using the Base64 module to handle character encoding in Python

Last Update:2016-06-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Base64 is a method that uses 64 characters to represent arbitrary binary data.

When we open exe, JPG, PDF files with Notepad, we will see a lot of garbled characters, because the binary file contains many character that cannot be displayed and printed, so if you want the text processing software such as Notepad to handle binary data, a binary to string conversion method is required. Base64 is one of the most common binary encoding methods.

The Base64 principle is simple, first, to prepare a 64-character array:


[' A ', ' B ', ' C ', ... ' A ', ' B ', ' C ', ... ' 0 ', ' 1 ', ... '+', '/']

Then, the binary data processing, every 3 bytes a group, is a total of 3x8=24bit, divided into 4 groups, each group of exactly 6 bit:

So we get 4 numbers as index, then look up the table, get the corresponding 4 characters, is the encoded string.

Therefore, the BASE64 encoding will encode 3 bytes of binary data into 4 bytes of text data, the length of 33%, the advantage is that the encoded text data can be displayed directly in the message body, Web pages and so on.

What if the binary data to be encoded is not a multiple of 3 and the last 1 or 2 bytes are left? Base64 with \x00 bytes at the end of the top, and then at the end of the code to add 1 or 2 = number, indicating how many bytes were filled, decoding, will be automatically removed.

Python's built-in base64 can be encoded directly into the base64:



>>> import base64
>>> base64.b64encode('binary\x00string')
'YmluYXJ5AHN0cmluZw=='
>>> base64.b64decode('YmluYXJ5AHN0cmluZw==')
'binary\x00string'

Since the standard BASE64 encoding may appear after the character + and/, in the URL can not be directly as parameters, so there is a "url safe" base64 encoding, in fact, the character + and/respectively into-and _:



>>> base64.b64encode('i\xb7\x1d\xfb\xef\xff')
'abcd++//'
>>> base64.urlsafe_b64encode('i\xb7\x1d\xfb\xef\xff')
'abcd--__'
>>> base64.urlsafe_b64decode('abcd--__')
'i\xb7\x1d\xfb\xef\xff'

You can also define the order of 64 characters yourself, so that you can customize the BASE64 encoding, but it is generally not necessary at all.

Base64 is a method of encoding by looking up a table and cannot be used for encryption, even if a custom encoding table is used.

BASE64 is suitable for encoding small pieces of content, such as digital certificate signatures, cookie content, and so on.

Because the = character may also appear in the Base64 encoding, but = used in the URL, the cookie will cause ambiguity, so a lot of Base64 encoding will be removed:


# standard BASE64: ' ABCD ', ' ywjjza== ' # automatically removed =: ' ABCD ', ' Ywjjza '

How to decode after removing =? Because Base64 is to change 3 bytes to 4 bytes, the length of the BASE64 encoding is always a multiple of 4, so you need to add = To change the length of the Base64 string to a multiple of 4, you can decode it normally.

Write a base64 decoding function that can handle removing =:



>>> base64.b64decode('YWJjZA==')
'abcd'
>>> base64.b64decode('YWJjZA')
Traceback (most recent call last):
 ...
TypeError: Incorrect padding
>>> safe_b64decode('YWJjZA')
'abcd'

Summary

Base64 is an arbitrary binary-to-text string encoding method that is commonly used to transmit small amounts of binary data in URLs, cookies, and Web pages.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Tutorial on using the Base64 module to handle character encoding in Python

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support