Coding and decoding of Python

Last Update:2018-07-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Coding Introduction:

1. ASCII: English, special characters, numerals, 8bit, 1byte
2. GBK: Chinese 16bit, 2byte. Compatible with ASCII
3. Unicode: Universal code, 32bit 4byte. Compatible with ASCII
4. UTF-8: variable-length Unicode. English: 8bit, Europe: 16bit, Chinese: 24bit 3byte

Python2 can only be used in ASCII
Python3 has Unicode, the default encoding is Unicode
In memory, Unicode is used. Hard drives and network transmissions are utf-8 or GBK.

Python3 's Encode () and Decode ()
In the memory of Python3. During the program run phase. The Unicode encoding is used.
Because Unicode is a universal code. Any content can be displayed. The space and resources are wasted in data transfer and storage due to the Unicode comparison.
You need to dump Unicode into UTF-8 or GBK for storage. How to convert it.
In Python, you can encode the word information. The encoded content can then be transferred.
The data after encoding is data of type bytes. In fact, the original data is only encoded after the expression has changed.

bytes of the form of expression
1. English b ' nihao ' English expressions and strings are no different
2. Chinese B ' \xc4\xe3\xba\xc3 ' This is a man ' hello ' of Utf-8 's bytes expression form

Strings are converted to Bytes->encode (character sets) when they are transmitted.
The result after encoding is the same as the source string.
The results after the Chinese encoding are different according to the encoding. The encoding results are also different.
We know. A Chinese UTF-8 encoding is 3 bytes. The Chinese encoding of a GBK is 2 bytes.
The type after encoding is the bytes type. We python is the bytes to save and store when the network is transmitted and stored
Type. So when the other person receives it. is also the type of data received by the bytes.
We can use Decode () to decode the operation. , restore the bytes type of data back to our familiar string.

# Example: encoding and decoding need to be encoded in the format.
#
# s = "I am the word"
# bs = S.encode ("GBK") # we can get to GBK's word
# print (BS) #编码的结果 GBK code: B ' \xce\xd2\xca\xc7\xce\xc4\xd7\xd6 '
# How to convert GBK into UTF-8
# First, convert the GBK to Unicode. which requires decoding.
BS = B ' \xce\xd2\xca\xc7\xce\xc4\xd7\xd6 '
#先将GBK码编码成unicode码
s = Bs.decode ("GBK")
Print (s) #我是文字
# then need to re-encode into UTF-8
BSS = S.encode ("UTF-8") # Recode
Print (BSS) # UTF-8 B ' \xe6\x88\x91\xe6\x98\xaf\xe6\x96\x87\xe5\xad\x97 '

Coding and decoding of Python

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More