Coding and decoding of Python

Source: Internet
Author: User

Coding Introduction:

1. ASCII: English, special characters, numerals, 8bit, 1byte
2. GBK: Chinese 16bit, 2byte. Compatible with ASCII
3. Unicode: Universal code, 32bit 4byte. Compatible with ASCII
4. UTF-8: variable-length Unicode. English: 8bit, Europe: 16bit, Chinese: 24bit 3byte

Python2 can only be used in ASCII
Python3 has Unicode, the default encoding is Unicode
In memory, Unicode is used. Hard drives and network transmissions are utf-8 or GBK.








Python3 's Encode () and Decode ()
In the memory of Python3. During the program run phase. The Unicode encoding is used.
Because Unicode is a universal code. Any content can be displayed. The space and resources are wasted in data transfer and storage due to the Unicode comparison.
You need to dump Unicode into UTF-8 or GBK for storage. How to convert it.
In Python, you can encode the word information. The encoded content can then be transferred.
The data after encoding is data of type bytes. In fact, the original data is only encoded after the expression has changed.

bytes of the form of expression
1. English b ' nihao ' English expressions and strings are no different
2. Chinese B ' \xc4\xe3\xba\xc3 ' This is a man ' hello ' of Utf-8 's bytes expression form

Strings are converted to Bytes->encode (character sets) when they are transmitted.
The result after encoding is the same as the source string.
The results after the Chinese encoding are different according to the encoding. The encoding results are also different.
We know. A Chinese UTF-8 encoding is 3 bytes. The Chinese encoding of a GBK is 2 bytes.
The type after encoding is the bytes type. We python is the bytes to save and store when the network is transmitted and stored
Type. So when the other person receives it. is also the type of data received by the bytes.
We can use Decode () to decode the operation. , restore the bytes type of data back to our familiar string.

# Example: encoding and decoding need to be encoded in the format.
#
# s = "I am the word"
# bs = S.encode ("GBK") # we can get to GBK's word
# print (BS) #编码的结果 GBK code: B ' \xce\xd2\xca\xc7\xce\xc4\xd7\xd6 '
# How to convert GBK into UTF-8
# First, convert the GBK to Unicode. which requires decoding.
BS = B ' \xce\xd2\xca\xc7\xce\xc4\xd7\xd6 '
#先将GBK码编码成unicode码
s = Bs.decode ("GBK")
Print (s) #我是文字
# then need to re-encode into UTF-8
BSS = S.encode ("UTF-8") # Recode
Print (BSS) # UTF-8 B ' \xe6\x88\x91\xe6\x98\xaf\xe6\x96\x87\xe5\xad\x97 '

Coding and decoding of Python

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.