Coding Introduction:
1. ASCII: English, special characters, numerals, 8bit, 1byte
2. GBK: Chinese 16bit, 2byte. Compatible with ASCII
3. Unicode: Universal code, 32bit 4byte. Compatible with ASCII
4. UTF-8: variable-length Unicode. English: 8bit, Europe: 16bit, Chinese: 24bit 3byte
Python2 can only be used in ASCII
Python3 has Unicode, the default encoding is Unicode
In memory, Unicode is used. Hard drives and network transmissions are utf-8 or GBK.
Python3 's Encode () and Decode ()
In the memory of Python3. During the program run phase. The Unicode encoding is used.
Because Unicode is a universal code. Any content can be displayed. The space and resources are wasted in data transfer and storage due to the Unicode comparison.
You need to dump Unicode into UTF-8 or GBK for storage. How to convert it.
In Python, you can encode the word information. The encoded content can then be transferred.
The data after encoding is data of type bytes. In fact, the original data is only encoded after the expression has changed.
bytes of the form of expression
1. English b ' nihao ' English expressions and strings are no different
2. Chinese B ' \xc4\xe3\xba\xc3 ' This is a man ' hello ' of Utf-8 's bytes expression form
Strings are converted to Bytes->encode (character sets) when they are transmitted.
The result after encoding is the same as the source string.
The results after the Chinese encoding are different according to the encoding. The encoding results are also different.
We know. A Chinese UTF-8 encoding is 3 bytes. The Chinese encoding of a GBK is 2 bytes.
The type after encoding is the bytes type. We python is the bytes to save and store when the network is transmitted and stored
Type. So when the other person receives it. is also the type of data received by the bytes.
We can use Decode () to decode the operation. , restore the bytes type of data back to our familiar string.
# Example: encoding and decoding need to be encoded in the format.
#
# s = "I am the word"
# bs = S.encode ("GBK") # we can get to GBK's word
# print (BS) #编码的结果 GBK code: B ' \xce\xd2\xca\xc7\xce\xc4\xd7\xd6 '
# How to convert GBK into UTF-8
# First, convert the GBK to Unicode. which requires decoding.
BS = B ' \xce\xd2\xca\xc7\xce\xc4\xd7\xd6 '
#先将GBK码编码成unicode码
s = Bs.decode ("GBK")
Print (s) #我是文字
# then need to re-encode into UTF-8
BSS = S.encode ("UTF-8") # Recode
Print (BSS) # UTF-8 B ' \xe6\x88\x91\xe6\x98\xaf\xe6\x96\x87\xe5\xad\x97 '
Coding and decoding of Python