Python Basics Coding issues in this section
- The origin of the string encoding problem
- String Encoding Solution
1. String encoding problem origin
Since the string encoding evolved from the ASCII--->unicode--->utf-8 (utf-16 and utf-32, etc.), and similar to China's GBK encoding, these encodings are incompatible with each other, So the written software implementation of the cross-language platform to run will appear characters garbled problem ...
The information is as follows:
- In the Python2 default encoding is ASCII, python3 default is utf-8 (file encoding default is Utf-8, string encoding is Unicode by default)
- Unicode is divided into utf-32 (accounting for 4 bytes), utf-16 (two bytes), Utf-8 (1-4 bytes), so Utf-8 is Unicode
- In the Py3 encode, while transcoding will also change the string to bytes type, decode decoding will also turn bytes back to string
2. String Encoding Solution
First, it needs to be understood that Unicode encoding is compatible with all encoding formats, and Unicode encoding acts as an intermediate bridge between various encoding transformations, and if ASCII encoding is to be converted to GBK encoding, it must first be decoded, converted to Unicode encoding, And then re-encoded into GBK encoding to complete the process. The process of converting from other encodings to Unicode encoding is called decoding (decode), and the process of converting from Unicode encoding to other encodings is called encoding (encode). PS:UTF-8 encoding is not compatible with GBK encoding by default and needs to be converted to Unicode encoding to be compatible with GBK encoding.
The encoding and decoding methods can be referred to as follows:
The coding problem involves the following aspects:
- Encoding format of the file
- Encoding format for strings
- Terminal encoding format for output string
The encoding format of the file and the encoding format of the string are consistent with the encoding format of the terminal in order to properly output the desired string.
There are two functions for transcoding in Python, the Encode () encoding function, and the decode () decoding function. Where the Encode function needs to fill in the source encoding format of the string, the Decode function needs to fill in the string format to be encoded. The test code is as follows, the original encoding format is the UTF-8 format string:
1s="Tesla"2S_to_unicode=s.decode ("Utf-8")#decoding into Unicode encoding format3 Print(s)4 Print(S_to_unicode)5Unicode_to_gbk=s_to_unicode.encode ("GBK")#encode into GBK encoded format6 Print(UNICODE_TO_GBK)7Gbk_to_unicode=unicode_to_gbk.decode ("GBK")#decoding into Unicode encoding format8 Print(Gbk_to_unicode)9Unicode_to_utf8=gbk_to_unicode.encode ("Utf-8")#encode into UTF-8 encoded formatTen Print(Unicode_to_utf8)
The coding problem of Python basics