Python 2.x Chinese garbled characters,
Garbled characters in Python are a headache.
In Python3, the Chinese language is fully supported. However, in Python2.x, you need to set the Chinese language. Otherwise, garbled characters will appear.
【Cause]
In Python2.x, it is mainly about character encoding. Otherwise, garbled characters may occur. Python uses ASCII encoding by default. Letters, punctuation marks, and other characters are represented in only one byte, but one byte cannot meet the requirements of Chinese characters.
>>> import sys>>> sys.getdefaultencoding()'ascii'
To represent all Chinese characters in a computer, the Chinese encoding uses two bytes. If the Chinese encoding and ASCII are used in combination, decoding errors may occur, leading to garbled characters. The default encoding method in CMD is GBK, which causes the above garbled characters!
The two-byte Chinese encoding standards are: GB2312, GBK, and BIG5.
【Solution]
In order to include different languages in a unified character set to meet international information exchange requirements, the UNICODE character set has been developed internationally, including all the characters in the world.Unique EncodingThe UNICODE character set can be used for cross-language text processing to avoid garbled characters.
I)Interactive commands: Generally, no garbled code is required.
Ii) In The py script file: The cross-character set must be set; otherwise, garbled characters are required.
- Add the following In the first sentence:
# Coding = UTF-8 # Or # coding = UTF-8 # Or #-*-coding: UTF-8 -*-
- Second, you need to save the file as the UTF-8 format!
The above sentence only tells the Python compiler that the script contains non-ASCII characters without conversion.
If you want to change the character encoding from the default ascii to the UTF-8, you need to choose Save As UTF-8 format when saving.
If you open it with NODEPAD, [Save As] --> UTF-8
If you use IDLE to open it, choose Options> Configure IDLE> General]
The above settings can ensure IDLE, run F5, and output Chinese characters normally.
【Encoding and decoding]
Added #-*-coding: UTF-8-*-at the beginning and saved the file in UTF-8 format, still cannot ensure that the normal output of Chinese,
Different editors, such as VIM, IDLE, and Eclipse, use different output codes.
Therefore, Chinese characters can be normally output in one place, but not necessarily in another place. Therefore, encoding and decoding settings are required!
Encode:Encoding
Decode:Decoding
The encoding and decoding objects must be the same. For example, UTF-8 encoding, must be decoded with a UTF-8.
Therefore, the final solution must be decoded in the original mode and re-encoded in the console format. For example, CMD uses the GBK mode by default.
You must use the following method:
Correct output:
【Other Instructions]
1.In Python3, the support for Chinese is very comprehensive, the source file is saved as the UTF-8 encoding by default, so that not only can use Chinese in the source code, but also the variable name can also use Chinese, for example:
>>> China = 'China' >>> print (China) Chinese
2.In Python3, no back-and-forth encoding/decoding is required, and the string object does not have the decode and encode methods.