The following is an analysis of Python Chinese Garbled text. By creating a file, you can use python in the file to read the file and related code examples. The following article describes a solution to Python Chinese Garbled text, the following is a detailed description of the article. I hope you will get some benefits.
Create a file named test.txt in ANSI format. The file content is: abc Chinese, which is read in python.
- # coding=gbk
- print open("Test.txt").read()
-
Result: abc Chinese changed the file format to UTF-8 which needs to be decoded here:
- # coding=gbk
- import codecs
- print open("Test.txt").read().decode("utf-8")
-
Result: I used Editplus to edit test.txt of abc中文, but when I used the notepad editor in Windows and saved it into the UTF-8 format, an error was reported:
- Traceback (most recent call last):
- File "ChineseTest.py", line 3, in
- print open("Test.txt").read().decode("utf-8")
- UnicodeEncodeError: 'gbk' codec can't encode
character u'\ufeff' in position 0: illegal multibyte
sequence
Originally, some software, such as notepad, will insert three invisible characters 0xEF 0xBB 0xBF at the beginning of the file when saving a file encoded in UTF-8 ). Therefore, we need to remove these characters during reading. The codecs module in python defines this constant:
- # coding=gbk
- import codecs
- data = open("Test.txt").read()
- if data[:3] == codecs.BOM_UTF8:
- datadata = data[3:]
- print data.decode("utf-8")
Result: abc (Chinese)