For example, I download some information from the Internet or write an e-mail program to download to the local, to Notepad (TXT)
The form is written and saved on the local computer, why see only English and garbled? How do you do that?
For
Garbled Reason:
Because your file is declared utf-8, it should also be a source file saved with a utf-8 encoding. However, the local default encoding for Windows is cp936, or GBK encoding, so it is garbled to print the Utf-8 string directly in the console.
Workaround:
In the console printing place with a transcoding on the OK, the printing time to write:
print Myname.decode (' utf-8 '). Encode (' GBK ')
A more general approach would be:
Import Sys
Type = Sys.getfilesystemencoding ()
Print Myname.decode (' utf-8 '). Encode (type)
Here we look at the common Chinese garbled solution set
Method One:
Add the encoding declaration at the beginning of the file:
#coding = GBK
s = ' Google '
Print S
Output results: Google
Method Two:
To transfer code at the time of output:
#coding = Utf-8
s = ' Google '
Print Unicode (S, ' GBK ')
Output results: Google
TXT files in Chinese garbled processing
Some software, such as Notepad, inserts three invisible characters (0xEF 0xbb 0xbf, or BOM) at the beginning of the file when saving a file encoded in Utf-8. So we need to get rid of these characters when we read, and the codecs module in Python defines this constant
# CODING=GBK
Import Codecs
data = open ("Test.txt"). Read ()
If data[:3] = = Codecs.bom_utf8:
Datadata = data[3:]
Print Data.decode ("Utf-8")
Converts STR to Unicode using the Unicode function and the Decode method. Why are the arguments for these two functions "GBK"?
The first reaction was that we used GBK (# CODING=GBK) In our coding statements, but really?
Modify the source file:
# Coding=utf-8
s = "Chinese"
Print Unicode (S, "Utf-8")
Run, Error:
Traceback (most recent call last):
File "chinesetest.py", line 3, in <module>
s = Unicode (S, "Utf-8")
Unicodedecodeerror: ' UTF8 ' codec can ' t decode bytes in position 0-1: Invalid data
Obviously, if the front is normal because both sides of the use of GBK, then I keep both sides utf-8 consistent, it should be normal, not error.
Further example, if we convert here still with GBK:
# coding=utf-8
s = "Chinese"
Print Unicode (S, "GBK")
Results: Chinese