Python Chinese garbled resolution method

Source: Internet
Author: User

For example, I download some information from the Internet or write an e-mail program to download to the local, to Notepad (TXT)
The form is written and saved on the local computer, why see only English and garbled? How do you do that?

For

Garbled Reason:
Because your file is declared utf-8, it should also be a source file saved with a utf-8 encoding. However, the local default encoding for Windows is cp936, or GBK encoding, so it is garbled to print the Utf-8 string directly in the console.

Workaround:
In the console printing place with a transcoding on the OK, the printing time to write:

print Myname.decode (' utf-8 '). Encode (' GBK ')

A more general approach would be:

Import Sys
Type = Sys.getfilesystemencoding ()
Print Myname.decode (' utf-8 '). Encode (type)

Here we look at the common Chinese garbled solution set

Method One:

Add the encoding declaration at the beginning of the file:

#coding = GBK

s = ' Google '

Print S

Output results: Google


Method Two:

To transfer code at the time of output:

#coding = Utf-8

s = ' Google '

Print Unicode (S, ' GBK ')

Output results: Google

TXT files in Chinese garbled processing

Some software, such as Notepad, inserts three invisible characters (0xEF 0xbb 0xbf, or BOM) at the beginning of the file when saving a file encoded in Utf-8. So we need to get rid of these characters when we read, and the codecs module in Python defines this constant

# CODING=GBK

Import Codecs

data = open ("Test.txt"). Read ()

If data[:3] = = Codecs.bom_utf8:

Datadata = data[3:]

Print Data.decode ("Utf-8")


Converts STR to Unicode using the Unicode function and the Decode method. Why are the arguments for these two functions "GBK"?
The first reaction was that we used GBK (# CODING=GBK) In our coding statements, but really?
Modify the source file:

# Coding=utf-8
s = "Chinese"
Print Unicode (S, "Utf-8")


Run, Error:

Traceback (most recent call last):
File "chinesetest.py", line 3, in <module>
s = Unicode (S, "Utf-8")
Unicodedecodeerror: ' UTF8 ' codec can ' t decode bytes in position 0-1: Invalid data


Obviously, if the front is normal because both sides of the use of GBK, then I keep both sides utf-8 consistent, it should be normal, not error.
Further example, if we convert here still with GBK:
# coding=utf-8
s = "Chinese"
Print Unicode (S, "GBK")
Results: Chinese

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.