Syntaxerror:non-ascii character Python Chinese processing _

Syntaxerror:non-ascii character Python Chinese processing __python

Last Update:2018-07-24 Source: Internet

Author: User

Tags windows ssh client

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Python's Chinese topic has always been a headache for beginners, and this article will give you a concrete explanation of common sense. Of course, it's almost certain that python will solve the problem completely in a future release, and it doesn't bother us so much.

Let's look at the Python version:

>>> Import Sys
>>> sys.version
"" 2.5.1 (r251:54863, APR 2007, 08:51:08) [MSC v.1310 bit (Intel)] ""

(i) Create a file chinesetest.py with Notepad, default ANSI:

s = "Chinese"
Print S

Test to see:

E:\project\python\test>python chinesetest.py

File "chinesetest.py", line 1
Syntaxerror:non-ascii character "" \xd6 "" in File chinesetest.py on line 1, but noencodingdeclared; Http://www.pytho
N.org/peps/pep-0263.html for details

Secretly to change the file code to UTF-8:

E:\project\python\test>python chinesetest.py
File "chinesetest.py", line 1
Syntaxerror:non-ascii character "" \xe4 "" in File chinesetest.py on line 1, but noencodingdeclared; Http://www.pytho
N.org/peps/pep-0263.html for details

No avail...
Since it provides a Web site, look at it. Simply browsing, and finally knowing that if there are non-ASCII characters in the file, you need to specify the encoding declaration in the first or second line. Change the encoding of the chinesetest.py file from the beginning to ANSI and add the code declaration:

# CODING=GBK
s = "Chinese"
Print S

Try again:

E:\project\python\test>python chinesetest.py
Chinese

Normal slightly:)

(ii) Take a look at its length:

# CODING=GBK
s = "Chinese"
Print Len (s)

Results: 4.
s here is the STR type, so the time of the plotting one Chinese equivalent to two English characters, is the length of 4.
We write this:

# CODING=GBK
s = "Chinese"
S1 = u "Chinese"
S2 = Unicode (S, "GBK") # ellipsis argument will be decoded with Python's default ASCII
S3 = S.decode ("GBK") # converting STR to Unicode is the Decode,unicode function affects the same
Print Len (S1)
Print len (S2)
Print Len (S3)

Results:
2
2
2
(iii) Next look at the handling of the document punishment: Build a file Test.txt, file pattern ANSI, content: ABC Chinese, read in Python

# CODING=GBK
Print open ("Test.txt"). Read ()

Results: ABC Chinese
Change the pattern of documents into UTF-8:
Results: ABC Juan PO
Clearly, there is a need to decode:

# CODING=GBK
Import Codecs
Print open ("Test.txt"). Read (). Decode ("Utf-8")

Results: ABC Chinese
The top of the Test.txt I was edited with EditPlus, but when I used Windows with Notepad editor to form the UTF-8 pattern,
Run times wrong:

# CODING=GBK
Import Codecs
Print open ("Test.txt"). Read (). Decode ("Utf-8")

Originally, some software, such as Notepad, when saving a file encoded in UTF-8, inserts three invisible characters (0 xef 0 xbb 0 xbf, or BOM) at the beginning of the file.
The codecs module in Python defines this constant as we need to remove these characters when we read them:

# CODING=GBK
Import Codecs
Print open ("Test.txt"). Read (). Decode ("Utf-8")

Results: ABC Chinese

(iv) A few remaining topics
In the second project group, we converted Str to Unicode using the Unicode function and the Decode method. Why are the arguments for these two functions "GBK"?
The first response was that we used GBK (# CODING=GBK) In our coding statements, but that was really true.
Correct the source file:

# Coding=utf-8
s = "Chinese"
Print Unicode (S, "Utf-8")

Run, Error:

Traceback (most recent call last):
File "chinesetest.py", line 3, in <module>
s = Unicode (S, "Utf-8")
Unicodedecodeerror: "" UTF8 "" codec can "" t decode bytes in position 0-1: Invalid data

Obviously, if the front is normal because both sides of the application of GBK, so here I confrontation on both sides utf-8 consistent, should also be normal, not error.
Further examples, if we convert here still use GBK:

# Coding=utf-8
s = "Chinese"
Print Unicode (S, "GBK")

Results: Chinese
Read an English material that roughly explains the print in Python:
When Python executes a print statement, it simply passes the output to the operating system (using fwrite () or something L Ike it), and some the other program is responsible for actually displaying which output on the screen. For example, on Windows, it might is the Windows console subsystem that displays. Or If you ' re using Windows and running Python on a Unix box somewhere else, your the Windows SSH client is actually RESPONSIB Le for displaying the data. If you are are running Python in a xterm on Unix, then xterm and your X server handle the display.

To print data reliably, your must know the encoding this display program expects.

Simply put, the print in Python passes strings directly to the control system, so you need to decode Str into a pattern that is consistent with the control system. Windows applies CP936 (almost identical to GBK), so you can apply GBK here.
Final Test:

# Coding=utf-8
s = "Chinese"
Print Unicode (S, "cp936")

Results: Chinese

Original link: http://www.byywee.com/page/M0/S739/739904.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More