Python coding problem, from dull ache to removing root cause

Last Update:2017-11-11 Source: Internet

Author: User

Tags stdin utf 8 utf 8 encoding

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Access to the Information link

Python code why is it so painful?

python2.7 Manual str function

Python source file default encoding with internal default encoding

1. The source file is encoded as ASCII by default, so if you do not show what encoding the current code is written with, Python will parse it with ASCII, and if there is UTF-8 encoding in the source file, it will be an error because ASCII cannot translate UTF8 encoding.

# file test.py  save a='a'b=' good ' using UTF8

After running

' \xe5 '  in file test.py on line 2, but no encoding declared; see http: for details

The above error says character ' \xe5 ' is non-ASCII, because ' \xe5 ' is part of the ' Good ' UTF8 byte string

As follows

Print ' good ' >>> a=' good ' >>> A in the terminal encoded as UTF8  '\xe5\xa5\xbd

So, the current source file is what encoding editing, be sure to declare it. For example,

# Coding=utf-8 #先在这里声明 # here is the program code

2. One thing to note is that when editing code in the command-line mode of a terminal, there is no need to declare the encoding used for the current code, I guess because Python reads the encoding of the current system directly in command-line mode

For example, under Windows CMD, look at the word ' good '

Print ' good '>>> a=' good ' >>> A on the terminal (CMD) encoded as GBK  '\xba\xc3'

You can see that in cmd, ' good ' is two bytes, different from the three bytes on the

3. Internal default encoding is ASCII, which requires attention when using some functions, such as STR and Unicode

Sometimes, run a script, for example, we usually save the script with UTF8 encoding, if run on Windows cmd, print Chinese will be garbled, because the default code of CMD is GBK, it explains UTF8 byte string, it will naturally mess up.

So how to let the terminal printing information, regardless of the terminal encoding is not garbled?

1. Adjust terminal default encoding

2. Let the script cater to the taste of the terminal, or plan A: The script is saved as GBK, or Plan B: in need of terminal display place to turn the code, I said B plan

 #  coding=utf-8  import   Sysa  = "  good   " #   This file is saved as UTF 8 encoding, if you want to display normally on CMD, you need to switch to GBK,  Aunicode = A.decode ( " utf-8   ") #   First decoding to Unicode, decoding the time to tell Python,a is a UTF8 byte string, do not think that is the ASCII byte string  agbk=aunicode.encode (  " GBK   ") #   Unicdode encoded A is re-encoded as GBK  print  AGBK

There is an episode above, I directly with A.encode (' GBK ') line? This is not possible because the encoding (encode) is for Unicode and must be Unicode encoded if abruptly uses something like ' I '. Encode (' GBK '), it will be an error

As follows

 >>> a= "  good  "  >>> A.encode ( " GBK   " ) Traceback (most recent call last): File 
     " <STDIN>   ", line 1, in  <module>unicodedecodeerror:  "   ASCII   " codec Can"  t decode byte 0xba in position 0:ordinal  not  in  range (128 >>>

See, Python crawled out of Unicode decoding exception, Python also said ' ASCII ' how, why?

Because Python only encode the Unicode string, and if it is not encode the byte string, it decode the byte string, which is

A.encode ('GBK') = = (   a.decode (' default encoding '). Encode ('  gbk')   )

The above Unicode decoding exception, is also in the decoding of the time thrown, Python think A is the default encoding ASCII encoding, can be a UTF8 encoding ah, ' good ' in ASCII does not exist, so will error

3. I don't care what your terminal code, terminal you have to give me normal display.

Then use the most direct Unicode encoding and let Python print it out according to the system's current encoding.

# Coding=utf-8 Print u' me '

When printing a Unicode string, the Unicode corresponding system-encoded character is printed, so it is not garbled.

Derivative of a small problem, I just want to see what a variable Unicode is like, then use the Reper function (return the string form of an object)

>>> a=u ' good '>>> au'\u597d' Print repr (a) U ' \u597d '

Say something about the STR function

The STR function, which returns the rendering of an object in the form of a string (in my understanding, is the rendering that one can see),

For different objects, STR has different methods of operation, for example, for string type, it returns

For a function type, the STR function returns the position of the function in memory in the form of a string

When using the STR function with the string type, note that if it is a Unicode type string, be aware of the current default encoding

As below, I am in cmd, encoded as Gbk,python default encoding for ASCII

 >>> a=u "  good  "  >>> str (a) Traceback (most recent call last): File   " <STDIN>  " , line 1, in  <module> Unicodeencodeerror:   " ascii   " codec Can"  t Encode character u   " \u597d"   in position 0 : Ordinal not  in  range (128 >>>

When using the STR function with Unicode above, this conversion involves the default encoding, which first makes such conversions: Unicodestr.encode (defaultencoding).

If defaultencoding is not coding the code itself, it throws an exception.

So, to set the defaultencoding, the following

 >>> import   sys  >>>  reload (SYS)  <module  " sys  "  (Built-in ) >>>> sys.setdefaultencoding ( '  GBK   ") #   Specifies that the default encoding is GBK  >>> a=u "   Span style= "COLOR: #800000" > " >>> str (a) #   There's no error here.  "  \xc0\xb2   " >>> str (a) ==a True

Python coding problem, from dull ache to removing root cause

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More