Summary of coding knowledge in python

Last Update:2017-05-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article mainly introduces the compilation and summary of python encoding knowledge. For more information, see Problem

During normal work, I encountered the following error:

UnicodeDecodeError: 'ascii' codec can't decode byte

It is common that everyone has encountered it. So I decided to organize and learn python encoding.

Basic knowledge

In python2.x, there are two data types: unicode and str, both of which are subclasses of basestring.

>>> A = '中' >>> type ()
 
  
>>> Isinstance (a, basestring) True >>>> a = U' center '>>> type ()
  
   
>>> Isinstance (a, basestring) True

In summary, str is a byte string consisting of encoded bytes (such as bytes of python3.x). unicode is an object, it is a true string consisting of characters.

>>> A = 'Chinese' >>> len (a) 6 >>> repr () "'\ xe4 \ xb8 \ xad \ xe6 \ x96 \ x87'" >>> B = u'chinese' >>> len (B) 2 >>> repr (B) "u' \ u4e2d \ u6587 '"

Console and script

Run the following command on the python console in linux. The result is different from the execution script.

>>> A = u'chinese' >>> repr () "u' \ xe4 \ xb8 \ xad \ xe6 \ x96 \ x87 '"> B = unicode ('Chinese', 'utf-8 ') b) >>> repr (B) "u' \ u4e2d \ u6587 '"

We can see that the object a initialized by u'chinese' is not what we expected. what is the reason?
Think of python as a pipe, and the intermediate process in the pipe is unicode. The entrance is converted to unicode, and the exit is converted to the target encoding (unless otherwise, the specific encoding is required in the processing logic ).
Run the command a = u'chinese' on the console, which can be interpreted as a command, a = 'Chinese'. decode (encode), to unicode object. So what is the encode here? For the console, the standard input is sys. stdin. encoding.

>>> sys.stdin.encoding'ISO-8859-1'

The default encoding for my console is ISO-8859-1, so a = u 'China' <=> a = 'China'. decode ('ISO-8859-1 ')
Here, the 'China' is understood by the console. even if the encoding of the byte code is based on the terminal encoding method, for the UTF-8 encoding terminal, 'Chinese' = '\ xe4 \ xb8 \ xad \ xe6 \ x96 \ x87'

>>> A = 'Chinese '. decode ('ISO-8859-1 ') >>> repr (a) "u' \ xe4 \ xb8 \ xad \ xe6 \ x96 \ x87 '"

So how can we modify the encoding value and set it? In linux, you can set the environment variables as follows:

export PYTHONIOENCODING=UTF-8

Summary

Return to the problem at first, because the difference between unicode and str is not clear, and the two are mixed.

>>> A = 'Chinese' >>> a. encode ('gbk') Traceback (most recent call last): File"
 
  
", Line 1, in
  
   
UnicodeDecodeError: 'ascii 'codec can't decode byte 0xe4 in position 0: ordinal not in range (128)

The above object a is actually str, that is, the bytecode. if the terminal is UTF-8 encoded, then a is the UTF-8 encode. A. encode ('gbk') is equivalent to. decode (encoding ). encode ('gbk'), that is, first decodes the bytecode into a unicode character, and then encode is a bytecode. Unicode objects are used as transfer stations. So what is encoding here?

>>> import sys>>> sys.getdefaultencoding()'ascii'

The default value is ascii, which is why the error cannot be decoded using ascii.

>>> Reload (sys)
 
  
>>> Sys. setdefaultencoding ('utf-8') >>> a = 'Chinese' >>> repr () "'\ xe4 \ xb8 \ xad \ xe6 \ x96 \ x87'">. encode ('gbk') '\ xd6 \ xd0 \ xce \ xc4'

Change the default encoding to UTF-8. Encode is not encouraged for str, because str is implicitly decoded. Decode only applies to str, and encode only applies to unicode. all decode and encode display the specified encoding method.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Summary of coding knowledge in python

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Summary of coding knowledge in python

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support