Summary of coding knowledge in python

Source: Internet
Author: User
This article mainly introduces the compilation and summary of python encoding knowledge. For more information, see Problem

During normal work, I encountered the following error:

UnicodeDecodeError: 'ascii' codec can't decode byte

It is common that everyone has encountered it. So I decided to organize and learn python encoding.

Basic knowledge

In python2.x, there are two data types: unicode and str, both of which are subclasses of basestring.

>>> A = '中' >>> type ()
 
  
>>> Isinstance (a, basestring) True >>>> a = U' center '>>> type ()
  
   
>>> Isinstance (a, basestring) True
  
 

In summary, str is a byte string consisting of encoded bytes (such as bytes of python3.x). unicode is an object, it is a true string consisting of characters.

>>> A = 'Chinese' >>> len (a) 6 >>> repr () "'\ xe4 \ xb8 \ xad \ xe6 \ x96 \ x87'" >>> B = u'chinese' >>> len (B) 2 >>> repr (B) "u' \ u4e2d \ u6587 '"

Console and script

Run the following command on the python console in linux. The result is different from the execution script.

>>> A = u'chinese' >>> repr () "u' \ xe4 \ xb8 \ xad \ xe6 \ x96 \ x87 '"> B = unicode ('Chinese', 'utf-8 ') b) >>> repr (B) "u' \ u4e2d \ u6587 '"

We can see that the object a initialized by u'chinese' is not what we expected. what is the reason?
Think of python as a pipe, and the intermediate process in the pipe is unicode. The entrance is converted to unicode, and the exit is converted to the target encoding (unless otherwise, the specific encoding is required in the processing logic ).
Run the command a = u'chinese' on the console, which can be interpreted as a command, a = 'Chinese'. decode (encode), to unicode object. So what is the encode here? For the console, the standard input is sys. stdin. encoding.

>>> sys.stdin.encoding'ISO-8859-1'

The default encoding for my console is ISO-8859-1, so a = u 'China' <=> a = 'China'. decode ('ISO-8859-1 ')
Here, the 'China' is understood by the console. even if the encoding of the byte code is based on the terminal encoding method, for the UTF-8 encoding terminal, 'Chinese' = '\ xe4 \ xb8 \ xad \ xe6 \ x96 \ x87'

>>> A = 'Chinese '. decode ('ISO-8859-1 ') >>> repr (a) "u' \ xe4 \ xb8 \ xad \ xe6 \ x96 \ x87 '"

So how can we modify the encoding value and set it? In linux, you can set the environment variables as follows:

export PYTHONIOENCODING=UTF-8

Summary

Return to the problem at first, because the difference between unicode and str is not clear, and the two are mixed.

>>> A = 'Chinese' >>> a. encode ('gbk') Traceback (most recent call last): File"
 
  
", Line 1, in
  
   
UnicodeDecodeError: 'ascii 'codec can't decode byte 0xe4 in position 0: ordinal not in range (128)
  
 

The above object a is actually str, that is, the bytecode. if the terminal is UTF-8 encoded, then a is the UTF-8 encode. A. encode ('gbk') is equivalent to. decode (encoding ). encode ('gbk'), that is, first decodes the bytecode into a unicode character, and then encode is a bytecode. Unicode objects are used as transfer stations. So what is encoding here?

>>> import sys>>> sys.getdefaultencoding()'ascii'

The default value is ascii, which is why the error cannot be decoded using ascii.

>>> Reload (sys)
 
  
>>> Sys. setdefaultencoding ('utf-8') >>> a = 'Chinese' >>> repr () "'\ xe4 \ xb8 \ xad \ xe6 \ x96 \ x87'">. encode ('gbk') '\ xd6 \ xd0 \ xce \ xc4'
 

Change the default encoding to UTF-8. Encode is not encouraged for str, because str is implicitly decoded. Decode only applies to str, and encode only applies to unicode. all decode and encode display the specified encoding method.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.