Python2 and Python3 's Unicode and str summary

Source: Internet
Author: User

Python's coding problems should be plagued by every child's shoe that writes Python code.
Python2 and Python3 's default encoding is different, so it is necessary to find out, otherwise search on the internet a bunch of answers a try, or quite a waste of time.
first of all, the Python 2.x str

s = "I'm not garbled"

S is a string that itself stores a byte code (bytes).
So what is the format of this byte code?
If this code is entered on the interpreter, then the S format is the interpreter's encoding format, which is GBK for Windows CMD.
If the segment code is saved before it is executed, for example, stored as utf-8, then when the interpreter loads the program, the S is initialized to UTF-8 encoding.
Python 2.x Unicode
We know that Unicode is a coding standard, the specific implementation criteria may be UTF-8,UTF-16,GBK ...
Python uses two bytes internally to store a Unicode, and the advantage of using Unicode objects instead of STR is that Unicode facilitates cross-platform.
You can define a Unicode in the following two ways:

S1 = u "I am not garbled" 
s2 = Unicode ("I am not garbled", "Utf-8")

Can be converted through encode and decode.

But Python 2 quietly masks the conversion of byte to Unicode, making it easier to process ASCII. The price you pay is to fail when dealing with non-ASCII.

We often encounter errors that are unicodedecodeerror: ' ASCII ' codec can ' t decode byte ...

Common solutions:
Set defaultencoding

Reload (SYS)
sys.setdefaultencoding (' Utf-8 ')

If you encode and decode in Python without specifying the encoding, then Python uses defaultencoding.

For example, in the previous example, STR is encoded in another format, and defaultencoding is used.

S.encode ("Utf-8") is equivalent to S.decode (defaultencoding). Encode ("Utf-8")

For example, if you use str to create a Unicode object, the program will also use defaultencoding if you do not specify the STR encoding format.

u = Unicode ("Life is short") is equivalent to u = Unicode ("Life is short", defaultencoding)

The default defaultcoding:ascii is the cause of many errors, so setting the defaultencoding early is a good habit.

the role of file header declaration encoding

Top: #-Coding:utf-8-now appears to have three roles. If you have a Chinese annotation in your code, you need a more advanced editor such as my Emacs, which will be formatted as a code file based on the header declaration. The program will be declared by the head, decoding the initialization U ' life is short ', such a Unicode object, (so header declaration and code storage format to be consistent)

in the Python3
There are two types of Python 3, one is str (Unicode) and the other is byte code. But they have different names. Type (variable)

The biggest change to Unicode support in Python 3 is that there is no automatic decoding of byte strings. If you want to connect a byte string to a Unicode, you get an error, regardless of what it contains.
Can be simply understood as:
Unicode-> python3 str in Python2
The byte of str-> Python3 in Python2

Personally think that the following article illustrated the relationship between the various characters, it is worth reading carefully.
Http://pycoders-weekly-chinese.readthedocs.io/en/latest/issue5/unipain.html

Reference:
http://python.jobbole.com/81244/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.