About Unicode and STR in python2 and str and bytes in Python3

Source: Internet
Author: User

There are two types of python3 that represent the sequence of characters: bytes and str. An instance of the former contains the original 8-bit value, and the instance of the latter contains Unicode characters.

There are also two types of python2 that represent sequences of characters, called STR and Unicode, respectively. Unlike Python3, an instance ofstr contains the original 8-bit value , while an instance of Unicode contains Unicode characters .

I do not understand the above two sentences, so the article behind the next is hope in order to get the above two words to understand.

See a few examples:

#in the Python2>>> Type ('x'. Decode ('Utf-8'))<type'Unicode'>#Why not binary, string can also decode? And how to solve#in the Python3>>> Type ('x'. Decode ('Utf-8'))#this is normal! Traceback (most recent): File"<stdin>", Line 1,inch<module>Attributeerror:'Str'object has no attribute'Decode'  #how did the string get solved?

First, this is the problem with the Python language itself, because in Python2 's syntax, the default STR is not really the string we understand, but rather a byte array, or a string that can be interpreted as a plain ASCII character. Corresponds to a variable of type bytes in Python3, and the real generic string is a variant of the Unicode type, which corresponds to the type of the STR variable in the Python3 that should be used as a byte array, and you say it's not messy, This is done to maintain compatibility with previous programs.

In Python2, as two types of character sequences, STR and Unicode need to be converted, and they are converted in this way.

Str--decode Method--"Unicode--encode Method--" str

In the Python3 can correspond to this conversion, with the above diagram, may be good to understand a little.

Byte--decode (decoding) Method--"Str-->encode (encoding) Method--" byte

#in the Python2>>> Type ('x')<type'Str'> >>> Type ('x'. Decode ('Utf-8'))<type'Unicode'>>>> Type (U'x'. Encode ('Utf-8'))<type'Str'>#in the Python3>>>type (x)<class 'Str'>>>> Type (b'x')<class 'bytes'>>>> Type (b'x'. Decode ('Utf-8'))<class 'Str'>

>>> type (' X '. Encode (' Utf-8 '))
<class ' bytes ' >

There is the implicit conversion, when a Unicode string and a str string to connect, will automatically convert the STR string into a Unicode type and then connect, and this time using the encoding method is the system's default encoding. Python2 default is Ascii,python3 the default is Utf-8.

#in the Python2>>> x = u'Meow'>>>Xu'\u55b5'>>>type (x)<type'Unicode'>#in the Python3>>> x = u'Meow'>>>x'Meow'>>>type (x)<class 'Str'>#Why the results are different

About Unicode and STR in python2 and str and bytes in Python3

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.