There are two types of python3 that represent the sequence of characters: bytes and str. An instance of the former contains the original 8-bit value, and the instance of the latter contains Unicode characters.
There are also two types of python2 that represent sequences of characters, called STR and Unicode, respectively. Unlike Python3, an instance ofstr contains the original 8-bit value , while an instance of Unicode contains Unicode characters .
I do not understand the above two sentences, so the article behind the next is hope in order to get the above two words to understand.
See a few examples:
#in the Python2>>> Type ('x'. Decode ('Utf-8'))<type'Unicode'>#Why not binary, string can also decode? And how to solve#in the Python3>>> Type ('x'. Decode ('Utf-8'))#this is normal! Traceback (most recent): File"<stdin>", Line 1,inch<module>Attributeerror:'Str'object has no attribute'Decode' #how did the string get solved?
First, this is the problem with the Python language itself, because in Python2 's syntax, the default STR is not really the string we understand, but rather a byte array, or a string that can be interpreted as a plain ASCII character. Corresponds to a variable of type bytes in Python3, and the real generic string is a variant of the Unicode type, which corresponds to the type of the STR variable in the Python3 that should be used as a byte array, and you say it's not messy, This is done to maintain compatibility with previous programs.
In Python2, as two types of character sequences, STR and Unicode need to be converted, and they are converted in this way.
Str--decode Method--"Unicode--encode Method--" str
In the Python3 can correspond to this conversion, with the above diagram, may be good to understand a little.
Byte--decode (decoding) Method--"Str-->encode (encoding) Method--" byte
#in the Python2>>> Type ('x')<type'Str'> >>> Type ('x'. Decode ('Utf-8'))<type'Unicode'>>>> Type (U'x'. Encode ('Utf-8'))<type'Str'>#in the Python3>>>type (x)<class 'Str'>>>> Type (b'x')<class 'bytes'>>>> Type (b'x'. Decode ('Utf-8'))<class 'Str'>
>>> type (' X '. Encode (' Utf-8 '))
<class ' bytes ' >
There is the implicit conversion, when a Unicode string and a str string to connect, will automatically convert the STR string into a Unicode type and then connect, and this time using the encoding method is the system's default encoding. Python2 default is Ascii,python3 the default is Utf-8.
#in the Python2>>> x = u'Meow'>>>Xu'\u55b5'>>>type (x)<type'Unicode'>#in the Python3>>> x = u'Meow'>>>x'Meow'>>>type (x)<class 'Str'>#Why the results are different
About Unicode and STR in python2 and str and bytes in Python3