String encoding differences for Python2 and 3

Source: Internet
Author: User
Tags stdin

In this paper, the differences between Python2 and Python3 in string coding are demonstrated in detail in the experiment.

In Python2, string literals correspond to 8-bit characters or byte-encoded byte literals. An important limitation of these strings is that they cannot fully support international character sets and Unicode encoding. To address this limitation,Python2 uses a separate string type for Unicode data . To enter the literal number of Unicode strings, precede the first quotation mark with the top ' u '.

Python2 also has a string type called byte literal, which refers to the literal amount of a string that has been encoded, and there is no difference between the byte literal and the normal string in Python2 because the normal string in Python2 is actually a byte string that has been encoded (not Unicode).

In Python3, you do not have to add this prefix character, otherwise it is a syntax error, because all strings are already Unicode encoded by default. If you run the interpreter with the-u option, Python2 simulates this behavior (that is, all string literals will be treated as Unicode characters, and the U prefix can be omitted). in Python3, the byte literal becomes a different type from the normal string .

~/download/firefox $ python2
Python 2.7.2 (default, June, 11:17:09)
[GCC 4.6.1] on linux2
Type ' help ' "Copyright", "credits" or "license" for the more information.
>>> ' Handsome ' #python2 automatically converts the string to the appropriate encoded byte string ' \xe5\xbc\xa0\xe4\xbf\x8a ' #自动转换为utf-8 encoded byte string >>> u ' Handsome ' # explicitly specifies that the string type is a Unicode type, that the type string is not encoded, and that the code point (ordinal) of the character in the Unicode character set is saved by U ' \U5F20\U4FCA ' >>> ' handsome '. Encode (' Utf-8 ') #
Python2 has automatically converted it into a utf-8 type encoding, so it's an error to encode again (Python2 will encode the string as an ASCII or Unicode encoding).  Traceback (most recent): File "<stdin>", line 1, in <module> unicodedecodeerror: ' ASCII ' codec can ' t Decode byte 0xe5 in position 0:ordinal not in range (128) >>> ' Handsome '. Decode (' Utf-8 ') #python2 can decode normally, the returned string class is not edited The Unicode type of the code u ' \U5F20\U4FCA ' >>> b ' Handsome ' # ' handsome ' has been converted Python2 to utf-8 encoding, so the byte string ' \xe5\xbc\xa0\xe4\xbf\x8a ' >& gt;> print ' Handsome ' handsome >>> print U ' handsome ' handsome >>> print B ' Handsome ' handsome >>> ~/download/firefox $ p Ython3 Python 3.2.2 (default, Sep 5, 04:33:58) [GCC 4.6.1 20110819 (Prerelease)] on linux2 Type ' help ', ' copyright ',


"Credits" or "license" for the more information. >>> ' Handsome ' #python3的字符串默认为unicode格(no code) ' Handsome ' >>> u ' Handsome ' #由于默认为unicode格式, so the string does not have to explicitly indicate its type as python2, otherwise it is a syntax error. File "<stdin>", line 1 u ' handsome ' ^ syntaxerror:invalid syntax >>> type (' Handsome ') #python3中文本字符串和字节字符串是严格区分的, default to U
Nicode format Text string <class ' str ' > >>> ' Handsome '. Decode (' Utf-8 ') #因为默认的文本字符串为unicode格式, so the text string has no Decode method Traceback (most recent): File "<stdin>", line 1, in <module> attributeerror: ' str ' object has no att Ribute ' decode ' >>> ' handsome '. Encode (' Utf-8 ') #将文本字符串编码, converting to encoded byte string type B ' \xe5\xbc\xa0\xe4\xbf\x8a ' >>> t Ype (' Handsome '. Encode (' Utf-8 ')) <class ' bytes ' > >>> print (' Handsome '. Encode (' Utf-8 ')) #对于已编码的字节字符串,
Many of the attributes and methods of a text string are no longer available. B ' \xe5\xbc\xa0\xe4\xbf\x8a ' >>>print (' handsome '. Encode (' Utf-8 ')) B ' \xe5\xbc\xa0\xe4\xbf\x8a ' >>> print (' Handsome '. Encode (' Utf-8 '). Decode (' Utf-8 ') #必须将字节字符串解码后才能打印出来 handsome

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.