Processing of python strings those things

Source: Internet
Author: User
Tags ord
This chapter describes the python stringTo write and deal with, before we figure out the vexing character encodingAfter the problem, we'll study a string of Python.

In the latest version of Python 3, strings are encoded in Unicode, meaning that Python strings support multiple languages, such as:

>>> print (' str with Chinese ') contains Chinese str

For the encoding of a single character,Python provides an integer representation of the ord () function to get the character, and theChr () function converts the encoding to the corresponding character:

>>> Ord (' A ') 65>>> ord (' Middle ') 20013>>> chr ("The ' B ' >>> chr (25991) ' text ')

If you know the integer encoding of the character, you can also write the str in hexadecimal:

>>> ' \u4e2d\u6587 ' Chinese

The two formulations are completely equivalent.

Because python's string type is str, it is represented in memory in Unicode and a character corresponds to several bytes. If you want to transfer on the network, or save to disk, you need to turn str into bytes in bytes.

Python uses single or double quotation marks with a B prefix for data of type bytes:

x = B ' ABC '

Be aware of the distinction between ' abc ' and ' B ' abc ', which is STR, although the content is the same as the former, but each character of bytes occupies only one byte.

The STR represented in Unicode can be encoded as a specified bytes by using the Encode () method, for example:

>>> ' abc '. Encode (' ASCII ') b ' abc ' >>> ' Chinese '. Encode (' utf-8 ') b ' \xe4\xb8\xad\xe6\x96\x87 ' >>> ' Chinese '. Encode (' ASCII ') Traceback (most recent call last):  File "<stdin>", line 1, in <module> Unicodeencodeerror:   ' ASCII ' codec can ' t encode characters in position 0-1: Ordinal not in range (128)

The English-language STR can be ASCII encoded as bytes, the content is the same, the Chinese-containing STR can be encoded with UTF-8 bytes. STR, which contains Chinese, cannot be ASCII encoded because the range of Chinese encodings exceeds the ASCII encoding range and Python will error.

In bytes, the bytes that cannot be displayed as ASCII characters are #显示 with \x#.

Conversely, if we read the byte stream from the network or disk, then the data read is bytes. To turn bytes into STR, you need to use the Decode () method:

>>> b ' abc '. DECODE (' ASCII ') ' abc ' >>> b ' \xe4\xb8\xad\xe6\x96\x87 '. Decode (' utf-8 ') ' Chinese '

If the bytes contains bytes that cannot be decoded, the decode () method will error:

>>> b ' \xe4\xb8\xad\xff ' decode (' Utf-8 ') Traceback (most recent call last):  ... Unicodedecodeerror: ' Utf-8 ' codec can ' t decode byte 0xff in position 3:invalid start byte

Above, is the problem of Python string programming

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.