Analysis of the entire process of parsing Python parser

Last Update:2013-12-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, let's take a look at the full process of the Python Parser: We first write the source code in the editor and save it as a file. If there is an encoding declaration in the source code and the Editor supports this syntax, the file will be stored in the disk in the corresponding encoding method.

Note: The encoding Declaration and the source file encoding are not necessarily the same, you can declare the Encoding As a UTF-8 In the encoding declaration, but use GB2312 to save the source file. Of course, it is impossible for us to find our own troubles and write errors intentionally, and a good IDE can also ensure consistency between the two. However, if we use Notepad, EditPlus, and other editors to write code, this problem may occur accidentally.

After we get a. py file, we can run it. Here, we will hand over the code to the Python parser to complete the parsing. When the parser reads the file, it first parses the encoding declaration in the file. If the file is encoded as gb2312, the content in the file is first converted from gb2312 to Unicode, then convert these Unicode to byte strings in UTF-8 format.

Note: here only refers to the source code is the pure code conversion of the script code) after this step is completed, the parser segment these UTF-8 byte strings, parse. If you encounter a Unicode character string Note: for example, encounter u 'China' I Love You '), then use the corresponding UTF-8 byte string to create a Unicode character string.

If the program uses a general string, the parser first converts the UTF-8 byte string to the corresponding encoding through Unicode here is gb2312 encoding) byte string Note: normal, non-unicode, that is, ascii), and use it to create a general String object. That is to say, the Unicode string and the general string in the memory storage format is not the same, the former uses the UTF-8 format, the latter uses the GB2312 format.

Now, we know the format of string storage in the memory. Next we need to know how print works. In fact, print is only responsible for handing over the corresponding bytes in the memory to the operating system, so that the corresponding program of the operating system, such as the cmd window, is displayed. There are two cases:

1. If the string is a general string, print only needs to push the corresponding byte string in the memory to the operating system. For example, code 1.
2. If the string is a Unicode string, print first implements the corresponding encode before pushing: We can show that the Unicode encode method is used to encode the code in the example 2)

Otherwise, Python uses the default encoding method, that is, code 3 in the ASCII example ). Of course, ASCII cannot properly encode Chinese characters, so Python reports an error. So far, we can resolve the first and third problems. As for the second problem, because Python has two types of strings, a general string and a Unicode string, both of which have their own character processing methods.

For the former, the method is in bytes, and in GB2312, each Chinese Character occupies two bytes, so the result is 5. For the latter, that is, the Unicode string, all characters are viewed in a unified manner.

Although the Chinese character of the console program is mentioned above, the Chinese Character Problems in file read/write and network transmission are similar in principle. The emergence of Unicode can solve the problem of internationalization of the software to a large extent. At the same time, Python provides excellent support for Unicode. Therefore, when writing a Python program, all use Unicode.

Uses the encoding method of the UTF-8 when saving the file. How to Use UTF-8 with Python has a detailed description, you can refer. There are still many Chinese problems in Python, such as file reading and writing and network data transmission. I hope you can communicate more and solve these problems together.

Review the process of using the Python Parser: First, write the source code in the editor and save it as a file. If there is an encoding declaration in the source code and the Editor supports this syntax, the file will be stored in the disk in the corresponding encoding method. Note: The encoding Declaration and the source file encoding are not necessarily the same, you can declare the Encoding As a UTF-8 In the encoding declaration, but use GB2312 to save the source file.

Of course, this is self-seeking troubles, and a good IDE should also ensure consistency between the two. However, if. This problem may occur if you use a notepad or EditPlus editor to write code. After obtaining a. py file, you can run it. This is to hand over the code to the Python parser to complete the parsing. When the parser reads the file, it first parses the encoding declaration in the file, assuming the file encoding is gb2312.

First convert the content in the file from gb2312 to Unicode, and then convert these Unicode to byte strings in UTF-8 format. After this step is completed, the parser segments and parses these UTF-8 byte strings. If a Unicode string is used, the Unicode string is created using the corresponding UTF-8 byte string if the program uses a general string.

Then the parser first converts the UTF-8 byte string to the corresponding encoding here is gb2312 encoding) byte string, and creates a general String object with it. That is to say, the Unicode string and the general string in the memory storage format is not the same, the former uses the UTF-8 format, the latter uses the GB2312 format.

Introduction to Python system files
How to correctly use Python Functions
Detailed introduction and analysis of Python build tools
Advantages of Python in PythonAndroid
How to Use the Python module to parse the configuration file?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Analysis of the entire process of parsing Python parser

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Analysis of the entire process of parsing Python parser

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support