Analysis of the entire process of parsing Python parser

Source: Internet
Author: User

First, let's take a look at the full process of the Python Parser: We first write the source code in the editor and save it as a file. If there is an encoding declaration in the source code and the Editor supports this syntax, the file will be stored in the disk in the corresponding encoding method.

Note: The encoding Declaration and the source file encoding are not necessarily the same, you can declare the Encoding As a UTF-8 In the encoding declaration, but use GB2312 to save the source file. Of course, it is impossible for us to find our own troubles and write errors intentionally, and a good IDE can also ensure consistency between the two. However, if we use Notepad, EditPlus, and other editors to write code, this problem may occur accidentally.

After we get a. py file, we can run it. Here, we will hand over the code to the Python parser to complete the parsing. When the parser reads the file, it first parses the encoding declaration in the file. If the file is encoded as gb2312, the content in the file is first converted from gb2312 to Unicode, then convert these Unicode to byte strings in UTF-8 format.

Note: here only refers to the source code is the pure code conversion of the script code) after this step is completed, the parser segment these UTF-8 byte strings, parse. If you encounter a Unicode character string Note: for example, encounter u 'China' I Love You '), then use the corresponding UTF-8 byte string to create a Unicode character string.

If the program uses a general string, the parser first converts the UTF-8 byte string to the corresponding encoding through Unicode here is gb2312 encoding) byte string Note: normal, non-unicode, that is, ascii), and use it to create a general String object. That is to say, the Unicode string and the general string in the memory storage format is not the same, the former uses the UTF-8 format, the latter uses the GB2312 format.

Now, we know the format of string storage in the memory. Next we need to know how print works. In fact, print is only responsible for handing over the corresponding bytes in the memory to the operating system, so that the corresponding program of the operating system, such as the cmd window, is displayed. There are two cases:

1. If the string is a general string, print only needs to push the corresponding byte string in the memory to the operating system. For example, code 1.
2. If the string is a Unicode string, print first implements the corresponding encode before pushing: We can show that the Unicode encode method is used to encode the code in the example 2)

Otherwise, Python uses the default encoding method, that is, code 3 in the ASCII example ). Of course, ASCII cannot properly encode Chinese characters, so Python reports an error. So far, we can resolve the first and third problems. As for the second problem, because Python has two types of strings, a general string and a Unicode string, both of which have their own character processing methods.

For the former, the method is in bytes, and in GB2312, each Chinese Character occupies two bytes, so the result is 5. For the latter, that is, the Unicode string, all characters are viewed in a unified manner.

Although the Chinese character of the console program is mentioned above, the Chinese Character Problems in file read/write and network transmission are similar in principle. The emergence of Unicode can solve the problem of internationalization of the software to a large extent. At the same time, Python provides excellent support for Unicode. Therefore, when writing a Python program, all use Unicode.

Uses the encoding method of the UTF-8 when saving the file. How to Use UTF-8 with Python has a detailed description, you can refer. There are still many Chinese problems in Python, such as file reading and writing and network data transmission. I hope you can communicate more and solve these problems together.

Review the process of using the Python Parser: First, write the source code in the editor and save it as a file. If there is an encoding declaration in the source code and the Editor supports this syntax, the file will be stored in the disk in the corresponding encoding method. Note: The encoding Declaration and the source file encoding are not necessarily the same, you can declare the Encoding As a UTF-8 In the encoding declaration, but use GB2312 to save the source file.

Of course, this is self-seeking troubles, and a good IDE should also ensure consistency between the two. However, if. This problem may occur if you use a notepad or EditPlus editor to write code. After obtaining a. py file, you can run it. This is to hand over the code to the Python parser to complete the parsing. When the parser reads the file, it first parses the encoding declaration in the file, assuming the file encoding is gb2312.

First convert the content in the file from gb2312 to Unicode, and then convert these Unicode to byte strings in UTF-8 format. After this step is completed, the parser segments and parses these UTF-8 byte strings. If a Unicode string is used, the Unicode string is created using the corresponding UTF-8 byte string if the program uses a general string.

Then the parser first converts the UTF-8 byte string to the corresponding encoding here is gb2312 encoding) byte string, and creates a general String object with it. That is to say, the Unicode string and the general string in the memory storage format is not the same, the former uses the UTF-8 format, the latter uses the GB2312 format.

  1. Introduction to Python system files
  2. How to correctly use Python Functions
  3. Detailed introduction and analysis of Python build tools
  4. Advantages of Python in PythonAndroid
  5. How to Use the Python module to parse the configuration file?

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.