Detailed introduction to the first step of Python Program Execution

Source: Internet
Author: User

We all know that Python program execution can be divided into five steps. This article will introduce the first step of Python program execution, that is, lexical analysis, if you are interested in the first step of Python program execution, you can click the following article.

Python source code analysis 3-lexical analyzer PyTokenizer favorites
Introduction
We analyzed the execution of the Python program in five steps:

Tokenizer performs lexical analysis and splits the source program into tokens.

Parser creates CST based on Token

CST is converted to AST

AST is compiled into bytecode

Execution bytecode

This article describes the first step of Python program execution, that is, lexical analysis.

In simple terms, lexical analysis combines the characters of the source program into tokens.

For example, sum = 0 can be divided into three tokens, 'sum', '=', and '0 '. Whitespace in the program is usually used only as a separator and will be ignored, so it is not displayed in the token list. However, in Python, due to the relationship between syntax rules, Tab/Space needs to be used to analyze program indentation, therefore, the processing of Whitespace in Python is slightly more complex than that of C/C ++ compilers.

In Python, lexical analysis is implemented in tokenizer. h and tokenizer. cpp under the Parser directory. Other parts of Python directly call the functions defined in tokenizer. h, as follows:

 
 
  1. extern struct tok_state
     *PyTokenizer_FromString
    (const char *);   
  2. extern struct tok_state 
    *PyTokenizer_FromFile
    (FILE *, char *, char *);   
  3. extern void PyTokenizer_Free
    (struct tok_state *);   
  4. extern int PyTokenizer_Get
    (struct tok_state *, char **, char **); 

All these functions start with PyTokenizer. This is a convention in Python source code. Although Python is implemented in C language, its implementation method draws on many object-oriented ideas. For lexical analysis, these four functions can be considered as member functions of PyTokenizer.

The first two functions, PyTokenizer_FromXXXX, can be considered as constructor and return the PyTokenizer instance. The internal state of the PyTokenizer object, that is, the member variable, is stored in the tok_state. PyTokenizer_Free can be considered as a destructor to release the memory occupied by PyTokenizer, that is, tok_state.

PyTokenizer_Get is a member function of PyTokenizer, which obtains the next Token in the Token stream. Both functions need to pass in the tok_state pointer, which is consistent with the principle that the this pointer needs to be implicitly passed to the member function in C ++. We can see that the idea of OO is actually irrelevant to the language. Even a structured language like C can also write programs that face objects.

The above is the first step in the execution of the Python program, that is, the introduction of lexical analysis. I forget you will get something.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.