Lexical analysis and grammatical analysis of Python

Lexical analysis (Lexical analysis): Analyze whether a word made up of characters is legal and, if there is no problem, produces a word flow.

Parsing (syntactic analysis): Analyze whether a sentence consisting of words is legal, and if there is no problem, a syntax tree is created.

When the lexical analyzer parses the source code text, there is a concept that needs to be clarified:

1. Physical line: A physical line consisting of a sequence of characters (on Windows that is CR LF, which is LF on Unix) that ends with a character series.

2. Logical line: Consisting of one or more physical rows, you can explicitly use a backslash (\) to connect multiple physical exercises into a logical line, or, in parentheses, brackets, expressions within curly braces can span multiple physical rows, but are treated as a logical line.

A lexical parser is a logical line, that is, for a lexical parser, only logical lines are a single line, and it only produces the word newline or token at the end of the logical line.

For each indentation, no matter how many whitespace characters it has, the lexical analyzer produces only one indent word or token, which represents the indentation level, and the lexical analyzer generates a dedent word or token whenever it exits the first indent. Note that the word dedent does not correspond to one or a group of characters, it is a complete logical concept.

Python uses a slightly modified BNF (Backus paradigm) to represent lexical and grammatical rules.

Here are some important points to keep in mind to avoid some low-level but covert programming errors.

1. Identifiers are case-sensitive, but still do not differentiate between two variables using only case differences.

2. Do not use identifiers that are keywords in other programming languages as variable names, because they are likely to be keywords for python.

3. Do not use Python to explicitly declare special-meaning identifiers as variable names (such variables typically begin with an underscore) unless you explicitly want to use that particular meaning.

4. Do not use $ and in Python? (except in strings), which are not valid characters.

5. Do not mix tab and empty Glyd to denote indentation, use only one of them, and develop a habit.

6. Integer literal constants have 10,16,8,2 several representations, if using 16, please write 0xa or 0Xa, if using 8, write 0o7 or 0o7, if you use 2, write 0b1 or 0b1; In short, do not omit the letter of the identifier, we recommend using only lowercase letters.

7. Adding an L or L after the integer literal constant is a long integer, it is recommended to use only l, because the lowercase l looks like the number 1.

8. Integers use 32-bit notation, but long integers are not limited to how many bits are represented, and in memory permitting, it can represent an integer of any size, which is not the same as the C language.

9. If the value of an integer literal constant exceeds the representation range of a 32-bit integer, then Python will automatically upgrade it to a long integer, but this may not be the case in earlier Python implementations, so if you expect a large integer value, make sure to use a long integer.

10. Floating-point literal constants are only 10 in binary form.
