Lexical Analysis: analyzes whether a word composed of characters is valid. If no problem exists, a word flow is generated.
Syntactic Analysis: analyzes whether a sentence composed of words is legal. If no problem exists, a syntax tree is generated.
When the lexical analyzer analyzes the source code text, there must be a clear concept:
1. Physical line: A physical line consists of a string ending with a carriage return Character Sequence (cr lf on Windows and LF on Unix.
2. logical row: A logical row is composed of one or more physical rows. You can use a backslash (\) to connect multiple physical rows to form a logical row, the expressions in brackets can span multiple physical rows, but are treated as a logical row.
The Lexical analyzer is oriented to logical lines. That is to say, for the lexical analyzer, only a logical line is regarded as a line. It generates the word NEWLINE or token only at the end of the logical line.
For each INDENT, no matter how many blank characters it has, the lexical analyzer will generate only one INDENT word or token, representing the first level of indentation. Whenever the first level of indentation is to be withdrawn, the Lexical analyzer generates a DEDENT word or token. Note that the word "DEDENT" does not correspond to one or more characters. It is a complete logical concept.
Python uses a slightly modified BNF (bacos paradigm) to represent lexical and syntax rules.
Next we will focus on some important points to avoid some low-level but hidden programming errors.
1. identifiers are case-sensitive, but do not use case-insensitive identifiers to distinguish two variables.
2. Do not use identifiers that are keywords in other programming languages as variable names, because they are likely to become Python keywords.
3. Do not use Python to explicitly declare variables with special meanings as variable names (such variables generally start with an underscore) unless you are clear about the special meanings you want to use.
4. Do not use $ and? in Python? (Except for strings), which are not valid characters.
5. Do not mix tabs and spaces to show indentation. use only one of them and get used to it.
6. the literal constants of integers include 10, 16, 8, and 2. If hexadecimal is used, enter 0xa or 0Xa. If hexadecimal is used, enter 0o7 or 0O7; if the binary format is used, enter 0b1 or 0B1. In short, do not omit the letter that marks the hexadecimal format. We recommend that you use only lower-case letters.
7. Add an l or L after an integer literal constant to form a long integer. We recommend that you use only L, because lowercase l looks like a number 1.
8. The integer is represented by 32 bits, but the long integer is not limited to many bits. When the memory permits, it can represent any integer of any size, which is different from the C language.
9. if the value of an integer literal constant exceeds the 32-bit integer range, Python automatically upgrades it to a long integer, but this may not be the case in earlier Python implementations, therefore, if you expect a large integer, use a long integer.
10. Floating Point literal constants are only in the decimal form.