Chapter 2 syntax and code conventions
This chapter describes the syntax and code conventions of the python program. The topics in this chapter include row structures, statement groups, reserved words, strings, operators, and tokens. In addition, we provide a detailed description of how to use Unicode strings.
1.1. Row structure/indentation
Each statement in the program ends with a line break. A particularly long statement can be divided into several short rows using the line break (/), as shown in the following example:
import matha = math,cos(3*(x-n)) + / math,sin(3*(y-n))
When you define a three-byte string, list, tuple, or dictionary, you do not need to use line breaks to separate statements. And that is to say, in the program, All parentheses (,), square brackets [,], curly brackets {,,,} and the parts in the three quotation marks string do not need to use the line break.
Indentation is used to indicate different code blocks, such as the main code block of a function, conditional Execution code block, cyclic body code block, and class definition code block. The number of indentation spaces (tabs) can be arbitrary, but the indentation in the entire block must be consistent:
Toggle line numbers
1 If A: 2 statement1 # consistent indentation, correct! 3 statement2 4 else: 5 statement3 6 statement4 # inconsistent indentation, Error!
If there are only a few statements in the block, you can place them in the same line: If a: statement1else: statement2 to indicate an empty block or an empty body. Use the pass statement: if a: passelse: Statements
Although tabs are allowed to indicate indentation, it is a bad habit. Do not mix tabs and spaces to indent, which will bring unexpected troubles. We recommend that you use a single tab or two or four spaces in each indentation level. Use the-t parameter when running python. If Python finds that there is a mix of tabs and spaces, it will display a warning message. If the-TT parameter is used for pythonTaberrorException.
A semicolon (;) can be used to place multiple statements in the same row. A semicolon can be used to end a row with only one statement.
# Indicates that this is a comment extended to the end of the line, but the # In the string does not have this function.
The interpreter ignores all blank rows (in non-interactive mode ).
1.2. identifier and reserved words
An identifier is the name used to identify variables, functions, classes, modules, and other objects. An identifier can contain letters, numbers, and underscores (_), but must start with a non-numeric character. The letters only contain the A-Z and A-Z in the ISO-Latin character set. Identifiers are case sensitive, so Foo and foo are two different objects. Special symbols, such as $, %, and @, cannot be used in identifiers. In addition, words such as if, else, And for are reserved words and cannot be used as identifiers. The following table lists all reserved characters:
and elif global orassert else if passbreak except import printclass exec in raisecontinue finally is returndef for lambda trydel from not while
The identifier that begins or ends with the following dashes. For example, an identifier starting with an underscore (for example_ Foo) Cannot be imported using the from module import * Statement. There are two underlines before and after the identifier, such_ Init __Is retained by special methods. The front side has two underline identifiers, such_ BARIs used to implement class private attributes, which will be discussed in chapter 7 class and object-oriented programming. Generally, similar identifiers should be avoided.
1.3. Numbers/Text
Python has four built-in numeric types: integer, long integer, floating point, and plural.
A number such as 1234 is parsed into a decimal integer. To specify an octal or hexadecimal integer, add 0 before a valid octal number or add 0x before a valid hexadecimal number. (For example, 0644 and 0x100fea8 ). After an integer is appended with the letter L or l, the system considers this as a long integer (such as 1234567890l ). Different from the integer type limited by the machine font length, the long integer can be any length (only limited by the memory size ). Numbers such as 123.34 and 1.2334e + 02 are parsed as floating point numbers. An integer or floating-point number with the suffix J or J constitutes the virtual part of a complex number. You can use a real number to add a virtual part to create a complex number, such as 1.2 + 12.34j.
Python currently supports two types of strings:
8-character data (ASCII)
16-bit wide character data (UNICODE)
The most common character is an ASCII string, because this character set uses only one byte to allow any character in the character set. Generally, ASCII strings are enclosed in single quotation marks ('), double quotation marks ("), or three quotation marks (''' Or """. The quotation marks before and after the string must be of the same type. A backslash (/) is used to escape special characters, such as line breaks, backslash itself, quotation marks, and other non-printable characters. Table 2.1 lists the expressions of accepted special characters. unidentifiable escape strings are retained as they are (including front-side backslash ). In addition, strings can contain embedded NULL bytes and binary data. A three-byte string can contain line breaks and quotation marks that do not need to be escaped.
Table 2.1 Standard character escape codes
Standard special characters |
Character |
Description |
/ |
Line feed |
// |
Backslash |
/' |
Single quotes |
/" |
Double quotation marks |
/ |
Bell) |
/B |
Escape Character |
/E |
Escape |
/0 |
Null) |
/N |
Line Break, equivalent to/x0a and/cj |
/V |
Vertical tab, equivalent to/x0b and/ck |
/T |
Horizontal tab, equivalent to/x09 and/Ci |
/R |
Carriage Return, equivalent to/x0d and/cm |
/F |
Page Break, equivalent to/x0c and/Cl |
/Ooo |
Octal value (000-377) |
/Xhh |
Hexadecimal value (x00-xff) |
/UN |
Unicode character value. N is a Unicode character represented by four hexadecimal numbers. |
A Unicode string represents a multi-byte international character set. It contains 65,536 characters. Unicode characters are defined by the U or U prefix, for example, 'a = u "hello "'. In the Unicode Character Set, each character is represented by a 16-digit integer. Unicode characters are expressed in the format of U + XXXX. XXXX is a hexadecimal number consisting of four hexadecimal numbers. (Note: This method is just a habit of representing Unicode characters, not the python syntax ). For example, U + 0068 is the Unicode Letter h (in the Latin-1 character set, you can find that the first 256 characters in the Unicode Character Set are exactly the same as the corresponding character encoding of the Lation-1 ). When a unicode string is assigned a value, both common and special characters are directly converted to the Unicode Character ordinal number (in [U + 0000, U + 00FF ). For example, when the string "Hello/N" is mapped to ASCII: 0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x0a, when u "Hello/N" is used to convert to a unicode string: U + 0068, U + 0065, U + 006c, U + 006c, U + 006f, U + 000a. any UNICODE character can be defined by/uxxxx./uxxxx must be in a unicode string. For example: S = u "/u0068/u0065/u006c/u006c/u006f/u000a" in earlier versions of Python, the/XXXXX byte sequence is used to define Unicode characters (this is related to the way the system recognizes Unicode characters ). Although you are still allowed to do so, I suggest you use a new representation. (Because the old representation method may be abolished at any time .) In addition, the octal code/OOO can be used to define Unicode characters in [U + 0000, U + 01ff. Unicode characters cannot be defined by using the original byte sequence in UTF-8 or UTF-16 encoding. For example, the seven characters created by the UTF-8-encoded string u 'm/303/274 ler' are represented as + 004d, U + 00c3, U + 00bc, U + 006c, U + 006c, U + 0065, U + 0072, this is not the result you want. This is because in the UTF-8, multibyte sequence/303/274 is used to represent U + 00fc, instead of U + 00c3, U + 00bc. For more information about unicode encoding, see Chapter 3-"types and objects", Chapter 4-"operators and expressions", and chapter 9-"Input and Output ". you can add the prefix R or R to a string, for example, 'R'/n/"''. These strings are called original strings, because almost all the special characters in it are left intact. However, the original string cannot end with a separate backslash (for example, r "/"). If the original string is defined with the UR or ur prefix,/uxxxx will still be parsed as Unicode characters. If you don't want this, you can add a backslash (for example, ur "// u1234") in front of it, which defines a string containing 7 characters. Note that when defining the original Unicode string, R must be after U. Adjacent strings (separated by spaces or line breaks). For example, "hello" 'World' is automatically linked to a string "helloworld" by python ". Both common, Unicode, and natural strings are automatically linked. Of course, as long as one of these strings is a unicode string, the final link result will also be a unicode string. For example, "S1" U "S2" generates U "s1s2 ". For details about this process, see Chapter 4 and Appendix A (the python library). If Python runs under the-u command line parameters, all characters will be parsed to Unicode. Square brackets [...] define a list with parentheses (...) define a tuple, curly braces {...} define a dictionary: A = [1, 3.4, 'Hello'] # A List B = (10, 20, 30) # A tuple c = {'A': 3, 'B': 42} # a dictionary
1.4. Operators, delimiters, and special symbols
Python currently supports the following operators:
+ - * ** // / % << >> & | ^+= -= *= **= //= /= %= <<= >>= &= |= ^=~ < > <= >= == != <>
The following can be used as expressions, lists, dictionaries, and separators of different statements:
( ) [ ] { } , : . ` = ;
For example, equal sign (=) is used as the separator between the object name and the assigned value; comma (,) is used to separate elements in function parameters, lists, or tuple; decimal point (.) omitted characters used in floating point and extended slice operations (...),
The following special symbols are also used in the statement:
'"#/@
Note: The @ symbol is added in Python 2.4 as the function modifier --- Wei Zhong
Character $ ,? It cannot appear in program statements, but it can appear in strings.
1.5. Document string
If the first statement of a module, class, or function is not named as a string, the string automatically becomes the document string of the object.(Docstrings), For example:
Toggle line numbers
1 def fact(n): 2 "This function computes a factorial" 3 if (n <= 1): return 1 4 else: return n*fact(n-1)
Document strings are often used in code browsing and document generation tools. By accessing an object's_ Doc __Properties, you can get the document string:
The _ Doc _ attribute is writable. -- Weizhong
>>> print fact._ _doc_ _This function computes a factorial>>>
The indentation of the document string must be consistent with other statements in the definition. In addition, multiple unnamed strings appearing in different rows are not automatically linked into one string, even if they are next to each other. (Note: The returned document string is only the first string, which is different from the automatic link of the string mentioned in the front ).