Re.compile.findall principle is understood, but the output is not understood (mainly by adding the regular expression of the parentheses group)
At first I did not understand the parentheses of the group and capture, see the online example (below), but it seems to be unclear whether the law of the brackets (or I did not find or I understand the ability is too poor), or do not see the rules of the brackets, so more try (the second big picture), and finally summed up the law.
is to try out the rules for grouping parentheses, here's a summary
Let's start with the last match.
Analysis : The first is the order of matching, the analysis of some parentheses, temporarily remove the other parentheses, easy to read
The first step is to match the rules in the whole, the whole match, first the parentheses (easy to read), that is, to first match the first "\w+\w+\s+\w+\s+\w+" from S (the look of the parentheses), but because there is no parentheses to expand the whole, so there is no capture (that is, no output), The first large string to match is "QEW rty UiO"
"Can be a corresponding"
\w+ \w+ \s+ \w+ \s+ \w+
| | | | | |
QE W rty UiO
corresponding diagram
The second step, matched to the string to match the capture, that is, the output, now from left to right, one by one left parenthesis, the first left parenthesis "(\w+\w+\s+\w+)" (temporarily removed nested in the middle of the opening parenthesis, easy to read), then match to the above string ("QEW rty UiO") "QEW Rty "(can be compared to the corresponding figure above), because it is in parentheses, so the capture (ie output)
The third step, the second parenthesis, \w+ (\w+) \s+\w+ (temporarily remove the other brackets) matches the string in the previous parenthesis ("QEW rty"), that is, match to ' W ' (can be compared to the corresponding graph above), because it is in parentheses, so the capture (that is, the output)
The fourth step, the third parenthesis, \w+\w+\s+\w+ (\s+\w+) (temporarily removing the other brackets) matches and outputs the string in the first step, which is "UiO"
Summary :
1, first of all to match the parentheses, draw the corresponding diagram, so clear, and then look at the parentheses in the capture output, and then in the matching text (s) and then look for the next matching large string, have been looking down ...
2, the parenthesis is to clear the analysis, the main note from the first left parenthesis start analysis
3. In the case of nested parentheses, such as ((((a) b) (c) d), to catch the character of a parenthesis, you first need to match the outermost parentheses and then slowly match the contents of the D parenthesis, then the contents of the B brackets inside the d brackets, and the parentheses in the B brackets. Then all the parentheses, the output, are arranged in the string matching the first parenthesis on the left: (d,b,a,c)
If there is any mistake, please correct me in time, thank you!
Python crawler note re.compile.findall ()