I personally understand regular expressions-Lazy matching and regular expression matching
Problem description
Link: http://www.hcoding.com /? P = 130
When you are new to regular expressions, you have a question. For example, you need to match the character between the first pair of "_" in the string "_ abc_123, when I first learned the regular expression, I wrote "/_ \ w * _/". The matching result is "abc_123" instead of "abc". The question mark is added, "/_ \ w *? _/". At this time, the matching result is" abc ".
We know '? 'When used separately, it indicates that the repeat is zero or once, and when '? 'The occurrence following the repeated delimiters serves as a lazy match, that is, matching as few characters as possible. Lazy qualifier description:
- *? : Repeat any time, but as few as possible
- +? : Repeat once or more times, but as few as possible
- ?? : Repeated 0 or 1 times, but as few as possible
- {N, m }? : Repeat n to m times, but as few as possible
- {N ,}? : Repeated more than n times, but as few as possible
Right, "as few duplicates as possible", which is a rough and straightforward explanation of lazy matching.
So how can we get "as few duplicates as possible? We can explain this by ignoring the quantifiers in the regular expression.
Ignore quantifiers
Quantifiers "*? "," +? ","?? "," {N, m }? "," {N ,}? "All are ignored priority quantifiers. Which of the following is used to ignore priority quantifiers? , +, *, {} Are added after? In the format. For example, 'AB ?? If 'Abb 'is matched, 'A' instead of "AB" is obtained ". After the engine matches a successfully, the engine first chooses not to match B because it ignores the priority. Then, it checks the expression and finds that the expression is complete. Then, the engine directly reports that the matching is successful. For more information, see the following example.
Example
Or in the above example, use "/_ \ w *? _/"Matches the character between the first pair" _ "in" _ abc_123.
After the first '_' is matched, '\ w *? 'First, it is determined that no character needs to be matched, because it ignores the priority quantifiers, and then the expression'/_ \ w *?_/'Second'_'(' \ W *? '_') And target string '_A'A' in bc_123 _ 'matches and the matching fails.' \ w * is used only when the matching fails *? 'Try unmatched branches (use \ w to match a, and try to match a successfully)
In the next step, do you want to try matching or ignore it? Because '\ w *? 'Indicates that the priority quantifiers are ignored. If yes, the previous step is repeated.' _ 'fails to match B.' \ w *? 'Try the unmatched branch AB. After the above steps are repeated three times in total (until the expression '\ w *? '_' After '_' matches the second '_' of the target string.) then, 'abc' is matched '.
Process (after matching the first ):
- Expression/_ \ w *?_/'Second'_'And target string '_A'A' in bc_123 _ 'matches, and the matching fails.' \ w *? 'Try to match the target string '_A'A' in bc_123 _ '. The match is successful.
- Expression/_ \ w *?_/'Second'_'And target string' _B'B' in c_123 _ 'matches. Matching fails.' \ w *? 'Try to match the target string '_AB'AB' in c_123 _ '. The match is successful.
- Expression/_ \ w *?_/'Second'_'And target string' _ ABC_ 'C' in 123 _ 'matches, and the matching fails.' \ w *? 'Try to match the target string '_Abc'Abc' in _ 123 _ '. The match is successful.
- Expression/_ \ w *?_/'Second'_'And target string' _ abc_The '_' in 123 _ 'matches, the match is successful, and the match ends. The result is abc.
The above is the idea of reading the section "proficient in regular expressions" about ignoring quantifiers. If you are not willing to accept your suggestions, thank you!
Link: http://www.hcoding.com /? P = 130
Original article. For more information, see JC & hcoding.com.