The key concepts behind regular expression advanced techniques
The original English text comes from smashing Magazine. Translated by Woole. Reprint please indicate the source.
Regular Expressions (Regular Expression, abbr. Regex) is powerful and can be used to find the information needed in a large string of words character. It takes advantage of the conventional character-structure expressions to function. Unfortunately, simple regular expressions are not nearly as powerful for some advanced applications. The structure of the filter is more complex, and you may need to use an advanced regular expression.
This article introduces you to the advanced techniques of regular expressions. We have selected eight commonly used concepts, with example parsing, each of which is a simple way of satisfying a complex requirement. If you have a lack of understanding of the basic concepts of regular, please read this article, or this tutorial, or Wikipedia entry.
The regular syntax here applies to PHP and is compatible with Perl.
1. Greed/laziness
All the regular operators that can be qualified more than once are greedy. They match the target string as much as possible, which means the result will be as long as possible. Unfortunately, this practice is not always what we want. Therefore, we add the "lazy" qualifier to solve the problem. Add "?" after each greedy operator Allows an expression to match only the shortest possible length. In addition, the modifier "U" can also be inert to operators that can be qualified more than once. Understanding the difference between greed and laziness is the basis for using advanced regular expressions.
Greedy operator
The operator * matches the previous expression 0 times or more than 0 times. It is a greedy operator. Take a look at the following example:
Preg_match ("/
Period (.) can represent any character other than a line break. The regular expression above matches the H1 label and all content within the label. It uses a period (.) and an asterisk (*) to match all content within the label. The results are as follows:
The entire string is returned. The * operator will match everything-even the middle H1 closing tag. Because it is greedy, matching the entire string is in line with its interests maximization principle.
Lazy operator
Make the expression lazy by slightly modifying the formula above and adding a question mark (?):
/
It would feel that the task would be complete only by matching the tag at the end of the first H1.
Another greedy operator with similar attributes is {n,}. It represents the previous match pattern repeat n or n times above, if not add a question mark, it will look for as many repetitions as possible, plus, it will be as little as possible (of course, "Repeat n times" the least).
# Build Strings
$str = "Hihihi oops hi";
# Use greedy {n,} operators to match
Preg_match ("/(HI) {2,}/", $str, $matches); # Matches[0] will be "Hihihi"
# using the aborted {n,}? Operator matching
Preg_match ("/(HI) {2,}?/", $str, $matches); # Matches[0] will be "Hihi"