2. Back referencing)What is the purpose?Back referencing)It is generally translated into "reverse reference", "Back Reference", and "Back Reference". I personally think "Back Reference" is more appropriate. It is referenced inside a regular expression.Previously captured content. For example, the following simple example aims to match the content inside the quotation marks:
# Create a matching Array
$matches
=
array
();
# Create a string
$str
=
""
This is a
'string'
""
;
# Capturing content using regular expressions
preg_match(
"/(\"|').*?(\"|')/"
,
$str
,
$matches
);
# Output the entire matching string
echo
$matches
[0];
It will output:
"This is a'
Obviously, this is not what we want.
This expression starts matching with double quotation marks at the beginning, and ends the matching incorrectly after encountering single quotation marks. This is because the expression says:("|')
, That is, double quotation marks ("
) And single quotes ('
. To fix this problem, you can use return references.Expression \ 1, \ 2 ,..., \ 9Is the serial number of each sub-content that has been captured in the previous step. It can be referenced as a "Pointer" to these groups. In this example, the first matching quotation mark is\1
.
How to use it?Replace the closed quotation marks in the above example with 1:
preg_match(
'/(\"|'
).*?\1/',
$str
,
$matches
);
This correctly returns the string:
"This is a 'string'"
Comments:
What should I do if the quotation marks are Chinese and the quotation marks are not the same character?
Remember the PHP functionpreg_replace
? There are also return references. But we didn't use \ 1... \ 9, but $1... $9... $ N (any number here) serves as a return pointer. For example, if you want to label all paragraphs<p>
Replace all with text:
$text
= preg_replace(
'/<p>(.*?)</p>/'
,
"<p>$1</p>"
,
$html
);
The $1 parameter is a return reference that represents a paragraph tag.<p>
And insert it into the replaced text. This simple and easy-to-use expression provides us with a simple way to get matched text, even when replacing text.
3. Named capture group (Named Groups)When callback references are used multiple times in an expression, it is easy to confuse things and find out the numbers (1... 9) which sub-content is very troublesome. An alternative to callback reference is to use a named capture group (hereinafter referred to as "famous group "). Use of famous groups(?P<name>pattern)
Name indicates the group name. pattern matches the regular structure of the famous group. See the following example:
/(?P<quote>"|').*?(?P=quote)/
In the above formula, quote is the group name,"|'
Is the regular expression of the matching content of the reorganization. After (? P = quote) is a famous group in the call group named quote. The effect of this Sub-statement is the same as that of the callback reference instance above, but it is implemented using a famous group. Is it easier to read and understand?
A famous group can also be used to process internal data in an array of matched content. The group name assigned to a specific regular expression can also be used as the index word of the matched content in the array.
preg_match(
'/(?P<quote>"|\')/'
, "
'String'
",
$matches
);
# The following statement outputs "'" (not including double quotation marks)
echo
$matches
[1];
# If the group name is used for calling, "'" will also be output.
echo
$matches
[
'quote'
];
Therefore, a famous group is not only easier to write code, but also used to organize code.
4. Word Boundaries)Word boundaryIt is the position between the characters in a string (including letters, numbers, and underscores, naturally including Chinese characters) and non-word characters. It does not match a real character. Its length isZero.\b
Match All word boundaries.
Unfortunately, word boundaries are ignored, and most people do not care about their practical significance. For example, if you want to match the word "import ":
/import/
Note! Regular Expressions are sometimes naughty. The following strings can also be matched with the preceding sub-statement:
important
You may think that if you add spaces before and after the import, you won't be able to match this independent word:
/ import /
If this happens:
The trader voted
for
the import
When the word import starts or ends with a string, the modified expression is still unavailable. Therefore, it is necessary to consider various situations:
/(^import | import | import$)/i
Don't worry. It's not over yet. What if there is a punctuation mark? To match the word, your regular expression may need to be written as follows:
/(^import(:|;|,)? | import(:|;|,)? | import(\.|\?|\!)?$)/i
For matching only one word, this is a little tricky. Therefore, word boundaries are significant. To meet the above requirements, andMany other variantsWith the character boundary, the code we need to write is:
/\bimport\b/
All the above situations have been solved.\b
The flexibility is that it is a non-length match. It only matches the positions imagined between two actual characters. It checks whether two adjacent characters are a single word and the other is a non-single word. If the condition is correct, a match is returned. If you encounter the start or end of a word,\b
It is treated as a non-word character. Becausei
If it is still regarded as a word character, the import will be matched.
Note:\b
We have\B
This operator matches the location between two or two non-words. Therefore, if you want to match the 'Hi' in a word, you can use:
\Bhi\B
"This" and "hight" will return a match, while "hi there" will not return a match.
5. Minimal group (Atomic Groups)Minimum GroupIs a non-capturing special regular expression group. It is usually used to improve the performance of regular expressions and to eliminate specific matches. A minimal group can be used (?> Pattern), where pattern is a matching expression.
/(?>his|this)/
When the Regular Expression Engine matches the smallest group, it skips the Backtracking position marked in the group. Take the word "smashing" as an example. When the above regular expression is used for matching, the Regular Expression Engine first tries to find "his" in "smashing ". Obviously, no matching is found. At this point, the smallest group plays a role: the Regular Expression Engine will discard all backtracking positions. That is to say, it will not try to find "this" from "smashing ". Why do we set it like this? Because "his" does not return a matching result, the "this" containing "his" cannot match any more!
The above example is not practical. We use/t?his?/
It can also achieve results. Let's take a look at the following example:
/\b(engineer|engrave|
end
)\b/
If "engineering" is used for matching, the regular engine will first match "engineer", but then it will encounter word boundaries,\b
, So the match fails. Then, the Regular Expression Engine tries to find the next Matching content in the string: engrave. When eng is matched, the matching fails. Finally, if you try "end", the result is also a failure. After careful observation, you will find that once the engineer fails to match and both reach the word boundary, the word "engrave" and "end" cannot match successfully. These two words are short compared with engineer, and the regular expression engine should not make unnecessary attempts.
/\b(?>engineer|engrave|
end
)\b/
The alternative writing method above can save the matching time of the Regular Expression Engine and improve the code efficiency.
6. Recursion (Recursion)Recursion (Recursion)Used to match nested structures, such as Arc embedding, (this (that), and HTML Tag nesting<div>
<div></div>
</div>
. We use(?R)
To represent the subpattern In the recursion process. The following is an example of matching nested arc:
/\(((?>[^()]+)|(?R))*\)/
The outermost layer uses the parentheses of the assense Operator\(
Match the beginning of the nested structure. Then there is a multi-choice Operator( * | * )
, It may match all characters except the brackets"(?>[^()]+)
", Or the sub-mode"(?R)
To match the entire expression again. Note that this operator will match as many nesting conditions as possible.
Another example of recursion is as follows:
/<([\w]+).*?>((?>[^<>]+)|((?R)))*<\/\1>/
The preceding expressions use character grouping, greedy operators, backtracking, and minimal grouping to match nested labels. First inactive arc Group([w]+)
Match the exit signature for the next application. If you find the label of this angle bracket style, try to find the remaining part of the label content. The subexpression enclosed by the next arc is very similar to the previous example: either match all characters not including angle brackets?>[^<>]+
Or recursively match the entire expression.(?R)
. The last</1>
Indicates a closed tag.
7. Callback (Callbacks)The specific content in the matching result may sometimes need some special modification. Apply multiple and complex modifications, regular expressionsCallbackIt will be useful. Callback is used for Functionspreg_replace_callback
In. You canpreg_replace_callback
Specify a function as a parameter. This function can receive the matching result array as a parameter and return the result after modifying the array.
For example, we want to convert all the letters in a string into uppercase letters. Unfortunately, PHP does not directly convert uppercase/lowercase regular operators. To complete this task, you can use the regular callback. First, the expression must match all uppercase letters:
/\b\w/
Both the word boundary and character class are used. This formula is not enough. We need a callback function:
function
upper_case(
$matches
) {
return
strtoupper
(
$matches
[0] );
}
Functionupper_case
Receives an array of matching results and converts the entire matching result to uppercase. In this example,$matches[0]
Indicates the letters to be capitalized. Then we usepreg_replace_callback
Implement callback:
preg_replace_callback(
'/\b\w/'
,
"upper_case"
,
$str
);
A simple callback has such powerful power.
8. Commenting)NoteIt is not used to match strings, but it is indeed the most important part of regular expressions. The deeper the regular expression, the more complicated the writing, and the more difficult it is to deduce exactly what is matched. Adding comments to the regular expression is the best way to minimize future confusion and confusion.
Add comments to the Regular Expression and use(?#comment)
Format. Replace "comment" with your comment statement:
/(? # Number) \ d/
If you want to make the code public, it is especially important to add comments to the regular expression. This makes it easier for others to understand and modify your code. Similar to comments on other occasions, this can also facilitate your re-access to previously written programs.
Consider using "x" or "(? X) "modifier to format comments. This modifier allows the regular engine to ignore spaces between expression parameters. "Useful" spaces can still be passed[ ]
Or\s
, Or\
(Adding spaces to the assignees.
/
\d #digit
[ ] #space
\w+ #word
/x
The above code serves the same purpose as the following formula:
/\d(?#digit)[ ](?#space)\w+(?#word)/
Always pay attention to the readability of the Code.
More resources)
- Regular-Expressions.info Comprehensive website on regular expressions
- Cheat SheetInformative regular expressions cheat sheet
- Regex GeneratorJavaScript regular expressions generator
About the author
Karthik viswan.pdf is a high school student who enjoys programming and website preparation. You can view his work: Lateral Code on his blog. You can also take a look at his online Twitter application.