Example: Assume that/Fred. + Barney/is used for matching on Fred and Barney went bowling last night. We know the regular expression.
Next we will explain this process in detail: first, the sub-mode Fred will match the corresponding string. The next part of the mode is. +, which will match any character except the line break. The number of times is greater than or equal to one. However, since the plus sign (+) is greedy, it will match as much as possible. Therefore, it will match all the remaining strings, including night.
Currently, banrey is matched, but it cannot be successful because it has reached the end of the string. Because. + can still match successfully if there is only one character missing, it returns t, the last letter of the string. (It is greedy, but it hopes that the entire pattern can be matched successfully .) The sub-mode Barney tries to match again, and the result is still invalid. Therefore, the. + character is returned to the letter H and matched again. A character is followed by a character, and. + returns the matched character until it returns the string Barney. Finally, the sub-mode banrey is matched, and the entire mode is now matched.
As shown in this example, these operations cause a large number of rollback operations because these quantifiers match too many strings.
Therefore, a non-greedy method is required for each greedy quantizer. Instead of using the plus sign (+), instead of using non-Greedy quantifiers + ?, It matches once or multiple times (the meaning of the plus sign), but it matches as few times as possible rather than as many times as possible. Now let's take a look at the mode:/Fred. +? Barney/hour process:
First, Fred will be matched. Next, the next part of the mode is. + ?, The number of matched characters is not greater than 1, so it matches spaces after Fred. The next sub-mode is banrey, which cannot be matched here (because the current position is and Barney... ).. +? Then match a, and the remaining pattern continues to match. Again, Barney cannot match, so. +? Then match N, and so on. When. +? After matching the five characters, Barney can be matched. Now the pattern match is successful.
There are also some rollback operations here, but since the engine only needs to roll back and only try a few times, the speed will be greatly improved. However, this improvement relies on banrey to be found near Fred. If Fred is at the beginning of the string and Barney is at the end, the greedy quantifiers are faster. Therefore, the speed of a regular expression depends on the specific data.
Non-Greedy quantifiers are not only related to efficiency. Even if it can match (or cannot match) the same string with its corresponding greedy quantifiers, their matching parts may be different. For example, if you have some HTML text, you want to remove the <bold> and </bold> tags and keep the content in them. Below is the text:
I'm talking about the cartoon with Fred and <bold> Wilma </bold>!
The following is a method to remove a tag. What are its errors?
S # <bold> (. *) </bold> #$1 # G;
The problem is that asterisks are greedy. If the text is changed to the following, what will happen?
I thought you said Fred and <bold> Velma </bold>, not <bold> Wilma </bold>
In this case, the pattern matches the content from the first <bold> to the last </bold>, and the middle part is retained. Oh! We need non-Greedy quantifiers. What is the non-Greedy type of asterisk *?, Therefore, this mode should be:
$ # <Bold> (.*?) </Bold> #$1 # G;
Now it can be correctly executed.
Because the non-Greedy type of the plus sign is + ?, *?, You may have realized that the types of the remaining two quantifiers are similar. The non-Greedy type of curly braces looks the same, but there is a question mark after the curly braces are closed, such as {5, 10 }? Or {8 ,}?. Even question mark quantifiers have non-Greedy types :??. It matches once or 0 times, but tends to match 0 times.