The greedy and non-greedy patterns of regular Expressions (overview) _ Regular Expressions

Source: Internet
Author: User
1 overview
Greed and non-greedy mode affect the matching behavior of the subexpression modified by the quantifier, and the greedy pattern matches as many as possible on the premise that the whole expression matches successfully, and not the greedy pattern matches the success of the whole expression, as little as possible. Non-greedy mode is supported only by some NFA engines.

Quantifiers, which belong to greedy patterns, are also called matching precedence quantifiers, including:

' {m,n} ', ' {m,} ', '? ', ' * ' and ' + '.

In some languages that use NFA engines, adding "?" after matching the precedence classifier, which becomes a quantifier that belongs to a non greedy pattern, is also called ignoring the precedence classifier, including:

"{m,n}?", "{m,}?", "??", "*?" and "+?".

From the point of view of the regular grammar, the subexpression modified by the matching precedence classifier uses the greedy pattern, such as "(Expression) +"; The subexpression modified by the ignored precedence classifier uses a non greedy pattern, such as "(Expression) +?".

For greedy mode, the name of a variety of documents are basically consistent, but for the non-greedy mode, some lazy mode or inertia mode, and some called reluctantly mode, in fact, what is called, as long as the principle and use, can be used freely that is. Personal habits Use the term greedy and not greedy, so the article will use this name to introduce.

2 greedy and non-greedy pattern matching principle
For greedy and non-greedy mode, can be applied and the principle of two points to understand, but if you want to really grasp, or from the principle of matching to understand.

First, from an application perspective, answer "what is greed and non-greedy mode?" ”

2.1 Analysis of greedy and non-greedy patterns from the perspective of application
2.1.1 What is greed and non-greedy patterns
First look at an example

Example:

SOURCE string: AA<DIV>TEST1</DIV>BB<DIV>TEST2</DIV>CC

Regular expression a:<div>.*</div>

Match result a:<div>test1</div>bb<div>test2</div>

Regular Expression two:<div>.*?</div>

Match result two:<div>test1</div> (this refers to a match result, so does not include <div>test2</div>)

According to the above example, from the matching behavior analysis, what is greedy and not greedy mode.

A regular expression uses a greedy pattern, the entire expression can be matched successfully when matching to the first "</div>", but because of the greedy pattern, still try to match to the right to see if there is a longer string that can be successfully matched to the second "</ After "div>", there is no substring that can be successfully matched to the right, and the match ends with the result "<div>test1</div>bb<div>test2</div>". Of course, the actual matching process is not the case, the following matching principle will be described in detail.

Only from the application point of view, you can think that greedy mode, is the entire expression matching the premise of success, as much as possible matching, that is, the so-called "greed", popular point, is to see what you want, how many to pick up, unless there is no longer want.

Regular Expression II adopts the non-greedy mode, in the match to the first "</div>" to make the entire expression match successfully, because the use of the greedy mode, so the end of the match, no longer to the right to try, the match result is "<div>test1</div>".

Only from the perspective of application, can think so, not greedy mode, is in the whole expression match the premise of success, as little as possible matching, that is, the so-called "not greedy", popular point, is to find a want to pick up on the line, as to whether there is no not to pick up on the matter.

2.1.2 on the prerequisites
In the above analysis of greedy and non-greedy patterns from the application perspective, a prerequisite is always mentioned is "the whole expression matching success", why should we emphasize this premise, we look at the following example.

Regular expression three: <div>.*</div>bb

Match result three: &LT;DIV&GT;TEST1&LT;/DIV&GT;BB

Modify the "." is still the matching priority classifier "*", so here is the greedy mode, the front "<div>.*</div>" can still be matched to "<div>test1</div>bb<div>test2 </div>, but because the "BB" behind does not match successfully, then "<div>.*</div>" must give up the matching "bb<div>test2</div>", To make the entire expression match successfully. The result of the entire expression is "<div>test1</div>bb", and "<div>.*</div>" matches the content "<div>test1</div>". It can be seen that, in the premise of "the whole expression match succeeds", the greedy mode really affects the matching behavior of the subexpression, if the whole expression match fails, the greedy pattern will only affect the matching process, and the effect of the matching result is not discussed.

non-greedy patterns also have the same problem, see the following examples.

Regular expression four: &LT;DIV&GT;.*?&LT;/DIV&GT;CC

Match result four: &LT;DIV&GT;TEST1&LT;/DIV&GT;BB&LT;DIV&GT;TEST2&LT;/DIV&GT;CC

This is a non-greedy pattern, the previous "<div>.*?</div>" is still matched to "<div>test1</div>", at which point "CC" cannot match successfully, requiring "<div >.*?</div> "must continue to try to match to the right until the" CC "after the match is" <div>test1</div>bb<div>test2</div> " In order to match successfully, the whole expression matches successfully, and the matching content is "&LT;DIV&GT;TEST1&LT;/DIV&GT;BB&LT;DIV&GT;TEST2&LT;/DIV&GT;CC", where "<div>.*?</ Div> "matches the" <div>test1</div>bb<div>test2</div> "content. It can be seen that, in the premise of "the whole expression match succeeds", the non greedy mode really affects the matching behavior of the subexpression, if the whole expression match fails, the non greedy mode cannot affect the matching behavior of the subexpression.

2.1.3 Greed or not greed--the choice of application
Through the application of the analysis, has been a basic understanding of the greedy and non-greedy model of the characteristics of the actual application, whether to choose the greedy mode, or not greedy mode, which should be determined according to demand.

For some simple requirements, such as the source character "Aa<div>test1</div>bb", then get the div tag, using greedy and non-greedy mode can achieve the desired results, which may not be very related to the use of.

However, in the case of 2.1.1, in practical applications, only need to get a pair of div tags, that is, not greedy mode to match the content, greedy mode to match the content is usually not what we need.

Then why is there a greedy mode of existence, from the application point of view is difficult to give a satisfactory answer, this need from the point of view of the matching principle to analyze the greedy and non-greedy mode.

2.2 Analysis of greedy and non-greedy patterns from the angle of matching principle
If you want to really understand what is greedy mode, what is the greedy mode, under what circumstances, respectively, how efficient, it can not only from the application point of view, but to fully understand the greedy and non-greedy pattern matching principle.

2.2.1 from basic matching principle
NFA Engine Basic Matching principle reference: Regular basis of--NFA engine matching principle.

This paper mainly focuses on the matching principle involved in greedy and non greedy patterns. Let's look at the simple matching process of greedy patterns.

SOURCE string: "Regex"

Regular expression: ". *"


Figure 2-1

Note: In order to be able to see the clear matching process, the above gap remains larger, the actual source string is "Regex", the same below.

Take a look at the matching process. First, the first "" "to obtain control, matching the position of the 0-bit" ", matching the success of control to the". * ".

". *" after obtaining control, because "*" is a matching priority quantifier, in the case of matching can not match, the preference for a match. Try to match from "R" at position 1, match succeeded, continue to match right, match the "E" at position 2, the match succeeds, continues to match to the right until the match to the end of "" ", the match succeeds, because at this time has been matched to the end of the string, so". * "End match, the control to the regular expression last" " ”。

"" "after obtaining control, because the match has failed at the end of the string, look forward to the state of backtracking, control is given to". * "by". * "to make a character, that is," "at the end of the string, and then give control to the final" "of the regular expression, by" "" matching the end of the string. " ", the match was successful.

At this point the entire regular expression matches successfully, where the ". *" Match is "Regex", and a backtracking is made during the match.

Next look at the simple matching process of the non greedy pattern.

SOURCE string: "Regex"

Regular expression: ". *?"




Figure 2-2

Look at the process of matching non greedy patterns. First, the first "" "to obtain control, matching the position of the 0-bit" ", matching the success of control to the". *? ".

“.*?” After gaining control, because of "*?" is to ignore the precedence quantifier, in case the match can not match, the priority attempt does not match, because "*" is equivalent to "{0,}", so in ignoring the priority, you can not match anything. Try to ignore the match from position 1, that is, not match anything, and give control to the last "" "of the regular expression.

"" "After obtaining control, try to match from position 1, by" "" "" Match position 1 "R", match failed, look forward to the status of backtracking, control to ". *?", by ". *?" Eat a character, match the position 1 "R", and then give control to the regular expression of the last "".

After obtaining control, try to match from position 2, by "" "Match position 1" E ", match failed, look forward to the state of backtracking, repeat the above process until the". *? " Match to "X" and then give control to the final "" "of the regular expression.

After the control is obtained, an attempt is made to match from position 6, by "" to match the last "" of the string, and the match succeeds.

At this point the entire regular expression matches successfully, where ". *?" The match was "Regex", and five backtracking was made during the match.

2.2.2 Greed or not greed--the choice of matching efficiency
Through the analysis of the matching principle, we can see that, in the case of matching success, greedy mode has less backtracking, and backtracking process, need to control the handover, let out the matching content or match the unmatched content, and try to match, to a large extent reduce the matching efficiency, so greedy mode and non-greedy mode, There is an advantage in matching efficiency.

But the example in 2.2.1, just a simple application, readers see here, whether there will be such a doubt, greedy mode is certainly more efficient than the non-greedy pattern matching? The answer is in the negative.

Example:

Requirements: Gets the substring from two "", which can no longer contain "".

Regular expression one: ". *"

Regular expression two: ". *?"

Situation one: When greedy patterns match more unwanted content, there may be more backtracking than non greedy patterns. For example, the source string is "the word" Regex "means regular expression."

Situation Two: Greedy mode can not meet the demand. For example, the source string is "the phrase" regular expression ' is called ' Regex ' for short. "

For case one, the regular expression of a greedy pattern, ". *" will always match to the end of the string, control to the final "", the match is unsuccessful, then backtrack, due to multiple matching content "means regular expression." Far more than the need to match the content itself, so using regular expressions for a while, the matching efficiency is less than the use of regular expression two of the non greedy mode.

For the situation two, the regular expression one match to is "the regular expression" is called "the Regex", even if the demand is not satisfied, naturally also does not have any matching efficiency high and low.

The above two kinds of situation is universal, then is not to satisfy the demand, but also takes into account the efficiency, can only use the not greedy pattern? Of course not, according to the actual situation, change matching priority quantifier modified subexpression, not only can satisfy the demand, but also can improve the matching efficiency.

SOURCE string: "Regex"

Give regular expression three: "[^"]* "

Look at the matching process for regular expression three.


Figure 2-3

First from the first "" "to obtain control, match the position of 0-bit" ", match the success, control power to" [^ "]*".

"[^"]* "after gaining control, because" * "is a matching priority quantifier, in the case of matching can not match, the first attempt to match. Try to match from "R" at location 1. The match succeeds, continues to match to the right, matches the position 2 "E", the match succeeds, continues to match to the right, until matches to "X", the match succeeds, then matches the end "" ", the match fails, will control to the regular expression final" "".

"" "" "at the end of the match string after the control is obtained, the match succeeds.

The entire regular expression was successfully matched, where "[^"]* "matches" Regex "and no backtracking was made during the match.)

The child expression decorated with quantifiers is replaced by a range of "." With an excluded character set "[^]", which is still greedy, and perfectly solves the problem of demand and efficiency. Of course, because this matching process does not backtrack, so there is no need to record backtracking state, so you can use the curing group, the positive to do further optimization.

Give the regular expression four: "(? >[^"]*)

Solidified groupings are not supported by all languages, such as. NET support, which Java does not support, but in Java it can be replaced by a simpler possessive classifier: "[^]*+."

3 greed or non-greedy mode--on matching efficiency
In general, greed and non-greedy patterns, if quantifiers are decorated with the same subexpression, such as ". *" and ". *", their application scenarios are usually different, so the efficiency is generally not comparable.

As for changing the subexpression of quantifier modification to satisfy the requirement, such as ". *" instead of "[^"]*), because the modified subexpression is different and does not have direct contrast. But in situations where the same subexpression can satisfy a requirement, such as "[^"]* "and" [^ "]*?"), greedy patterns are usually more efficient to match.

At the same time there is also the fact that the non-greedy mode can be implemented, through the optimization of quantifiers modified by the expression of the greedy model can be implemented, and greedy mode can achieve some of the optimization effect, but may not be not greedy mode can be achieved.

Greedy mode also has the advantage that when the match fails, greedy mode can report the failure more quickly, thereby improving the matching efficiency. The following is a comprehensive review of the matching efficiency of greedy and non-greedy patterns.

3.1 Efficiency improvement--evolution process
After understanding the rationale for the matching of greedy and non-greedy patterns, let's take a look at the evolutionary process of regular efficiency promotion again.

Requirements: Gets the substring from two "", which can no longer contain "".

SOURCE string: The phrase "regular expression" is called "Regex" for short.

Regular expression one: ". *"

A regular expression matches a "regular expression" is called "Regex" and does not meet the requirements.

Put forward the regular expression two: ". *?"

First "" "" "to obtain control, from the position of 0 start to try to match, until the position 11 match successfully, control to". *? ", matching process with the 2.2.1 of non-greedy pattern matching process. “.*?” The match was "Regex", and four backtracking was made during the match.

How to eliminate the loss of the matching efficiency caused by backtracking is to use a smaller range of subexpression, adopt greedy mode, and propose regular expression three: "[^"]* "

First "" "" "to obtain control, starting from the position of 0 to try to match, until the position 11 match successfully, control to" [^ "]*, matching process with the 2.2.2 section of the non-greedy pattern matching process. "[^ ']*" matches the "Regex", and no backtracking is performed during the match.

3.2 Efficiency improvements-faster reporting failures
The above discussion is to match the successful evolution process, and for a regular expression, in the case of matching failure, if the fastest speed to report the match failure, it will improve the matching efficiency, this may be the most easily overlooked in our design process. If the source string data is very large, or the regular expression is more complex, whether the ability to report matching failure quickly will have a direct impact on the matching efficiency.

The following builds a regular expression that matches the failure and analyzes the matching process.

In the following matching process analysis, the source string is unified as follows: The phrase "regular expression" is called "Regex" to short.

Analysis of 3.2.1 Non-greedy pattern matching failure process

Figure 3-1

Construct a regular expression of a non-greedy pattern that matches failed: ". *?" @

Because the last "@" exists, this regular expression must finally match the failure, then look at the matching process.

First, "" "" "to obtain control, from the position 0 start to try to match, matching failure, until the map marked a matching success, control to". *? ".

“.*?” After the control is obtained, the position of a behind is tried to match, because the greedy mode, first ignore the match, give the control to "", and record the backtracking state. "" "" "after the control, from the position after the start of the attempt to match, matching the character" R "failed to find the state of backtracking, the control to". *? ", by". *? " Matches the character "R". Repeat the above process until ". *?" Matches the character "n" in front of B, "" "matches the character" "at B, and the control is given to" @ ". The "@" matches the next Space "", the match fails, and the status of backtracking is found, and control is given to ". *?" by ". *?" Matches a space. Continue to repeat the above matching process until the ". *?" Match to the end of the string, handing control over to "". The match failed because it was already the end of the string, and the entire expression was reported to have failed at position 11, and a round of matching attempts ended.

The regular engine gearing makes the positive forward drive and enters the next round of attempts. The subsequent matching process is basically similar to the first round of the attempt to match the process, and can be referred to in Figure 3-1.

From the matching process, we can see that the non greedy pattern matching failure process, almost every step is accompanied by the backtracking process, the impact on the matching efficiency is very large.

3.2.2 Greedy pattern matching failure Process Analysis--a large range of subexpression


Figure 3-2

PS: The above analysis process diagram refers to the "proficient in regular expression," a section of the relevant chapter diagram.

Build a regular expression that matches a failed greedy pattern: ". *" @

The quantifier-decorated subexpression is a "." with a large matching range, and because of the existence of the last "@", this regular expression is also a certain match failure, look at the matching process.

First by "" "" "to obtain control, from the position 0 start to try to match, match failed, until the map marked a match success, control to". * ".

After the ". *" Gain control, an attempt is made to match from the position behind a, because it is greedy mode, the optimization tries to match, has been matched to the end of the string, and the control is given to "". "" "" "after the control, because it is already the end of the string, match failed, look for the state of backtracking, the control to". * "by". * "to give up the matched character". ". Repeat the above procedure until "" "" "" "," "" "" "," "" "" "" "" " The "@" matches the space "" at the next d, and the match fails to find a state for backtracking, and control is given to ". *" by ". *" to yield the matched text. Continue to repeat the above matching process until the ". *" yields all the matching text to I, handing control over to "". "" "Match failed because there is no state available for backtracking, the entire expression is reported to have failed at position 11, and a round of matching attempts has ended.

The regular engine gearing makes the positive forward drive and enters the next round of attempts. The subsequent matching process is basically similar to the first round of the attempt to match the process, and can be referred to in Figure 3-2.

From the matching process, we can see that the matching failure process of the large-scale subexpression greedy pattern, in general, is not different from the non greedy mode, the final backtracking times are basically consistent with the non greedy mode, and the effect on the matching efficiency is still great.

3.2.3 Greedy pattern matching failure Process Analysis--an improved sub-expression

Figure 3-3

Build a regular expression that matches the greedy pattern that failed: "[^]*" @

The quantifier-decorated subexpression is changed to match the smaller exclusion character group "[^]", because the last "@" existence, this regular expression also must match the failure, look at the matching process.

First by "" "" "to obtain control, from the position 0 start to try to match, match failed, until the map marked a match success, control to" [^ "]*.

"[^"]* "after the control, by a position after the start of the attempt to match, because it is greedy mode, the first attempt to match, has been matched to B, the control to" ". "" "matches the next character" ", the match succeeds, and the control is given to" @ ". The "@" matches the next Space "", the match fails, looks for the backtracking state, and control is given to "[^"]* ", by" [^ "]*") to yield the matched text. Continue to repeat the above matching process until the "[^"]*] yields all the matched text to C, giving control to "". "" "Match failed because there is no state available for backtracking, the entire expression is reported to have failed at position 11, and a round of matching attempts has ended.

The regular engine gearing makes the positive forward drive and enters the next round of attempts. The subsequent matching process is basically similar to the first round of the attempt to match the process, and can be referred to in Figure 3-3.

From the matching process, we can see that the matching failure process of the greedy pattern of the excluded character group is reduced, and the matching efficiency can be improved effectively by reducing the number of backtracking per round in general.

3.2.4 greedy pattern matching failure Process Analysis--solidification grouping
Through the analysis of the 3.2.3 section, it is possible to know that, because "[^"]* "uses an excluded character group, so in Figure 3-3, the character that is matched between A and B is definitely not the character" ", so the backtracking between B and C is superfluous, that is to say, the state of backtracking between the two is completely out of the record. NET can use the solidification grouping, in Java can use occupies the first quantifier to achieve this effect.


Figure 3-4

First by "" "" "to obtain control, from the position 0 start to try to match, match failed, until the map marked a match success, control to" (? >[^ "]*).

"(?) >[^"]* "after the control, from the position after the start of the attempt to match, because it is greedy mode, the first attempt to match, has been matched to B, the control to" "", in this matching process, do not record any state of backtracking. "" "matches the next character" ", the match succeeds, and the control is given to" @ ". "@" matches the next space "", Match failed, look for the state of backtracking, because there is no state to backtrack, report the entire expression at position 11 match failed, a round of match attempt to end.

The regular engine gearing makes the positive forward drive and enters the next round of attempts. The subsequent matching process is basically similar to the first round of the attempt to match the process, and can be referred to in Figure 3-4.

From the matching process, we can see that the matching failure process using the greedy mode of the curing group is not related to backtracking, and can maximize the matching efficiency.

3.3 Conversion of non-greedy mode to greedy mode
When you use a subexpression that matches a larger range, the greedy pattern matches the contents of the non greedy pattern, but the greedy pattern can be implemented by optimizing the subexpression, which can be matched by a non greedy pattern.

For example, in practical applications, match the contents of the IMG tag.

Example:

Requirements: Get the image address in the IMG tag, src= fixed to "" "

SOURCE string:

Regular expression One:

In the match result, the capture Group 1 content is the picture address. As you can see, this example uses a non-greedy pattern, and according to the analysis in the previous section, the next two non-greedy patterns can use the exclusion character group to convert the non-greedy pattern to greedy mode.

Regular Expression II: ]*>

Note: the character ">" may also appear in the properties between "src=" and ">" of the tag end tag, but that is an extreme case, which is not discussed here.

The latter two are not greedy patterns, it is possible to convert the exclusion character group to greedy mode to improve the efficiency of the match, whereas the src= mode before "src=" is not allowed to use excluded character groups because of the exclusion of a sequence of characters, rather than a single or several characters. Of course, there is no way, you can use a sequential look to achieve this effect.

Regular expression three: ]*>

“(?! src=). " Represents such a character, starting from it, the right cannot be the character sequence "src=" and "(?:(?! src=).) * "means that there are 0 or infinitely many characters that conform to the rules above. This achieves the goal of excluding character sequences, with the same effect as an excluded character group, except that the excluded character group excludes one or more characters, which excludes one or more ordered sequences of characters.

However, in order to look at the way to exclude the sequence of characters, because in matching each character, should be more judgments, so relative to the non-greedy mode, is to improve efficiency or reduce efficiency, according to the actual situation to analyze. For simple regular expressions, or simple source strings, it is generally not greedy to be efficient, and for a large number of source strings, or complex regular expressions, greedy patterns are generally more efficient.

For example, the above obtained IMG tag in the image address needs, basically with regular expression two can be; for complex applications, such as the balance group, you need to use the greedy pattern of a combined look.

Take a balanced group that matches nested div tags as an example:

Regex reg = new regex (? ISX) #匹配模式, ignoring case, "." Match any character

<div[^>]*> #开始标记 "<div ... > "

(?> #分组构造, used to qualify the quantifier "*" Cosmetic range

<div[^>]*> (?<open>) #命名捕获组, encountered the start tag, into the stack, Open count plus 1

| #分支结构

</div> (?<-open>) #狭义平衡组, encountered end tag, out stack, Open count minus 1

| #分支结构

(?:(?! </?div\b).) * #右侧不为开始或结束标记的任意字符

) * #以上子串出现0次或任意多次

(? (Open) (?!)) #判断是否还有 ' OPEN ', it means no pairing, nothing matches.

</div> #结束标记 "</div>"

");

“(?:(?! </?div\b).) * "Here is a combination of the greedy mode of looking around, although each one character has to do a lot of judgment, but this judgment is based on character, fast, and if the use of non-greedy mode, then every time to do is the branch structure" | " , and the branch structure is very influential in matching efficiency, and its cost is much higher than the judgment of the determined characters. Another reason is that greedy patterns can be combined with curing groups to enhance efficiency, and the use of the non-greedy mode of curing group is meaningless.

4 greed and non-greed--a final review
4.1 An example of the matching principle review
Take a look at the 2.1.1 section of the case, the previous analysis from the application point of view, but the discussion of the matching principle will find that the matching process is not so simple, the following from the matching principle angle analysis of the matching process.


Figure 4-1

First, the "<" to obtain control, from the position 0 bit start to try to match, match the character "a", match the failure, the first round of matching end. The second-round match starts at position 1 to try the match, and the same match fails. The third round starts from position 3 to try to match, match the character "<", match successfully, control power to "D".

"D" tries to match the character "D", the match succeeds, and the control is given to "I". Repeat the process until the ">" is matched to the character ">" and control is given to ". *".

". *" belongs to greedy mode, will be from B after the character "T" start, always match to E, that is, the end of the string position, the control to "<".

"<" tries to match from the end of the string. Match failed, look forward to the state of backtracking, the control to ". *" by ". *" to give a character "C", the control to "<", try to match, match failed, look forward to the status of backtracking. Repeat the above process until ". *" to give up the matched character "<", in fact, is to let out the matching substring "</div>cc" until the "<" matches the character "<" success, control to "/".

Next, the corresponding characters are successfully matched by "/", "D", "I" and "V", at which point the entire regular expression match is completed.

4.2 Greed and non-greed--the details of quantifiers
Non-greedy pattern of 4.2.1 interval classifier
The non-greedy pattern mentioned above has always been used "*?" without involving other interval quantifiers, for "*?" and "+?" Such non-greedy patterns, most people who have contacted regular expressions, can be understood, but not greedy patterns for interval quantifiers, such as "{M,n}?", either have not seen, or do not understand, mainly this scenario is very small, so it was ignored.

The first thing to be clear is that the quantifier "{m,n}" is a matching priority classifier, although it has a cap, but before reaching the upper limit, it can match or match as many as possible. and "{m,n}?" is the corresponding ignore priority quantifiers, in the case of matching can not match, as little as possible match.

The next example illustrates the application of this non-greedy pattern.

For example (reference limit character length and minimum match):

Requirements: How to limit the string in length 100, from scratch to the first occurrence of ABC

Csdn. {1,100}ABC This write is the maximum match (1-100 strings, I need the smallest)

For example CSDNFDDABCKJDSFJABC, the matching result should be: CSDNFDDABC

Regular expression: Csdn. {1,100}?ABC

There may be some people who do not understand this example, but think, in fact, "*" is equivalent to "{0,}", "+" is equivalent to "{1,}", "*?" That is, "{0,}?", the abstract is "{m,}?", that is, the upper limit is infinity. If the upper limit is a fixed value, that is "{m,n}?", which should be understandable.

"{m}" is not placed in the matching precedence classifier, the same, "{m}?" Although it is supported by some languages, it is not placed in the neglect of precedence quantifiers. Mainly because these two classifiers, the implementation of the effect is the same, only the modified subexpression matching m to match the success, and there is no state of backtracking, so there is no matching priority or ignore the priority problem, is not covered in this article. In fact, even if the discussion is meaningless, just know that their matching behavior is the same.

4.2.2 ignores the matching lower bound of the precedence classifier
A good understanding of the matching lower bound of the matching precedence classifier, "?" Equivalent to "{0,1}", its modified subexpression, matching at least 0 times, matching up to 1 times; * "equivalent to" {0,} ", it decorated subexpression, at least 0 times, matching the maximum number of infinite times," + "is equivalent to" {1,} ", it decorated subexpression, at least 1 matches, matching infinitely many times.

It is also easy to ignore the lower bound of the precedence classifier.

“??” Also ignores the precedence quantifiers, and the modified subexpression uses a non greedy pattern, "??" A decorated subexpression that matches at least 0 times and matches up to 1 times. In the matching process, follow the principle of non-greedy pattern matching, first do not match, that is, matching 0 times, record backtracking state, only have to match, to try to match.

“*?” The modified subexpression, which matches at least 0 times, matches infinitely many times; The modified subexpression, which matches at least 1 times, matches infinitely many times, "+?" Although the use of the non-greedy pattern, in the matching process, the first to match a character, and then ignore the match, this also needs to be noted.

4.3 Summary of greedy and non-greedy patterns
Ø greed and non-greed from a grammatical point of view

The subexpression modified by the matched precedence classifier uses the greedy pattern, the subexpression that is ignored by the precedence classifier, and the use of a non greedy pattern.

Matching priority quantifiers include: "{m,n}", "{m,}", "?", "*" and "+".

Ignoring priority quantifiers includes: "{m,n}?", "{m,}?", "??", "*?" and "+?".

Ø greed and non-greed from the point of view of application

Greed and non-greedy mode affect the matching behavior of the subexpression modified by the quantifier, the greedy pattern matches as many as possible under the premise that the whole expression matches successfully, and not the greedy pattern matches as little as possible under the premise that the whole expression matches successfully. Non-greedy mode is supported only by some NFA engines.

Ø greed and non-greed from the point of view of matching principle

Greedy and non-greedy patterns that achieve the same matching results are usually more efficient than greedy patterns.

All non-greedy patterns can be converted to greedy patterns by modifying the subexpression modified by quantifiers.

Greedy mode can be combined with curing group, improve the matching efficiency, but not greedy mode is not possible.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.