Greedy and non-greedy patterns of regular expressions

Source: Internet
Author: User

1 Overview
Greedy and non-greedy mode affects the matching behavior of sub-expressions modified by quantifiers, and greedy mode matches as much as possible while the whole expression matches successfully, while not greedy mode matches as few as possible if the whole expression matches successfully. Non-greedy mode is only supported by partial NFA engines.

A quantifier that belongs to greedy mode, also known as a matching priority quantifier, includes:

"{m,n}", "{m,}", "?", "*", and "+".

In some languages that use the NFA engine, after matching the first quantifier with "?", it becomes a quantifier that belongs to the non-greedy mode, also called ignoring the priority quantifier, including:

"{m,n}?", "{m,}?", "??", "*?" and "+?".

From the regular syntax point of view, the sub-expression that is modified by the matched precedence quantifier uses the greedy pattern, such as "(expression) +"; The sub-expression that is ignored by the precedence quantifier modifier is a non-greedy pattern, such as "(expression) +?".

For greedy mode, the names of various documents are basically consistent, but for non-greedy mode, some are called lazy mode or lazy mode, some are called reluctantly mode, in fact, what does not matter, as long as the master principle and usage, to be able to use it. Personal habits use greed and non-greed, so the text will be used to introduce the term.

2 greedy and non-greedy pattern matching principle
For greedy and non-greedy mode, can be from the application and the principle of two angles to understand, but if you want to really grasp, or from the matching principle to understand.

From the application point of view, answer "what is greedy and non-greedy mode?" ”

2.1 Analysis of greedy and non-greedy patterns from the perspective of application
2.1.1 What is greedy and non-greedy mode
Let's look at an example.

Example:

SOURCE string: AA<DIV>TEST1</DIV>BB<DIV>TEST2</DIV>CC

Regular expression One:<div>.*</div>

Match result One:<div>test1</div>bb<div>test2</div>

Regular Expression two:<div>.*?</div>

Match result two:<div>test1</div> (this refers to a match result, so the <div>test2</div> is not included)

According to the above example, from the matching behavior analysis, what is greedy and non-greedy mode.

The regular expression is a greedy pattern, the match to the first "</div>" when the entire expression can be matched successfully, but because of the use of greedy mode, so still to the right to try to match, to see if there is a longer string that can be successfully matched to the second "</ Div> ", right again there is no substring that can be successfully matched, the match ends, and the match result is" <div>test1</div>bb<div>test2</div> ". Of course, the actual matching process is not the case, the following matching principle will be described in detail.

Only from the perspective of application analysis, it can be thought that the greedy mode, is in the entire expression matching success, as much as possible match, that is, the so-called "greed", popular point, is to see what you want, how much to pick up, unless there is no longer wanted.

The regular expression two adopts a non-greedy mode, the match to the first "</div>" to make the entire expression match successfully, because the use of non-greedy mode, so end the match, no longer try to the right, the match result is "<div>test1</div>".

Only from the perspective of application analysis, it can be said that the non-greedy mode, is in the entire expression matching success, as little as possible to match, that is, the so-called "non-greedy", popular point, is to find a want to pick up on the line, as to whether or not have not picked up on it.

2.1.2 Description of Prerequisites
In the above analysis of greedy and non-greedy mode from the perspective of application, always mentioned a precondition is "the whole expression matching success", why to emphasize this premise, we look at the following example.

Regular expression three: <div>.*</div>bb

Match result three: &LT;DIV&GT;TEST1&LT;/DIV&GT;BB

Retouch "." is still the match of the first quantifier "*", so this is still greedy mode, the front "<div>.*</div>" can still match to "<div>test1</div>bb<div>test2 </div> ", but since the" BB "in the back fails to match," <div>.*</div> "must give up the matching" bb<div>test2</div> ", To make the entire expression match successfully. At this point the entire expression matches the result of "&LT;DIV&GT;TEST1&LT;/DIV&GT;BB", "<div>.*</div>" matches the content "<div>test1</div>". It can be seen that, under the premise of "success of the whole expression matching", greedy mode really affects the matching behavior of sub-expressions, and if the whole expression fails, the greedy pattern only affects the matching process, and the influence of the matching results is not discussed.

The same problem exists in non-greedy mode, as seen in the following example.

Regular expression four: &LT;DIV&GT;.*?&LT;/DIV&GT;CC

Match result four: &LT;DIV&GT;TEST1&LT;/DIV&GT;BB&LT;DIV&GT;TEST2&LT;/DIV&GT;CC

The use of the non-greedy mode, the front "<div>.*?</div>" is still matched to "<div>test1</div>", the following "CC" does not match the success, requirements "<div >.*?</div> "must continue to match to the right until the match is" <div>test1</div>bb<div>test2</div> ", followed by" CC " To match success, the entire expression matches successfully, and the matching content is "&LT;DIV&GT;TEST1&LT;/DIV&GT;BB&LT;DIV&GT;TEST2&LT;/DIV&GT;CC", where "<div>.*?</ Div> "matches the content" <div>test1</div>bb<div>test2</div> ". As you can see, the non-greedy pattern really affects the matching behavior of the sub-expression, and if the whole expression fails, the non-greedy mode cannot affect the matching behavior of the sub-expression, under the premise that the whole expression matches successfully.

2.1.3 Greed or non-greed--the choice of application
Through the analysis of the application angle, has basically understood the greedy and the non-greedy pattern characteristic, then in the actual application, whether chooses the greedy pattern, or is the non-greedy pattern, this needs to determine according to the demand.

For some simple needs, such as the source character is "Aa<div>test1</div>bb", then get the div tag, using greedy and non-greedy mode can achieve the desired results, which may not be the same relationship.

However, in the case of 2.1.1, in practice, it is generally only necessary to get a paired div tag at a time, that is, what the non-greedy pattern matches, and the content that the greedy pattern matches is usually not what we need.

Then why should there be a greedy mode of existence, from the application point of view is difficult to give a satisfactory answer, which need to be from the point of view of matching theory to analyze the greedy and non-greedy mode.

2.2 Analysis of greedy and non-greedy modes from the point of view of matching principle
If you want to really understand what is greedy mode, what is the non-greedy mode, respectively, under what circumstances, the respective efficiency, it can not only from the application point of view, but to fully understand the greedy and non-greedy pattern matching principle.

2.2.1 from the basic matching principle
NFA Engine Basic Matching principle reference: the regular basis of the--NFA engine matching principle.

This is mainly for the greedy and non-greedy pattern related to the matching principle is introduced. First look at the greedy pattern simple matching process.

SOURCE string: "Regex"

Regular expression: ". *"




Figure 2-1

Note: In order to be able to see the clear matching process, the above gaps remain larger, the actual source string is "" Regex ", the same as below.

Take a look at the matching process. First by the first "" "to take control, matching the position of 0-bit" ", matching success, control to". * ".

". *" after gaining control, because "*" is a matching priority quantifier, in the case of matching can not match, the first attempt to match. Start the match from "R" at position 1, match succeeds, continue to match right, match "E" at position 2, match succeeds, continue to match right until "" "at end, match succeeds, because at this point the end of the string is matched, the". * "ends the match, giving control to the last" "of the regular expression. ”。

"" "after taking control, because already at the end of the string, the match fails, looking forward to the state of backtracking, control to". * ", by". * "to let out a character, that is, the end of the string" ", and then give control to the final" "", "" "match the end of the string" " ", the match was successful.

At this point the entire regular expression matches successfully, where ". *" matches the "Regex" and a backtracking occurs during the matching process.

Next look at a simple matching process for non-greedy patterns.

SOURCE string: "Regex"

Regular expression: ". *?"





Figure 2-2

Look at the matching process for non-greedy patterns. First by the first "" "to obtain control, matching the position of 0-bit" ", matching success, control to". *? ".

“.*?” After gaining control, due to "*?" is to ignore the precedence quantifier, in the case of matching can not match, the first attempt does not match, because "*" is equivalent to "{0,}", so in the case of ignoring precedence, can not match any content. Attempts to ignore a match from position 1, that is, do not match anything, give control to the "" "at the end of the regular expression.

"" "after gaining control, try to match from position 1, by" "" Match location 1 at the "R", matching failed, forward to look for the status of backtracking, control to ". *?", by ". *?" Eat a character, match the "R" at position 1, and then give control to the "" "at the end of the regular expression.

"" "after gaining control, try to match from position 2, by" "" Match location 1 at the "E", matching failed, forward to find the state that can be traced back, repeat the process until the ". *?" Match to "X" and then give control to the "" "at the end of the regular expression.

"" "after gaining control, try to match from position 6," "" matches the string at the end of "", the match succeeds.

At this point the entire regular expression matches successfully, where ". *?" The match is "Regex", and five backtracking was performed during the match.

2.2.2 Greed or non-greed--the choice of matching efficiency
Through the analysis of the matching principle, we can see that in the case of matching success, greedy mode has a less backtracking, and the backtracking process, need to control the handover, let out the matched content or match the unmatched content, and try to match again, to a large extent reduce the matching efficiency, so greedy mode and non-greedy mode compared to There is an advantage in matching efficiency.

But the example in 2.2.1, is simply a simple application, when readers see here, whether there is such a doubt, greedy mode must be more efficient than non-greedy pattern matching? The answer is in the negative.

Example:

Requirement: Obtain a substring from two "", which can no longer contain "" ".

Regular expression one: ". *"

Regular expression two: ". *?"

Scenario one: When greedy patterns match more unwanted content, there may be more backtracking than non-greedy patterns. For example, the source string is "the word" Regex "means regular expression."

Scenario Two: Greedy mode does not meet demand. For example, the source string is "the phrase" regular expression "was called" Regex "for short."

For the case of a, the regular expression of a greedy pattern, ". *" will always match to the end of the string, control to the last "" ", after the match is unsuccessful, then backtracking, due to the content of the multi-match" means regular expression. " Far more than the need to match the content itself, so the use of regular expressions for a moment, the matching efficiency is less than using regular expression two non-greedy mode.

For case two, the regular expression of a match to the "" "Regular expression" is called "Regex", even the demand is not satisfied, nature is not what matching efficiency of high and low.

The above two situations are universal, then is not to meet the needs, but also to take into account efficiency, you can only use non-greedy mode? Of course not, according to the actual situation, changing the sub-expression of matching priority quantifier modification can not only satisfy the demand, but also improve the matching efficiency.

SOURCE string: "Regex"

Give the regular expression three: "[^"]* "

Take a look at the matching process of the regular expression three.




Figure 2-3

First by the first "" "to obtain control, matching position 0 bits" ", matching success, control to" [^ "]*".

"[^"]* "after gaining control, because" * "is the match of the first quantifier, in the case of matching can not match, the first attempt to match. Start the match from "R" at position 1, match succeeds, continue to match right, match "E" at position 2, match succeeds, continue to match right until match to "X", match succeeds, match end "" ", give control to the last" "" of the regular expression.

"" "after gaining control, match" "" at the end of the string to match successfully.

At this point the entire regular expression matches successfully, where "[^"]* "matches" Regex "and no backtracking is performed during the match.

The sub-expressions modified by quantifiers are replaced by ".", which is of a larger scope, with the exclusion character group "[^"] ", which is still greedy mode, which solves the problem of demand and efficiency perfectly. Of course, since this matching process does not backtrack, it does not need to record the backtracking state, so you can use the curing group to further optimize the regular.

Give the regular expression four: "(? >[^"]*) "

Curing groups are not supported in all languages, such as. NET support, but Java is not supported, but in Java it is possible to use simpler possessive quantifiers instead of: "[^"]*+ ".

3 greed or non-greedy mode--talk about matching efficiency
In general, greedy and non-greedy patterns, if quantifier-modified sub-expressions are the same, such as ". *" and ". *", their application scenarios are usually different, so efficiency is generally not comparable.

And for changing the sub-expression of quantifier modification, in order to meet the needs, such as ". *" instead of "[^"]* ", because the modified sub-expression is different, also does not have a direct contrast. But in the case of the same sub-expression, which can satisfy the demand, such as "[^"]* "and" ^ "]*?", the greedy pattern's matching efficiency is usually higher.

At the same time, there is a fact that the non-greedy mode can be achieved, by optimizing the quantifier modified sub-expression greedy mode can be achieved, and greedy mode can achieve some of the optimization effect, but not necessarily non-greedy mode can be achieved.

Greedy mode also has a little advantage, that is, when the match fails, greedy mode can report the failure more quickly, thus improving the matching efficiency. The matching efficiency of greedy and non-greedy patterns is comprehensively examined below.

3.1 Efficiency improvement--evolution process
After understanding the basic principles of greedy and non-greedy patterns, let's take a look at the evolutionary process of regular efficiency improvement.

Requirement: Obtain a substring from two "", which can no longer contain "" ".

SOURCE string: The phrase ' regular expression ' is called ' Regex ' for short.

Regular expression one: ". *"

The regular expression matches the content "" "Regular expression" is called "Regex", does not meet the requirements.

Put forward the regular expression two: ". *?"

First, "" "to take control, starting from the position 0 bit to try to match, until the position 11 matches successfully, control to". * ", matching process and 2.2.1 non-greedy pattern matching process. “.*?” The match is "Regex", and four backtracking was performed during the match.

How to eliminate the loss of the matching efficiency caused by backtracking is to use a smaller range of sub-expressions, adopt the greedy pattern, put forward the regular expression three: "[^"]* "

First "" "to take control, starting from the position 0 bit to try to match, until the position 11 matches successfully, control to" [^ "]*", the matching process with 2.2.2 section of the non-greedy pattern matching process. "[^"]* "matches the" Regex "and no backtracking is performed during the match.

3.2 Efficiency improvements-faster reporting failures
The above discussion is a successful evolution of the match, and for a regular expression, in the case of a match failure, if you can report a match failure at the fastest speed, it will also improve the matching efficiency, which is perhaps the most overlooked in our design of the regular process. If the source string data is very large, or if the regular expression is complex, the ability to quickly report a match failure will have a direct impact on the matching efficiency.

The following will build a regular expression that matches the failure to analyze the matching process.

In the following matching process analysis, the source string is uniform: the phrase "regular expression" is called "Regex" for short.

Analysis of 3.2.1 Non-greedy pattern matching failure process


Figure 3-1

Build a regular expression that matches the failed non-greedy pattern: ". *?" @

Since the last "@" exists, the regular expression must finally be a match failure, then look at the matching process.

First by "" "to take control, starting from position 0 to try to match, match failure, until the figure indicated in the match succeeded, control to". *? ".

“.*?” After gaining control, a match is attempted from the position after a, because the non-greedy mode, first ignoring the match, the control is given "" ", while recording the backtracking state. "" "after gaining control, a match is attempted by the position after a, the match character" R "fails to find a status that can be traced back, giving control to". *? ", by". *? " Match the character "R". Repeat the process until the ". *?" Matches the character "N" at the front of B, "" "matches the character" "" at B, and gives control to "@". The "@" matches the next Space "", the match fails, the lookup is available for backtracking status, control is given to ". *", by ". *?" Matches a space. Continue to repeat the above matching process until the ". *?" Match to the end of the string, giving control to "". Because the match failed at the end of the string, the entire expression was reported to fail at position 11, and a round of match attempts ended.

The regular engine drive makes the regular forward drive into the next round of attempts. The subsequent matching process is basically similar to the first attempt to match the process, which can be referenced in Figure 3-1.

From the matching process, we can see that the non-greedy pattern of the matching failure process, almost every step is accompanied by the backtracking process, the impact on the matching efficiency is very large.

3.2.2 Greedy pattern matching failure Process Analysis--large-scale sub-expression


Figure 3-2

PS: The above analysis process diagram refers to the relevant chapters of "Proficient in regular expressions".

Build a regular expression that matches the failed greedy pattern: ". *" @

Where quantifier modification of the sub-expression is a matching range of ".", due to the existence of the last "@", the regular expression finally is a certain failure to match, look at the matching process.

First by "" "to take control, starting from position 0 to try to match, match failure, until the figure indicated in the match succeeded, control to". * ".

". *" after gaining control, a match is attempted from the position after a, because it is greedy mode, the optimization tries to match, always matches to the end position of the string, and gives control to "" ". "" "after gaining control, because it is already the end position of the string, the match fails to find the state that can be traced back, giving control to". * ", by". * "to let out the matched character". ". Repeat the process until the following "" "matches the character" "" at the back of C, giving control to "@". The "@" matches the space "" at the next D, the match fails, the lookup is available for backtracking, control is given to ". *" and the matched text is conceded by ". *". Continue repeating the above matching process until the ". *" yields all matched text to I, giving control to "" ". "" "Match failed because there is no status available for backtracking, reporting that the entire expression fails at position 11, and a round match attempt is completed.

The regular engine drive makes the regular forward drive into the next round of attempts. The subsequent matching process is basically similar to the first attempt to match the process, which can be referenced in Figure 3-2.

From the matching process can be seen, large-scale sub-expression greedy pattern matching failure process, in general, and non-greedy mode is no different, the final number of backtracking and non-greedy mode is basically consistent, the impact on matching efficiency is still very large.

Analysis of 3.2.3 greedy pattern matching failure process--improved sub-expression

Figure 3-3

To build a regular expression that matches the failed greedy pattern: "[^"]* "@

Where quantifier modification of the sub-expression, to match a smaller range of excluded character group "[^"] ", because the last" @ "existence, the regular expression finally is a certain match failed, look at the matching process.

First by "" "to take control, starting from position 0 to try to match, match failed, until the figure indicated in the match succeeded, control to" [^ "]*".

"[^"]* "after gaining control, starting from the position after a to try to match, because it is greedy mode, first try to match, always match to B, the control to" ". "" matches the next character "" ", and the match succeeds, giving control to" @ ". By "@" matches the next Space "", the match fails, finds the state that is available for backtracking, control is given to "[^"]* ", and" [^ "]*" yields the matched text. Continue repeating the above matching process until "[^"]* "yields all matched text to C, giving control to" ". "" "Match failed because there is no status available for backtracking, reporting that the entire expression fails at position 11, and a round match attempt is completed.

The regular engine drive makes the regular forward drive into the next round of attempts. The subsequent matching process is basically similar to the first attempt to match the process, which can be referenced in Figure 3-3.

From the matching process, we can see that the greedy pattern matching failure process using excluded character groups, in general, significantly reduces the number of backtracking per round, which can effectively improve the matching efficiency.

3.2.4 greedy pattern matching failure Process Analysis--Curing group
Through the analysis of the 3.2.3 section, we can know that because "[^"]* "uses excluded character groups, then in Figure 3-3, the character between A and B is not necessarily the character" ", so the backtracking process between B and C is superfluous, that is, the state of the backtracking between the two can be completely non-logged. NET can use the Cure grouping, in Java can use to occupy the priority quantifier to achieve this effect.


Figure 3-4

First by "" "to take control, starting from position 0 to try to match, match failed, until the figure indicated in the match succeeded, control to" (? >[^ "]*)".

"(? >[^"]*) "after the control, starting from the position after a to try to match, because it is greedy mode, the first attempt to match, always match to B, the control to" "", in this matching process, do not log any of the status of backtracking. "" matches the next character "" ", and the match succeeds, giving control to" @ ". By "@" matches the next Space "", the match fails to find the state that is available for backtracking, because there is no status available for backtracking, the entire expression is reported to match at position 11 failure, and a round match attempt ends.

The regular engine drive makes the regular forward drive into the next round of attempts. The subsequent matching process is basically similar to the first attempt to match the process, which can be referenced in Figure 3-4.

From the matching process can be seen, using the curing group greedy pattern matching failure process, does not involve backtracking, can maximize the matching efficiency.

3.3 Non-greedy mode conversion to greedy mode
When using a sub-expression with a larger range, the greedy pattern matches the non-greedy pattern, but the greedy pattern can be achieved by optimizing the sub-expressions, which can be matched by non-greedy patterns.

For example, in the actual application, the content of the IMG tag is matched.

Example:

Requirements: Obtain the image address in the IMG tag, src= and then fixed to "" "

SOURCE string:

Regular expression One:

In the match result, the contents of capturing group 1 are the picture addresses. As you can see, the use of this example is non-greedy mode, and according to the above chapter analysis, the following two non-greedy mode can use exclusion character group, the non-greedy mode is converted to greedy mode.

Regular expression two: ]*>

Note: the character ">" may also appear in the attribute between "src=" and the label closing marker ">", but that is the extreme situation, which is not discussed here.

The latter two non-greedy modes, can be converted to greedy mode by the exclusion type character group, improve the matching efficiency, and "src=" before the non-greedy mode, because to exclude a character sequence "src=", rather than a single one or several characters, so you cannot use excluded character groups. Of course, there is no way, you can use a sequential look to achieve this effect.

Regular expression three: ]*>

“(?! src=). " Represents such a character, starting from it, the right cannot be the character sequence "src=", and "(?:(?! src=).) * "means a character that conforms to the above rules, with 0 or more infinite. This achieves the purpose of excluding character sequences, and achieves the same effect as an excluded character group except that the excluded character group excludes one or more characters, and this surround view structure excludes one or more ordered sequences of characters.

However, in order to exclude the sequence of characters, because of the matching of each character, it is necessary to make more judgments, so relative to the non-greedy mode, is to improve efficiency or reduce efficiency, according to the actual situation to analyze. For simple regular expressions, or simple source strings, the non-greedy pattern is generally more efficient, and for a large number of source strings, or complex regular expressions, the greedy mode is generally more efficient.

For example, the above to obtain an IMG tag image address requirements, basically with regular expression two can be, for complex applications, such as the balance group, you need to use the greedy mode of surround view.

Take the balance group that matches the nested div tag as an example:

Regex reg = new Regex (@ "(? ISX) #匹配模式, ignoring case,". " Match any character

<div[^>]*> #开始标记 "<div ... > "

(?> #分组构造, used to qualify quantifier "*" Modifier range

<div[^>]*> (?<open>) #命名捕获组, Encounter start tag, enter Stack, Open count plus 1

| #分支结构

</div> (?<-open>) #狭义平衡组, encounter end tag, out stack, Open count minus 1

| #分支结构

(?:(?! </?div\b).) * #右侧不为开始或结束标记的任意字符

) * #以上子串出现0次或任意多次

(? (Open) (?!)) #判断是否还有 ' OPEN ', there is no pairing, nothing matches

</div> #结束标记 "</div>"

");

“(?:(?! </?div\b).) * "Here is the greedy mode of combining the look, although every single character has to make a lot of judgments, but this judgment is character-based, fast, and if the use of non-greedy mode, then every time to do is branching structure" | " , and the branch structure is very important to the matching efficiency, and the cost is much higher than the determination of the character. Another reason is that greedy patterns can be combined with curing groups to improve efficiency, while the use of curing groups for non-greedy patterns is meaningless.

4 greed and non-greed--a final review
4.1 A review of the matching principle of an example
Take a look back at the 2.1.1 Section example, the previous analysis from the perspective of application, but after discussing the matching principle will find that the matching process is not so simple, the following from the matching principle of the matching process analysis.


Figure 4-1

First, the "<" to get control, by the position of 0-bit start to try to match, match the character "a", matching failed, the first round of matching end. The second match starts at position 1 and tries to match, and the same match fails. The third round starts with position 3, matches the character "<", matches the success, and control is given to "D".

"D" attempts to match the character "D", the match succeeds, and control is given to "I". Repeat the process until the ">" is matched to the character ">" and control is given to ". *".

". *" is greedy mode, will start from the character "T" after B, always match to E, that is, the end of the string, the control to "<".

"<" attempts to match from the end of the string, the match fails, the forward lookup can be traced back to the state, give control to ". *", by ". *" to let out a character "C", give control to "<", try to match, match the failure, forward to find the status can be traced back. Repeat the process until the ". *" yields the matched character "<", which actually yields the matched substring "</div>cc", "<" matches the character "<" succeeds, control is given to "/".

Next, "/", "D", "I", "V" match the corresponding characters successfully, at this time the entire regular expression matches complete.

4.2 Greed and non-greed--the details of quantifiers
Non-greedy mode of 4.2.1 interval quantifier
The non-greedy pattern mentioned above has always been used as "*?" without involving other interval quantifiers, for "*?" and "+?" This non-greedy pattern, most people who have touched the regular expression can understand, but for the non-greedy mode of interval quantifiers, such as "{M,n}?", either has not seen, or is not understood, mainly this application scenario is very few, so is ignored.

The first thing to be clear is that the quantifier "{m,n}" is a match-first quantifier, although it has a cap, but before reaching the upper limit, can match, or to match as much as possible. and "{m,n}?" is the corresponding ignore the priority quantifier, in the case of matching can not match, as few matches as possible.

The following example illustrates the application of this non-greedy pattern.

Example (refer to limit character length and minimum match):

Requirements: How to limit the string in length 100 to match from the beginning to the first occurrence of ABC

Csdn. {1,100}ABC This write is the maximum match (1-100 strings, I need the smallest one)

For example CSDNFDDABCKJDSFJABC, the matching result should be: CSDNFDDABC

Regular expression: Csdn. {1,100}?ABC

Perhaps some people do not understand this example, but think, in fact, "*" is equivalent to "{0,}", "+" is equivalent to "{1,}", "*?" That is, "{0,}?", which is "{m,}?", that is, the upper limit is infinity. If the upper limit is a fixed value, that is "{m,n}", so it should be understandable.

"{m}" is not placed in the matching priority quantifier, the same, "{m}?" Although supported by some languages, but also not ignored in the priority quantifier, mainly because the two quantifiers, the implementation of the effect is the same, only the modified sub-expression matches m to match the success, and there is no fallback state, so there is no match priority or ignore the priority problem, is not within the scope of this discussion. In fact, even if the discussion is meaningless, just know that their matching behavior is the same.

4.2.2 ignores the lower bound of the precedence quantifier
A good understanding of the matching lower bound of the first quantifier, "?" Equivalent to "{0,1}", it modifies the sub-expression, the minimum match 0 times, the maximum match 1 times, "*" is equivalent to "{0,}", it modifies the sub-expression, the minimum match 0 times, the maximum matches the infinite number of times; "+" is equivalent to "{1,}", it modifies the sub-expression, the minimum match 1 times,

It is also easy to ignore the lower limit of the precedence quantifier.

“??” Also ignore the priority quantifier, the modified sub-expression used is also non-greedy mode, "??" A modified subexpression that matches at least 0 times and matches up to 1 times. In the matching process, follow the non-greedy pattern matching principle, first mismatch, that is, match 0 times, record the backtracking state, only have to match, only to try to match.

“*?” The modified sub-expression, which matches at least 0 times, matches infinitely many times; "+?" Modified sub-expression with a minimum of 1 matches, up to an infinite number of times, "+?" Although the use of non-greedy mode, in the matching process, the first to match a character, followed by ignoring the match, this also needs to be noted.

4.3 Summary of greedy and non-greedy patterns
Ø from a grammatical point of view greed and non-greed

A sub-expression that is modified by a matched precedence quantifier, using a greedy pattern, a sub-expression that is ignored by a precedence quantifier modifier, and a non-greedy pattern used.

The matching precedence quantifiers include: "{m,n}", "{m,}", "?", "*", and "+".

Ignore precedence quantifiers include: "{m,n}?", "{m,}?", "??", "*?" and "+?".

Ø from the point of view of application greed and non-greed

Greedy and non-greedy patterns affect the matching behavior of sub-expressions modified by quantifiers, and greedy mode matches as much as possible while the whole expression matches successfully, while non-greedy mode matches as few as possible if the whole expression matches successfully. Non-greedy mode is only supported by partial NFA engines.

Ø greed and non-greed from the point of view of matching principle

Greedy and non-greedy patterns that can achieve the same results are usually more efficient to match greedy patterns.

All non-greedy modes can be converted to greedy mode by modifying the sub-expressions modified by quantifiers.

Greedy mode can be combined with curing groups to improve matching efficiency, not greedy mode.

Greedy and non-greedy patterns of regular expressions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.