1 Overview
Greedy and non-greedy mode affects the matching behavior of sub-expressions modified by quantifiers, and greedy mode matches as much as possible while the whole expression matches successfully, while not greedy mode matches as few as possible if the whole expression matches successfully. Non-greedy mode is only supported by partial NFA engines.
A quantifier that belongs to greedy mode, also known as a matching priority quantifier, includes:
"{m,n}", "{m,}", "?", "*", and "+".
In some languages that use the NFA engine, after matching the first quantifier with "?", it becomes a quantifier that belongs to the non-greedy mode, also called ignoring the priority quantifier, including:
"{m,n}?", "{m,}?", "??", "*?" and "+?".
From the regular syntax point of view, the sub-expression that is modified by the matched precedence quantifier uses the greedy pattern, such as "(expression) +"; The sub-expression that is ignored by the precedence quantifier modifier is a non-greedy pattern, such as "(expression) +?".
For greedy mode, the names of various documents are basically consistent, but for non-greedy mode, some are called lazy mode or lazy mode, some are called reluctantly mode, in fact, what does not matter, as long as the master principle and usage, to be able to use it. Personal habits use greed and non-greed, so the text will be used to introduce the term.
2 greedy and non-greedy pattern matching principle
For greedy and non-greedy mode, can be from the application and the principle of two angles to understand, but if you want to really grasp, or from the matching principle to understand.
From the application point of view, answer "what is greedy and non-greedy mode?" ”
2.1 Analysis of greedy and non-greedy patterns from the perspective of application
2.1.1 What is greedy and non-greedy mode
Let's look at an example.
Example:
SOURCE string: AA<DIV>TEST1</DIV>BB<DIV>TEST2</DIV>CC
Regular expression One:<div>.*</div>
Match result One:<div>test1</div>bb<div>test2</div>
Regular Expression two:<div>.*?</div>
Match result two:<div>test1</div> (this refers to a match result, so the <div>test2</div> is not included)
According to the above example, from the matching behavior analysis, what is greedy and non-greedy mode.
A regular expression is a greedy pattern that can be matched to the first "</div>" so that the entire expressionMatch Success, but because of the greedy mode, you still have to try to match to the right, to see if there is a longer substring that can be successfully matched, match to the second "</div>", then to the right there are no strings that can be successfully matched, the match ends, the match result is "<div>test1 </div>bb<div>test2</div> ". Of course, the actual matching process is not the case, the following matching principle will be described in detail.
Only from the perspective of application analysis, it can be thought that the greedy mode, is in the entire expression matching success, as much as possible match, that is, the so-called "greed", popular point, is to see what you want, how much to pick up, unless there is no longer wanted.
A non-greedy pattern used in regular expression two, which makes the entire expression match to the first "</div>"Match Success, because of the non-greedy mode, so the end of the match, no longer try to the right, the match result is "<div>test1</div>".
Only from the perspective of application analysis, it can be said that the non-greedy mode, is in the entire expression matching success, as little as possible to match, that is, the so-called "non-greedy", popular point, is to find a want to pick up on the line, as to whether or not have not picked up on it.
2.1.2 Description of Prerequisites
In the above analysis of greedy and non-greedy mode from the perspective of application, always mentioned a precondition is "the whole expression matching success", why to emphasize this premise, we look at the following example.
Regular expression three: <div>.*</div>bb
Match result three: <DIV>TEST1</DIV>BB
Retouch "." is still the match of the first quantifier "*", so this is still greedy mode, the front "<div>.*</div>" can still match to "<div>test1</div>bb<div>test2 </div> ", but since the" BB "in the back fails to match," <div>.*</div> "must give up the matching" bb<div>test2</div> ", To make the entire expression match successfully. At this point the entire expression matches the result of "<DIV>TEST1</DIV>BB", "<div>.*</div>" matches the content "<div>test1</div>". It can be seen that, under the premise of "success of the whole expression matching", greedy mode really affects the matching behavior of sub-expressions, and if the whole expression fails, the greedy pattern only affects the matching process, and the influence of the matching results is not discussed.
The same problem exists in non-greedy mode, as seen in the following example.
Regular expression four: <DIV>.*?</DIV>CC
Match result four: <DIV>TEST1</DIV>BB<DIV>TEST2</DIV>CC
The use of the non-greedy mode, the front "<div>.*?</div>" is still matched to "<div>test1</div>", the following "CC" does not match the success, requirements "<div >.*?</div> "must continue to match to the right until the match is" <div>test1</div>bb<div>test2</div> ", followed by" CC " To match success, the entire expression matches successfully, and the matching content is "<DIV>TEST1</DIV>BB<DIV>TEST2</DIV>CC", where "<div>.*?</ Div> "matches the content" <div>test1</div>bb<div>test2</div> ". As you can see, the non-greedy pattern really affects the matching behavior of the sub-expression, and if the whole expression fails, the non-greedy mode cannot affect the matching behavior of the sub-expression, under the premise that the whole expression matches successfully.
2.1.3 Greed or non-greed--the choice of application
Through the analysis of the application angle, has basically understood the greedy and the non-greedy pattern characteristic, then in the actual application, whether chooses the greedy pattern, or is the non-greedy pattern, this needs to determine according to the demand.
For some simple needs, such as the source character is "Aa<div>test1</div>bb", then get the div tag, using greedy and non-greedy mode can achieve the desired results, which may not be the same relationship.
However, in the case of 2.1.1, in practice, it is generally only necessary to get a paired div tag at a time, that is, what the non-greedy pattern matches, and the content that the greedy pattern matches is usually not what we need.
Then why should there be a greedy mode of existence, from the application point of view is difficult to give a satisfactory answer, which need to be from the point of view of matching theory to analyze the greedy and non-greedy mode.
2.2 Analysis of greedy and non-greedy modes from the point of view of matching principle
If you want to really understand what is greedy mode, what is the non-greedy mode, respectively, under what circumstances, the respective efficiency, it can not only from the application point of view, but to fully understand the greedy and non-greedy pattern matching principle.
Article reproduced from http://www.jb51.net/article/31491.htm
Greedy and non-greedy patterns in regular expressions