Use "/(.*?) D/"Why does regular expression match" abcd "instead of" d "?

Source: Internet
Author: User
And use (.*?) If the regular expression matches abcd, only a and another problem is obtained: Use (.*?) D. Why does the regular expression match abcd instead of abcd? PHP code: about? Metacharacters are used for * zhihu's first question * "/(.*?) /"If the regular expression matches" abcd ", only" a "is obtained"

And another problem:
Use "/(.*?) D/"Why does the regular expression match" abcd "instead of" abcd?

PHP code:
About "? "Metacharacters
/* Zhihu's first question */reply content: For "/(. *?) D/"Why does regular expression match" abcd "instead of" d "?" This problem is still a non-Greedy match. I can only sigh with you these five scum.

First, the regular match you mentioned generally refers to partial match or search. It should be exec in js, but I have not checked it. This policy is independent of the regular expression itself. Once this policy is used, the RegEx engine will find the feasible solution from the left to the start point. Non-greedy means that results are returned if conditions are met.

*Is zero or any number, so A (.*?)It is actually equivalent:

a        * = 0a.       * = 1a..      * = 2a...     * = 3…        …
No matter how your regular expression is written, the sooner the first character appears at the position of the Meta string, the higher the priority of this answer. It doesn't matter whether it's greedy or not !! @ Tim Shen !!
A very important premise is that the regular expression is successful. That is to say, we need to find a feasible solution!

/(.*?)d/
Because regular expressions are scanned from the past to the next by character.

A yes
B yes
C yes
D bingo's first a. d corresponds to the two abcd vertices matching bc respectively.
(.*?) No need to read
Third... d corresponds to three vertices in the past, which are abcd and match abc respectively, because the actual effect of this match is to find the first match ". ", and then find the first" d ", which is actually no different from greedy or greedy, the reason is that regular expressions are matched from the past to the next, rather than from the back to the next! I saw a very good post and explained this problem from the NFA engine mechanism.
For your reference
The key content of this question is underlined.
(There are two images that you will understand after reading)


The greedy and non-Greedy modes can be understood from the application and principle perspectives, but if you want to grasp them, you still need to understand the matching principle.

From the application perspective, I would like to answer "what is the greedy and non-Greedy model ?"

2.1 greedy and non-Greedy models from the application perspective
2.1.1 what is greedy and non-Greedy mode?
Let's look at an example.

Example:

Source string: aa

Test1

Bb

Test2

Cc

Regular Expression 1:

.*



Matching result 1:

Test1

Bb

Test2



Regular Expression 2:

.*?



Matching result 2:

Test1

(This indicates a matching result, so it is not included

Test2

)

Based on the above example, we can analyze the Matching Behavior in what is the greedy and non-Greedy pattern.

The first regular expression uses the greedy pattern.

The entire expression can be matched successfully. However, because the greedy mode is used, you still need to try matching to the right to check whether there are longer substrings that can be matched successfully, match to the second"

"Then, no child string can be matched to the right. The match ends and the matching result is"

Test1

Bb

Test2

". Of course, the actual matching process is not like this. The matching principle will be detailed later.

Only from the application perspective, we can think that, Greedy mode is to match as many expressions as possible on the premise that the entire expression matches successfully,That is, the so-called "greedy". In layman's terms, it means to pick up what you want, unless you don't want it any more.

Regular Expression 2 adopts the non-Greedy mode and matches the first"

"To make the entire expression match successfully. Because the non-Greedy mode is adopted, the matching is ended and no attempt is made to the right. The matching result is"

Test1

".

Only from the application perspective, we can think that, The non-Greedy mode is to match as few as possible on the premise that the entire expression matches successfully,That is, the so-called "non-greedy". In layman's terms, you just need to find a desired one. If you still have nothing to do with it, you just need to pick it up.

2.1.2 prerequisites
When analyzing the greedy and non-Greedy models from the application perspective, one of the prerequisites that has always been mentioned is "The entire expression matches successfully"Why do we need to emphasize this premise? Let's look at the following example.

Regular Expression 3:

.*

Bb

Matching result 3:

Test1

Bb

"." Is still matched with the priority quantizer "*", so here it is greedy mode, the previous"

.*

"Can still match"

Test1

Bb

Test2

", But because the following" bb "cannot be matched successfully, then"

.*

"Must make the matched" bb"

Test2

To make the entire expression match successfully. In this case, the matching result of the entire expression is"

Test1

Bb ","

.*

"Matched content is"

Test1

". It can be seen that, on the premise of "the entire expression matches successfully", the greedy mode truly affects the Matching Behavior of the subexpression. If the entire expression fails to match, greedy mode only affects the matching process, and the impact on matching results cannot be discussed.

The non-Greedy mode also has the same problem. Let's look at the example below.

Regular Expression 4:

.*?

Cc

Matching Result 4:

Test1

Bb

Test2

Cc

The non-Greedy mode is used here.

.*?

"Is still matched to"

Test1

"So far, the" cc "after this time cannot be matched successfully, requiring"

.*?

"You must continue to the right until the Matching content is"

Test1

Bb

Test2

", The following" cc "can be matched successfully, the entire expression is matched successfully, and the matched content is"

Test1

Bb

Test2

Cc ", where"

.*?

"Matched content is"

Test1

Bb

Test2

". We can see that, on the premise of "the entire expression matches successfully", non-Greedy mode actually affects the Matching Behavior of the subexpression. If the entire expression fails to match, the non-Greedy mode cannot affect the Matching Behavior of subexpressions.

2.1.3 greedy or non-greedy-Application Selection
From the perspective of application analysis, we have basically understood the features of greedy and non-Greedy models. In actual application, do you choose greedy or non-Greedy models, this should be determined based on requirements.

For some simple requirements, such as the source character "aa"

Test1

"Bb", then the p tag can be obtained using both greedy and non-Greedy modes. Which of the following methods does not have much to do with it.

However, in the example in 2.1.1, in practice, only one pair p tag is required at a time, that is, the content matched by the non-Greedy pattern, the content that the greedy pattern matches is generally not what we need.

Why is there a greedy pattern? From the application perspective, it is difficult to give satisfactory answers. Therefore, we need to analyze the greedy and non-Greedy pattern from the perspective of matching principles.

2.2 greedy and non-Greedy models from the perspective of matching principles
If you really want to know what the greedy mode is, what the non-Greedy mode is, when it is used, and how efficient it is, you cannot simply analyze it from the application perspective, it is necessary to fully understand the matching principles of greedy and non-Greedy patterns.

2.2.1 starting from the basic Matching Principle
For the basic NFA Engine Matching principles, see NFA Engine Matching principles.

This article mainly introduces the matching principles involved in the greedy and non-Greedy modes. Let's take a look at the simple matching process of greedy mode.

Source string: "Regex"

Regular Expression :".*"


-1

Note: In order to be able to see clearly the matching process, the gap above is large, and the actual source string is "" Regex ", the same below.

Let's take a look at the matching process.

First, the first "" gets control, matches the "" with 0 digits, matches successfully, and gives control to ". *".

After ". *" gets control, because "*" is a matching priority quantizer, matching is given priority when matching is not matching. Start from the "R" at location 1 and try matching. The matching is successful. Continue to the right and match the "e" at location 2. The matching is successful. Continue to the right, the matching is successful until it matches "" at the end. Because it matches the end of the string, ". * "ends the match and gives control to the" at the end of the regular expression.

"After obtaining control, because it is already at the end of the string and the matching fails, the status of the forward lookup is available for backtracking, and the control is handed over to". * ", by". * "giving up a character, that is, the" at the end of the string, and then giving control to the "at the end of the regular expression, match "" at the end of the string by ". The match is successful.

In this case, the entire regular expression is matched successfully, and ". *" matches "Regex", and a backtracing is performed during the matching process.

Next, let's take a look at the simple matching process in non-Greedy mode.

Source string: "Regex"

Regular Expression :".*? "




-2

Let's take a look at the non-Greedy pattern matching process.

First, the first "" gets control, matches "" with 0 digits, matches successfully, and gives control to ". *?".

". *?" After obtaining control, Is to ignore the priority quantifiers. In the case of matching or not matching, the priority attempts do not match. Because "*" is equivalent to "{0, it does not match any content. Try to ignore the match from position 1, that is, do not match any content, and give control to the last "of the regular expression.

After "gets control, it tries to match from position 1." "matches" R "at position 1. If the matching fails, it looks forward to the available status, control to ". *? ", By ". *?" Take a character, match the "R" at position 1, and then give the control to the "at the end of the regular expression.

After "gets control, it tries to match from position 2." "matches" e "at position 1. If the matching fails, it looks forward for the available status, repeat the above process until *?" Match "x", and then give the control to the "at the end of the regular expression.

After "gets control, it tries to match from position 6." "matches the last" "of the string, and the match is successful.


At this time, the entire regular expression matches successfully, where ". *?" The matching content is "Regex", and the matching process carries out five backtracking operations.

Full text link:
Regular Expression basics-

Thanks to the original author

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.