The backtracking of regular expressions--regular expressions

Source: Internet
Author: User
I am also the first contact about "backtracking" and I don't know much about it. Here is what I know as a heart tak record down, for viewing.

The matching basis of the regular expressions we use is probably divided into: the first choice of the most left (the first) matching results and the standard matching quantifiers (*, +,? and {m, n}) are match-first.

"First-choice leftmost matching", by definition, is the basis for starting from the beginning of the string until the match is over. The standard matching classifier "is also divided into" indeterminate type of finite automaton (NFA) "or" expression-oriented ", while the other is" definite-type finite automaton (DFA) "can also be called" Text-led. " The regular expression we are currently using in JavaScript is "expression-led." The expression-led and text-led explanation is a bit of a hassle, but first it looks like an example might be clearer.

Copy Code code as follows:

Use regular expressions to match text
var reg =/to (nite|knight|night)/;
var str = ' doing tonight ';
Reg.test (str);

In the above example, the first element [T], it will try again until the ' t ' is found in the target string. After that, check to see if the character immediately followed is matched by [o], and if so, check the following element (Nite|knight|night). Its true meaning is "nite" or "Knight" or "Night". The engine will try these 3 possibilities in turn. The process of trying [nite] is to try [n] First, then [i], then [t], and finally [E]. If this attempt fails, the engine tries another possibility so that it continues until the match succeeds or the report fails. Control in an expression is converted between different elements, so it is called an expression dominant.

It is also the example above "text-led", when the string is scanned, all matches that are currently valid are logged. When the engine moves to T, it adds a potential possibility to the current processing match:

Position in the string Position in regular expression
... doing tonight Possible match location:/t↑O (nite|knight|nigth)/

Each character that is scanned next updates the current possible match sequence. After you continue scanning for two characters, the following situation is:

Position in the string Position in regular expression
... doing tonight Possible match location:/to (ni↑te|knight|ni↑gth)/

The possible match becomes two (Knight is eliminated). When you scan to G, there is only one possible match. When the H and T matches are complete, the engine finds that the match is complete and the report succeeds. Text dominates because each character in the string it scans is controlled by the engine.

If you want to figure out how "expression dominance" works, take a look at today's topic "backtracking" (backtracking). Backtracking is like taking a fork in the road and making a mark at each intersection when you meet a fork. If you go to a dead end, you can follow the way back, until you meet the mark you've made before, marking the path that hasn't been tried yet. If that road is gone, you can go back, find the next marker, repeat it until you find a way out, or until you have done all the road you haven't tried.

In many cases, the regular engine must make a choice in two (or more) options. The engine must try to match x when the/......x?....../is encountered. Have a very ... X+....../, there is no doubt that X tries to match at least once--because the plus sign requires that it must match at least once. After the first X match, this requirement has been met and you need to decide whether to try the next X. If you decide to do so, decide whether to match the third X, the fourth X, and so on. Each choice, in fact, is to make a mark, to hint that there is another possible choice, reserved for standby. There are two main points to consider in the backtracking process: Which branch should be selected first? which (or which) are the previously saved branches of the retrospective?

The first question is chosen by following this important principle:

If you need to choose between "Try it" and "pass it on", the engine chooses "try it" for the matching priority classifier, and the "pass attempt" is selected for ignoring the precedence classifier.

The second issue is the following principle:

The option that is closest to the current store is returned when the local failure forces backtracking. The principle used is LIFO (last in-first out, LIFO).

Let's take a look at a few examples of marking on the road:

1. No backtracking matching

Match "abc" with [Ab?c]. After [a] match, the current status of the match is as follows:

"A↑BC" A↑b?c

Now turn to [b] , the regular engine needs to decide: Is it necessary to try [b] or skip? Because [? is a match-first, it tries to match. However, in order to ensure recovery after this attempt is ultimately unsuccessful, the engine will:

"A↑BC" Ab? ↑C
Added to the standby status sequence. That is, the engine may continue to match from the following location at a later time: from [b] in the regular expression After that, the string's C (that is, the current position) matches. This is actually skipping the [B] match, and the problem allows that. Once the engine is marked, it will continue to check [b]. In the example, it can match, so the new current state becomes:
"AB↑C" Ab? ↑C

The final [C] can also match successfully, so the entire match is complete. Standby states are no longer needed, so they are no longer saved.

2. The matching of backtracking

The following text to match is "AC", and before you try [b], everything is the same as the previous procedure. Obviously, [B] cannot match this time. In other words, to [...] The way to try is not going through. Because there is a standby state, this "local match failed" production union causes the overall match to fail. The engine will backtrack, that is, to switch the current state to the most recently saved state.

"A↑C" Ab? ↑C

The option that has not been tried before [b] to try. At this point, [C] can match C, so the entire matching declaration is complete.

3, unsuccessful match

The text to match now is "ABX". Before [b] was tried, the standby state was saved because there was a question mark:

"A↑bx" Ab? ↑C

[b] can match, but the road goes down, because [C] cannot match X. The engine then goes back to the previous state and "returns" B to [C] to match. Obviously, the test also failed. If there are other saved states, backtracking continues, but there is no other state at this point, and the entire match starting at the current position in the string fails.

At present, the backtracking of regular expressions can only understand so much, later I slowly add it!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.