Regular Expression-quantifiers

Last Update:2014-07-26 Source: Internet

Author: User

Tags expression engine

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Regular Expression-quantifiers
[Quantifiers], as the name implies, are used to match the quantity. Metacharacters similar to \ w, \ d, and [0-9] can only match one character. To match multiple characters, you must enter the metacharacters multiple times, this is troublesome to use. Therefore, the regular expression uses quantifiers to match the number of metacharacters. The operator quantifiers can match 0 metacharacters or countless metacharacters. There are three types of quantifiers in a regular expression: match the quantifiers (Standard quantifiers), ignore the quantifiers, and occupy the quantifiers. This article focuses on matching priority quantifiers (Standard quantifiers), which are supported by all regular expression tools, the ignore and take precedence quantifiers are only supported by some regular expression tools. The second step involves ignoring the priority quantifiers. This article does not cover the meaning of preemptible quantifiers. preemptible quantifiers are related to the content of the solidified group. In the above-mentioned matching priority quantifiers) and ignoring priority quantifiers are actually greedy and non-Greedy patterns in the regular expressions we call, first, let's start with the explanation of the number of matched priority quantifiers *, +, and ,?, {Num, num}, matching priority means that it will match as many as possible. Let's first simplify the meaning of each quantizer. All quantifiers have a so-called upper limit and lower limit *. The lower limit of an asterisk is [0] and the upper limit is [unlimited ], this means that asterisks can have at least no matching characters and can match up to countless characters +. The lower limit of the plus sign is [1] and the upper limit is [unlimited]. That is, the plus sign must match at least one character, can match up to countless characters ?, The lower limit of "hello" is "0", and the upper limit is "1". That is, "hello" can have at least no matching characters and at most one matching character {num, num }, this structure can be called a counting quantizer. The upper and lower thresholds are controlled by expressions. For example, {5, 10} indicates that the lower limit is [5] and the upper limit is [10 ], that is, it must match at least five characters and a maximum of 10 characters. The content matched by the quantifiers is represented by the metacharacters on the left of the quantifiers. For example, the regular expression \ d + metacharacters \ d represents a number. The above regular expression means that, you must match at least one number and at most countless connected numbers. If the text is [1], the matching is successful and the matching content is [1 ]. If the text is [22334455], it can also be matched successfully. The text of the matched text is [22334455 ]. Assume that the text is [July 22, August 4, 1992] and can be matched successfully. The matched content is [1992] [8] [4]. The following is an example of a regular expression for counting quantifiers: {5, 10} assume that the text is 19920804, And the match succeeds. The match text is 19920804 ]. If the text is [February 1, August 4, 1992], the match will fail, because the regular expression must match at least five connected characters, and such a string does not exist in the text. Assume that the text is [18722334455] and can be matched successfully. The matching content is [1872233445]. The reason why the last character [5] does not match is that the maximum number of regular expressions is 10, below we can only match up to 10 connected strings, we will give a slightly more difficult example. This example involves some content about the regular expression matching principle about return. Regular Expression: \ d * 3 the regular expression above indicates that no matter how many numeric characters are matched, the last character must be 3. If the text is [11224455], the match is unsuccessful. Even if \ d * can match all numeric characters, the matching fails in the last 3. Assume that the text is [123456] and can be matched successfully. The matched text is [123 ]. So how is this regular expression matched? This involves the concept of exchange. Exchange means that, in full-text matching (that is, the complete regular expression), when non-mandatory matching (for example, +, one or two can be matched, therefore, you do not need to match the second of the two expressions. This is a non-essential match.) When an expression must match, the expression must be matched and the non-essential expression is forced to be returned. In the above example \ d * 3, a non-essential expression is \ d *, and a required expression is 3. When matching the text [123456], \ d * will match all the characters because it is the matching priority. The matching result is [123456], and then the regular expression is pushed to 3, at this time, 3 does not have content to match, so force \ d * to return the last character, that is, the character [6] and 3 match, unable to match, continue to return, the character "5" is returned, but it still cannot be matched. This loop is always returned to [3] and can be matched. Therefore, the complete expression is used to match the text 123 ]. This is the entire matching process. Ignore priority quantifiers ignore the opposite of the matching priority quantifiers. Ignore the priority quantifiers mean as few matches as possible, that is, match with the lower limit as much as possible. Which of the following is the ignore priority *?, +? ,??, {Num, num }? Their online and offline are the same as those for matching priority quantifiers. Their structure is just after matching priority quantifiers is added with 【?]. The following is an example: Regular Expression: AB ?? Regular Expression: assume that the text is [AB] and the content matched by the above regular expression is [a], because it ignores the priority and matches [B, ignore is selected first. If the regular expression is AB ?, Then the matched text will be [AB], because [B] at this time is the matching priority, and it will first select matching. This example illustrates the difference between priority and matching priority. Let's look at a more complex example involving the forced return concept. The regular expression is AB ?? The c-matched text is [abc]. Here the matching result is [abc ]. The matching process is as follows: first, the regular expression a matches the character [a]. It can be matched. Then, the regular expression B ?? Is ignored, and does not match characters at the moment. In the future, regular c matches the character [B] in the text (no match has been performed just now, so the pointer Record of the text is still in Text B), and c and [B] cannot match. At this time, the matching will not fail, and the expression engine will go back to B ?? And try to match the text [B. The regular expression c matches the character [c] and can be matched. Therefore, the matching result is [abc ]. In the above example, we mentioned a new concept backtracking, which is only available in the NFA engine. We will explain it later.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More