Regular Expression-quantifiers

Source: Internet
Author: User
Tags expression engine

Regular Expression-quantifiers
[Quantifiers], as the name implies, are used to match the quantity. Metacharacters similar to \ w, \ d, and [0-9] can only match one character. To match multiple characters, you must enter the metacharacters multiple times, this is troublesome to use. Therefore, the regular expression uses quantifiers to match the number of metacharacters. The operator quantifiers can match 0 metacharacters or countless metacharacters. There are three types of quantifiers in a regular expression: match the quantifiers (Standard quantifiers), ignore the quantifiers, and occupy the quantifiers. This article focuses on matching priority quantifiers (Standard quantifiers), which are supported by all regular expression tools, the ignore and take precedence quantifiers are only supported by some regular expression tools. The second step involves ignoring the priority quantifiers. This article does not cover the meaning of preemptible quantifiers. preemptible quantifiers are related to the content of the solidified group. In the above-mentioned matching priority quantifiers) and ignoring priority quantifiers are actually greedy and non-Greedy patterns in the regular expressions we call, first, let's start with the explanation of the number of matched priority quantifiers *, +, and ,?, {Num, num}, matching priority means that it will match as many as possible. Let's first simplify the meaning of each quantizer. All quantifiers have a so-called upper limit and lower limit *. The lower limit of an asterisk is [0] and the upper limit is [unlimited ], this means that asterisks can have at least no matching characters and can match up to countless characters +. The lower limit of the plus sign is [1] and the upper limit is [unlimited]. That is, the plus sign must match at least one character, can match up to countless characters ?, The lower limit of "hello" is "0", and the upper limit is "1". That is, "hello" can have at least no matching characters and at most one matching character {num, num }, this structure can be called a counting quantizer. The upper and lower thresholds are controlled by expressions. For example, {5, 10} indicates that the lower limit is [5] and the upper limit is [10 ], that is, it must match at least five characters and a maximum of 10 characters. The content matched by the quantifiers is represented by the metacharacters on the left of the quantifiers. For example, the regular expression \ d + metacharacters \ d represents a number. The above regular expression means that, you must match at least one number and at most countless connected numbers. If the text is [1], the matching is successful and the matching content is [1 ]. If the text is [22334455], it can also be matched successfully. The text of the matched text is [22334455 ]. Assume that the text is [July 22, August 4, 1992] and can be matched successfully. The matched content is [1992] [8] [4]. The following is an example of a regular expression for counting quantifiers: {5, 10} assume that the text is 19920804, And the match succeeds. The match text is 19920804 ]. If the text is [February 1, August 4, 1992], the match will fail, because the regular expression must match at least five connected characters, and such a string does not exist in the text. Assume that the text is [18722334455] and can be matched successfully. The matching content is [1872233445]. The reason why the last character [5] does not match is that the maximum number of regular expressions is 10, below we can only match up to 10 connected strings, we will give a slightly more difficult example. This example involves some content about the regular expression matching principle about return. Regular Expression: \ d * 3 the regular expression above indicates that no matter how many numeric characters are matched, the last character must be 3. If the text is [11224455], the match is unsuccessful. Even if \ d * can match all numeric characters, the matching fails in the last 3. Assume that the text is [123456] and can be matched successfully. The matched text is [123 ]. So how is this regular expression matched? This involves the concept of exchange. Exchange means that, in full-text matching (that is, the complete regular expression), when non-mandatory matching (for example, +, one or two can be matched, therefore, you do not need to match the second of the two expressions. This is a non-essential match.) When an expression must match, the expression must be matched and the non-essential expression is forced to be returned. In the above example \ d * 3, a non-essential expression is \ d *, and a required expression is 3. When matching the text [123456], \ d * will match all the characters because it is the matching priority. The matching result is [123456], and then the regular expression is pushed to 3, at this time, 3 does not have content to match, so force \ d * to return the last character, that is, the character [6] and 3 match, unable to match, continue to return, the character "5" is returned, but it still cannot be matched. This loop is always returned to [3] and can be matched. Therefore, the complete expression is used to match the text 123 ]. This is the entire matching process. Ignore priority quantifiers ignore the opposite of the matching priority quantifiers. Ignore the priority quantifiers mean as few matches as possible, that is, match with the lower limit as much as possible. Which of the following is the ignore priority *?, +? ,??, {Num, num }? Their online and offline are the same as those for matching priority quantifiers. Their structure is just after matching priority quantifiers is added with 【?]. The following is an example: Regular Expression: AB ?? Regular Expression: assume that the text is [AB] and the content matched by the above regular expression is [a], because it ignores the priority and matches [B, ignore is selected first. If the regular expression is AB ?, Then the matched text will be [AB], because [B] at this time is the matching priority, and it will first select matching. This example illustrates the difference between priority and matching priority. Let's look at a more complex example involving the forced return concept. The regular expression is AB ?? The c-matched text is [abc]. Here the matching result is [abc ]. The matching process is as follows: first, the regular expression a matches the character [a]. It can be matched. Then, the regular expression B ?? Is ignored, and does not match characters at the moment. In the future, regular c matches the character [B] in the text (no match has been performed just now, so the pointer Record of the text is still in Text B), and c and [B] cannot match. At this time, the matching will not fail, and the expression engine will go back to B ?? And try to match the text [B. The regular expression c matches the character [c] and can be matched. Therefore, the matching result is [abc ]. In the above example, we mentioned a new concept backtracking, which is only available in the NFA engine. We will explain it later.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.