Common quantifiers of Regular Expressions

Source: Internet
Author: User
Tags closing tag
Common quantifiers of Regular Expressions

{M,N}The regular expression has three common quantifiers, which are+,?,*. Their form is different from{M,N}The functions are the same (or you can refer to them as "quantifiers"). For details, see table 2-2.

 

Table 2-2 common quantifiers

Common quantifiers

{M, n} equivalent form

Description

*

{0 ,}

It may or may not appear, and there is no upper limit on the number of occurrences

+

{1 ,}

Appears at least once, and there is no upper limit on the number of occurrences

?

{0, 1}

It can appear at most once or not

 

In practice, in many cases, only the three meanings need to be expressed, so the frequency of commonly used quantifiers is higher{M,N}.

As we all know, some American and British English terms are written differently, such as traveler and traveleller, it is required that 2nd l be "appear once at most or not", which is exactly used?Quantifiers:Travell? Er, As shown in example 2-4.

 

Example 2-4 quantifiers? Application

Re. Search (R "^ Travell? Er $ "," traveler ")! = None
#=> True

Re. Search (R "^ Travell? Er $ "," traveller ")! = None
#=> True

 

In fact, there are many such cases, such as favor, favor, color, and color. In addition, there are many other application scenarios, such as HTTP and HTTPS. Although they are two concepts, they are both protocol names and can be usedHTTPS?Matching. For example, the string indicating the price may be 100 or 100.¥? 100Match.

Quantifiers are also widely used to parse HTML code. HTML is a "tag language" that contains various tags, such as <Match <,>Matched to the end>,[^>] +Matches the "several characters" in the middle, so the entire regular expression is<[^>] +>, As shown in example 2-5.

 

Example 2-5 quantifiers +

Re. Search (R "^ <[^>] +> $", "<bold> ")! = None
#=> True

Re. Search (R "^ <[^>] +> $", "</table> ")! = None
#=> True

Re. Search (R "^ <[^>] +> $", "<> ")! = None
#=> False

 

Similarly, you can use a regular expression to match a double quotation mark string. The difference is that the double quotation mark string can have no character between the two double quotation marks. "" is also a completely legal double quotation mark string, and the quantifiers should be used.*So the whole regular expression becomes"[^"] *"For the program, see Example 2-6.

 

Example 2-6 quantifiers *

Re. Search (R "^ \" [^ \ "] * \" $ "," \ "Some \"")! = None
#=> True

Re. Search (R "^ \" [^ \ "] * \" $ ","\"\"")! = None
#=> True

Note: Double quotation marks in a string must be escaped and written as \ ". This is not a regular expression, but a string escape.

 

There are a lot of knowledge about the use of quantifiers. Let's look at several tag matching examples: tags can be roughly divided into open tags and close
For example, Tag, such as <br/>. Now let's look at the regular expressions that match these three types of tags.

Open tags start with <, followed by "several characters" (but cannot start with/), and finally>, so the corresponding regular expression is<[^/] [^>] *>Note: Because[^/]Must match one character. Therefore, the other part of "several characters" must be written[^>] *Otherwise, it cannot match tags with a single character, such as <B>.

The feature of close tag is to start with <, followed by/, followed by "a few characters (but cannot start with/)", and finally>, so the corresponding regular expression is</[^>] +>;

The self-closing tag starts with <, contains "several characters" in the middle, and finally/>. Therefore, the corresponding regular expression is<[^>] +/>. Note: This is not<[^>/] +/>, Exclusive character group only exclusive>, but not exclude/, because you only need to confirm that the/appears before the end. If it is written<[^>/] +/>The tag cannot match Src = "http: // somehost/picture"/>.

Table 2-3 lists the expressions matching tags.

 

Table 2-3 matching of various tags

Expression matching all tags

Tag Classification

Expression for matching classification tags

 

<[^>] +>

Open tag

<[^/>] [^>] *>

Close tag

</[^>] +>

Self-closing tag

<[^>/] +/>

 

By comparing the expressions "Matching all tags" and "matching classification tags" in the table, we can find that their modes are similar, but the details are different. That is to say, by changing character groups and quantifiers, You can accurately control the range of strings that can be matched by regular expressions for different purposes. This is actually a fundamental rule when regular expressions are used: Use appropriate structures (including character groups and quantifiers) to precisely express your intent and define the matching text.

After careful observation, you may find that the expression matching the open tag can also match the self-closing tag:<[^/] [^>] *>Yes <br/> because[^>] *It does not rule out/matching. Change the expression<[^/] [^>] * [^/]>To ensure that the matching open
The tag does not end with/>.

However, this produces new problems:<[^/] [^>] * [^/]>Two matched tags appear between <and>.[^/]As mentioned in the previous chapter, the exclusion character group indicates "matching an unlisted character at the current position". Therefore, the string in the tag must contain at least two characters, in this way, the <u> cannot be matched.

Think about it. What we really want to express is that the character string inside the tag cannot start with/or end with/. If the character string contains only one character, it is both the beginning, it is also the end. It is obviously inappropriate to use two excluded character groups. It seems that there is no way to solve this problem. In fact, it's just that the existing knowledge is not enough to solve this problem. On page 1, we have a detailed solution to this problem.

 

 

This article is excerpted from the book "Regular Expression Guide" by Yu Sheng

Book details: http://blog.csdn.net/broadview2006/article/details/7569554

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.