Regular basis-capturing group

Source: Internet
Author: User

1 Overview 1.1 What is a capturing group

A capturing group is the content of a regular expression that matches a neutron expression, and is saved to a group in memory that is numerically numbered or explicitly named for easy reference later. Of course, such a reference can be either inside the regular expression or outside the regular expression.

There are two forms of a capturing group, one is a normal capture group, the other is a named capturing group, and the capturing group is usually referred to as a normal capturing group. The syntax is as follows:

Normal capturing group: (Expression)

Named capturing group: (? <name>expression)

Common capturing groups are supported in most languages or tools that support regular expressions, whereas named capturing groups are currently supported in some languages, such as. NET, PHP, and Python, and Java is said to provide support for this feature in 7.0. The syntax for the named capturing group given above is. NET, in addition to the syntax in the. NET use (? ' Name ' Expression ' is equivalent to using (? <name>expression). In PHP and Python, the named capturing group syntax is: (? p<name>expression).

Another point to note is that, in addition to (Expression) and (? <name>expression) syntax, the other (?...) The syntax is not a capturing group.

1.2 Capturing group Number sequences

Number sequences refer to rules that are numbered as capturing groups, where number sequences are clearer in regular expressions in which the normal capture group or named capturing group appears separately, and the number sequences of capturing groups are slightly more complex in regular expressions that appear to be mixed between the normal capturing group and the named capturing group.

Before unfolding the discussion, it should be stated that the capturing group numbered 0 refers to the whole of the regular expression, which is basically applicable in the language that supports capturing groups. The other numbering sequences are discussed below.

1.2.1 General capturing group number sequence

If you do not explicitly name the capturing group, that is, you do not use a named capturing group, you need to access all capturing groups in numerical order. In the case of only ordinary capturing groups, the number of capturing groups is numbered in the order in which they appear, from left to right, starting with 1.

Regular expression:(\d{4})-(\d{2}-(\d\d) )

The above regular expression can be used to match the date formatted as YYYY-MM-DD, in order to differentiate between the following table, the month and day are used \d{2} and \d\d respectively.

Match the string with the preceding regular expression: 2008-12-31, the matching result is:

Number

Named

Capturing groups

Match content

0

(\d{4})-(\d{2}-(\d\d))

2008-12-31

1

(\d{4})

2008

2

(\d{2}-(\d\d))

12-31

3

(\d\d)

31

1.2.2 named capturing group number sequence

Named capturing groups are explicitly named to allow easy access to the specified group through the group name, without the need to go through the number of numbers, while avoiding the addition or reduction of the capturing group to the result of the reference when the regular expression is extended.

It is easy to overlook, however, that named capturing groups are also numbered, and in the case of only named capturing groups, the number of capturing groups is numbered in the order in which they appear, from left to right, starting with 1.

Regular expression: (?<year>\d{4})-(? <date>\d{2}-(? <day>\d\d) )

Match the string with the preceding regular expression: 2008-12-31

The matching results are:

name

number

Capture Group

Match content

 "

(? <year>\d{4})-(? <dat e>\d{2}-(? <day>\d\d))

2008-12-31

1

(? <year>\d{4})

page

2

Date

(? <DATE&G t;\d{2}-(? <day>\d\d))

12-31

3

Day

(? <day>\d\d)

+

1.2.3 Common capturing group and named capturing group mixed number sequences

When a regular expression is mixed with a named capturing group, the number sequence of the capturing group is slightly more complex. For named capturing groups, they can be accessed at any time through the group name, whereas for a normal capturing group, it can only be accessed by determining its number.

The mixed-mode capturing group number is first numbered according to the order in which the normal capturing group appears, from left to right, from 1 onwards, when the normal capturing group number is complete, and then in the named capturing group, in the order in which it appears, from left to right, and then by the number value of the normal capturing group.

That is, the named capturing group is ignored, the normal capturing group is numbered, and the named capturing group is numbered after the normal capturing group has completed numbering.

Regular expression:(\d{4})-(? <date>\d{2}-(\d\d) )

Match the string with the preceding regular expression: 2008-12-31, the matching result is:

Number

Named

Capturing groups

Match content

0

(\d{4})-(? <date>\d{2}-(\d\d))

2008-12-31

1

(\d{4})

2008

3

Date

(? <date>\d{2}-(\d\d))

12-31

2

(\d\d)

31

2 Capturing a group reference

A reference to a capturing group typically has the following types:

1) in a regular expression, a reference to the content captured by the preceding capturing group is referred to as a reverse reference;

2) in the regular expression, (? ( Name) The conditional judgment structure of yes|no);

3) in the program, a reference to the capture group captures the content.

2.1 Reverse Reference

The capture group captures content that can be referenced not only by the program outside the regular expression, but also within the regular expression, which is referred to as a reverse reference.

The role of a reverse reference is usually to find or qualify duplicates, to limit the occurrence of a specified identity pairing, and so on.

For a reference to a normal capture group and a named capture group, the syntax is as follows:

Normal capturing group reverse reference: \K<NUMBER>, usually abbreviated to \number

Named capture group Reverse reference:\k<name> or \k ' name '

The number in the normal capture group reverse reference is a decimal number, which is the name of the capturing group, and the name of the named capturing group in the reverse reference is named capturing group.

A reverse reference involves more content, followed by a separate description.

#参考: http://blog.csdn.net/lxcnn/article/details/4146148

Regular basis-capturing group

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.