Introduction to capturing and non-capturing groups using regular expressions

Source: Internet
Author: User

Capture Group
Syntax:

Character 

Description

Example

(Pattern)

Match pattern and capture the result. The group number is automatically set.

(Abc) + d

Match abcd or abcabcd

(? <Name>Pattern)

Or

(? 'Name'Pattern)

Match pattern and capture the result. Set name to the group name.

 

\Num

Reverse reference of the capture group. Num is a positive integer.

(\ W) \ 2 \ 1

Match abba

\ K <Name>

Or

\ K'Name'

Reverse reference of the named capture group. The name is the name of the capture group.

(? <Group> \ w) abc \ k <group>

Match xabcx

After a subexpression is specified with parentheses, the text that matches the subexpression (that is, the content captured by this group) can be further processed in the expression or other programs. By default, each capture group will automatically have a group number. The rule is: from left to right, marked by the left parentheses of the group, and the first group number that appears is 1, the second is 2, and so on.
For example:
(\ D {4})-(\ d {2}-(\ d {2 }))
1 1 2 3 32
The following is an example of using a program to process a capture group. A Url address is parsed and all capture groups are displayed.
You can see the number of capture groups set in order.
Regex. Match Method
Copy codeThe Code is as follows:
Using System. Text. RegularExpressions;
Namespace Wuhong. Test
{
Class Program
{
Static void Main (string [] args)
{
// Target string
String source = "http: // reg-test-server: 8080/download/file1.html #";
// Regular Expression
String regex = @ "(\ w +): \/([^/:] +) (: \ d + )? ([^ #:] *) ";
Regex regUrl = new Regex (regex );
// Match the Regular Expression
Match m = regUrl. Match (source );
Console. WriteLine (m. Success );
If (m. Success)
{
// The capture group is stored in the Match. Groups set. The index value starts from 1 and the index 0 is the matching string value.
// Display in the format of "group number: captured content"
For (int I = 0; I <m. Groups. Count; I ++)
{
Console. WriteLine (string. Format ("{0 }:{ 1}", I, m. Groups [I]);
}
}
Console. ReadLine ();
}
}
}

You can also specify the group name of the subexpression. In this way, the group name can be directly referenced in the expression or program, and the group number can also be used. However, if a regular expression contains both a common capturing group and a named capturing group, pay special attention to the number of the capturing group. The numbering rule is to first number the common capturing group, the name of the capture group is numbered.
For example:
(\ D {4 })-(? <Date> \ d {2}-(\ d {2 }))
1 1 3 2 23

The following code processes the name capture group in the program, displays the group number generated by the mixed rule, and replaces the source string with the content of the capture group.
We can see that the general capturing group is numbered first, and then the named capturing group number.
Regex. Replace Method
Copy codeThe Code is as follows:
Using System. Text. RegularExpressions;
Namespace Wuhong. Test
{
Class Program
{
Static void Main (string [] args)
{
// Target string
String source = "http: // reg-test-server: 8080/download/file1.html #";
// Regular expression, which is used to name two groups
String regex = @ "(\ w + ):\/\/(? <Server> [^/:] + )(? <Port>: \ d + )? ([^ #:] *) ";
Regex regUrl = new Regex (regex );
// Match the Regular Expression
Match m = regUrl. Match (source );
Console. WriteLine (m. Success );
If (m. Success)
{
// The capture group is stored in the Match. Groups set. The index value starts from 1 and the index 0 is the matching string value.
// Display in the format of "group number: captured content"
For (int I = 0; I <m. Groups. Count; I ++)
{
Console. WriteLine (string. Format ("{0 }:{ 1}", I, m. Groups [I]);
}
}
// Replace the string
// The "$ group number" references the content of the capture group.
// Note that a string cannot be followed by a number after the "$ group number". In this case, use the name capture group in the format of "$ {group name }"
String replacement = string. Format ("$1: // {0} {1} $2", "new-reg-test-server ","");
String result = regUrl. Replace (source, replacement );
Console. WriteLine (result );
Console. ReadLine ();
}
}
}


Non-capturing Group
Syntax:

Character 

Description

Example

(? :Pattern)

Matches pattern, but does not capture matching results.

'Industr (? : Y | ies)

Match 'cluster' or 'industries '.

(? =Pattern)

Pre-check with zero width, and no matching results are captured.

'Windows (? = 95 | 98 | NT | 2000 )'

Match "Windows" in "Windows2000"

Does not match "Windows" in "Windows3.1 ".

(?!Pattern)

Pre-query with Zero Width and negative value without capturing matching results.

'Windows (?! 95 | 98 | NT | 2000 )'

Match "Windows" in "Windows3.1"

Does not match "Windows" in "Windows2000 ".

(? <=Pattern)

The zero-width forward lookup does not capture matching results.

'1970 (? <= Office | Word | Excel )'

Match "2000" in "Office2000"

Does not match "2000" in "Windows2000 ".

(? <!Pattern)

The matching results are not captured.

'1970 (? <! Office | Word | Excel )'

Match "2000" in "Windows2000"

The parameter does not match "2000" in "Office2000 ".


A non-capturing group only matches the results, but does not capture the results, and no group number is assigned. Of course, it cannot be further processed in expressions and programs.
First (? : Pattern) is different from (pattern) except that no results are captured.
The following four non-capturing groups are used to match the content before (or after) the position of pattern (or does not match pattern. The matching result does not include pattern.
For example:
(? <= <(\ W +)> ).*(? = <\/\ 1>) matches the content in a simple HTML Tag that does not contain attributes. For example, the matching result does not include the prefix <div> and suffix </div> for the "hello" in <div> hello </div>.
The following is an example of a non-capturing group in the program to extract the zip code.
We can see that both reverse lookup and reverse pre-query are not captured.
Regex. Matches Method
Copy codeThe Code is as follows:
Using System. Text. RegularExpressions;
Namespace Wuhong. Test
{
Class Program
{
Static void Main (string [] args)
{
// Target string
String source = "there are 6 groups of numbers: 010001,100,210, and. Pick out the zip code. ";
// Regular Expression
String regex = @"(? <! \ D) ([1-9] \ d {5 })(?! \ D )";
Regex regUrl = new Regex (regex );
// Obtain all matches
MatchCollection mList = regUrl. Matches (source );
For (int j = 0; j <mList. Count; j ++)
{
// Display each group. You can see that each group has only one item, but reverse lookup and reverse lookup are not captured.
For (int I = 0; I <mList [j]. Groups. Count; I ++)
{
Console. WriteLine (string. Format ("{0 }:{ 1 }:{ 2}", j, I, mList [j]. Groups [I]);
}
}
Console. ReadLine ();
}
}
}


Note
Syntax:

Character

Description

Example

(? #Comment)

Comment is a comment, which does not affect the processing of regular expressions.

2 [0-4] \ d (? #200-249) | 25 [0-5] (? #250-255) | 1? \ D? (? #0-199)

Matches an integer between 0 and.

This is not explained.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.