C # grouping and matching modes of regular table expressions

Source: Internet
Author: User

I. Classification of groups
The group in the regular expression includes the capture group and the non-capture group, and the capture group is divided into the common capture group and the named capture group, which are
Capture group: (exp)
Named capture group :(? <Name> exp)
Non-capturing group :(? : Exp)

Ii. Roles of groups
1. Role of a capture group

The capture group is used to save the content matched by the regular expression exp to the Group for future use.

For example, a string:
<A href = "http://anmo.ymxxyoga.com/" Title = ""> csdn </a>
I want to get the URL, and the rule it complies with is in the <a...> label, then this can be done

C # code
String test = "<a href = \" http://anmo.ymxxyoga.com/"Title = \" \ "> csdn </a> ";
Match m = RegEx. match (test, @ "<A \ s * href =" "([^" "] *)" "[^>] *>", regexoptions. ignorecase );
If (M. Success)
MessageBox. Show (M. Groups [1]. value );

The above regular expression matches the <a href = "http://anmo.ymxxyoga.com/" Title = "">, and we want to get the URL, the other part of the expression is only used to ensure that the obtained URL is in the <a...> tag, so the capture group used here saves the matched URL to the capture group, and then uses M. groups [1]. value to obtain the content matched by the capture group.
M. Groups [1]. value is a reference method for capture, and another reference method M. Result ("$1") has the same effect.

A normal capturing group uses a natural number such as, 3... to reference the capturing group.
Instead of capturing the serial numbers of a group, you can directly reference a group by using its name.

C # code
String test = "<a href = \" http://anmo.ymxxyoga.com/"Title = \" \ "> csdn </a> ";
Match m = RegEx. Match (test, @ "<A \ s * href = ""(? <URL> [^ ""] *) "" [^>] *> ", regexoptions. ignorecase );
If (M. Success)
MessageBox. Show (M. Groups ["url"]. value );


The group naming and sequence number sorting rules of the capture group are described later.

2. Role of a non-capturing Group
Non-capturing groups have two functions. The first one is more common, and the second one is more useful.

(1) saving system resources and improving efficiency
When "dimensions" is used to indicate the relationship "or", in a slightly complex case, you need to use () to limit the scope of "dimensions, otherwise, it indicates that the relationship between the left and right sides of the "handler" is "or". This is an external question. It is not described in detail here. When {num} is used for expression matching times, sometimes () is also used to limit the scope of effect.
While using () to limit the scope of the content, the matching results will be saved to a capture group by default. In most cases, we do not need to save this part of content, this has brought some side effects, wasting system resources and reducing efficiency.
A role of a non-capturing group is to eliminate this side effect ,(? : Exp) is used to match the rules expressed by exp, but the matching results are not saved to the capture group.

For example, matching time such as hh: mm: SS

C # code
MessageBox. Show (RegEx. ismatch ("18:23:55", "^ (? : [01] [0-9] | 2 [0-3]) (? : [0-5] [0-9]) {2} $ "). tostring ());

(? : [01] [0-9] limit 2 [0-3]) Verify that the hour part meets the rules, but the matching results are not saved to the capture group.
(? : [0-5] [0-9]) {2} verifies the second part, but does not save the matching results to the capture group.

(2) when using the RegEx. split method, it plays the same role as the regexoptions. explicitcapture parameter. This is not used much. Just take a look.

Iii. Capture group naming and sequence number sorting
A common capturing group is named by "(" in the order of appearance from left to right in the order of natural numbers 1, 2, 3...
The name of the capture group is (? Name in <Name> exp)

Note that, if the expression matches successfully, $0 indicates the content that the entire expression matches in any situation. groups [0]. value indicates the content matched by the entire expression, which can be abbreviated as M. value

In addition, the name capture group can be referenced by name, and can also be referenced by serial numbers. Its naming rules are as follows: first, name the sequence number of the normal capture group from left to right, and then name the sequence number from the beginning, from left to right. For example:

<A \ s * href = "(? <URL> [^ "] *)" \ s * Title = "([^"] *) "[^>] *> (? <Text> [\ s] *?) </A>
2 URL 1 3 Text

C # code
String test = "<a href = \" http://anmo.ymxxyoga.com/"Title = \" \ "> csdn </a> ";
Match m = RegEx. Match (test, @ "<A \ s * href = ""(? <URL> [^ ""] *) "\ s * Title =" "([^" "] *)" "[^>] *> (? <Text> [\ s] *?) </A> ", regexoptions. ignorecase );
If (M. Success)
{
Richtextbox1.text + = M. Groups [0]. Value + "\ n"; // <a href = "http://anmo.ymxxyoga.com/" Title = ""> csdn </a>
Richtextbox1.text + = M. Groups [1]. Value + "\ n"; // waiting for you on the bed
Richtextbox1.text + = M. Groups [2]. Value + "\ n"; // http://anmo.ymxxyoga.com
Richtextbox1.text + = M. Groups ["url"]. Value + "\ n"; // http://anmo.ymxxyoga.com
Richtextbox1.text + = M. groups [3]. Value + "\ n"; // csdn
Richtextbox1.text + = M. Groups ["text"]. Value + "\ n"; // csdn
}


4. Another group Reference Method
Except for the above M. groups [1]. value and M. result ("$1"). In addition to the two reference methods for processing the result set, there is also a reference method for replacing the result set. The example is as follows:

Only the URL and link text are retained, and other useless information in the <a...> tag is removed.

C # code
String test = "<a href = \" http://anmo.ymxxyoga.com/"Title = \" \ "> csdn </a> ";
String result = RegEx. replace (test, @ "<A \ s * href =" "([^" "] *)" "\ s * Title =" "([^" "] *) "" [^>] *> (? <Text> [\ s] *?) </A> ", @" <a href = "" $1 "" >$ {text} </a> ", regexoptions. ignorecase );
MessageBox. Show (result );


A common capturing group is referenced by $ number, while a naming capturing group is referenced by $ {name }.

Pre-Search
(? = Exp)
(?! Exp)
(? <= Exp)
(? <! Exp)

The following description is easy to feel dizzy. I will explain their functions and usage in another way.
(? = Exp) match the position before exp
(? <= Exp) match the position behind exp
(?! Exp) the position behind the matching is not the exp position.
(? <! Exp) match the position not above exp

Some documents translate assertion with Zero Width. I am used to the presearch method. The first two are forward pre-searches, and the last two are reverse pre-searches. Of course, other translations actually mean one thing, you just need to know.

These four expressions are similar to non-captured expressions in that they do not save the matching results to the capture group. The difference is that non-capturing group matches the content, although it is not saved to the capture group, it actually exists in the result $0, and the content matched by the above four Expressions generally does not exist within $0, therefore, the matching results are zero-width.

A better understanding is to use them as additional conditions, rather than the components of regular expressions.

For better description, let's first talk about the concept of "gap". "gap" is zero-width. It is only a position in a string, not an actual character, such as the string "AB ", before "A", there is a "gap" between "A" and "B", that is, the entire string has three "gaps"

(? = Exp) Add a condition behind the "gap", that is, the "gap" must be able to match the exp content.
(?! (Exp) Add a condition after the "gap", that is, the "gap" must be unable to match the exp content.
(? <= Exp) Add a condition before the "gap", that is, the exp content must be matched before the "gap ".
(? <! Exp) Add a condition before the "gap", that is, the "gap" must not match the exp content.

Example:
<[^>] *> Arbitrary HTML Tag expression

Add a condition
<(?! IMG) [^>] *>
This indicates all labels except the label. Let's look at the actual example.

C # code
String test = "<p> <a href = \" http://anmo.ymxxyoga.com/"Title = \" bed waiting for you \ "> ";
Matchcollection MC = RegEx. Matches (test, @ "<(?! IMG) [^>] *> ", regexoptions. ignorecase );
Foreach (Match m in MC)
{
Richtextbox1.text + = M. Value + "\ n ";
}

Output result:
<P>
<A href = "http://anmo.ymxxyoga.com/" Title = "bed waiting for you">
</A>
</P>

Now let's take a look at <(?! IMG) [^>] *> This Regular Expression
(?! IMG) is located in the "gap" between "<" and the first character behind it. It indicates that after this "gap, it cannot be IMG. The whole expression means that it does not match tag

Similarly, <(? = IMG) [^>] *> indicates that only the label is matched.

C # code
String test = "<p> <a href = \" http://anmo.ymxxyoga.com/"Title = \" bed waiting for you \ "> ";
Matchcollection MC = RegEx. Matches (test, @ "<(? = IMG) [^>] *> ", regexoptions. ignorecase );
Foreach (Match m in MC)
{
MessageBox. Show (M. value );
}

Output:

(? <= <A [^>] *>) ] *> indicates that only the label

C # code
String test = "<p> <a href = \" http://anmo.ymxxyoga.com/"Title = \" bed waiting for you \ "> Matchcollection MC = RegEx. Matches (test ,@"(? <= <A [^>] *>) ] *> ", regexoptions. ignorecase );
Foreach (Match m in MC)
{
MessageBox. Show (M. value );
}

Output:

(? <! <A [^>] *>) ] *> indicates that only the label that is not previously <a...> is matched.

C # code
String test = "<p> <a href = \" http://anmo.ymxxyoga.com/"Title = \" bed waiting for you \ "> Matchcollection MC = RegEx. Matches (test ,@"(? <! <A [^>] *>) ] *> ", regexoptions. ignorecase );
Foreach (Match m in MC)
{
MessageBox. Show (M. value );
}

Output:

The above (? <= <A [^>] *>) ] *> for example (? <= <A [^>] *>), but in the result M. value does not exist in the content it matches <a href = "http://anmo.ymxxyoga.com/" Title = "">, so it is zero-width, just as an additional condition exists

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.