Part 1:
Advanced techniques for using regular expressions in. Net (1)
Advanced techniques for using regular expressions in. Net (2)
Reverse reference
Reverse Reference refers to referencing the matched group to other places of the expression itself. For example, when matching the HTML Tag, we match a <A>, we need to reference the matched a to find </a>. In this case, we need to use reverse references.
Syntax
A. List of reverse reference numbers. The syntax is \ number.
B. Back-reference the named group. Syntax: \ K <Name>
Example
A. Matched HTML tags
@ "<(? <Tag> [^ \ s>] +) [^>] *>. * </\ K <tag>"
B. match two overlapping characters
Public static void main ()
{
String S = "aabbc11asd ";
RegEx Reg = new RegEx (@ "(\ W) \ 1 ");
Matchcollection matches = reg. Matches (s );
Foreach (Match m in matches)
Console. writeline (M. value );
Console. Readline ();
}
The returned result is aa BB 11.
Secondary matching Group
The pattern in the brackets is not saved as part of the matching result.
1. Declaration(? =)
Meaning: The pattern in parentheses must appear on the right side of the declaration, but not as part of the matching.
Public static void main ()
{
String S = "C #. net, VB.net, PHP, Java, jscript.net ";
RegEx Reg = new RegEx (@ "[\ W \ #] + (? = \. Net) ", regexoptions. Compiled );
Matchcollection MC = reg. Matches (s );
Foreach (Match m in MC)
Console. writeline (M. value );
Console. Readline ();
// Output C # VB JScript
}
We can see that the matching engine requires matching. net, but does not put. net in the matching result.
2. negative statement(?!)
Meaning: The pattern in parentheses must not appear on the right side of the Declaration
The following example shows how to obtain all the content in a <A> tag pair, even if it contains other HTML tags.
Public static void main ()
{
String newscontent = @ "url: <a href =" "1.html" "> test <span style = "" color: red; ""> RegEx </span> </a>. ";
RegEx regend = new RegEx (@ "<\ s * A [^>] *> ([^ <] | <(?! /A) * <\ s */A \ s *> ", regexoptions. multiline );
Console. writeline (regend. Match (newscontent). value );
// Result: <a href = "1.html"> test <span style =" color: red; "> RegEx </span> </a>
Console. Readline ();
}
3. Reverse positive declaration(? <=)
Meaning: The pattern in parentheses must appear on the left side of the declaration, but not as part of the matching.
4. Negative Declaration(? <!)
Meaning: The pattern in parentheses must not appear on the left side of the Declaration
Non-backtracking matching
Syntax:(?>)
Meaning: After a group is matched, the matched characters cannot be used for expression matching. Well, I certainly don't understand this sentence. I spent a lot of time trying to understand it. Let's explain it through examples:
"Www.csdn.net" can be matched by @ "\ W + \. (. *) \. \ W +", but not by @ "\ W + \. (?>. *) \. \ W + "to match! Why?
The reason is that regular expression matching is greedy. When matching, it will match as many results as possible. Therefore, in the above two regular expressions. * All match csdn.net. At this time, the first expression is found at the start of matching \. \ W + has no character to match it, so it will perform backtracking. * The matching result is pushed back. The characters that are returned are used to match \. \ W + \. \ W + matches successfully, and the entire expression returns a successful match result. The second expression uses non-backtracing matching. Therefore, after a. * match is completed, it is not allowed to use backtracing to match \. \ W +. Therefore, the entire expression fails to match.
Please note that backtracking matching is a waste of resources. Therefore, please try to avoid your regular expression from successfully matching through backtracking, as shown in the above example, you can replace it with @ "\ W + \. ([^ \.] + \.) + \ W + "+ ".
Next article: advanced techniques for using regular expressions in. Net (4)