. NET regular expressions using the advanced technique of the reverse reference

Source: Internet
Author: User
Tags foreach expression html tags net readline reference
Advanced | technique | A reverse reference, referring to the matching group to the other part of the expression itself, for example, in matching HTML tags, we match a <a>, we want to match out a reference out, to find </a>, this time will use a reverse reference.





Syntax





A, reverse reference numbered group, syntax is \number





B, reverse reference named Group, syntax \k <name>





examples





A, matching pairs of HTML tags





@ "(? <tag> [^\s>]+) [^>]*>.*</\k <tag> >





B, matching two two overlapping occurrences of characters





public static void Main ()


{


string s = "AABBC11ASD";


Regex reg = new Regex (@ "(\w) \1");


matchcollection matches = Reg. Matches (s);


foreach (Match m in matches)


Console.WriteLine (M.value);


Console.ReadLine ();


}


return result is AA BB 11





Auxiliary matching group





The following group structures, the pattern in parentheses is not saved as part of the matching result





1, positive statement (? =)





Meaning: The pattern in parentheses must appear on the right side of the declaration, but not as part of the





public static void Main ()


{


string s = "c#.net,vb.net,php,java,jscript.net";


Regex reg = new Regex (@ "[\w\#]+ (? =\.net)", regexoptions.compiled);


MatchCollection mc = Reg. Matches (s);


foreach (Match m in MC)


Console.WriteLine (M.value);


Console.ReadLine ();


//Output C # VB JScript


}


can see that the matching engine requires matching. NET, but does not put. NET in matching results





2, negative statement (?!)





meaning: The pattern in parentheses must not appear on the right side of the declaration





The following example shows how to get the entire contents of a <a> label pair, even if it contains another HTML tag.





public static void Main ()


{


string newscontent = @ "url:<a href=" "1.html" ">test<span style=" "color:red;" >regex </span> </a>. ";


Regex regend = new Regex (@) <\s*a[^>]*> ([^<]|< (?!) /a)) * <\s*/a\s*> ", regexoptions.multiline);





Console.WriteLine (Regend.match (Newscontent). Value);


//result: <a Href= "1.html" >test<span style=" color:red; >regex </span> </a>


Console.ReadLine ();


}


3, Reverse positive statement (? <=)





meaning: The pattern in parentheses must appear on the left side of the declaration, but not as part of the match





4, Reverse negative statement (? <!)





meaning: The pattern in parentheses must not appear on the left side of the declaration





non-backtracking matching





syntax: (? >)





meaning: After the group matches, its matching characters cannot be used to match the subsequent expression by backtracking. Oh, see this sentence must not understand, I had to understand this also spent a lot of time, or through an example to explain it:


"Www.csdn.net" can go through @ "\w+\." (. *) \.\w+ "to match, but not through @" \w+\. (? >.*) \.\w+ "to match!" Why, then?





The reason is that a regular match is greedy, and it will match as many results as possible, so, in the example of the two regular formula, the. * Will match the csdn.net, this time, the first expression at the beginning of the match found \.\w+ no characters to match it, so it will backtrack, the so-called backtracking, is to . * The result of the match is pushed back, the characters that are left out are then used to match the \.\w+ until the \.\w+ match succeeds, and the entire expression returns the successful match result. The second expression, because of the use of a non-backtracking match, is not allowed to match the \.\w+ by backtracking, so the entire expression fails to match.





Note that backtracking matching is a very wasteful way to match a resource, so try to avoid your regular formula to successfully match by backtracking, as in the previous example, you can change to @ "\w+\." ([^\.] +\.) +\w+ "+".





Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.