The first two articles describe the basics of regular expressions and some simple examples, this one will go a little deeper into the grouping of regular expressions, in. NET the expression grouping is represented by the match class.
First look at a section of code:
/// <summary>///Show examples of multiple group within match/// </summary> Public voidshowstructure () {//the string to match stringText ="1 A 2B 3C 4D 5E 6F 7G 8H 9I 10J 11Q 12J 13K 14L 15M 16N ffee80 #800080"; //Regular Expressions stringPattern =@"( (/d+) ([A-z]))/s+"; //use Regexoptions.ignorecase enumeration values to indicate case insensitivityRegex r =NewRegex (pattern, regexoptions.ignorecase); //match a string with a regular expression, returning only one match resultMatch m =R.match (text); while(m.success) {//Displays the index value at the beginning of the match and the value to matchSystem.Console.WriteLine ("match=["+ M +"]"); CaptureCollection cc=M.captures; foreach(Capture Cinchcc) {Console.WriteLine ("/tcapture=["+ C +"]"); } for(inti =0; i < M.groups.count; i++) {Group Group=M.groups[i]; System.Console.WriteLine ("/t/tgroups[{0}]=[{1}]", I, group); for(intj =0; J < Group. Captures.count; J + +) {Capture Capture=Group. CAPTURES[J]; Console.WriteLine ("/t/t/tcaptures[{0}]=[{1}]", J, capture); } } //make the next match.m =M.nextmatch (); }}
The execution of this code is as follows:
MATCH=[1A]
CAPTURE=[1A]
GROUPS[0]=[1A]
CAPTURES[0]=[1A]
GROUPS[1]=[1A]
CAPTURES[0]=[1A]
GROUPS[2]=[1]
CAPTURES[0]=[1]
Groups[3]=[a]
Captures[0]=[a]
MATCH=[2B]
CAPTURE=[2B]
GROUPS[0]=[2B]
CAPTURES[0]=[2B]
GROUPS[1]=[2B]
CAPTURES[0]=[2B]
GROUPS[2]=[2]
CAPTURES[0]=[2]
GROUPS[3]=[B]
CAPTURES[0]=[B]
.................. To omit some of the results.
MATCH=[16N]
CAPTURE=[16N]
GROUPS[0]=[16N]
CAPTURES[0]=[16N]
GROUPS[1]=[16N]
CAPTURES[0]=[16N]
GROUPS[2]=[16]
CAPTURES[0]=[16]
Groups[3]=[n]
Captures[0]=[n]
By analyzing the code in the code above, we come to the conclusion that in ((/d+) ([A-z])/s+ This regular expression contains a total of four groups, that is, grouping, according to the default left-to-right matching method, where Groups[0] represents the entire grouping, The others are sub-groups, denoted by the following:
In the above code is the Regex class of the match () method, call this method return is a match, to handle parsing all the strings, but also in the while loop through the match class of NextMatch () The Success method returns the next possible successful match (which can be determined by the match class's properties). The above code can also be written in the following form:
/// <summary>///use the matches method of the Regex class to all the matches/// </summary> Public voidMatches () {//the string to match stringText ="1 A 2B 3C 4D 5E 6F 7G 8H 9I 10J 11Q 12J 13K 14L 15M 16N ffee80 #800080"; //Regular Expressions stringPattern =@"( (/d+) ([A-z]))/s+"; //use Regexoptions.ignorecase enumeration values to indicate case insensitivityRegex r =NewRegex (pattern, regexoptions.ignorecase); //matches a string using a regular expression to return all matching resultsMatchCollection MatchCollection =r.matches (text); foreach(Match minchmatchcollection) { //Displays the index value at the beginning of the match and the value to matchSystem.Console.WriteLine ("match=["+ M +"]"); CaptureCollection cc=M.captures; foreach(Capture Cinchcc) {Console.WriteLine ("/tcapture=["+ C +"]"); } for(inti =0; i < M.groups.count; i++) {Group Group=M.groups[i]; System.Console.WriteLine ("/t/tgroups[{0}]=[{1}]", I, group); for(intj =0; J < Group. Captures.count; J + +) {Capture Capture=Group. CAPTURES[J]; Console.WriteLine ("/t/t/tcaptures[{0}]=[{1}]", J, capture); } } }}
The above code is the same as using the while loop to iterate through all the matches, and in reality it is possible to start a match from a location that does not need to be all matched, such as starting at the 32nd character, which can be done by match () or matches () method is implemented by overloading the method, only the MatchCollection matchcollection = r.matches (text) in the instance code is required, instead matchcollection MatchCollection = r. Matches (text,48); it's ready.
The output results are as follows:
MATCH=[5M]
CAPTURE=[5M]
GROUPS[0]=[5M]
CAPTURES[0]=[5M]
GROUPS[1]=[5M]
CAPTURES[0]=[5M]
GROUPS[2]=[5]
CAPTURES[0]=[5]
GROUPS[3]=[M]
CAPTURES[0]=[M]
MATCH=[16N]
CAPTURE=[16N]
GROUPS[0]=[16N]
CAPTURES[0]=[16N]
GROUPS[1]=[16N]
CAPTURES[0]=[16N]
GROUPS[2]=[16]
CAPTURES[0]=[16]
Groups[3]=[n]
Captures[0]=[n]
Note that the above matchcollection matchcollection = r.matches (text,48) indicates that the match starts at position 48 of the text string, noting that position 0 is before the entire string. Position 1 is positioned before the second character after the first character in a string, as follows (note that there is a space between the string "1 a" and "2B"):
Position 48 at text is exactly 5 in 15M, so the first match returned is 5M instead of 15M. Here also continue to come up with the diagram in the first article, as follows:
You can see that there is an inheritance relationship between the capture, group, and match classes, and the index, length, and value properties are defined in the capture class at the top of the inheritance relationship. Where index indicates the occurrence of the first character of the captured substring found in the original string, the Length property represents the substring, and the Value property represents the substring captured from the original string, which can be used to implement some more complex applications. For example, there are many forums still do not use WYSIWYG online editor, but use a UBB encoded editor, the use of WYSIWYG editor there is a certain security risk, such as can be embedded in the source code JS code or other malicious code, so that visitors can access the security problems, Using the UBB code does not matter because the UBB code contains a limited number of tags that do not affect regular use and the editor that supports UBB code does not allow the HTML code to appear directly in the string, and avoids the problem of malicious scripting attacks. The text entered in the editor that supports UBB code is stored in the database in the form of UBB encoding, which requires the conversion of UBB encoding into HTML code, for example, the following code is UBB:
[URL]HTTP://ZHOUFOXCN.BLOG.51CTO.COM[/URL][URL=HTTP://BLOG.CSDN.NET/ZHOUFOXCN] Zhou Gong's column [/url]
The following example shows how to convert the above UBB encoding into HTML code:
The following example shows how to convert the above UBB encoding into HTML code:/// <summary>///the following code implements replacing the Ubb hyperlink code in the text with the HTML hyperlink code/// </summary> Public voidUbbdemo () {stringText ="[URL=HTTP://ZHOUFOXCN.BLOG.51CTO.COM][/URL][URL=HTTP://BLOG.CSDN.NET/ZHOUFOXCN] Zhou Gong's column [/url]"; Console.WriteLine ("original UBB Code:"+text); Regex regex=NewRegex (@"(/[url= ([/s/t]*?) /]) ([^[]*) (/[//url/] )", regexoptions.ignorecase); MatchCollection MatchCollection=regex. Matches (text); foreach(Match matchinchmatchcollection) { stringLinkText =string. Empty; //If a link literal is included, such as a link name exists in the second UBB code, the link name is used directly if(!string. IsNullOrEmpty (match. groups[3]. Value) {LinkText= match. groups[3]. Value; } Else//otherwise use the link as the link name{LinkText= match. groups[2]. Value; } text= text. Replace (match. groups[0]. Value,"<a href="/"mce_href="/"""+ match. GROUPS[2]. Value +"/"target=/"_blank/">"+ LinkText +"</a>"); } Console.WriteLine ("Replace the code:"+text);}
The results of the program execution are as follows:
Original UBB code:[URL=HTTP://ZHOUFOXCN.BLOG.51CTO.COM][/URL][URL=HTTP://BLOG.CSDN.NET/ZHOUFOXCN] Zhou Gong's column [/url]
Replaced code:<a href= "http://zhoufoxcn.blog.51cto.com" target= "_blank" >http://zhoufoxcn.blog.51cto.com</a ><a href= "HTTP://BLOG.CSDN.NET/ZHOUFOXCN" target= "_blank" > Zhou Gong's Column </a>
The above example is a little more complicated, for beginners to the regular expression of friends, may be a little difficult to understand, but there is no relationship, I will talk about the regular expression. In practice, the match may be passed. Groups[0]. Value this is not very convenient, just want to write string name=datatable.rows[i][j when accessing a dataTable, this way, once again adjusted, this way of indexing is very error-prone, In fact, we can also use the name instead of the index to access the group group, this will be in the later space to speak.
C # Regular expression programming (iii): Match class and group class usage