ArticleDirectory
- 1 Overview
- 2 Basic Applications
- 3. Extended applications
1 Overview
When you are a beginner at regular expressions, you are not familiar with the RegEx class. If you encounter any problems, you do not know which method to use. This article introduces the basic applications of the RegEx Class Based on some typical application scenarios of regular expressions. Here we will focus on the introduction of. Net classes. We will not discuss the application of regular expressions in depth.
Regular expressions are used for pattern matching. Different purposes can be divided into the following types of applications: verification, extraction, replacement, and segmentation. With the control, class, and class methods provided by. net, you can easily implement these applications.
The following describes common classes, methods, and attributes in. Net Based on typical application scenarios. This article aims to guide the basic usage of. Net classes and does not discuss the involved regular expressions in depth. This article is suitable for beginners who use regular expressions on the. NET platform.
2. Basic Application 2.1 Verification
The purpose of verification is to determine whether the Input Source string conforms to a certain rule or rule. Depending on the requirements, it may be to verify the entire source string or only one of its substrings.
There are two types of applications for verification in. net. One is in the regularexpressionvalidator control, and the other is inProgram.
2.1.1 verify the control regularexpressionvalidator
Regularexpressionvalidator is a client verification control that comes with. net. You can verify the input value of a control through simple settings.
The basic application syntax is as follows:
<Asp: regularexpressionvalidator
Id = "regularexpressionvalidator1"
Runat = "server"
Controltovalidate = "textbox1"
Errormessage = "regularexpressionvalidator"
Validationexpression = "^ ([1-9] [0-9] * | 0) (\. [0-9] [2])? $ ">
</ASP: regularexpressionvalidator>
The regularexpressionvalidator control is not described too much. It just describes the points that need to be paid attention to during use:
1. regularexpressionvalidator performs client verification;
2. In regularexpressionvalidator, JavaScript syntax rules are used;
3. The regularexpressionvalidator control cannot verify whether the input is empty.
As regularexpressionvalidator performs client verification and is easily skipped, server verification is required when regularexpressionvalidator is used.
Regularexpressionvalidator is used to generate the client's JavaScriptCodeThe regular expression used by regularexpressionvalidator must comply with the Javascript syntax rules. The differences between regularexpressionvalidator and. NET are as follows:
1. Reverse view is not supported, that is (? <= Expression) and (? <! Expression;
2. metacharacters only support ASCII code, that is, \ W is equivalent to [a-zA-Z0-9 _], \ D is equivalent to [0-9]
The regularexpressionvalidator control is generally used to verify whether the string entered by a control conforms to a certain rule.^And$"Is indispensable; in use"|"Must use"()"To limit"|"Function scope, for example, 0-can be written."^ ([1-9]? [0-9] | 100) $".
Regularexpressionvalidator cannot verify whether the input is empty. to verify whether the input is empty, use the requiredfieldvalidator control.
The regularexpressionvalidator verification control is.. NET is one of the verification controls that facilitate client verification and encapsulation. However, due to regularexpressionvalidator's limited support for regular syntax rules, only limited format validation can be performed, some complex verifications can be implemented by writing JavaScript code by yourself, which is also very simple.
2.1.2 program verification-ismatch ()
In the program, the verification basically uses the ismatch method. The verification object may be the whole of the source string, or it may be just one of the substrings.
Verify whether the source string meets a certain rule. This is basically the same as the requirement when regularexpressionvalidator is used.. net syntax, which is much more powerful than JavaScript. For example, it is a typical requirement to verify whether the string entered in a text box meets a certain rule.
Example 1: Verification textbox1 input content, requires the integer part is 0 or a positive integer, decimal can be dispensable, there must be two decimal places.
RegEx Reg = new RegEx (@ "^ (? : [1-9] [0-9] * | 0 )(? : \. [0-9] {2 })? $ ");
If (Reg. ismatch (textbox1.text ))
{
Richtextbox2.text = "the input format is correct! ";
}
Else
{
Richtextbox2.text = "incorrect input format! ";
}
Because the source string is verified as a whole,^And$"Is indispensable. Otherwise, the verification result may be incorrect, such as the regular expression"(? : [1-9] [0-9] * | 0 )(? : \. [0-9] {2 })?", In the input"0.123"Can be matched successfully, and the matching result is"0.12", At this time, the regular expression only plays a matching role and does not play a verification role.
Verify whether the local part of the source string complies with a certain rule, that is, check the source string neutron string, which is usually used to determine whether the source string contains, or does not contain substrings that conform to a certain rule. The function is similar to indexof in the string class.
Example 2(Refer to two regular expressions ):
Data:
1985aaa1985bb
Bcae1958fiefadf1955fef
Atijc1944cvkd
Df2564isdjfef2564d
Abc1234def5678ghi5678jkl
Requirement 1: Verify that the four consecutive numbers that appear at any position in the string are repeated in the entire string. If there are duplicates, the value true indicates that there are duplicates and the value false indicates that no duplicates exist.
If the verification result of requirement 1 is true, it should be:
1985aaa1985bb
Df2564isdjfef2564d
Abc1234def5678ghi5678jkl
Because the requirement specifies whether there are repeated four consecutive numbers at any position, so before finding the duplicate, You need to repeat each position in the historical source string for verification, so you cannot limit the start identifier"^"; In the matching process, unless the repetition is still not found until the end, it is only necessary to match the location where the repetition exists, so that no end identifier is required"$", So this is a typical requirement for character string sub-serial verification.
Code implementation:
String [] test = new string [] {"1985aaa1985bb", "clerk", "atijc1944cvkd", "df2564isdjfef2564d", "abc1234def5678ghi5678jkl "};
RegEx Reg = new RegEx (@ "(\ D {4 })(? :(?! \ 1).) * \ 1 ");
Foreach (string s in test)
{
Richtextbox2.text + = "Source string:" + S. padright (25, '') +" Verification Result: "+ Reg. ismatch (s) +" \ n ";
}
/* Output
Source string: 1985aaa1985bb Verification Result: True
Source string: bcae1958fiefadf1955fef Verification Result: false
Source string: atijc1944cvkd Verification Result: false
Source string: df2564isdjfef2564d Verification Result: True
Source string: abc1234def5678ghi5678jkl Verification Result: True
*/
Because duplicate issues are involved, reverse references are used here. For details about reverse references, refer to the regular-based-reverse references.
Requirement 2: verify whether the first four consecutive numbers in the string are repeated. If there are duplicates, the value true indicates that the value is false.
If the verification result of above data requirement 2 is true, it should be:
1985aaa1985bb
Df2564isdjfef2564d
Because the requirement indicates whether there is a duplicate in the first request, a start identifier is required.^", To ensure that it is the first four consecutive numbers that appear. During the matching process, unless the repetition is still not found until the end, you only need to match to the position where the repetition exists, in this way, you do not need to end the identifier"$", So this is still the requirement for character string sub-serial verification, but compared with requirement 1, a restriction is added.
Code implementation:
String [] test = new string [] {"1985aaa1985bb", "clerk", "atijc1944cvkd", "df2564isdjfef2564d", "abc1234def5678ghi5678jkl "};
RegEx Reg = new RegEx (@ "^ (? :(?! \ D {4}).) * (\ D {4 })(? :(?! \ 1).) * \ 1 ");
Foreach (string s in test)
{
Richtextbox2.text + = "Source string:" + S. padright (25, '') +" Verification Result: "+ Reg. ismatch (s) +" \ n ";
}
/* Output
Source string: 1985aaa1985bb Verification Result: True
Source string: bcae1958fiefadf1955fef Verification Result: false
Source string: atijc1944cvkd Verification Result: false
Source string: df2564isdjfef2564d Verification Result: True
Source string: abc1234def5678ghi5678jkl Verification Result: false
*/
2.2 extract -- match (), matches ()
Extraction is mainly used to obtain one or more substrings from the source string that conform to a certain rule or rule. In general, string processing is widely used for extraction. The match () and matches () methods are used for extraction, as well as some methods of the Match Class and matchcollection class during result processing, and sometimes some methods of the capture class are used.
2.2.1 extract a single Matching content -- match ()
You can use the match () method when you only need to extract one or only obtain the content that is successfully matched for the first time. When the match () method is used, as long as the match is successful at a certain position, the matching will not continue and a match object will be returned.
Example: Extract name
Source string:Name: Zhang San, Gender: male, age:25
Code implementation:
String test = "Name: Zhang San, Gender: male, age: 25 ";
RegEx Reg = new RegEx (@"(? <= Name :) [^,] + ");
Match m = reg. Match (test );
If (M. Success) // verify whether the matching is successful
{
Richtextbox2.text = M. value;
}
/* Output
Zhang San
*/
Although match () is only one match, multiple specified sub-strings can be obtained through the capture group, for example, the first <A…> The link and text of the tag.
String test = "ABC <a href = \" www.test1.com \ "> Test 1 </a> def <a href = \" www.test2.com \ "> Test 2 </a> Ghi";
RegEx Reg = new RegEx (@"(? Is) <(? :(?! Href =).) href = (['""]?) (? <URL> [^ "" \ s>] *) \ 1 [^>] *> (? <Text> (? :(?! </? A \ B).) *) </a> ");
Match m = reg. Match (test );
If (M. Success)
{
Richtextbox2.text + = M. Groups ["url"]. Value + "\ n"; // link
Richtextbox2.text + = M. Groups ["text"]. Value + "\ n"; // text
}
/* Output
Www.test1.com
Test 1
*/
There is also a way to reference the capture results of a capture group.
String test = "ABC <a href = \" www.test1.com \ "> Test 1 </a> def <a href = \" www.test2.com \ "> Test 2 </a> Ghi";
RegEx Reg = new RegEx (@"(? Is) <(? :(?! Href =).) href = (['""]?) (? <URL> [^ "" \ s>] *) \ 1 [^>] *> (? <Text> (? :(?! </? A \ B).) *) </a> "); match M = reg. Match (test );
If (M. Success)
{
Richtextbox2.text + = M. Result ("$ {URL}") + "\ n"; // link
Richtextbox2.text + = M. Result ("$ {text}") + "\ n"; // text
}
/* Output
Www.test1.com
Test 1
*/
The results obtained by the two methods are the same. Which one is used depends on your habits.
2.2.2 extract Multiple matching content -- matches ()
You can use the matches () method when you need to extract multiple sub-strings that conform to the rules. When the matches () method is used, an object of the matchcollection type needs to be matched at every position of the source string.
For the example of extracting links and text mentioned in section 1.2.1, If you extract all links and text, instead of the first one, you can use the matches () method.
String test = "ABC <a href = \" www.test1.com \ "> Test 1 </a> def <a href = \" www.test2.com \ "> Test 2 </a> Ghi";
RegEx Reg = new RegEx (@"(? Is) <(? :(?! Href =).) href = (['""]?) (? <URL> [^ "" \ s>] *) \ 1 [^>] *> (? <Text> (? :(?! </? A \ B).) *) </a> ");
Matchcollection MC = reg. Matches (test );
Foreach (Match m in MC)
{
Richtextbox2.text + = M. Groups ["url"]. Value + "\ n"; // link
Richtextbox2.text + = M. Groups ["text"]. Value + "\ n"; // text
}
/* Output
Www.test1.com
Test 1
Www.test2.com
Test 2
*/
For matches (), in some scenarios, you can use the Count attribute to count the number of times a substring appears in a certain rule, for example, count the number of times that the independent "3" in the string appears.
String test = "137,1, 33,4, 3,6, 21,3, 35,93, 2,98 ";
RegEx Reg = new RegEx (@ "\ B3 \ B ");
Int COUNT = reg. Matches (test). Count; // 2
At this time, we only care about the number of successful matches, and we do not care about the Matching content. Therefore, we should try to make the regular expressions as concise as possible to achieve the goal, which can accelerate the matching efficiency, reduce resource usage. For example, in the source string extracted from the link above, the Statistics <A…> The number of times a tag appears. Generally, the following code can achieve the goal.
String test = "ABC <a href = \" www.test1.com \ "> Test 1 </a> def <a href = \" www.test2.com \ "> Test 2 </a> Ghi";
RegEx Reg = new RegEx (@"(? I) <A \ B ");
Int COUNT = reg. Matches (test). Count; // 2
2.2.3 set of capture group matching process -- capture
In some cases, when a regular expression is matched once, the capture group may be matched multiple times.
Example:
Source string:<Region name = oldslist Col = 1 row = 2 order = ASC> abcsadf </region> jfdsajf <region name = newslist class = List Col = 4 ROW = 10 order = DESC> abcsadf </region>
Requirement: extract the attributes and attribute values of each region and group them by region.
For this requirement, you can extract all region values first, and then extract its attributes and attribute values for each region tag. However, this is more complicated and you can consider extracting them in a regular expression. Because the number of attributes is not fixed, a fixed number of quantifiers cannot be used to match attribute pairs. Regular Expressions can be written
(? Is) <region \ s + (? :(? <Key> [^ \ s =] +) = (? <Value> [^ \ s>] +) \ s *) +>
In this case, if groups is used to obtain the matching result, since groups only retains the last matching result, it can only obtain the child string that is successfully matched for the last time. The captures attribute is used.
String test = "<region name = oldslist Col = 1 row = 2 order = ASC> abcsadf </region> jfdsajf <region name = newslist class = List Col = 4 ROW = 10 Order = DESC> abcsadf </region> ";
Matchcollection MC = RegEx. Matches (test ,@"(? Is) <region \ s + (? :(? <Key> [^ \ s =] +) = (? <Value> [^ \ s>] +) \ s *) +> ");
For (INT I = 0; I <MC. Count; I ++)
{
Richtextbox2.text + = "" + (I + 1) + "region attributes: \ n ";
For (Int J = 0; j <Mc [I]. Groups ["key"]. captures. Count; j ++)
{
Richtextbox2.text + = "Property:" + Mc [I]. groups ["key"]. captures [J]. value. padright (10, '') +" value: "+ Mc [I]. groups ["value"]. captures [J]. value + "\ n ";
}
Richtextbox2.text + = "\ n ";
}
/* Output
Attributes of the 1st region:
Attribute: name value: oldslist
Attribute: Col value: 1
Attribute: Row value: 2
Attribute: order value: ASC
Attributes of the 2nd Region:
Attribute: name value: newslist
Attribute: class value: List
Attribute: Col value: 4
Attribute: Row value: 10
Attribute: order value: DESC
*/
A group is actually a set of capture. When a capturing group matches only one substring, this set has only one element. When the capturing group matches multiple substrings successively, groups [I]. value only retains the last matching result, but the capture set can record all substrings that match in the matching process.
There are not many scenarios for capture. For the above example, if capture is not used, it can be implemented by means of Multiple matching. However, in some complex expressions, it is difficult to perform multiple matching, in this case, capture is useful.
2.3 replace
Replacement is mainly to replace the child strings that conform to a certain rule with other content from the source string. In general, replacement is also widely used in string processing. Replacement is mainly based on the Replace () method. In some scenarios with complicated replacement rules, the delegate method may also be used.
2.3.1 General replacement
The purpose of replacement is clear. You only need to find out the rule of the substring to be replaced and replace it with the target substring.
Example 1:
Source string: ABC
Requirement: Replace the relative address with the absolute address, which is already an absolute address.
String test = " ABC ";
RegEx Reg = new RegEx (@"(? I )(? <= Src = (['"]?) (?! Http ://)(? = [^ '"\ S>] + \ 1 )");
String result = reg. Replace (test, "http://www.test.com ");
/* Output
ABC
*/
It must be noted that in. net only provides one replace () method, and does not provide two methods, such as replaceall () and replacefirst () in Java. net only replaces the regular substring that appears for the first time, which must be processed in the regular expression.
Example 2:
Source string:Abc123def123ghi
Requirement: the"123"Is replaced with null, and other locations are not replaced.
String test = "abc123def123ghi ";
RegEx Reg = new RegEx (@"(? S) ^ ((? :(?! 123).) *) 123 ");
String result = reg. Replace (test, "$1 ");
/* Output
Abcdef123ghi
*/
In this case,^"Only replace the substring that appears for the first time, because most of the regular engine^So after the regular expression matches successfully or fails at location 0, it will not try to match other locations.
2.3.2 delegate Method Application in replacement
For some complicated replacement rules, the delegate method may be used. As this application is a typical application, it will be introduced separately in the following article.
2.4 Segmentation
Splitting is to split the source string into an array using a substring conforming to a certain rule, mainly using the split () method. Because the split () method of RegEx does not provide stringsplitoptions similar to the split () method of string. removeemptyentries parameter. If a regular substring appears at the beginning or end, an empty string is not required. In this case, you need to process it in the regular expression.
Example 1:
Source string:Chinese character 123 text English
Requirement: Separate by English words and non-English words (English words include substrings consisting of a-Z, A-Z, 0-9 ).
String STR = "Chinese Character 123 text English ";
String [] result = RegEx. Split (STR ,@"(? <! ^) \ B (?! $) ", Regexoptions. ecmascript );
Foreach (string s in result)
{
Richtextbox2.text + = S + "\ n ";
}
/* Output
Chinese characters
123
Text
English
*/
Here we use"(? <! ^)And(?! $)To limit the substrings that do not start or end with a separator, and no unnecessary empty strings will appear in the result.
There are also some applications that can be regarded as a regular expression and use the skill category.
Example 2:
Source string:Left (Name, 1), left (name, 1), left (name, 1)
Requirement:()"Inner",.
String test = "Left (name, 1), left (name, 1), left (name, 1 )";
RegEx Reg = new RegEx (@"(? <! \ ([^)] *), (?! [^ (] * \) ");
String [] sarray = reg. Split (test );
Foreach (string s in sarray)
{
Richtextbox2.text + = S + "\ n ";
}
/* Output
Left (name, 1)
Left (name, 1)
Left (name, 1)
*/
When using the regular split () method, you need to note that if a capture group exists in the regular expression, the content matched by the capture group will also be saved to the split result.
The following examples are not described in detail. You can see the results.
String test = "aa11 <BBB> cc22 <DDD> ee ";
String [] temp = RegEx. Split (test, @ "[0-9] + (<[^>] *> )");
Foreach (string s in temp)
{
Richtextbox2.text + = S + "\ n ";
}
/* Output
AA
<BBB>
CC
<DDD>
EE
*/
If you do not want to save the content matched by the capture group to the result, you can use a non-capture group.
String test = "aa11 <BBB> cc22 <DDD> ee ";
String [] temp = RegEx. Split (test, @ "[0-9] + (? : <[^>] *> )");
Foreach (string s in temp)
{
Richtextbox2.text + = S + "\ n ";
}
/* Output
AA
CC
EE
*/
3. Extended applications
Here we will introduce some regular expression Extension applications that may be involved in. net.
3.1 escape when a regular expression is dynamically generated -- escape ()
Sometimes you need to dynamically generate a regular expression based on some variables. If the variable contains metacharacters in the regular expression, it will be parsed into metacharacters, which may cause the regular expression compilation to fail, leading to program exceptions, escape the variable. The RegEx. Escape () method replaces the escape code to escape the smallest character sets (\, *, + ,? , |, {, [, (,), ^, $,., #, And blank ).
For example, if the DIV tag is obtained based on the ID entered by the user and the ID does not contain any metacharacters, the correct result can be obtained.
String test = "<Div id = \" test1 \ "> ABC </div> <Div id = \" Test2 \ "> def </div> ";
String [] IDs = new string [] {"test1", "Test2 "};
Foreach (string ID in IDS)
{
RegEx Reg = new RegEx (@"(? Is) <Div \ s + id = "+ ID + @" [^>] *> (? :(?! </? Div \ B).) * </div> ");
Matchcollection MC = reg. Matches (test );
Foreach (Match m in MC)
{
Richtextbox2.text + = M. Value + "\ n ";
}
}
/* Output
<Div id = "test1"> ABC </div>
<Div id = "Test2"> def </div>
*/
However, if the input ID contains unescaped metacharacters, such as"ABC (", It will throw an exception similar to the following.
Analyzing "(? Is) <Div \ s + id = "ABC (" [^>] *> (? :(?! </? Div \ B).) * </div>"-)Insufficient.
In this case, you can use the escape () method to escape the input variables.
String test = "<Div id = \" test1 \ "> ABC </div> <Div id = \" Test2 \ "> def </div> ";
String [] IDs = new string [] {"test1", "Test2", "ABC ("};
Foreach (string ID in IDS)
{
RegEx Reg = new RegEx (@"(? Is) <Div \ s + id = "+ RegEx. Escape (ID) + @" [^>] *> (? :(?! </? Div \ B).) * </div> ");
Matchcollection MC = reg. Matches (test );
Foreach (Match m in MC)
{
Richtextbox2.text + = M. Value + "\ n ";
}
}
/* Output
<Div id = "test1"> ABC </div>
<Div id = "Test2"> def </div>
*/
After escaping using the escape () method, you can get the correct result without throwing an exception.
3.2 static method
. NET provides the corresponding static methods for some common RegEx methods. You can directly call the corresponding methods without explicitly declaring the RegEx object, making it easier to write, the code is simpler and easier to read.
For example, if the last section of the IP address is replaced with "*", only one line of code is required.
String result = RegEx. Replace ("10.27.123.12", @ "\ D + $", "*"); // 10.27.123 .*
Each call to a static method creates a temporary RegEx object, which is released after use. Therefore, every time a static method is called, it is re-compiled, which reduces the execution efficiency. Therefore, static methods are not suitable for loop or frequently called methods, but must be explicitly declared as RegEx objects.
However, in scenarios where only one call is required or execution efficiency is not required, static methods are a good choice.