Document directory
- Introduction
- Introduction to the use history of Regular Expressions
- Simple expression
- Qualifier
- Metacharacters
- Character class
- Predefined set metacharacters
- Expression example
- ASP. NET Authentication
- Regular Expression API
- Free tools
- Advanced topic
Introduction
Microsoft. NET Framework supports regular expressions first-class, and even Microsoft ASP. NET also has controls that depend on the Regular Expression Language. This topic describes the basic knowledge and recommendation of regular expressions.
This article is intended for beginners who have little or no knowledge about regular expressions or who are familiar with ASP. NET and can use. NET programming. In addition, we hope that Regular expression cheat sheet becomes a reference or training material for developers who have experience using regular expressions. The content of this article is as follows:
Introduction to the use history of Regular Expressions
Simple expression
Qualifier
Metacharacters
Character class
Predefined set metacharacters
Expression example details
ASP. NET Authentication
Regular Expression API
Free tools
Advanced topic Overview
Summary and other resources
Generally, if you have questions about this article or regular expressions, visit the http://www.aspadvice.com/and raise the issue through regex mailing list. At the time of writing this article, more than 350 subscribers are involved.
Back to Top Regular Expression history
The regular expression was designed in 1950s and exists so far. Regular Expressions are initially used to describe "regular sets", which are Patterns Studied by neuroscientists. Regular expressions were first proposed by the mathematician Stephen Kleene and eventually used by Ken Thompson in two popular text utilities qed and grep. Jeffrey Friedl elaborated on this in his book Mastering Regular Expressions (2nd edition. It is recommended that those who want to learn more about the theory and history of Regular Expressions read this book.
In the last fifty years, regular expressions have gradually evolved from fuzzy and esoteric mathematical concepts to the main functions used in various tools and software packages. Although many UNIX tools have supported regular expressions for decades, they have been embodied in most Windows developer kits for nearly a decade. In Microsoft Visual Basic 6 or Microsoft VBScript, regular expressions are still difficult to use even if they are ideal. However, with the implementation of the. NET Framework, the support for regular expressions has grown to the extreme. All Microsoft developers and all. NET languages can use regular expressions.
So what is a regular expression? A regular expression is a language that clearly describes the pattern in a text string. In addition to the simple description of these patterns, the Regular Expression Engine can usually be used to traverse the match and use the pattern as the separator to parse the string as a sub-string, or intelligently Replace the text or reset the text format. Regular Expressions provide an effective and simple way to solve many common tasks related to text processing.
When discussing regular expressions, regular expressions are usually analyzed based on text that matches (or is not matching) regular expressions. This article (and System. text. regularExpressions class) References three participating objects in the regular expression interaction operation: the regular expression's "pattern", the "input" string, and all pattern "matches" in the string ".
Back to Top Simple Expressions
The simplest regular expression is familiar to everyone, that is, a text string. A specific string can be described by the text itself;FooThis regular expression pattern can precisely match the input stringFoo. In this example, the following input is also matched:The Foo D was quite tastyIf you want the exact match, this may not be the expected result.
Of course, using regular expressions to match exact strings that are equal to itself is meaningless and does not reflect the true effect of regular expressions. If you do not searchFoo, But to search for lettersFWhat should I do if I start all the words or all three letters? At present, this is beyond the reasonable range of text strings. We need to study Regular Expressions in more depth. Below is a text expression example and some matching input.
Mode |
Input (matching) |
Foo |
Foo,FooD,FooT, "There's edevilFooT ." |
Return to the top limit
A qualifier provides a simple method to specify the number of times a specific character or character set can appear repeatedly in the mode. There are three non-explicit delimiters:
*The description "appears 0 or multiple times ".
+, The description "appears one or more times ".
?, The description "appears 0 or 1 time ".
The qualifier always references the pattern before the qualifier (left side), usually a single character, unless you create a pattern group using parentheses. Below are some examples of the pattern and matched input.
Mode |
Input (matching) |
Fo * |
Foo,FoE,FooD,FoooT,"FoRget it ",FUnny, puFfY |
Fo + |
Foo,FoE,FooD,FooT,"FoRget it" |
Fo? |
FoO,FoE,FoOd,FoOt,"FoRget it ",FUnny, puFfY |
In addition to the specified mode,?The character can also be forced or sub-pattern to match the least number of characters (if it matches multiple characters in the input string ).
In addition to non-explicit delimiters (usually called delimiters), there are also explicit delimiters. In terms of the number of occurrences of a pattern, the concept of a qualifier is very vague. You can use an explicit qualifier to accurately specify numbers, ranges, or number sets. The explicit qualifier is placed behind the applied pattern, which is the same as the regular qualifier. The explicit qualifier uses braces {} and the numeric value in it to indicate the upper and lower limits of the number of occurrences of the pattern. For example,X {5}Will match exactly 5 x characters (Xxxxx). If only one number is specified, the maximum number of times is reached. If the number is followed by a comma, for exampleX {5 ,}To match any x characters that appear more than 4 times. Below are some examples of the pattern and matched input.
Model |
Input (matching) |
AB {2} c |
Abbc, AaAbbcCc |
AB {, 2} c |
Ac,Abc,Abbc,AbbcC |
AB {2, 3} c |
Abbc,Abbbc,AbbcC.AbbbcC |
Back to Top metacharacters
In a regular expression, there is a special construction, that is, metacharacters. Currently, many metacharacters are known, such*,?,+And{}Character. Other characters have special meanings in the Regular Expression Language. These characters include:$ ^. [(|)]And\.
.Metacharacter is the simplest but most commonly used character. It can match any single character. If you want to specify certain modes that can contain any combination of characters, it is very useful to use periods, but it must be within a specific length range. In addition, we know that the expression will match all the modes contained in a long string. What should we do if we only need the exact match mode? This is often found in the verification scheme, for example, to ensure that the user entered the correct zip code or phone number format. Use^Metacharacters can be used to specify the start of a string (or line ).$Metacharacters can specify the end of a string (or line. By adding these characters to the start and end of the mode, you can force the mode to match only the exact matching input string. If^Metacharacters are used in square brackets.[]The start of the specified character class also has special meanings. For details, see.
\(Backslash) metacharacters can be used to escape characters based on special meanings, or to specify instances of pre-defined metacharacters. For more information, see. To include metacharacters of a text style in a regular expression, you must use a backslash to "escape ". For example, to match a string starting with "c: \", you can use:^ C :\\. Note: Use^Metacharacters indicate that the string must start with this mode, and then escape the text backslash with the backslash.
|(PIPE) metacharacters are used to specify alternately, especially to specify "this" or "this" in the mode ". For example,A | BWill match any input content that contains "a" or "B", which is similar to the character class[AB]Very similar.
Brackets()It is used to group modes. It allows a qualifier to appear multiple times in a full mode. For ease of reading or matching specific input parts separately, you may be allowed to analyze or reset the format.
Some usage examples of metacharacters are listed below.
Mode |
Input (matching) |
. |
A,B,C,1,2,3 |
.* |
Abc,123,Any string,Match when no characters exist |
^ C :\\ |
C: \ windows,C :\\\\\,C: \ foo.txt,C :\ Followed by any other content |
Abc $ |
Abc,123abc,To Abc Arbitrary string ended |
(Abc) {2, 3} |
Abcabc,Abcabcabc |
Back to Top character class
The character class is the "mini" language in the regular expression, in square brackets[]. The simplest character class is just a two‑dimensional table in parentheses, such[Aeiou]. When you use a character class in an expression, you can use either of the characters in the mode (but only one character can be used, unless a qualifier is used ). Note that you cannot use a character class to define a word or mode. You can only define a single character.
To specify any numeric value, you can use the character class[0123456789]. However, because it is not convenient to use characters, you must use a hyphen in brackets.-To define the character range. A hyphen has a special meaning in the character class (not in a regular expression, so it cannot be called a regular expression metacharacter accurately ), A hyphen has a special meaning in the character class only when the hyphen is not the first character. To use a hyphen to specify any numeric value, you can use[0-9]. The same is true for lowercase letters.[A-z], Uppercase letters can be usedA-Z. The definition range of a hyphen depends on the character set used. Therefore, the order in which characters appear in (for example) ASCII or Unicode tables determines the characters included in the range. If you need to include a hyphen in the range, specify it as the first character. For example:[-.?]Match any of the four characters (note that the last character is a space ). Note that regular expression metacharacters are not specially processed in character classes, so these metacharacters do not need to be escaped. Given that character classes are a language separate from other regular expression languages, character classes have their own rules and syntax.
If you use characters^As the first character of the character class, this class is denied. It can also match any character other than the character class members. Therefore, to match any non-Vowel character, you can use the character class[^ AAeEiIoOuU]. Note: To deny a hyphen, use the hyphen as the second character of the character class, as shown in figure[^-]. Remember,^The role of a character class is completely different from its role in the regular expression mode.
The following lists some character classes used in the operation.
Mode |
Input (matching) |
^ B [aeiou] t $ |
Bat,Bet,Bit,Bot,But |
^ [0-9] {5} $ |
11111,12345,99999 |
^ C :\\ |
C: \ windows,C :\\\\\,C: \ foo.txt,C :\Followed by any other content |
Abc $ |
Abc,123abc,To Abc Arbitrary string ended |
(Abc) {2, 3} |
Abcabc,Abcabcabc |
^ [^-] [0-9] $ |
0,1,2,... (Mismatch-0,-1,-2, etc) |
In the next version of. NET Framework, the code name "Whidbey" is added to the character class as a new function, called character class subtraction ). It allows you to subtract another character class from one character class, providing a more readable way to describe certain modes. This specification can be accessed through the following address. Its syntax is similar[A-z-[aeiou]Match All lowercase consonants.
Back to Top: predefined set metacharacters
You can do a lot of work with the tools currently provided. However[0-9]Represents each numeric number in the mode, or (worse) used[0 -9a -ZA-Z]Represents any letter or number, and there is still a long process. To alleviate the pain of dealing with these common but lengthy patterns, a predefined metadatabase character set is defined in advance. Different implementations of Regular Expressions define different predefined metadatabase character sets. The predefined metadatabase character sets described below are combined in. NET Framework.System. Text. RegularExpressionsAPI support. The standard syntax for these predefined metacharacters is in the backslash\Followed by one or more characters. Most predefined metacharacters have only one character, which is easy to use and is an ideal replacement for long character classes. The following are two examples:\ DMatch All numeric values,\ WMatch All word characters (letters, numbers, and underscores ). The exception is that some specific character codes match. In this case, you must specify the address of the matched character, as shown in figure\ U000DMatch the Unicode carriage return. The following lists some of the most common character classes and their equivalent metacharacters.
Metacharacters |
Equivalent character class |
\ |
Matching ringtone (alert); \ u0007 |
\ B |
Match the word boundary outside the character class, which matches the unsigned character, \ u0008 |
\ T |
Matching tab, \ u0009 |
\ R |
Match carriage return, \ u000D |
\ W |
Match vertical tabulation operator, \ u000B |
\ F |
Match the newline, \ u000C |
\ N |
Match new line, \ u000A |
\ E |
Matching escape character, \ u001B |
\ 040 |
Matches three octal ASCII characters. \ 040 indicates space (32 in decimal format ). |
\ X20 |
Use a two-digit hexadecimal number to match ASCII characters. In this example, \ x2-indicates space. |
\ CC |
Matches ASCII control characters. In this example, It is ctrl-C. |
\ U0020 |
Use a 4-digit hexadecimal number to match Unicode characters. In this example, \ u0020 is a space. |
\* |
It does not mean that any character of the pre-defined character class is treated only as this character. Therefore,\*Equivalent\ X 2A(It is text *, not * metacharacters ). |
\ P {name} |
Match any character in the named character class "name. Supports Unicode groups and block ranges. For example, Ll, Nd, Z, IsGreek, IsBoxDrawing, and SC (currency ). |
\ P {name} |
Match the text not included in the named character class "name. |
\ W |
Match any word character. For non-Unicode and ECMAScript implementations, this is equivalentA-zA-Z_0-9. In Unicode categories, this is equivalent[\ P {Ll} \ p {Lu} \ p {Lt} \ p {Lo} \ p {Nd} \ p {Pc}]. |
\ W |
The negative value of \ w is equivalent to the ECMAScript compatible set.[^ A-zA-Z_0-9]Or Unicode character category[^ \ P {Ll} \ p {Lu} \ p {Lt} \ p {Lo} \ p {Nd} \ p {Pc}]. |
\ S |
Match any characters in the blank area. Equivalent to the Unicode character class[\ F \ n \ r \ t \ v \ x85 \ p {Z}]. If you use the ECMAScript option to specify the ECMAScript compatibility mode, \ s is equivalent[\ F \ n \ r \ t \ v](Note that the leading space is null ). |
\ S |
Match any non-blank area characters. Equivalent to the Unicode character category[^ \ F \ n \ r \ t \ v \ x85 \ p {Z}]. If you use the ECMAScript option to specify the ECMAScript compatibility mode, \ S is equivalent[^ \ F \ n \ r \ t \ v](Note the space after ^ ). |
\ D |
Match any decimal number. In the ECMAScript mode[\ P {Nd}]Non-Unicode[0-9]. |
\ D |
Match any non-decimal number. In the ECMAScript mode[\ P {Nd}]Non-Unicode[^ 0-9]. |
Example of a back-to-top expression
Many people like to learn through examples. Below are some examples of expressions. For more examples, visit the regular expression online database at http://www.regexlib.com /.
Mode |
Description |
^ \ D {5} $ |
Five numeric numbers, such as the US Postal code. |
^ (\ D {5}) | (\ d {5}-\ d {4} $ |
5 numeric or 5 Numeric-dashes-4 numeric. Match the United States postal code in 5-digit format, or the United States postal code in 5-digit + 4-digit format. |
^ (\ D {5} (-\ d {4 })? $ |
Same as the previous one, but more effective. Use? The four digits in the mode can be used as the optional part, instead of comparing two different modes (in another way ). |
^ [+-]? \ D + (\. \ d + )? $ |
Matches any real number with an optional symbol. |
^ [+-]? \ D *\.? \ D * $ |
It is the same as the previous one, but it also matches an empty string. |
^ (20 | 21 | 22 | 23 | [01] \ d) [0-5] \ d $ |
Matches the time value in the 24-hour format. |
/\*.*\*/ |
Matching C-style comments /*...*/ |
Back to Top ASP. NET Verification
ASP. NET provides a set of verification controls that make it easier to verify input on Web forms than to process tasks using the old (or if you prefer) ASP. One of the most effective validators isRegularExpressionValidatorAs expected, it allows you to provide a regular expression that must match the input to verify the input. SetValidationExpressionAttribute can specify the regular expression mode. The following shows the verification program for verifying the postal code field:
<asp:RegularExpressionValidator runat="server" id="ZipCodeValidator"
ControlToValidate="ZipCodeTextBox" ErrorMessage="Invalid ZIP code
format; format should be either 12345 or 12345-6789."
ValidationExpression="(\d{5}(-\d{4})?" />
UseRegularExpressionValidatorPay attention to the following issues:
Never use an empty string in the control to be verified by the validators. Only RequiredFieldValidator can capture null strings.
You do not need to specify the start and end of the matching character (^And$)-They are assumed in advance. If the start and end are added, there is no effect, and this is not required.
All verification controls must be verified on the client and server. If the regular expression is not compatible with ECMAScript, client verification fails. To avoid this problem, make sure that the expression is ECMAScript compatible. Otherwise, control verification is performed only on the server side.
Back to Top Regular Expression API
Except for the ASP. NET validation control, regular expressions are used in. NET.System. Text. RegularExpressionsThe class found in the namespace. Especially those primary classes that you want to be familiarRegex,MatchAndMatchCollection.
By the way, there are still some disputes over whether the regular expression abbreviation regex is pronounced/reg-eks/or/rej-eks. I prefer the latter, but both of them are approved by experts, so you can decide which pronunciation to choose.
RegexA class has many methods and attributes. If you have never used it before, you may feel at a loss. The following summarizes some of the most common methods:
Method |
Description |
Escape/Unescape |
The literal character in the expression. |
IsMatch |
If the regular expression matches the input string, "Ture" is returned ". |
Match |
If a match is found in the input string, the matching object is returned. |
Matches |
If any or all matches are found in the input string, a matching set object is returned. |
Replace |
Replace the match in the input string with the given replacement string. |
Split |
Returns an array string when the input string is split into array elements separated by a regular expression. |
In addition to specifying many methods, some options can be specified.RegexObject constructor. Because these options are part of Bit Blocking, you may be able to specify these options at the same time (for example, you can specify both Multiline and Singleline ).
Method |
Description |
Compiled |
This option is used when many matching operations are performed in a loop. This reduces the number of analysis expression steps in each loop. |
Multiline |
It has nothing to do with the number of rows in the input string. Specifically, it only modifies^And$To match the beginning and end of the line (EOL), rather than matching the start and end of the entire input string. |
IgnoreCase |
Make the mode ignore the case when matching the search string. |
IgnorePatternWhitespace |
You can include any number of blank areas in the mode as needed (? # Note #) add comments to the syntax in the mode. |
SingleLine |
It has no relationship with the number of rows in the input string. More specifically, it will cause.(Period) metacharacters match any character, not any character except \ n (default ). |
Operations that are often performed using regular expressions include verification, matching, and replacement. In most casesRegexClass static method to complete these operations, do not need to instantiateRegexClass. All you need to do is to create or find the correct expression, and then useRegexThe IsMatch () method of the class applies the expression to the input string. For example, the following function demonstrates how to use a regular expression to verify zip code:
private void ValidateZipButton_Click(object sender, System.EventArgs e)
{
String ZipRegex = @"^\d{5}$";
if(Regex.IsMatch(ZipTextBox.Text, ZipRegex))
{
ResultLabel.Text = "ZIP is valid!";
}
else
{
ResultLabel.Text = "ZIP is invalid!";
}
}
Similarly, you can use the static Replace () method to Replace a match with a specific string, as shown below:
String newText = Regex.Replace(inputString, pattern, replacementText);
Finally, you can use the following code to traverse the matching set of input strings:
private void MatchButton_Click(object sender, System.EventArgs e)
{
MatchCollection matches = Regex.Matches(SearchStringTextBox.Text,
MatchExpressionTextBox.Text);
MatchCountLabel.Text = matches.Count.ToString();
MatchesLabel.Text = "";
foreach(Match match in matches)
{
MatchesLabel.Text += "Found" + match.ToString() + " at
position " + match.Index + ".<br>";
}
}
Generally, you need to instantiate a method other than the default method.RegexClass. Especially when setting options. For example, if you want to create a Regex instance that ignores case-insensitive and mode-blank areas and then retrieve a set that matches the expression, use the following code:
Regex re = new Regex(pattern,
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
MatchCollection mc = re.Matches(inputString);
The downloaded files in this article include the full version of these examples, which is the same as on the simple ASP. NET page.
Back to Top free tools
Regulator (http://royo.is-a-geek.com/iserializable/regulator/)-a regular expression testing tool running on the client, tightly integrated with RegexLib through Web Services, provides support for "matching", "splitting", and "Replacement. Including Performance Analysis and syntax highlighting.
RegexDesigner. NET (http://www.sellsbrothers.com/tools/)-a powerful visual tool that helps construct and test regular expressions. It can generate C # and/or VB. NET code and compiled assembly code to help you integrate expressions into applications.
Regular Expression Workbench (v2.0) (http://www.gotdotnet.com/Community/UserSamples/Details.aspx? SampleGuid = C712F2DF-B026-4D58-8961-4EE2729D7322)-a tool developed by Eric Gunnerson to create, test, and study regular expressions. The "Examine-o-matic" function allows you to hover your mouse over a regular expression and decode its meaning.
Back to Top advanced topics
Regular expressions have two functions that you have to say: one is "Naming group" and the other is "lookaround processing ). These functions are rarely used.
Using a naming group, you can name the matching group separately, and then reference these groups in the expression using the program language. This function is particularly effective if you use the Replace method to reset the format of the input string (by re-arranging the order and replacing the elements in the input string. For example, assume that the date uses a string in MM/DD/YYYY format, and you want the date format to be DD-MM-YYYY. At this time, you can write an expression to capture the first format, traverse its matching set, analyze each string, and then use the string operation to create a replacement string. This requires a lot of code generation and a lot of processing. If you use a naming group, you can complete the same task. For details, see:
String MDYToDMY(String input)
{
return Regex.Replace(intput, @"\b(?<month>\d{1,2})/(?<day>\d{1,2}/(?<year>\d{4})\b", "${day}-
${month}-${year}");
}
You can also reference a group by number or by name. In any case, such a reference is called "reverse reference ". Another scenario where reverse references are often used is to match the expression itself. The following expression is used to find duplicate letters:[A-z] \ 1. It will match "aa", "bb", and "cc", but it is different from[A-z] {2}Or[A-z] [a-z]The latter two are equivalent, and the latter two can match "AB" or "ac" or any other combination of two letters. Reverse reference allows the expression to remember some characters in the input string that have been analyzed and matched by the expression.
"Four-way processing" refers to the positive and negative Lookahead and Lookbehind functions supported by many regular expression engines. Not all regular expression engines support four-way authentication. These constructs do not use characters, even if they can match characters. Some modes may not be described without four-way processing. In particular, when a part of a pattern is dependent on another part, this pattern cannot be described without four-way processing. The following describes the syntax of each four-way processing.
Syntax |
Description |
(? = ...) |
Positive Lookahead |
(?!...) |
Negative Lookahead |
(? <= ...) |
Positive Lookbehind |
(? <!...) |
Negative Lookbehind |
Password verification is an example that requires four steps. It is assumed that the password must be 4 to 8 characters long and contain at least one number. To do this, you can only test in matching\ DAnd then use the string operation to test the length. To implement this in a regular expression, you must use Lookahead. In particular, the regular lookahead is shown as follows:^ (? =. * \ D). {4, 8} $