. NET regular expression basics: Simple Expressions
You are familiar with the simplest regular expressions, that is, text strings. A specific string can be described by the text itself;FooThis regular expression pattern can precisely match the input stringFoo. In this example, the following input is also matched:TheFoo D was quite tastyIf you want exact match, this may not be the expected result.
Of course, using regular expressions to match exact strings that are equal to itself is meaningless and does not reflect the true effect of regular expressions. If you do not searchFoo, But to search for lettersFWhat should I do if I start all the words or all three letters? At present, this is beyond the reasonable range of text strings. We need to study Regular Expressions in more depth. Below is a text expression example and some matching input.
Mode |
Input matching) |
Foo |
Foo,FooD,FooT, "There's edevilFooT ." |
. NET regular expression basics: qualifier
A qualifier provides a simple method to specify the number of times a specific character or character set can appear repeatedly in the mode. There are three non-explicit delimiters:
*The description "appears 0 or multiple times ".
+, The description "appears one or more times ".
?, The description "appears 0 or 1 time ".
The pattern that always references the left side before the qualifier. It is usually a single character, unless you create a pattern group using parentheses. Below are some examples of the pattern and matched input.
Mode |
Input matching) |
Fo * |
Foo,FoE,FooD,FoooT,"FoRget it ",FUnny, puFfY |
Fo + |
Foo,FoE,FooD,FooT,"FoRget it" |
Fo? |
FoO,FoE,FoOd,FoOt,"FoRget it ",FUnny, puFfY |
In addition to the specified mode,?Character can also be forced or sub-mode to match a minimum number of characters if it matches multiple characters in the input string ).
In addition to non-explicit delimiters, they are generally called delimiters, but they are also called non-explicit delimiters to distinguish them from the next group. In terms of the number of occurrences of a pattern, the concept of a qualifier is very vague. You can use an explicit qualifier to accurately specify numbers, ranges, or number sets. The explicit qualifier is located behind the applied pattern, which is the same as the regular delimiter. The explicit qualifier uses braces {} and the numeric value in it to indicate the upper and lower limits of the number of occurrences of the pattern. For example,X {5}Will match exactly 5 x characters (Xxxxx). If only one number is specified, the maximum number of times is reached. If the number is followed by a comma, for exampleX {5 ,}To match any x characters that appear more than 4 times. Below are some examples of the pattern and matched input.
Mode |
Input matching) |
AB {2} c |
Abbc, AaAbbcCc |
AB {, 2} c |
Ac,Abc,Abbc,AbbcC |
AB {2, 3} c |
Abbc,Abbbc,AbbcC.AbbbcC |
. NET regular expression basics: metacharacters
In a regular expression, there is a special construction, that is, metacharacters. Currently, many metacharacters are known, such*,?,+And{}Character. Other characters have special meanings in the Regular Expression Language. These characters include:$ ^. [(|)]And\.
.Periods or periods) metacharacter is the simplest but most commonly used character. It can match any single character. If you want to specify certain modes that can contain any combination of characters, it is very useful to use periods, but it must be within a specific length range. In addition, we know that the expression will match all the modes contained in a long string. What should we do if we only need the exact match mode? This is often used in verification schemes. For example, make sure that the zip code or phone number entered by the user is in the correct format. Use^Metacharacters can be specified as strings or rows.$Metacharacters can specify the end of a string or line. By adding these characters to the start and end of the mode, you can force the mode to match only the exact matching input string. If^Metacharacters are used in square brackets.[]The start of the specified character class also has special meanings. For details, see.
\Backslash) metacharacters can be used to escape characters based on special meanings, or to specify instances of pre-defined metacharacters. For more information, see. To include text-style metacharacters in a regular expression, you must use a backslash to "escape ". For example, to match a string starting with "c: \", you can use:^ C :\\. Note: Use^Metacharacters indicate that the string must start with this mode, and then escape the text backslash with the backslash.
|MPs Queue) metacharacters are used to specify them alternately, especially to specify "this" or "this" in the mode ". For example,A | BWill match any input content that contains "a" or "B", which is similar to the character class[AB]Very similar.
Brackets()It is used to group modes. It allows a qualifier to appear multiple times in a full mode. To facilitate reading, or to match specific input parts separately, you may be allowed to analyze or reset the format.
Some usage examples of metacharacters are listed below.
Mode |
Input matching) |
. |
A,B,C,1,2,3 |
.* |
Abc,123,Any string,Match when no characters exist |
^ C :\\ |
C: \ windows,C :\\\\\,C: \ foo.txt,C :\Followed by any other content |
Abc $ |
Abc,123abc,To AbcArbitrary string ended |
(Abc) {2, 3} |
Abcabc,Abcabcabc |
. NET regular expression basics: character classes
The character class is the "mini" language in the regular expression, in square brackets[]. The simplest character class is just a two‑dimensional table in parentheses, such[Aeiou]. When using a character class in an expression, you can use either of the characters in the mode, but only one character can be used, unless a qualifier is used ). Note that you cannot use a character class to define a word or mode. You can only define a single character.
To specify any numeric value, you can use the character class[0123456789]. However, because it is not convenient to use characters, you must use a hyphen in brackets.-To define the character range. The special meaning of a hyphen in a character class is not in a regular expression. Therefore, to be precise, it cannot be called a regular expression metacharacters, A hyphen has a special meaning in the character class. To use a hyphen to specify any numeric value, you can use[0-9]. The same is true for lowercase letters.[A-z], Uppercase letters can be usedA-Z. The definition range of a hyphen depends on the character set used. Therefore, the order in which characters appear in ASCII or Unicode tables, for example, determines the characters included in the range. If you need to include a hyphen in the range, specify it as the first character. For example:[-.?]It will match any of the four characters. Note that the last character is a space ). Note that regular expression metacharacters are not specially processed in character classes, so these metacharacters do not need to be escaped. Given that character classes are a language separate from other regular expression languages, character classes have their own rules and syntax.
If you use characters^As the first character of the character class, this class is denied. It can also match any character other than the character class members. Therefore, to match any non-Vowel character, you can use the character class[^ AAeEiIoOuU]. Note: To deny a hyphen, use the hyphen as the second character of the character class, as shown in figure[^-]. Remember,^The role of a character class in the regular expression mode is completely different from that in the character class.
The following lists some character classes used in the operation.
Mode |
Input matching) |
^ B [aeiou] t $ |
Bat,Bet,Bit,Bot,But |
^ [0-9] {5} $ |
11111,12345,99999 |
^ C :\\ |
C: \ windows,C :\\\\\,C: \ foo.txt,C :\Followed by any other content |
Abc $ |
Abc,123abc,To AbcArbitrary string ended |
(Abc) {2, 3} |
Abcabc,Abcabcabc |
^ [^-] [0-9] $ |
0,1,2,... Does not match-0,-1,-2, etc) |
In the next version of. NET Framework, the code name "Whidbey" is added to the character class as a new function, called character class subtraction ). It allows you to subtract another character class from one character class, providing a more readable way to describe certain modes. Its syntax is similar[A-z-[aeiou]Match All lowercase consonants.
. NET regular expression basics: predefined set metacharacters
You can do a lot of work using the current tools. However[0-9]Represents each number in the mode, or worse) Use[0 -9a -ZA-Z]Represents any letter or number, and there is still a long process. To alleviate the pain of dealing with these common but lengthy patterns, a predefined metadatabase character set is defined in advance. Different implementations of Regular Expressions define different predefined metadatabase character sets. The predefined metadatabase character sets described below are combined in. NET Framework.System. Text. RegularExpressionsAPI support. The standard syntax for these predefined metacharacters is in the backslash\Followed by one or more characters. Most predefined metacharacters have only one character, which is easy to use and is an ideal replacement for long character classes. The following are two examples:\ DMatch All numeric values,\ WMatch All characters, letters, numbers, and underscores ). The exception is that some specific character codes match. In this case, you must specify the address of the matched character, as shown in figure\ U000DMatch the Unicode carriage return. The following lists some of the most common character classes and their equivalent metacharacters.
Metacharacters |
Equivalent character class |
\ |
Matching ringtone alert); \ u0007 |
\ B |
Match the word boundary outside the character class, which matches the unsigned character, \ u0008 |
\ T |
Matched tab, \ u0009 |
\ R |
Match carriage return, \ u000D |
\ W |
Match vertical tabs, \ u000B |
\ F |
Match the newline, \ u000C |
\ N |
Match new line, \ u000A |
\ E |
Matching escape character, \ u001B |
\ 040 |
Matches three octal ASCII characters. \ 040 indicates the number of spaces in decimal format 32 ). |
\ X20 |
Use a two-digit hexadecimal number to match ASCII characters. In this example, \ x2-indicates space. |
\ CC |
Matches ASCII control characters. In this example, It is ctrl-C. |
\ U0020 |
Use a 4-digit hexadecimal number to match Unicode characters. In this example, \ u0020 is a space. |
\* |
It does not mean that any character of the pre-defined character class is treated only as this character. Therefore,\*Equivalent\ X 2AIs text *, not * metacharacters ). |
\ P {name} |
Match any character in the named character class "name. Supports Unicode groups and block ranges. For example, Ll, Nd, Z, IsGreek, IsBoxDrawing, and SC currency ). |
\ P {name} |
Match the text not included in the named character class "name. |
\ W |
Match any word character. For non-Unicode and ECMAScript implementations, this is equivalentA-zA-Z_0-9. In Unicode categories, this is equivalent[\ P {Ll} \ p {Lu} \ p {Lt} \ p {Lo} \ p {Nd} \ p {Pc}]. |
\ W |
The negative value of \ w is equivalent to the ECMAScript compatible set.[^ A-zA-Z_0-9]Or Unicode character category[^ \ P {Ll} \ p {Lu} \ p {Lt} \ p {Lo} \ p {Nd} \ p {Pc}]. |
\ S |
Match any characters in the blank area. Equivalent to the Unicode character class[\ F \ n \ r \ t \ v \ x85 \ p {Z}]. If you use the ECMAScript option to specify the ECMAScript compatibility mode, \ s is equivalent[\ F \ n \ r \ t \ v]Note the leading space ). |
\ S |
Match any non-blank area characters. Equivalent to the Unicode character category[^ \ F \ n \ r \ t \ v \ x85 \ p {Z}]. If you use the ECMAScript option to specify the ECMAScript compatibility mode, \ S is equivalent[^ \ F \ n \ r \ t \ v]Note the space after ^ ). |
\ D |
Match any decimal number. In the ECMAScript mode[\ P {Nd}]Non-Unicode[0-9]. |
\ D |
Match any non-decimal number. In the ECMAScript mode[\ P {Nd}]Non-Unicode[^ 0-9]. |
. NET regular expression basics: expression example
Many people like to learn through examples. Below are some examples of expressions.
Mode |
Description |
^ \ D {5} $ |
Five numeric numbers, such as the US Postal code. |
^ (\ D {5}) | (\ d {5}-\ d {4} $ |
5 numeric or 5 Numeric-dashes-4 numeric. Match the United States postal code in 5-digit format, or the United States postal code in 5-digit + 4-digit format. |
^ (\ D {5} (-\ d {4 })? $ |
Same as the previous one, but more effective. Use? The four digits in the mode can be used as an optional part, rather than two different modes ). |
^ [+-]? \ D + (\. \ d + )? $ |
Matches any real number with an optional symbol. |
^ [+-]? \ D *\.? \ D * $ |
It is the same as the previous one, but it also matches an empty string. |
^ (20 | 21 | 22 | 23 | [01] \ d) [0-5] \ d $ |
Matches the time value in the 24-hour format. |
/\*.*\*/ |
Matching C-style comments /*...*/ |
- What is a regular expression: its historical relationship with the. NET Framework
- C # Regular Expressions
- Use regular expressions to make C # determine whether the input date format is correct
- Use regular expressions in. NET in four cases
- Four common functions of JAVA Regular Expressions