Regular Expressions usually contain literaltext and Metacharacter)
Letter text refers to a normal text such as "ABCDE" that can match any string containing "ABCDE.
Metacharacters are more flexible to use common expressions to match all strings that conform to the regular expression.
C # regular expression syntax 1. match a single character
[] -- Select a character match
Supported types: Word characters ([AE]), non-word characters ([!?,; @ # $ *]), Letter range ([A-Z]), number range ([0])
Eg. Regular Expressions can match strings
[AE] ffectaffect, Effect
(In this example, "[AE]" is a metacharacter and "ffect" is a letter or text)
Note: 1. to match a hyphen in a character class, use the hyphen as the first character.
2. A single regular expression can contain multiple character classes.
Eg. [01] [0-9]: [0-5] [0-9] [AP] m can be used to match all the time in the format of PM
^ -- Exclude certain characters (this can be expressed in [] and can also start with a string)
Eg. Regular Expressions can match strings that cannot match strings
M [^ A] tmet, MIT, M & T ...... Mat
C # regular expression syntax 2. Matching special characters
Special characters that can be used:
/T -- match the tab
/R -- match the hard carriage return
/F -- match the Page Break
/N -- match the linefeed
Description indicates the metacharacters of the character class:
. -- Match any character except/N (or any character in single line mode)
/W -- match any word character (any letter or number)
/W -- match any non-word characters (any character except letters and numbers)
/S -- match any blank characters (including spaces, line breaks, tabs, etc)
/S -- match any non-blank characters (except spaces, line breaks, tabs, and other characters)
/D -- match any number characters (0 ~ 9)
/D -- match any non-numeric characters (except 0 ~ Any character other than 9)
Character position in the string:
^ -- Match the start of a string (or the start of a multiline downlink ).
$ -- Match the end of a string, the last character before the end of a string "/N", or the end of a row in multiline mode.
/A -- match the start of a string (ignore multiline Mode)
/Z -- match the end of the string or the last character before the end of the string "/N" (ignore multiline mode ).
/Z -- match the end of the string.
/G -- match the start position of the current search.
/B -- match the boundary of a word.
/B -- match the non-boundary of a word.
Note:
1. periods (.) are particularly useful. It can be used to represent any character.
Eg. Regular Expressions can match strings
01.17.8401/17/84, 01-17-84,011784, 01.17.84
2. You can use/B to match the word boundary.
Eg. Regular Expressions can match strings that cannot match strings
/Blet/bletletter, Hamlet
3./A and/Z are very useful when ensuring that the string contains an expression rather than other content.
Eg. Determine whether the text control contains the word "sophia" without any additional characters, line breaks, or spaces.
/Asophia/Z
4. The period character (.) has a special meaning. to indicate the meaning of the letter character, add a backslash before it :/.
C # regular expression syntax 3. Matching and selecting a Character Sequence
| -- Match either
Eg. Regular Expressions can match strings
COL (o | ou) rcolor, color
Note:/B (Bill | Ted) And/bbill | Ted are different.
The latter can also match "malted" because/B metacharacters are only applied to "bill ".
C # regular expression syntax 4. Use quantifiers to match
* -- Match 0 or multiple times
+ -- Match once or multiple times
? -- Match 0 times or 1 time
{N} -- exactly match n times
{N ,}-- match at least N times
{N, m} -- match at least N times, but not more than m times
Eg. Regular Expressions can match strings
Brothers? Brother, brothers
Eg. Regular Expressions can match strings
/BP/d {3, 5}/B starts with P, followed by 3 ~ End with 5 digits
Note: You can also use the quantifiers with () to apply the quantifiers to the entire letter sequence.
Eg. Regular Expressions can match strings
()? Schoolisbeautiful. schoolisbeautiful, theschoolisbeautiful.
C # regular expression syntax 5. Recognition of regular expressions and greed
Some quantifiers are greedy. They match as many characters as possible.
For example, the quantizer * matches 0 or multiple characters. Assume that you want to match any HTML tag in the string. You may use the following regular expression:
<. *>
Existing string a <I> quantifier </I> canbe <big> greedy </big>
Result <. *> match <I> quantifier </I> canbe <big> greedy </big>.
To solve this problem, we need to use a special non-Greedy character "?" With the quantifiers. Therefore, the expression changes as follows:
<. *?>
In this way, you can match <I>, </I>, <big>, and </big> correctly.
? Can force quantifiers to match as few characters as possible ,? It can also be used in the following quantifiers:
*? -- Non-Greedy quantifiers *
+? -- Non-Greedy quantifiers +
?? -- Non-Greedy quantifiers?
{N }? -- Non-Greedy quantifiers {n}
{N ,}? -- Non-Greedy quantifiers {n ,}
{N, m }? -- Non-Greedy quantifiers {n, m}
6. Capture and reverse reference
Capturegroup is like a variable in a regular expression. A capture group can capture the character pattern in a regular expression and reference and modify the pattern by the number or name following the regular expression.
() -- Used to capture strings
/Number -- reference by number
Eg.
Regular Expressions can match strings
(/W)/2/1 Abba
Note: 1. Reverse reference is very effective for matching HTML tags. For example, <(/W +)> </1> can match tags in similar formats such as <Table> </table>.
2. by default, parentheses are used to capture the characters contained in parentheses. You can use the N option to disable this default behavior (details will be given in article 1 ), or add? : To parentheses. Eg .(? : Sophia) or (? N: Sophia) Sophia is not captured at this time.
(? <Capture group name>)/k <capture group name> -- reference by name
Eg.
Regular Expressions can match strings
(? <Sophia>/W) ABC/k <Sophia> xabcx
Note: In the replacement mode, the format of the capture group is slightly different. Capture the group by using a value such as $1 and $2 and reference the capture group by name such as $ {Sophia }.
7. Set regular expression options
Eg.
Stringstr = "<H4> Sophia </H4>"
Regexobjregex = newregex ("<H (d)> (.*?) </H1> ");
Response. Write (objregex. Replace (STR, "<fontsize = $1> $2 </font> "));
I -- the matching executed is case-insensitive (the attribute in. NET is ignorecase)
M -- specify multiline mode (the attribute in. NET is multiline)
N -- only capture groups with names or numbers displayed (the attribute in. NET is explicitcapture)
C -- compile the regular expression, which will produce a fast execution speed, but the startup will slow down (the attribute in. NET is compiled)
S -- specify the singleline mode (the attribute in. NET is singleline)
X -- remove unescaped spaces and comments (the attribute in. NET is ignorepatternwhitespace)
R -- search from right to left (the attribute in. NET is righttoleft)
--- Indicates disabled.
Eg .(? Im-R: Sophia) supports case-insensitive matching of Sophia. The multi-row mode is used, but the matching from right to left is disabled.
Note: 1. M will affect how to parse the starting metacharacters (^) and ending metacharacters ($ ). By default, ^ and $ match only the beginning of the entire string, even if the string contains multiple lines of text. If M is enabled, it can match the beginning and end of each line of text.
2. s will affect how to parse the periods (.). Generally, a period can match all characters except line breaks. However, in single-line mode, a line break can also be matched with a period.