C # Regular Expression basics
Metacharacters
Character |
Description |
^ |
Match the start position of the line |
$ |
Match the end position of the line |
B |
Match the start or end of a word |
. |
Matches any character other than a newline symbol |
W |
Match word characters (including letters, numbers, underscores, and Chinese characters) |
W |
Match any non-word characters (including letters, numbers, underscores, and Chinese characters) |
S |
Match any white space characters, such as spaces, tabs, line breaks, and so on |
S |
Matching Non-white-space characters |
D |
Match an arbitrary number |
D |
Match a number that is not arbitrary |
Common Character Set:
Character |
Description |
. |
Matches any character other than a newline symbol |
W |
Match word characters (including letters, numbers, underscores, and Chinese characters) |
W |
Match any non-word characters (including letters, numbers, underscores, and Chinese characters) |
S |
Match any white space characters, such as spaces, tabs, line breaks, and so on |
S |
Matching Non-white-space characters |
D |
Match an arbitrary number |
D |
Match a number that is not arbitrary |
[ABCD] |
Match any character in the character set combination |
[^ABCD] |
Matches any character other than the character set combination |
[0-9a-za-z] |
Matches any number, letter (uppercase and lowercase letters) and underscores, equivalent to W |
[^0-9a-za-z] |
Matches numbers, letters (uppercase and lowercase letters) and underscores, equal to W |
P{name} |
Matches any of the characters in the named character class specified by the {name} |
P{name} |
Matches any character outside of the named-named character class specified by {name} |
Common escape characters:
Character |
Description |
A |
Bell Alarm u0007 |
B |
In a regular expression, the boundary of a word, or backspace u0008 if it is in a character class. |
T |
tab u0009 |
R |
return character u000d |
V |
Vertical tab U000B |
F |
Change page character u000c |
N |
Line feed u000a |
E |
Fallback character u001b |
40 |
Match ASCII characters to 8 binary numbers |
X20 |
Match ASCII characters with hexadecimal representation |
Cc |
ASCII control characters, such as CTRL + C |
u0020 |
Match Unicode characters with hexadecimal representation |
Common Qualifier characters:
Character |
Description |
N |
Repeat n times |
{N,} |
Repeat at least n times |
{N,m} |
Repeat at least n times, up to M times |
* |
Repeat at least 0 times, equal to {0,} |
+ |
Repeat at least 1 times, equal to {1,} |
? |
Repeat 0 or 1 times, equivalent to {0,1} |
*? |
Use the duplicate first match as little as possible |
+ |
Use as few repetitions as possible, but use at least once |
?? |
Use 0 repetitions or one repetition |
{n}? |
Equal to {n} |
{N,}? |
Use duplicates as much as possible but use at least n times |
{n,m}? |
Between n times and M times, using as few repetitions as possible |
Replace character : |
Example: Three fixed telephone numbers in some parts of China
0D{2}-D{8}|0D{3}-D{7}|0D{3}-D{8}
Grouping characters : ()
Example: Match a simple IP address
(d{1,3}.) {3}d{1,3}