Overview
Regular expressions, which are mainly used to describe a particular type of text (pattern). The regular expression engine is responsible for finding this particular text in the given string.
This article is mainly to list the regular expression symbols commonly used to classify the description. This article is just a quick understanding of the regular expression of the relevant meta-characters, as a memo, for later understanding of more complex expressions of reference, later on about the regular expression of the relevant content will continue to update this article. Sample language in C #
Overview
Normal characters
Character Set fit
Shorthand for the character set
Specify the number of repetitions of the character
Match position character
Branch substitution characters
Match Special characters
Group, reverse reference, non-capturing group
Greed and non-greed
Backtracking and non-backtracking
Forward pre-search, reverse pre-search
At last
Normal characters
The simplest kind of text description is to give the content to match directly. If you want to "Generic specialization, the decorator pattern, chains of responsibilities, and extensible software." Find pattern, then the regular type Direct is "heels" can
View Code
Character Set fit
Place the character in brackets, which is the character set. A character set that tells the regular engine to match characters from a character set and matches only one character.
Character |
Matched characters |
Example |
[...] |
Match any one of the characters in parentheses |
[ABC] can match a single character, a, B or C, but cannot match other characters |
[^...] |
Match any character in a non-parenthesis |
[^ABC] can match any one character except A,b,c, such as D,e,f |
For example, the word grey Gray (English) and Grey (MEI), in a text match gray or grey, then through the regular type of gr[ae]y, or to find me and my in a text, Regular type is M[ey]
We can also use hyphens in the character set-to denote a range, such as [0-9] to match a 0 to 9 number; [A-za-z] to match the English letter; [0-9 A-za-z] to match a 0 to 9 digit or letter.
View Code
Shorthand for the character set
We often want to match a number, a letter, a whitespace, although it can be represented by ordinary character classes, but not convenient, so the regular hint of some common character sets of shorthand characters
Character |
Matched characters |
Example |
\d |
Any number from 0 to 9 |
\d\d can match 72, but cannot match me or 7a |
\d |
Non-numeric characters |
\d\d can match me, but cannot match 7a or 72 |
\w |
Any word character, such as A-Z, A-Z, 0-9, and underscore characters |
\w\w\w\w can match ab_2, but cannot match [email protected] |
\w |
Non-word characters |
\w can match @, but cannot match a |
\s |
Any whitespace character, including tabs, line breaks, carriage returns, page breaks, and Vertical tabs |
Match all traditional white-space characters |
\s |
Any non-whitespace character |
\s can match any non-whitespace character, such as ~ ~ @#& |
. |
Any one character |
Match any character, except for line breaks |
View Code
Specify the number of repetitions of the character
Specifies how many times the preceding characters are repeated: matches the number of repetitions, and does not match the content. For example, in a series of phone numbers to find a 158-based 11-digit mobile phone number, if we have not learned the following content, the regular expression is 158\d\d\d\d\d\d\d\d, but if we learn the following knowledge, then the regular expression is 158\d{8}
Character |
Matched characters |
Example |
N |
matches the preceding character n times |
X{2}, can match xx, but cannot match xxx |
{N,} |
Match previous characters n times or more |
X{2,} can be 2 or more x, such as can match xx,xxx,xxxx,xxxxxx |
{N,m} |
Matches the preceding character at least n times, up to M times. If n is 0, you can specify no |
x{2,4}, matching xx,xxx,xxxx, but not matching x,xxxxx |
? |
Matches the preceding character 0 or 1 times, equivalent to {0,1} |
X? Match x or empty |
+ |
Matches the preceding character 1 or more times, equivalent to {1,} |
x+ match x,xx, or xxx |
* |
Matches the preceding character 0 or more times |
x* match 0 or more X |
View Code
Match the position of the character
Now we have learned to match most of the text using the character set, the shorthand for the character set. But what if we encounter the following situation?
The first word that requires matching text is Google
Require matching text to end with bye
Require matching text The first word in each line is a number
Requires matching a word to start with Hel
The above matches a location, but the need to match any content is normal. Special characters are also provided in regular expressions to match locations (mismatched content). If you match the start position of the text with the end position of the matching text, \b matches the boundary of a word
Character |
Matched characters |
Example |
^ |
The pattern thereafter must be at the beginning of the string and, if it is a multiline string, at the beginning of any row. For multi-line text (with carriage return), you need to set the multiline identity |
|
$ |
The preceding pattern must be at the end of the string, and if it is a multiline string, it should be at the end of any line |
|
\b |
Match the boundaries of a word, |
|
\b |
Matches a non-word boundary, not at the beginning or end of a word |
|
\a |
The preceding pattern must start at the beginning of the string and ignore multiple lines of identification |
|
\z |
The preceding pattern must be at the end of the string and ignore multiple lines of identification |
|
\z |
The preceding pattern must be at the end of the string, or before the line break |
|
View Code
Branch substitution characters
In the character set, we can use the brackets to specify any one of the characters in the brackets, that is, the pattern can list a variety of character stories, and the matched text can be matched as long as any of the stories match them. There is no such mechanism, there are multiple patterns in the same regular pattern, which can be matched only if any of these patterns are satisfied. Together, a complex regular can be divided into a regular formula of relatively simple sub-groups. Similar to the meaning of the logical symbol OR.
Character |
Matched characters |
Example |
| |
Select a match to match any previous or subsequent pattern |
Cat|mouse can match cat or mouse |
View code matches special characters
To this point, we already know the character set, some shorthand character sets, match the position of the character, specify the number of matches of the character, branch match. The symbols we use represent a variety of specific meanings in regular expressions. So what should we do when we want to match the characters themselves? Precede the special characters with \, the following is a list of some of the escape characters for common special characters
Character |
Matched characters |
Example |
\\ |
Match character \ |
|
\. |
Matches the character. |
|
\* |
Match characters * |
|
\+ |
Match character + |
|
\? |
Match character? |
|
\| |
Match characters | |
|
\( |
Match characters ( |
|
\) |
Match character) |
|
\{ |
Match character { |
|
\} |
Match character} |
|
\^ |
Match character ^ |
|
\$ |
Match characters $ |
|
\ n |
Match character N |
|
\ r |
Match character R |
|
\ t |
Match character T |
|
\f |
Match character F |
|
\nnn |
Matches the ASCII character specified by a three-bit octal number, such as \103 matches an uppercase C |
|
\xnn |
Matches the ASCII character specified by a two-bit hexadecimal number, such as \x43 match C |
|
\xnnnn |
Matches the Unicode character specified by a four-bit hexadecimal number |
|
\cv |
Matches a control character, such as a \CV match Ctrl + V |
|
View code group, reverse reference, non-capturing group
Groups, which can be enclosed in parentheses and used independently by the regular expression, are called a group in the regular style between parentheses. You can apply a match number of characters and branch matching characters to a group.
1 Example: public void Set, public void SetValue
Regular type set (Value)? , where (value) is a group that matches the number of characters? Applies to the entire group (value) and can be matched to a set or SetValue
2 Example: Out of sight, out of mind
Regular formula: "(out of) sight, \1 Mind"
The regular expression engine stores what is matched in "()" as a "group" and can be referenced in an indexed manner. "\1" in an expression that is used to reverse the first group that appears in the expression. Also in C #, you can access the contents of a captured group through a group. Note that groups[0] is the entire matching string, and the contents of the group start at index 1
View Code
3 can be indexed according to the group name. Use the following format to identify the name of a group (? <groupname> ...)
Regular formula: "(? <group1>out of) sight, \1 Mind"
View Code
4 references outside the expression, for external $ index, or group name with ${group name}
Example: Out of of sight, out of mind
Regular type "(? <group1>[a-z]+) \1"
View Code
5 non-capturing group, add before group?: Because some groups express only a choice to replace, when we do not want to use waste storage, to use does not capture the group
"(?: O UT of) Sight "
View Code
Character |
Matched characters |
Example |
(? <groupname>exp) |
Match exp, and capture the text into a group named name |
|
(?: EXP) |
Matches exp, does not capture matching text, and does not assign group numbers to this group |
|
Greed and non-greed
The engine of the regular expression is greedy by default, and as long as the pattern allows, it will match as many characters as possible. You can change the matching pattern to non-greedy by adding "?" after "Repeat description character (*,+, etc.)". Greed and non-greed are closely related to the content of a specified number of repetitions.
Character |
Matched characters |
Example |
? |
If it is followed by a quantifier (that is, a character that specifies the number of matches), then the regular expression takes a non-greedy pattern |
|
Example out of sight, out of mind
Greedy regular type:. * of output out of sight, out of
Non-greedy regular formula:. *? of output out of
An additional example
Input: The title of Cnblog is
Target: Match HTML tags
Regular type 1:<.+>
Regular 1 output:
Regular type 2:<.+?>
Regular 2 output:
View code backtracking and non-backtracking
Use "(...)" Non-retrospective declaration of the method. Because of the greedy nature of the regular expression engine, which in some cases causes it to backtrack to get a match, consider the following example:
Example: Live for nothing, die for something
Regular (default non-backtracking): ". *thing," Output live for nothing. ". *" due to its greedy nature, it will always match to the end of the string, followed by "thing", but fails when matching "," when the engine will backtrack and match successfully at "thing,"
Regular (backtracking): "(? >.*) thing," no match for anything. The entire expression match failed due to forced non-backtracking
View Code
Character |
Matched characters |
Example |
(...) |
Do not backtrack when matching intra-group expressions |
|
Forward pre-search, reverse pre-search
Matches a specific pattern, and declares the preceding or subsequent content. It's similar to the matching position.
Character |
Matched characters |
Example |
(? =exp) |
The left pattern must be followed by exp, and the declaration itself is not part of the matching result |
|
(?! Exp |
The left side of the pattern cannot be followed by exp, and the declaration itself is not part of the match result |
|
(? <=exp) |
The right-hand pattern must be preceded by exp, and the declaration itself is not part of the matching result |
|
(? <!exp) |
The right-hand pattern cannot be preceded by exp, and the declaration itself is not part of the matching result |
|
View Code
At last
Reference Address:
Regular Expressions 30-minute introductory tutorial
Regular Expressions Tutorial
NET Framework Regular Expressions
. NET Advanced Series: C # Regular Expression collation memo
C#_ Regular Expressions