C#_ Regular Expressions

Last Update:2015-09-14 Source: Internet

Author: User

Tags character classes expression engine

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Overview

Regular expressions, which are mainly used to describe a particular type of text (pattern). The regular expression engine is responsible for finding this particular text in the given string.

This article is mainly to list the regular expression symbols commonly used to classify the description. This article is just a quick understanding of the regular expression of the relevant meta-characters, as a memo, for later understanding of more complex expressions of reference, later on about the regular expression of the relevant content will continue to update this article. Sample language in C #

Overview

Normal characters

Character Set fit

Shorthand for the character set

Specify the number of repetitions of the character

Match position character

Branch substitution characters

Match Special characters

Group, reverse reference, non-capturing group

Greed and non-greed

Backtracking and non-backtracking

Forward pre-search, reverse pre-search

At last

Normal characters

The simplest kind of text description is to give the content to match directly. If you want to "Generic specialization, the decorator pattern, chains of responsibilities, and extensible software." Find pattern, then the regular type Direct is "heels" can

View Code

Character Set fit

Place the character in brackets, which is the character set. A character set that tells the regular engine to match characters from a character set and matches only one character.

Character	Matched characters	Example
[...]	Match any one of the characters in parentheses	[ABC] can match a single character, a, B or C, but cannot match other characters
[^...]	Match any character in a non-parenthesis	[^ABC] can match any one character except A,b,c, such as D,e,f

For example, the word grey Gray (English) and Grey (MEI), in a text match gray or grey, then through the regular type of gr[ae]y, or to find me and my in a text, Regular type is M[ey]

We can also use hyphens in the character set-to denote a range, such as [0-9] to match a 0 to 9 number; [A-za-z] to match the English letter; [0-9 A-za-z] to match a 0 to 9 digit or letter.

View Code

Shorthand for the character set

We often want to match a number, a letter, a whitespace, although it can be represented by ordinary character classes, but not convenient, so the regular hint of some common character sets of shorthand characters

Character	Matched characters	Example
\d	Any number from 0 to 9	\d\d can match 72, but cannot match me or 7a
\d	Non-numeric characters	\d\d can match me, but cannot match 7a or 72
\w	Any word character, such as A-Z, A-Z, 0-9, and underscore characters	\w\w\w\w can match ab_2, but cannot match [email protected]
\w	Non-word characters	\w can match @, but cannot match a
\s	Any whitespace character, including tabs, line breaks, carriage returns, page breaks, and Vertical tabs	Match all traditional white-space characters
\s	Any non-whitespace character	\s can match any non-whitespace character, such as ~ ~ @#&
.	Any one character	Match any character, except for line breaks

View Code

Specify the number of repetitions of the character

Specifies how many times the preceding characters are repeated: matches the number of repetitions, and does not match the content. For example, in a series of phone numbers to find a 158-based 11-digit mobile phone number, if we have not learned the following content, the regular expression is 158\d\d\d\d\d\d\d\d, but if we learn the following knowledge, then the regular expression is 158\d{8}

Character	Matched characters	Example
N	matches the preceding character n times	X{2}, can match xx, but cannot match xxx
{N,}	Match previous characters n times or more	X{2,} can be 2 or more x, such as can match xx,xxx,xxxx,xxxxxx
{N,m}	Matches the preceding character at least n times, up to M times. If n is 0, you can specify no	x{2,4}, matching xx,xxx,xxxx, but not matching x,xxxxx
?	Matches the preceding character 0 or 1 times, equivalent to {0,1}	X? Match x or empty
+	Matches the preceding character 1 or more times, equivalent to {1,}	x+ match x,xx, or xxx
*	Matches the preceding character 0 or more times	x* match 0 or more X

View Code

Match the position of the character

Now we have learned to match most of the text using the character set, the shorthand for the character set. But what if we encounter the following situation?

The first word that requires matching text is Google

Require matching text to end with bye

Require matching text The first word in each line is a number

Requires matching a word to start with Hel

The above matches a location, but the need to match any content is normal. Special characters are also provided in regular expressions to match locations (mismatched content). If you match the start position of the text with the end position of the matching text, \b matches the boundary of a word

Character	Matched characters	Example
^	The pattern thereafter must be at the beginning of the string and, if it is a multiline string, at the beginning of any row. For multi-line text (with carriage return), you need to set the multiline identity
$	The preceding pattern must be at the end of the string, and if it is a multiline string, it should be at the end of any line
\b	Match the boundaries of a word,
\b	Matches a non-word boundary, not at the beginning or end of a word
\a	The preceding pattern must start at the beginning of the string and ignore multiple lines of identification
\z	The preceding pattern must be at the end of the string and ignore multiple lines of identification
\z	The preceding pattern must be at the end of the string, or before the line break

View Code

Branch substitution characters

In the character set, we can use the brackets to specify any one of the characters in the brackets, that is, the pattern can list a variety of character stories, and the matched text can be matched as long as any of the stories match them. There is no such mechanism, there are multiple patterns in the same regular pattern, which can be matched only if any of these patterns are satisfied. Together, a complex regular can be divided into a regular formula of relatively simple sub-groups. Similar to the meaning of the logical symbol OR.

Character	Matched characters	Example
\|	Select a match to match any previous or subsequent pattern	Cat\|mouse can match cat or mouse

View code matches special characters

To this point, we already know the character set, some shorthand character sets, match the position of the character, specify the number of matches of the character, branch match. The symbols we use represent a variety of specific meanings in regular expressions. So what should we do when we want to match the characters themselves? Precede the special characters with \, the following is a list of some of the escape characters for common special characters

Character	Matched characters	Example
\\	Match character \
\.	Matches the character.
\*	Match characters *
\+	Match character +
\?	Match character?
\\|	Match characters \|
\(	Match characters (
\)	Match character)
\{	Match character {
\}	Match character}
\^	Match character ^
\$	Match characters $
\ n	Match character N
\ r	Match character R
\ t	Match character T
\f	Match character F
\nnn	Matches the ASCII character specified by a three-bit octal number, such as \103 matches an uppercase C
\xnn	Matches the ASCII character specified by a two-bit hexadecimal number, such as \x43 match C
\xnnnn	Matches the Unicode character specified by a four-bit hexadecimal number
\cv	Matches a control character, such as a \CV match Ctrl + V

View code group, reverse reference, non-capturing group

Groups, which can be enclosed in parentheses and used independently by the regular expression, are called a group in the regular style between parentheses. You can apply a match number of characters and branch matching characters to a group.

1 Example: public void Set, public void SetValue

Regular type set (Value)? , where (value) is a group that matches the number of characters? Applies to the entire group (value) and can be matched to a set or SetValue

2 Example: Out of sight, out of mind

Regular formula: "(out of) sight, \1 Mind"

The regular expression engine stores what is matched in "()" as a "group" and can be referenced in an indexed manner. "\1" in an expression that is used to reverse the first group that appears in the expression. Also in C #, you can access the contents of a captured group through a group. Note that groups[0] is the entire matching string, and the contents of the group start at index 1

View Code

3 can be indexed according to the group name. Use the following format to identify the name of a group (? <groupname> ...)

Regular formula: "(? <group1>out of) sight, \1 Mind"

View Code

4 references outside the expression, for external $ index, or group name with ${group name}

Example: Out of of sight, out of mind

Regular type "(? <group1>[a-z]+) \1"

View Code

5 non-capturing group, add before group?: Because some groups express only a choice to replace, when we do not want to use waste storage, to use does not capture the group

"(?: O UT of) Sight "

View Code

Character	Matched characters	Example
(? <groupname>exp)	Match exp, and capture the text into a group named name
(?: EXP)	Matches exp, does not capture matching text, and does not assign group numbers to this group

Greed and non-greed

The engine of the regular expression is greedy by default, and as long as the pattern allows, it will match as many characters as possible. You can change the matching pattern to non-greedy by adding "?" after "Repeat description character (*,+, etc.)". Greed and non-greed are closely related to the content of a specified number of repetitions.

Character	Matched characters	Example
?	If it is followed by a quantifier (that is, a character that specifies the number of matches), then the regular expression takes a non-greedy pattern

Example out of sight, out of mind

Greedy regular type:. * of output out of sight, out of

Non-greedy regular formula:. *? of output out of

An additional example

Input: The title of Cnblog is

Target: Match HTML tags

Regular type 1:<.+>

Regular 1 output:

Regular type 2:<.+?>

Regular 2 output:

View code backtracking and non-backtracking

Use "(...)" Non-retrospective declaration of the method. Because of the greedy nature of the regular expression engine, which in some cases causes it to backtrack to get a match, consider the following example:

Example: Live for nothing, die for something

Regular (default non-backtracking): ". *thing," Output live for nothing. ". *" due to its greedy nature, it will always match to the end of the string, followed by "thing", but fails when matching "," when the engine will backtrack and match successfully at "thing,"

Regular (backtracking): "(? >.*) thing," no match for anything. The entire expression match failed due to forced non-backtracking

View Code

Character	Matched characters	Example
(...)	Do not backtrack when matching intra-group expressions

Forward pre-search, reverse pre-search

Matches a specific pattern, and declares the preceding or subsequent content. It's similar to the matching position.

Character	Matched characters	Example
(? =exp)	The left pattern must be followed by exp, and the declaration itself is not part of the matching result
(?! Exp	The left side of the pattern cannot be followed by exp, and the declaration itself is not part of the match result
(? <=exp)	The right-hand pattern must be preceded by exp, and the declaration itself is not part of the matching result
(? <!exp)	The right-hand pattern cannot be preceded by exp, and the declaration itself is not part of the matching result

View Code

At last

Reference Address:

Regular Expressions 30-minute introductory tutorial

Regular Expressions Tutorial

NET Framework Regular Expressions

. NET Advanced Series: C # Regular Expression collation memo

C#_ Regular Expressions

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More