Features of. NET regular expressions using advanced skills

Source: Internet
Author: User
Tags control characters net regex repetition alphanumeric characters expression engine
Syntax: ??, *?, + ?, {N }?, {N, m }?

Meaning: Simply put, the following? (LazyCharacterTell the Regular Expression Engine that the expression in front of it does not need to be matched if it matches the shortest match, as shown in??,? It matches 0-1 matching items, then ?? It is the shortest, and 0 items won't be matched. Likewise,*?Match 0,+?Match 1,{N }?Match n,{N, m }?Match n. When using@ "\ W *?"Match"Abcd"There will beFive timesMatched successfully,The matching result is a null string.,Why?5TimesThis is because the Regular Expression Engine compares a character to a character when matching an expression. Every successful match, let's move forward.

Judgment Expression

Syntax:

1. A | B. This is the most basic. A or B is actually not A judgment.

2 ,(? (Expression) yes-expression | no-expression), where no-expression is optional, meaning. If expression is true, yes-expression must be matched; otherwise, no-expression must be matched.

3 ,(? (Group-name) yes-expressioin | no-expression), where no-expression is optional, meaning that if a group named group-name matches successfully, yes-expression must be matched, otherwise, no-expression must be matched.

The expression can be well understood. The only thing to note is :@"(? (A) A | B) "cannot match" AA ". Why? How can I write a match? Let's think about it first ......

We should write Regex :@"(? (A) AA | B) ". Note that the content in the expression is not part of the yes-expression or no-expression.

Features of the. net Regular Expression Engine

The. net RegEx engine works in the same way as we take it for granted, but there are several points to note:

1. the. NET Framework Regular Expression Engine matches as many characters as possible (Greedy ). Because of this, do not use regular expressions such as @ "<. *> (. *) </. *>" to find all innerText in an HTML document. (I also saw someone writing regular expressions on the Internet and decided to write "Regular Expression advanced skills)

2. the. NET Framework Regular Expression Engine is a back-to-back regular expression matcher. It is incorporated into the traditional non-deterministic finite automaton (NFA) Engine (such as the engine used by Perl and Python ). This makes it different from the faster, but more limited functional pure expressions deterministic finite automaton (DFA) engine .. . NET Framework Regular Expression Engine tries to match successfully, so when @ "\ w + \. (. *)\. in \ w +. * Set www. .csdn.net in .csdn.net all matched, let the following \. when \ w + does not have any characters to match, the engine will trace back to get a successful match.
 
NET Framework Regular Expression Engine also includes a complete set of syntaxes, allowing programmers to manipulate the Backtracking engine. Including:

"Inertia" qualifier :?? ,*? , +? , {N, m }?. These inertia delimiters indicate that the backtracing engine first searches for the smallest number of duplicates. In contrast, the common "greedy" qualifier first attempts to match the maximum number of duplicates.

Match from right to left. This is useful when searching from right to left, not from left to right, or it is more useful to start searching from the right of the pattern than from the left of the pattern.

3. In the. NET Framework Regular Expression Engine (expression1 | expression2 | expression3), expression1 is always the first attempt, followed by expression2 and expression3.

PublicstaticvoidMain ()
{
Strings = "THINisaasp. netdeveloper .";
Regexreg = newRegex (@ "(\ w {2} | \ w {3} | \ w {4})", RegexOptions. Compiled | RegexOptions. IgnoreCase );
MatchCollectionmc = reg. Matches (s );
Foreach (Matchminmc)
Console. WriteLine (m. Value );
Console. ReadLine ();
}

The output result is: 'th'' IN ''is'' as ''ne ''de ''ve ''lo'' 'pe'

  Appendix

Escape Character Description
General characters Except. $ ^ {[(|) * +? \, Other characters match themselves.
\ It matches the Bell (Alarm) \ u0007.
\ B In a regular expression, \ B indicates the word boundary (between \ w and \ W). However, in the [] character class, \ B indicates the return character. In the replacement mode, \ B always indicates the return character.
\ T Match with the Tab \ u0009.
\ R Match the carriage return \ u000D.
\ V Match the vertical Tab \ u000B.
\ F Match with the newline \ u000C.
\ N Match the linefeed \ u000A.
\ E Matches the Esc character \ u001B.
\ 040 Match the ASCII character to the octal number (up to three digits). If the number without leading zero is only one digit or corresponds to the capture group number, the digit is backward referenced. For example\040Space.
\ X20 The hexadecimal representation (exactly two digits) matches the ASCII character.
\ CC Matches ASCII control characters. For example,\cCCtrl-C.
\ U0020 The hexadecimal representation (exactly four digits) matches the Unicode character.
\ It matches the character that is not recognized as an escape character. For example, \ * is the same as \ x2A.
Character class Description
. Matches any character except \ n. If you have modified the Singleline option, the period can match any character.
[Aeiou] Matches any single character in the specified character set.
[^ Aeiou] Matches any single character that is not in the specified character set.
[0-9a-fA-F] The hyphen (-) allows you to specify the range of consecutive characters.
\ P {name}

Matches any character in the name character class specified by {name. The supported names are Unicode groups and block ranges. For example, Ll, Nd, Z, IsGreek, and IsBoxDrawing. You can use the GetUnicodeCategory method to find the Unicode category to which a character belongs.

\ P {name} Match the text that is not included in the group and block range specified in {name.
\ W Match any word character. It is equivalent to the Unicode character category [\ p {Ll} \ p {Lu} \ p {Lt} \ p {Lo} \ p {Nd} \ p {Pc} \ p {Lm }]. \ W is equivalent to [a-zA-Z_0-9] If the ECMAScript-compliant behavior is specified with the ECMAScript option.
\ W Matches any non-word character. It is equivalent to the Unicode character category [^ \ p {Ll} \ p {Lu} \ p {Lt} \ p {Lo} \ p {Nd} \ p {Pc} \ p {lm}]. If the ECMAScript-compliant behavior is specified with the ECMAScript option, \ W is equivalent to [^ a-zA-Z_0-9].
\ S Matches any blank character. It is equivalent to the Unicode character category [\ f \ n \ r \ t \ v \ x85 \ p {Z}]. If the ECMAScript option is used to specify ECMAScript-compliant behavior, \ s is equivalent to [\ f \ n \ r \ t \ v].
\ S Matches any non-blank characters. It is equivalent to the Unicode character category [^ \ f \ n \ r \ t \ v \ x85 \ p {Z}]. If the ECMAScript option is used to specify ECMAScript-compliant behavior, \ S is equivalent to [^ \ f \ n \ r \ t \ v].
\ D Matches any decimal number. For ECMAScript behaviors of Unicode classes, it is equivalent to \ p {Nd}. For ECMAScript behaviors of non-Unicode classes, it is equivalent to [0-9].
\ D Matches any non-digit. For ECMAScript behaviors of Unicode classes, it is equivalent to \ P {Nd}. For ECMAScript behaviors of non-Unicode classes, it is equivalent to [^ 0-9].
Assertions Description
> ^ The specified match must start with the string or line.
$ The specified match must appear at the following position: the end of the string, before \ n at the end of the string, or the end of the row.
\ Specifies that the match must appear at the beginning of the string (ignore the Multiline option ).
\ Z Specifies that the match must appear at the end of the string or before \ n at the end of the string (ignore the Multiline option ).
\ Z Specifies that the match must appear at the end of the string (ignore the Multiline option ).
\ G The specified match must appear at the end of the previous match. This assertion ensures that all matches are continuous when used together with Match. NextMatch.
\ B The specified match must appear on the boundary between the \ w (alphanumeric) and \ W (non-alphanumeric) characters. A match must appear on the word boundary, that is, the first or last character in a word separated by any non-alphanumeric characters.
\ B The specified match cannot appear on the \ B boundary.
Qualifier Description
* Specify zero or more matches, for example, \ w * or (abc )*. Equivalent to {0 ,}.
+ Specify one or more matches, for example, \ w + or (abc) +. Equivalent to {1 ,}.
? Specify zero or one match. For example, \ w? Or (abc )?. It is equivalent to {0, 1 }.
{N} Exactly n matches are specified. For example, (pizza) {2 }.
{N ,} Specify at least n matches; for example, (abc) {2 ,}.
{N, m} Specify at least n matches but no more than m matches.
*? Specify to use the first duplicate match as little as possible (equivalent to lazy *).
+? Specify to use as few duplicates as possible but at least once (equivalent to lazy + ).
?? Specify zero repetition (if possible) or one repetition (lazy ?).
{N }? It is equivalent to {n} (lazy {n }).
{N ,}? Specify to use as few duplicates as possible but at least n times (lazy {n ,}).
{N, m }? Specify to use as few duplicates as possible between n and m times (lazy {n, m }).

-- End --

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.