A detailed explanation of C # Regular expression grammar rules __ regular expressions

Source: Internet
Author: User
Tags character classes html tags numeric value pow


Regular expressions usually contain alphabetic text (LiteralText) and metacharacters (metacharacter)
The alphabetic text refers to normal text such as "ABCDE" to match any string containing "ABCDE".
Metacharacters is more flexible in using common expressions to match all strings that conform to this expression pattern.
C # Regular expression syntax one,

Match a single character []--Select a character match from

Intermediate supported types: Word characters ([AE]), non-word characters ([!?,; @#$*]), letter ranges ([A-z]), range of numbers ([0])

eg. regular expression [Ae]ffect
Can match string affect,effect

(In this case "[AE]" is a metacharacters, "Ffect" is the alphabetic text)

Attention:
1. To match a hyphen in a character class, the hyphenation symbol is listed as the first character.

2. You can include multiple character classes in a single regular expression.

eg. [01] [0-9]:[0-5][0-9][ap]m can be used to match all time, such as 12:59pm format

^--excludes certain characters (in [] the meaning of the table, but also the beginning of the string)

eg. regular expression m[^a]t
Can match strings
Cannot match string Met,mit,m&t......mat

C # Regular expression syntax two,
Special characters that can be used to match Special characters:

\t--matching tab \r--matching hard return \f--matching page break \n--matching newline character description represents the metacharacters of the character class:

.--matches any character other than \ n (or any character in single-line mode) \w--matches any word character (any letter or number)
\w--matches any non word character (except letters and numbers) \s--matches any white space characters (including spaces, wrapping, tabs, and so on).
\s--matches any non-white-space character (any character except a space, newline, tab, etc.) \d--matches any number character (0~9 number)
\d--matches any non-numeric character (except 0~9) that represents the character position in the string: ^--matches the beginning of a string (or the beginning of a multiline pattern line).
$--matches the end of a string, or the last character before "\ n" at the end of the string, or the end of a line in a multiline pattern.
\a--matches the beginning of a string (ignoring multiline mode) \z--the last character (ignoring multiline mode) before the end of the string or the end of the string "\ n".
\z--matches the end of the string. \g--matches the location where the current search begins. \b--matches the bounds of the word. \b--matches the non boundary of a word.
Attention:

1. The period character (.) is particularly useful. You can use it to represent any one character.

eg. regular expression 01.17.84
Can match string 01/17/84,01-17-84,011784,01.17.84

2. You can use \b to match the bounds of a word

eg. regular expressions
Can match string \blet\blet
Cannot match string Letter,hamlet

3.\a and \z are very good at ensuring that a string contains an expression, not something else.

eg. to determine if the text control contains the word "Sophia" without any extra characters, line breaks, or whitespace.

\asophia\z

4. Period character (.) has a special meaning, to indicate the meaning of the letter character itself, precede with a backslash: \.

C # regular-expression syntax three,
Match a sequence of two selected characters

|--Match TWO Select a

eg. regular expression col (O|ou) r
Can match string color,colour

Note: \b (bill|ted) and \bbill|ted are different.

The latter can also match "malted" because the \b metacharacters apply only to "Bill".

C # Regular expression syntax four,
Using quantifier matching *--matching 0 or more +--matches 1 or more times?—— match 0 or 1 times {n}--exactly matches n times {n,}--at least match n times {n,m}--matches n times at least,
But not more than m times
eg. regular expression of the brothers?
Can match string brother,brothers

eg. regular expression \bp\d{3,5}
The matching string \b begins with a p followed by the end of 3~5 digits

Note: You can also use quantifiers with () to apply the quantifier to the entire sequence of letters.

eg. regular expression (the). Schoolisbeautiful.
can match string schoolisbeautiful,theschoolisbeautiful.

C # Regular Expression Syntax v.
Recognizing regular expressions and greed some quantifiers are greedy (greedy). They will match as many characters as possible.

such as quantifier * matches 0 or more characters. Suppose you want to match any HTML tag in the string. You may use the following regular expression:

<.*>

Existing string a<i>quantifier</i>canbe<big>greedy</big>

The result <.*> match <i>quantifier</i>canbe<big>greedy</big> all.

To work around this problem, you need to use a special non-greedy character with the quantifier. , so the expression changes as follows:

<.*. >

This will correctly match <i>, </i>, <big>, </big>.

? can force quantifiers to match characters as little as possible. It can also be used in the following quantifiers:

*?—— non-greedy quantifiers * +?—— non-greedy quantifiers +?? --not greedy quantifiers? {n}?—— non greedy quantifier {n} {n}}?—— non-greedy quantifiers
{N,} {n,m}?—— non-greedy quantifier {n,m}
C # Regular expression syntax six,
Capturing and reversing reference capture groups (Capturegroup) is like a variable in a regular expression.
Capturing groups can capture character patterns in regular expressions and refer to the change mode by the number or name following the regular expression.

()--the string used to capture it

\ number--Reference by number

eg.

Regular expression (\w) (\w) \2\1
Can match string ABBA

Attention:
1. Reverse references are very effective for matching HTML tags such as < (\w+) ></\1> can match the tags in similar formats as <table></table>.

2. By default, the characters contained within parentheses are captured as long as the parentheses are used, and the N option is used to disable the default behavior (detailed in article 7th).
or Add. : Into parentheses. eg. (?: Sophia) or (? n:sophia) does not capture Sophia at this time.

(?< capture Group name >) \k< capture group name >--referenced by name

eg.

Regular expression (. <sophia>\w) abc\k<sophia>
Can match string xabcx

Note: There is a slightly different format for using capturing groups in replacement mode, which is to be captured by numeric references using $, $, and ${sophia}, and so on, to refer to capturing groups by name

C # Regular expression syntax seven,
Set options for regular expressions

eg.

Stringstr= "

Regexobjregex=newregex ("Response.Write (Objregex.replace (str, "<fontsize=$1>$2</font>"));

The match performed by i--is case-insensitive (the property in. NET is ignorecase) m--specifies multiline mode (properties in. NET are multiline)
n--only captures groups that display named or numbered (Explicitcapture in. net) c--compiles regular expressions, which results in faster execution, but slows startup (properties in. NET are compiled)
s--Specify Single-line mode (properties in. NET Singleline) x--eliminate non-escaped whitespace characters and annotations (properties in. NET are ignorepatternwhitespace)
r--searches from right to left (properties in. NET are RightToLeft)-—— means disabled.
eg. (? im-r:sophia) allows case-insensitive matching of Sophia, uses multi-line mode, but disables matching from right to left.

Attention:
1.M affects how the starting metacharacters (^) and ending metacharacters ($) are resolved.
In the default case ^ and $ match only the beginning of the entire string, even if the string contains more than one line of text. If M is enabled, they can match the beginning and end of each line of text.

2.S affects how the period metacharacters (.) are resolved. Usually a period can match all characters except line breaks. In Single-line mode, however, a period can also match a newline character.

From:http://greatverve.cnblogs.com/archive/2011/06/27/csharp-reg.html
Library:

Common C # Regular expressions. "^\d+$"//non-negative Integer (positive integer + 0)
"^[0-9]*[1-9][0-9]*$"//Positive integer
"^ ((-\d+) | (0+)) $ "//non-positive integer (negative integer + 0)
"^-[0-9]*[1-9][0-9]*$"//Negative integer
"^-?\d+$"//Integer
"^\d+ (\.\d+)? $"//nonnegative floating-point number (positive float + 0)
"^ ([0-9]+\. [0-9]*[1-9][0-9]*) | ([0-9]*[1-9][0-9]*\. [0-9]+) | ([0-9]*[1-9][0-9]*)] $ "//Positive floating-point number
"^ ((-\d+ (\.\d+)?) | (0+ (\.0+)) $ "//non-positive floating-point number (negative floating-point number + 0)
^ (-([0-9]+\. [0-9]*[1-9][0-9]*) | ([0-9]*[1-9][0-9]*\. [0-9]+) | ([0-9]*[1-9][0-9]*))] $ "//negative floating-point number
"^ (-?\d+) (\.\d+)? $"//floating-point number
"^[a-za-z]+$"//A string of 26 English letters
"^[a-z]+$"//A string of 26 uppercase letters
"^[a-z]+$"///a string consisting of 26 lowercase letters
"^[a-za-z0-9]+$"//A string of numbers and 26 English letters
"^\w+$"//A string of numbers, 26 English letters, or underscores
"^[\w-]+ (\.[ \w-]+) *@[\w-]+ (\.[ \w-]+) +$ "//email address
"^[a-za-z]+://(\w+ (-\w+) *) (\. ( \w+ (-\w+) *)) * (\?\s*) $ "//url
/^ (D{2}|d{4})-((0 ([1-9]{1})) | ( 1[1|2])-(([0-2] ([1-9]{1})) | ( 3[0|1]) $///year-month-day
/^ ((0 ([1-9]{1})) | (1[1|2]) /(([0-2] ([1-9]{1})] | (3[0|1]) /(D{2}|d{4}) $///month/day/year
"^ ([w.] +) @ ([[0-9]{1,3}. [0-9] {1,3}. [0-9] {1,3}.) | (([w-]+.) +)) ([a-za-z]{2,4}| [0-9] {1,3}) (]?) $ "//emil
"(d+-)?" (d{4}-?d{7}|d{3}-?d{8}|^d{7,8}) (-d+)? " Phone number
"^ (d{1,2}|1dd|2[0-4]d|25[0-5]). (D{1,2}|1dd|2[0-4]d|25[0-5]). (D{1,2}|1dd|2[0-4]d|25[0-5]). (D{1,2}|1dd|2[0-4]d|25[0-5]) $ "//IP address

Yyyy-mm-dd basically takes into account the leap year and February.
^ ((1[6-9]| [2-9]\d) \d{2})-(0?[ 13578]|1[02])-(0?[ 1-9]| [12]\d|3[01]) | (((1[6-9]| [2-9]\d) \d{2})-(0?[ 13456789]|1[012])-(0?[ 1-9]| [12]\d|30)] | (((1[6-9]| [2-9]\d) \d{2}) -0?2-(0?[ 1-9]|1\D|2[0-8]) | (((1[6-9]| [2-9]\d) (0[48]|[ 2468][048]| [13579] [26]) | ((16| [2468] [048]| [3579] [26]) 00)) $ -0?2-29-)


C # Regular Expressions
Picture src[^>]*[^/]. (?: jpg|bmp|gif) (?:\"| \')
Chinese ^ ([\u4e00-\u9fa5]+|[ a-za-z0-9]+) $
Web site "\<a.+?href=['" "] (?!) http\:\/\/) (?! mailto\:) (>foundanchor>[^ ' "" >]+?) [^>]*?\> "

Matching regular expressions for Chinese characters: [\U4E00-\U9FA5]

Match Double-byte characters (including Chinese characters): [^\x00-\xff]

A regular expression that matches a blank row: \n[\s|] *\r

Regular expression matching HTML tags:/< (. *) >.*<\/\1>|< (. *) \/>/

Matching a regular expression with a trailing space: (^\s*) | (\s*$) (Trim function like VBScript)

Regular expression matching an email address: \w+ ([-+.] \w+) *@\w+ ([-.] \w+) *\.\w+ ([-.] \w+) *

A regular expression that matches URL URLs: http://([\w-]+\.) +[\w-]+ (/[\w-/?%&=]*)?
---------------------------------------------------------------------------
Here is an example:

Use regular expressions to restrict the entry of text boxes in a Web page's form:

The regular expression limit can only be entered in Chinese: onkeyup= "value=value.replace (/[^\u4e00-\u9fa5]/g,") "Onbeforepaste=" Clipboarddata.setdata (' Text ', Clipboarddata.getdata (' text '). Replace (/[^\u4e00-\u9fa5]/g, ') "

1. Only full-width characters can be entered with regular expression restrictions: onkeyup= "Value=value.replace (/[^\uff00-\uffff]/g,") "Onbeforepaste=" Clipboarddata.setdata (' Text ', Clipboarddata.getdata (' text '). Replace (/[^\uff00-\uffff]/g, ') "

2. Only numbers can be entered with regular expression restrictions: onkeyup= "Value=value.replace (/[^\d]/g,") "Onbeforepaste=" Clipboarddata.setdata (' text ', Clipboarddata.getdata (' text '). Replace (/[^\d]/g, ') "

3. Only numbers and English can be entered using regular expression restrictions: onkeyup= "Value=value.replace (/[\w]/g,") "Onbeforepaste=" Clipboarddata.setdata (' text ', Clipboarddata.getdata (' text '). Replace (/[^\d]/g, ') "

4. Calculate the length of the string (a double-byte character length meter 2,ascii character count 1)

String.prototype.len=function () {return This.replace ([^\x00-\xff]/g, "AA"). Length;}

There is no trim function in 5.javascript like VBScript, we can use this expression to implement the following:

String.prototype.trim = function ()
{
Return This.replace (/(^\s*) | ( \s*$)/g, "");
}

To decompose and transform an IP address using a regular expression:

6. The following is a JavaScript program that uses regular expressions to match an IP address and converts an IP address to a corresponding numeric value:

function IP2V (IP)
{
re=/(\d+) \. (\d+) \. (\d+) \. (\d+)/g//matching the regular expression of the IP address
if (Re.test (IP))
{
Return Regexp.$1*math.pow (255,3)) +regexp.$2*math.pow (255,2)) +regexp.$3*255+regexp.$4*1
}

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.