C # regular expression syntax rules

Source: Internet
Author: User
Tags character classes
    Describes the rules of C # regular expression syntax, including character rules and letter and text rules. I hope this article will help you understand C # regular expression syntax.

     

    Regular Expressions usually contain literaltext and Metacharacter)

    Letter text refers to a normal text such as "ABCDE" that can match any string containing "ABCDE.

    Metacharacters are more flexible to use common expressions to match all strings that conform to the regular expression.

    C # regular expression syntax 1. match a single character

    [] -- Select a character match

    Supported types: Word characters ([AE]), non-word characters ([!?,; @ # $ *]), Letter range ([A-Z]), number range ([0])

    Eg. Regular Expressions can match strings

    [AE] ffectaffect, Effect

    (In this example, "[AE]" is a metacharacter and "ffect" is a letter or text)

    Note: 1. to match a hyphen in a character class, use the hyphen as the first character.

    2. A single regular expression can contain multiple character classes.

    Eg. [01] [0-9]: [0-5] [0-9] [AP] m can be used to match all the time in the format of PM

    ^ -- Exclude certain characters (this can be expressed in [] and can also start with a string)

    Eg. Regular Expressions can match strings that cannot match strings

    M [^ A] tmet, MIT, M & T ...... Mat

     

    C # regular expression syntax 2. Matching special characters

    Special characters that can be used:

    /T -- match the tab

    /R -- match the hard carriage return

    /F -- match the Page Break

    /N -- match the linefeed

    Description indicates the metacharacters of the character class:

    . -- Match any character except/N (or any character in single line mode)

    /W -- match any word character (any letter or number)

    /W -- match any non-word characters (any character except letters and numbers)

    /S -- match any blank characters (including spaces, line breaks, tabs, etc)

    /S -- match any non-blank characters (except spaces, line breaks, tabs, and other characters)

    /D -- match any number characters (0 ~ 9)

    /D -- match any non-numeric characters (except 0 ~ Any character other than 9)

    Character position in the string:

    ^ -- Match the start of a string (or the start of a multiline downlink ).

    $ -- Match the end of a string, the last character before the end of a string "/N", or the end of a row in multiline mode.

    /A -- match the start of a string (ignore multiline Mode)

    /Z -- match the end of the string or the last character before the end of the string "/N" (ignore multiline mode ).

    /Z -- match the end of the string.

    /G -- match the start position of the current search.

    /B -- match the boundary of a word.

    /B -- match the non-boundary of a word.

    Note:

    1. periods (.) are particularly useful. It can be used to represent any character.

    Eg. Regular Expressions can match strings

    01.17.8401/17/84, 01-17-84,011784, 01.17.84

    2. You can use/B to match the word boundary.

    Eg. Regular Expressions can match strings that cannot match strings

    /Blet/bletletter, Hamlet

    3./A and/Z are very useful when ensuring that the string contains an expression rather than other content.

    Eg. Determine whether the text control contains the word "sophia" without any additional characters, line breaks, or spaces.

    /Asophia/Z

    4. The period character (.) has a special meaning. to indicate the meaning of the letter character, add a backslash before it :/.

    C # regular expression syntax 3. Matching and selecting a Character Sequence

    | -- Match either

    Eg. Regular Expressions can match strings

    COL (o | ou) rcolor, color

    Note:/B (Bill | Ted) And/bbill | Ted are different.

    The latter can also match "malted" because/B metacharacters are only applied to "bill ".

    C # regular expression syntax 4. Use quantifiers to match

    * -- Match 0 or multiple times

    + -- Match once or multiple times

    ? -- Match 0 times or 1 time

    {N} -- exactly match n times

    {N ,}-- match at least N times

    {N, m} -- match at least N times, but not more than m times

    Eg. Regular Expressions can match strings

    Brothers? Brother, brothers

    Eg. Regular Expressions can match strings

    /BP/d {3, 5}/B starts with P, followed by 3 ~ End with 5 digits

    Note: You can also use the quantifiers with () to apply the quantifiers to the entire letter sequence.

    Eg. Regular Expressions can match strings

    ()? Schoolisbeautiful. schoolisbeautiful, theschoolisbeautiful.

    C # regular expression syntax 5. Recognition of regular expressions and greed

    Some quantifiers are greedy. They match as many characters as possible.

    For example, the quantizer * matches 0 or multiple characters. Assume that you want to match any HTML tag in the string. You may use the following regular expression:

    <. *>

    Existing string a <I> quantifier </I> canbe <big> greedy </big>

    Result <. *> match <I> quantifier </I> canbe <big> greedy </big>.

    To solve this problem, we need to use a special non-Greedy character "?" With the quantifiers. Therefore, the expression changes as follows:

    <. *?>

    In this way, you can match <I>, </I>, <big>, and </big> correctly.

    ? Can force quantifiers to match as few characters as possible ,? It can also be used in the following quantifiers:

    *? -- Non-Greedy quantifiers *

    +? -- Non-Greedy quantifiers +

    ?? -- Non-Greedy quantifiers?

    {N }? -- Non-Greedy quantifiers {n}

    {N ,}? -- Non-Greedy quantifiers {n ,}

    {N, m }? -- Non-Greedy quantifiers {n, m}

    6. Capture and reverse reference

    Capturegroup is like a variable in a regular expression. A capture group can capture the character pattern in a regular expression and reference and modify the pattern by the number or name following the regular expression.

    () -- Used to capture strings

    /Number -- reference by number

    Eg.

    Regular Expressions can match strings

    (/W)/2/1 Abba

    Note: 1. Reverse reference is very effective for matching HTML tags. For example, <(/W +)> </1> can match tags in similar formats such as <Table> </table>.

    2. by default, parentheses are used to capture the characters contained in parentheses. You can use the N option to disable this default behavior (details will be given in article 1 ), or add? : To parentheses. Eg .(? : Sophia) or (? N: Sophia) Sophia is not captured at this time.

    (? <Capture group name>)/k <capture group name> -- reference by name

    Eg.

    Regular Expressions can match strings

    (? <Sophia>/W) ABC/k <Sophia> xabcx

    Note: In the replacement mode, the format of the capture group is slightly different. Capture the group by using a value such as $1 and $2 and reference the capture group by name such as $ {Sophia }.

    7. Set regular expression options

    Eg.

    Stringstr = "<H4> Sophia </H4>"

    Regexobjregex = newregex ("<H (d)> (.*?) </H1> ");

    Response. Write (objregex. Replace (STR, "<fontsize = $1> $2 </font> "));

     

    I -- the matching executed is case-insensitive (the attribute in. NET is ignorecase)

    M -- specify multiline mode (the attribute in. NET is multiline)

    N -- only capture groups with names or numbers displayed (the attribute in. NET is explicitcapture)

    C -- compile the regular expression, which will produce a fast execution speed, but the startup will slow down (the attribute in. NET is compiled)

    S -- specify the singleline mode (the attribute in. NET is singleline)

    X -- remove unescaped spaces and comments (the attribute in. NET is ignorepatternwhitespace)

    R -- search from right to left (the attribute in. NET is righttoleft)

    --- Indicates disabled.

    Eg .(? Im-R: Sophia) supports case-insensitive matching of Sophia. The multi-row mode is used, but the matching from right to left is disabled.

    Note: 1. M will affect how to parse the starting metacharacters (^) and ending metacharacters ($ ). By default, ^ and $ match only the beginning of the entire string, even if the string contains multiple lines of text. If M is enabled, it can match the beginning and end of each line of text.

    2. s will affect how to parse the periods (.). Generally, a period can match all characters except line breaks. However, in single-line mode, a line break can also be matched with a period.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.