C # Regular Expression

Source: Internet
Author: User
What is a regular expression?

Regular Expressions are powerful tools for testing and operating strings. A simple understanding of regular expressions can be considered as a special validation string. Regular Expressions are commonly used to verify the format of user input information, such as the above group "\ W {1, }@\ W {1 ,}\. \ W {1 "is actually used to verify whether the email address is valid. Of course, the regular expression is not only used for verification. It can be said that regular expressions can be used wherever strings are used;

Basic classes involved

The regular expression is written in English (Regular Expression). According to the use range and Word Meaning of the regular expression,. Net sets its namespace to system. Text. regularexpressions;

The namespace contains eight basic classes: capture, capturecollection, group, groupcollection, match, matchcollection, RegEx, and regexcompilationinfo1;

Figure 1 normal expression namespace in the msdn Library

 

Capture Capture results for a single expression
Capturecollection Used to capture strings in a sequence
Group Indicates the result of a single capture.
Groupcollection Indicates a collection of capture groups.
Match Indicates matching results of a single regular expression.
Matchcollection Indicates applying a regular expression to a string through iteration.
RegEx An Immutable regular expression.
Regexcompilationinfo To compile a regular expression, you need to provide information

[Note]

This article is an entry-level article for beginners of regular expressions. We will not introduce advanced groups and their related syntaxes here;

Basic knowledge of Regular Expressions

  • Basic syntax

Regular expressions have their own set of syntax rules. Common syntaxes include; character matching, repeat matching, character locating, escape matching, and other advanced syntaxes (character grouping, character replacement, and character decision making );

Character matching Syntax:

Character syntax Syntax explanation Syntax example
\ D Matching number (0 ~ 9) '\ D' matches 8 and does not match 12;
\ D Match non-Numbers '\ D' matches C and does not match 3;
\ W Match any single character '\ W \ W' matches A3 and does not match @ 3;
\ W Match non-single character '\ W' matches @ and does not match C;
\ S Match blank characters '\ D \ s \ d' matches 3 D and does not match ABC;
\ S Match non-null characters '\ S \ s' matches a #4 and does not match 3 D;
. Match any character '...' Matches a $5 and does not match the line feed;
[…] Match any character in brackets [B-d] matches B, C, and D, but does not match E;
[^…] Match non-parentheses [^ B-Z] matches a and does not match B-Z characters;

Repeated matching Syntax:

Repeated syntax Syntax explanation Syntax example
{N} Match n characters \ D {3} matches \ D, does not match \ D or \ D
{N ,} Match n times or more \ W {2} matched \ W and \ W or above, not matched \ W
{N, m} Match n times above m times \ S {1, 3} matches \ s, and \ s does not match
? Match 0 or 1 time 5? Match 5 or 0, not 5 or 0
+ Match once or multiple times \ S + matches more than one \ s and does not match more than one \ s
* Match more than 0 times \ W * matches 0 and above \ W, but does not match non-N * \ W

Character locating Syntax:

Repeated syntax Syntax explanation Syntax example
^ Locate the start position of the subsequent mode
$ The front mode is at the end of the string
\ Start position of the previous Mode
\ Z End position of the previous Mode
\ Z End position of the previous mode (before line feed)
\ B Match A Word boundary
\ B Match a non-word boundary

Escape matching Syntax:

Escape syntax Involved characters (syntax explanation) Syntax example
"\" + Actual characters \. * +? | () {}^ $ For example, \ matches the character "\"
\ N Match line feed
\ R Match carriage return
\ T Match horizontal tabs
\ V Match vertical tabs
\ F Match form feed
\ NNN Matches an octal ASCII
\ Xnn Matches a hexadecimal ASCII
\ Unnnn Match 4 hexadecimal uniode
\ C + uppercase letters Match Ctrl-UPPERCASE letters Example: \ CS-match Ctrl + S

  • Constructing Regular Expression

Constructing Regular Expressions involves the RegEx class. The RegEx classes include ismatch (), replace (), split (), and match;

(1) ismatch () method;

The ismatch () method is actually a method that returns bool value. If the test character Meets the regular expression, true is returned; otherwise, false is returned;

Example 1: the phone number of the non-Chengdu region is legal;

Analysis: Chengdu telephone numbers consist of 028 *********. The front is fixed area code 028 and the back is 8 digits;

Design Regular Expression: 028 \ D {8} (Interpretation: 028 area code is fixed, followed by 8 numbers \ D );

Program code, as shown in 2:

 

Figure 2 "Example 1" ismatch is a use case

(2) Replace () method;

The Replace () method is actually a replacement method, replacing the regular expression matching pattern;

Example 2: When publishing an article with a public email address, replace @ at to avoid spam;

Analysis: First, you need to determine the email address in the article and then replace the email address.

Design Regular Expression: Determine the email expression "\ W {1, }@w {1 ,}\\.";

Program code: 3;

 

Figure 3 "Example 2" The replace method is a use case

(3) Split () method;

The split () method is actually a sharding method, which is stored in a String Array Based on matching regular expressions;

Example 3: Read all email addresses from the group email address;

Analysis: ";" is used as the separator for group mail, which must be split ";".

Program code: 4;

 

Figure 4 "Example 3" The split method is a use case

Basic method for constructing expressions

The constructor for constructing a RegEx object contains two overloading Methods: one is a constructor without parameters and the other is a constructor with parameters;

  • Basic Format RegEx (string pattern );
  • RegEx (string pattern, regexoptions );

Supplement: regexoptions belongs to the enumeration type, including ignorecase (case-insensitive), reghttoleft (from right to left), none (default), cultureinvariant (ignore region), and multline (multi-row mode) and singleline (single-line mode );

Example 4: Create a valid ISBN verification format;

Analysis: ISBN is a X-XXXXX-XXX-X;

Regular Expression format: \ D-\ D {5}-\ D {3}-\ D

Construct the Regular Expression Function RegEx isbnregex = new RegEx (expression, parameter is blank)

Code details: 5;

 

Figure 5 "Example 4" constructor verification function is a use case

Compile a validation program

To help you learn Regular Expressions and quickly check whether the regular expression statements are correct, an ismatch () method Regular Expression validator is provided below;

  1. Open vs. net, select the Windows Application of Visual C # project in the new project, and name it "regex_tools ";
  2. Then write the interface shown in 6
    Figure 6 regular expression ismatch method validators
  3. Then add the regular expression namespace declaration using system. Text. regularexpressions in the form Declaration;
  4. Write the following code
    • Compile a private parameter determination method, as shown in figure 7;
      Figure 7 private verification parameter Judgment Method
    • Compile the judgment button method, as shown in figure 8;
      Figure 8 ismatch verification and judgment button Method
    • Compile the Clear button method. All text boxes are empty;
    • Compile the program. A simple regular expression validators are successfully generated;

     

     

     

    Comprehensive Analysis of C # Regular Expressions:

    So far, many programming languages and tools have supported regular expressions, of course. net is no exception ,. net base class library contains a namespace and a series of classes that can fully exert the power of Rule expressions.
    The knowledge of regular expressions may be the most worrying thing for many programmers. If you do not have any knowledge about regular expressions, we recommend that you start with the basic knowledge of regular expressions. SeeRegular expression syntax.

    The following describes the regular expressions in C #. The regular expressions in C # are contained in a namespace of the. NET base record. The namespace is system. Text. regularexpressions. The namespace contains eight classes, one enumeration, and one delegate. They are:
    Capture: contains a matching result;
    Capturecollection: the sequence of capture;
    Group: the result of a group record, inherited by capture;
    Groupcollection: a collection of capture groups.
    Match: the matching result of an expression, inherited by the Group;
    Matchcollection: a sequence of match;
    Matchevaluator: The delegate used to perform the replacement operation;
    RegEx: An Example of the compiled expression.
    Regexcompilationinfo: provides information that the compiler uses to compile a regular expression into an independent assembly.
    Regexoptions provides the enumerated values used to set regular expressions.
    The RegEx class also contains some static methods:
    Escape: escape the escape characters in the RegEx string;
    Ismatch: If the expression matches a string, this method returns a Boolean value;
    Match: returns the instance of the match;
    Matches: returns a series of match methods;
    Replace: Replace the matching expression with the replacement string;
    Split: returns a series of strings determined by expressions;
    Unescape: do not escape characters in strings.

    The following describes their usage:
    First, let's look at a simple matching example. First, we start to learn from the simple expressions of the RegEx and match classes. Match m = RegEx. match ("abracadabra", "(a | B | r) +"); now we have an instance of the match class that can be used for testing, for example: If (M. success) {}. If you want to use a matched string, you can convert it into a string: mesaagebox. show ("match =" + M. tostring (); in this example, the following output is obtained: match = Abra. This is the matched string.

    The RegEx class indicates a read-only regular expression class. It also contains various static methods (which will be introduced one by one in the following examples), allowing other regular expression classes to be used without explicitly creating instances of other classes.

    The following code example creates an instance of the RegEx class and defines a simple regular expression when initializing an object. Declare a RegEx object variable: RegEx objalphapatt;, create an instance of the RegEx object, and define its rule: objalphapatt = new RegEx ("[^ A-Za-Z]");

    The ismatch method indicates whether the regular expression specified by the RegEx constructor finds a match in the input string. This is one of the most common methods when we use a C # regular expression. The following example illustrates how to use the ismatch method:
    If (! Objalphapatt. ismatch ("testismatchmethod "))
    Lblmsg. Text = "matched successfully ";
    Else
    Lblmsg. Text = "Mismatch failed ";
    The result of executing this code is "matched successfully"
    If (! Objalphapatt. ismatch ("testismatchmethod7654298 "))
    Lblmsg. Text = "matched successfully ";
    Else
    Lblmsg. Text = "Mismatch failed ";
    The result of executing this code is "the matching fails"

    The escape method uses escape characters as characters instead of escape characters. The minimum metacharacters (\, *, + ,? , |, {, [, (,), ^, $,., #, And blank ). The replace method replaces all the matching items of the character pattern defined by the regular expression with the specified replacement string. The following example shows how to use the RegEx object defined above: objalphapatt. replace ("This [test] ** replace and escape", RegEx. escape ("()"); The returned result is: This \ (\) test \(\)\(\)\(\) \ (\) replace \ (\) and \ (\) escape. If it is not escape, the returned result is: this () test ()() () Replace () and () Escape: Unescape reverses the conversions executed by escape. However, escape cannot completely reverse Unescape.

    The split method splits the input string into a substring array at the position defined by the regular expression match. For example:
    RegEx r = new RegEx ("-"); // split on hyphens.
    String [] S = R. Split ("first-second-third ");
    For (INT I = 0; I <S. length; I ++)
    {
    Response. Write (s [I] + "<br> ");
    }

    The execution result is:
    First
    Second
    Third

    The split method looks the same as the split method of string, but the split method of string splits the string in a separator determined by a regular expression rather than a group of characters.

    The match method searches for matching items of the regular expression in the input string, and the match method of the RegEx class returns the match object. The match class indicates the matching operation result of the regular expression. The following example demonstrates the use of the Match Method and uses the group attribute of the match object to return the Group Object:

    String text = @ "Public String testmatchobj string s string match ";
    String PAT = @ "(\ W +) \ s + (string )";
    // Compile the regular expression.
    RegEx r = new RegEx (Pat, regexoptions. ignorecase );
    // Match the regular expression pattern against a text string.
    Match m = R. Match (text );
    Int matchcount = 0;
    While (M. Success)
    {
    Response. Write ("match" + (++ matchcount) + "<br> ");
    For (INT I = 1; I <= 2; I ++)
    {
    Group G = M. Groups [I];
    Response. Write ("group" + I + "= '" + G + "'" + "<br> ");
    Capturecollection cc = G. captures;
    For (Int J = 0; j <cc. Count; j ++)
    {
    Capture c = Cc [J];
    Response. Write ("capture" + J + "= '" + C + "', position =" + C. index + "<br> ");
    }
    }
    M = M. nextmatch ();
    }

    The running result of this example is:
    Matebook
    Group1 = 'public'
    Capture0 = 'public', position = 0
    Group2 = 'string'
    Capture0 = 'string', position = 7
    Match2
    Group1 = 'testmatchobj'
    Capture0 = 'testmatchobj ', position = 14
    Group2 = 'string'
    Capture0 = 'string', position = 27
    Match3
    Group1 = 'S'
    Capture0 ='s, position = 34
    Group2 = 'string'
    Capture0 = 'string', position = 36

    The matchcollection class indicates a successful read-only set of non-overlapping matches. The matchcollection instance is composed of RegEx. the following example shows how to find all the matches specified in RegEx in the input string and fill in matchcollection.

    Matchcollection MC;
    RegEx r = new RegEx ("match ");
    MC = R. Matches ("matchcollectionregexmatchs ");
    For (INT I = 0; I <MC. Count; I ++)
    {
    Response. Write (MC [I]. Value + "POS:" + Mc [I]. Index. tostring () + "<br> ");
    }
    The instance runs as follows:
    Match pos: 0
    Match pos: 20

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.