. NET Framework 4 Regular Expression

Source: Internet
Author: User
Tags processing text expression engine

Source reference from: MSDN http://msdn.microsoft.com/zh-cn/library/hs600312

How regular expressions work

Regular Expression example

Related Topics

The Regular Expression Language element provides information about the character set, operator, and construction that can be used to define regular expressions.

The regular expression object model explains how to use the regular expression class information and code examples.

Detailed information about the regular expression behavior provides information about the function and behavior of the. NET Framework regular expression.

The regular expression example provides a code example to illustrate the typical usage of the regular expression.

Regular Expressions provide a powerful, flexible, and efficient way to process text. The full pattern matching representation of Regular Expressions allows you to quickly analyze a large number of texts to find specific character patterns; Verify the text to ensure that it matches predefined patterns (such as email addresses ); extract, edit, replace, or delete a text substring. Add the extracted string to the set to generate a report. Regular Expressions are an indispensable tool for many applications that process strings or analyze large text blocks.

How regular expressions work

The central component for processing Text using regular expressions is the regular expression engine, which is represented by the System. Text. RegularExpressions. Regex object in. NET Framework. To use a regular expression to process text, at least the following information must be provided to the Regular Expression Engine:

  1. The regular expression pattern to be identified in the text.
    In. in the. NET Framework, the regular expression mode is defined by special syntax or language. This syntax or language is compatible with Perl 5 Regular Expressions and some other functions are added, such as matching from right to left. For more information, see the Regular Expression Language element.
  2. The text to be analyzed for the regular expression mode.

The Regex class allows you to perform the following operations:

  1. Call the IsMatch method to determine whether the input text has a regular expression pattern match. For an example of text verification using the IsMatch method, see How to: verify whether the string is in a valid email format.
  2. You can call the Match or Matches method to retrieve one or all text Matches matching the regular expression pattern. The first method returns the Match object that provides information about the matching text. The second method returns the MatchCollection object, which contains a Match object for each matching item found in the analyzed text.
  3. Replace the text that matches the Regular Expression Pattern by calling the Replace method. For an example of using the Replace method to change the date format and Remove invalid characters from the string, see How to: Strip invalid characters from the string and example: change the date format.

For an overview of the regular expression object model, see the regular expression object model.

Regular Expression example

The String class contains many String search and replacement methods, which can be used when you want to locate a text String in a large String. Regular Expressions are most useful when you want to locate one of several substrings in a large string or when you want to identify the pattern in a string, as shown in the following example.

Example 1: replace a substring

Assume that an email list contains some names, which sometimes include the title (Mr., Mrs., Miss, or Ms.), the last name, and the first name. If you do not want to include a title when generating an envelope tag from the list, you can use a regular expression to remove the title, as shown in the following example.

using System;using System.Text.RegularExpressions;public class Example{   public static void Main()   {      string pattern = "(Mr. |Mrs. |Miss |Ms. )";      string[] names = { "Mr. Henry Hunt", "Ms. Sara Samuels",                          "Abraham Adams", "Ms. Nicole Norris" };      foreach (string name in names)         Console.WriteLine(Regex.Replace(name, pattern, String.Empty));   }}// The example displays the following output://    Henry Hunt//    Sara Samuels//    Abraham Adams//    Nicole Norris

Regular Expression mode (Mr. | Mrs. | Miss | Ms .) match any "Mr. "," Mrs. "," Miss ", or" Ms. ". call Regex. the Replace method uses String. empty replaces the matched string. That is, the matched string is removed from the original string.

Example 2: identify duplicate words

Accidental repetition of words is a common mistake of writers. You can use a regular expression to identify duplicate words, as shown in the following example.

using System;using System.Text.RegularExpressions;public class Class1{   public static void Main()   {      string pattern = @"\b(\w+?)\s\1\b";      string input = "This this is a nice day. What about this? This tastes good. I saw a a dog.";      foreach (Match match in Regex.Matches(input, pattern, RegexOptions.IgnoreCase))         Console.WriteLine("{0} (duplicates '{1})' at position {2}",                            match.Value, match.Groups[1].Value, match.Index);   }}// The example displays the following output://       This this (duplicates 'This)' at position 0//       a a (duplicates 'a)' at position 66

Regular Expression mode \ B (\ w + ?) \ S \ 1 \ B can be interpreted as follows:

\ B starts at the word boundary.

(\ W +) matches one or more word characters. Together, they form a group that can be called \ 1.

\ S matches the white space character.

\ 1 matches a substring in A group named \ 1.

\ B matches the word boundary.

Call the Regex. Matches method by setting the regular expression option to RegexOptions. IgnoreCase. Therefore, the matching operation is case-insensitive. In This example, the substring "this This" is identified as a duplicate.

Note that the input string contains the sub-string "this. This ". However, the child string is not identified as a duplicate due to the insertion of punctuation marks.

Example 3: dynamically generate a unique regional Regular Expression

The following example demonstrates how to combine the regular expression function with the flexibility provided by the globalization feature of. NET Framework. It uses the DateTimeFormatInfo object to determine the system's current region to set the currency format of goods, and then uses this information to dynamically construct a regular expression for extracting currency values from text. For each match, it extracts child groups that only contain numeric strings, converts them to Decimal values, and calculates the cumulative value.

using System;using System.Collections.Generic;using System.Globalization;using System.Text.RegularExpressions;public class Example{   public static void Main()   {      // Define text to be parsed.      string input = "Office expenses on 2/13/2008:\n" +                      "Paper (500 sheets)                      $3.95\n" +                      "Pencils (box of 10)                     $1.00\n" +                      "Pens (box of 10)                        $4.49\n" +                      "Erasers                                 $2.19\n" +                      "Ink jet printer                        $69.95\n\n" +                      "Total Expenses                        $ 81.58\n";       // Get current culture's DateTimeFormatInfo object.      NumberFormatInfo nfi = CultureInfo.CurrentCulture.NumberFormat;      // Assign needed property values to variables.      string currencySymbol = nfi.CurrencySymbol;      bool symbolPrecedesIfPositive = nfi.CurrencyPositivePattern % 2 == 0;      string groupSeparator = nfi.CurrencyGroupSeparator;      string decimalSeparator = nfi.CurrencyDecimalSeparator;      // Form regular expression pattern.      string pattern = Regex.Escape( symbolPrecedesIfPositive ? currencySymbol : "") +                        @"\s*[-+]?" + "([0-9]{0,3}(" + groupSeparator + "[0-9]{3})*(" +                        Regex.Escape(decimalSeparator) + "[0-9]+)?)" +                        (! symbolPrecedesIfPositive ? currencySymbol : "");       Console.WriteLine( "The regular expression pattern is:");      Console.WriteLine("   " + pattern);            // Get text that matches regular expression pattern.      MatchCollection matches = Regex.Matches(input, pattern,                                               RegexOptions.IgnorePatternWhitespace);                     Console.WriteLine("Found {0} matches.", matches.Count);       // Get numeric string, convert it to a value, and add it to List object.      List<decimal> expenses = new List<Decimal>();      foreach (Match match in matches)         expenses.Add(Decimal.Parse(match.Groups[1].Value));            // Determine whether total is present and if present, whether it is correct.      decimal total = 0;      foreach (decimal value in expenses)         total += value;      if (total / 2 == expenses[expenses.Count - 1])          Console.WriteLine("The expenses total {0:C2}.", expenses[expenses.Count - 1]);      else         Console.WriteLine("The expenses total {0:C2}.", total);   }  }// The example displays the following output://       The regular expression pattern is://          \$\s*[-+]?([0-9]{0,3}(,[0-9]{3})*\.?[0-9]+)//       Found 6 matches.//       The expenses total $81.58.

In this example, the regular expression \ $ \ s * [-+] is dynamically generated on the computer where the current culture is set to "English-US" (en-US). ([0-9] {0, 3} (, [0-9] {3}) * (\. [0-9] + )?). The regular expression mode can be interpreted as follows:

\ $ Search for a match of the dollar sign ($) in the input string. The regular expression pattern string contains a backslash to indicate the literal interpretation of the dollar sign rather than using it as the position point of the regular expression. (The Single $ symbol indicates that the regular expression engine should try to start matching at the end of the string .) To ensure that the currency symbol set in the current region is not incorrectly interpreted as a regular expression symbol, this example calls the Escape method to Escape the character.

\ S * searches for zero or multiple matches with white space characters.

[-+]? Search for zero or one matching item of the positive or negative number.

([0-9] {0, 3} (, [0-9] {3}) * (\. [0-9] + )?)

The external parentheses that enclose this expression define the expression as a capture group or subexpression. If a Match is found, information about this part of the matched string can be retrieved from the second Group object (this object is located in the GroupCollection object returned by the Match. Groups attribute ). (The first element in the Set indicates the entire match .)

[0-9] {} searches for the Zero-to-three matching items of the decimal numbers 0 to 9.

(, [0-9] {3}) * searches for zero or multiple matches followed by group delimiters of three decimal digits.

\. Find a match for the decimal separator.

[0-9] + searches for one or more decimal numbers.

(\. [0-9] + )? Searches for zero or one matching item followed by at least one decimal number decimal separator.

If all the sub-modes are found in the input string, the Match is successful and the Match object containing the matching information is added to the MatchCollection object.

Related Topics

The Regular Expression Language element provides information about the character set, operator, and construction that can be used to define regular expressions.

The regular expression object model explains how to use the regular expression class information and code examples.

Detailed information about the regular expression behavior provides information about the function and behavior of the. NET Framework regular expression.

The regular expression example provides a code example to illustrate the typical usage of the regular expression.

Reference

System. Text. RegularExpressions

System. Text. RegularExpressions. Regex

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.