Introduction of regular expressions
Translation: Northtibet
Some beginners are not very familiar with regular expressions, it is necessary to make a brief review here. If you're a regular expression master, you don't have to look at this part.
A regular expression is a string that describes a set of strings. For example, the regular expression "mic*" describes all strings that contain "Mic" followed by 0 or more characters. Mickey, Microsoft, Michelangelo, or the Mic itself are examples. Period "." Matches any character, "+" similar to "*", but at least one character, so "mic+" matches all except "Mic" strings above. [A-z] refers to a matching range, so [a-za-z_0-9] matches letters, numbers, or underscores. A Regex is called a word character and can be written "\w". So "\w+" matches at least one character sequence of words--in other words, the C sign (c tokens). Thus, almost all C symbols cannot begin with a number, so the following regular expression is correct: "^[a-za-z_]\w*$". The special character "^" means "with ... Start "(unless it is within a certain range, at which point it means" not ")," $ "means" end ", then" ^[a-za-z_]\w*$ "means a letter, a number, or an underscore string that starts with a letter or an underscore.
Regular expressions are useful when validating input. \d Match number, {n} match repeat n times, so ^5\d{15}$ match 5 start 16 digits, that is to say MasterCard credit card number. That ^[45]\d{15}$ is the visa number, which begins with 4. You can group expressions with braces, and here's a test. What does this expression describe?
^\d{5} (-\d{4}) {0,1}$
Hint: {0,1} means repeat 0 or 1 times (can be abbreviated as a question mark?). Have you figured it out yet? The expression means: repeat 0 or 1 times after five digits (dash followed by four digits). It matches 02142 and 98007-4235, but does not match 3245 or 2345-98761. This is also the postal code of the United States. Large brackets group Zip+4 parts, so the {0,1} modifier applies to the entire grouping.
I have only a taste of what the regular expression can do. I have not mentioned the replacement, because I do not have specific information, so I do not dare to describe what will happen in Unicode. But you can feel how powerful the regular expression is. For years they have been a mainstay of UNIX and have been perfected in languages such as Web programming and Perl, and their HTML operations are almost entirely about text processing. Regular expressions are not fully used in Windows until the. NET framework is available, and it is officially a member of the Windows family.
Frame Regex Class
The. NET framework implements regular expressions with the Regex class and has three support classes: Match, Group, and Capture (see Figure A). Typically, you create a Regex and call Regex::match with the input string to get the first match, or use regex::matches to get all the matches:
Regex *r = new Regex ("\b\w+\b"); matchcollection* MC = r->matches ("ABC, _foo,<& mumble7"); for (int i=0; i<mc->count; i++) {Match *m = Mc->index (i); Console.WriteLine (M->value); }
This will show "abc", "foo" and "Mumble7", each matching on one line. This example introduces a special character \b, called an anchor or an atomic 0-width assertion, just like ^ (start) and $ (end). \b Specifies the bounds of a word, so "\b\w+\b" means one or more word characters separated by a word.
Figure A Regex Class
Each bracket expression in a regular expression forms a Group. Regex::groups returns the Groups as a collection, which is never empty because the entire regular expression itself is a group. Groups is important because they enable you to make logical OR matching, such as "(Ying|yong)", which allows you to apply qualifiers to subexpression and let you draw a separate part of the match. Figure 1 in the body of my regextest program after running with ZIP code for example display grouping.
The most powerful function in all functions is to count Regex::replace, which makes the power of regular expressions incredibly powerful. Like many developers, I often had to manually convert "\ n" to "\ r \ n" Before I passed the string to a multiline edit control multiple times, but with Regex::replace it was a breeze.
s = Regex::replace (s, "\ n", "\ r \ n");
Regex::match and Replace have static overloads, so you can quickly use regular expressions without creating objects. One of my favorite regex::replace overloads is with a delegate parameter that allows you to dynamically compute the replacement text with the procedure code--see the interesting example in the body.
Some caveats: the implementation of each regular expression is different. For example, in Perl, {, 1} is a shorthand version of {0,1}, and Microsoft bosses don't do that. Be careful of some small differences. Authoritative. NET Regex data refer to the Regular Expressions as a Language in the MSDN library (http://www.vckbase.com/library/en-us/cpguide/html/ cpconregularexpressionsaslanguage.asp) ".