C # Regular expression notation pattern
Characters |
Description |
\ |
An escape character that escapes a character with special functionality to a normal character, or vice versa |
^ |
Match the starting position of the input string |
$ |
Match the end position of the input string |
* |
Matches the preceding 0 or more sub-expressions |
+ |
Matches the preceding one or more sub-expressions |
? |
Matches the preceding 0 or one sub-expression |
{n} |
N is a non-negative integer that matches the previous N second-son expression |
{n,} |
N is a nonnegative integer that matches at least the previous N second-son expression |
{n,m} |
m and n are non-negative integers, where n<=m, with a minimum of n matches and a maximum of m Times |
? |
When the character immediately follows the other qualifiers (*,+,?,{n},{n,},{n,m}), the matching pattern matches the searched string as little as possible |
. |
Match any single character except "\ n" |
(pattern) |
Match pattern and get this match |
(?:pattern) |
Matches pattern but does not get matching results |
(? =pattern) |
Forward pre-check to match the find string at the beginning of any string matching pattern |
(?! pattern) |
Negative pre-check to match the find string at the beginning of any mismatched pattern string |
x| y |
Match x or y. For example, ' Z|food ' can match "z" or "food". ' (z|f) Ood ' matches "Zood" or "food" |
[XYZ] |
The character set is combined. Matches any one of the characters contained. For example, ' [ABC] ' can match ' a ' in ' plain ' |
[^XYZ] |
Negative character set. Matches any character that is not contained. For example, ' [^ABC] ' can match ' P ' in ' plain ' |
[A-Z] |
Matches any character within the specified range. For example, ' [A-z] ' can match any lowercase alphabetic character in the ' a ' to ' Z ' range |
[^ A-Z] |
Matches any character that is not in the specified range. For example, ' [^a-z] ' can match any character that is not in ' a ' ~ ' Z ' |
\b |
Match a word boundary, which is the position between a word and a space |
\b |
Match non-word boundaries |
\d |
Matches a numeric character, equivalent to [0-9] |
\d |
Matches a non-numeric character, equivalent to [^0-9] |
\f |
Match a page break |
\ n |
Match a line break |
\ r |
Match a carriage return character |
\s |
Matches any whitespace character, including spaces, tabs, page breaks, and so on |
\s |
Match any non-whitespace character |
\ t |
Match a tab |
\v |
Matches a vertical tab. Equivalent to \x0b and \ck |
\w |
Matches any word character that includes an underscore. Equivalent to ' [a-za-z0-9_] ' |
\w |
Matches any non-word character. Equivalent to ' [^a-za-z0-9_] ' |
Description
Because in the regular expression "\", "?" "," * "," ^ "," $ "," + "," (",") "," | "," {"," ["characters already have a special meaning, if they need to use their original meaning, it should be escaped, for example, you want to have at least one" \ "in the string, then the regular expression should be written: \\+.
Second, in C #, to use the regular expression class, add the following statement at the beginning of the source file:using System.Text.RegularExpressions;
third, the Regex class commonly used methods
1. Static Match methodUsing the static Match method, you can get a continuous substring of the first matching pattern in the source.
the static Match method has 2 overloads, each of which isregex.match (string input, string pattern);regex.match (string input, string pattern, regexoptions options);
parameter representation of the first Overload: input, Mode
The second overloaded parameter is represented by a bitwise OR combination of the input, pattern, RegexOptions enumeration. The valid value of the RegexOptions enumeration is: Complied means compiling this pattern cultureinvariant means not considering the cultural background ECMAScript the expression conforms to ECMAScript, this value can only and ignorecase, Multiline, complied, explicitcapture means that only explicitly named groups are saved ignorecase means that the case of the input is not differentiated ignorepatternwhitespace indicates that the non-escaped whitespace in the pattern is removed and enabled by the # The annotation of the tag multiline represents the multiline pattern, which alters the meaning of the metacharacters ^ and $, they can match the beginning and end of the line none means no setting, this enumeration item has no meaning RightToLeft the right-to-left scan, match, at this point, The static Match method returns a right-to-left first match singleline that represents a single-line pattern, changing the metacharacters. meaning that it can match newline characters note: Multiline can be used with Singleline without ECMAScript. Singleline and Multiline are not mutually exclusive, but mutually exclusive with ECMAScript.
2. Static Matches methodThe overloaded form of this method is the same as the static Match method, which returns a matchcollection that represents the matching set of patterns in the input.
3. Static IsMatch methodThis method returns a bool, overloaded form with static matches, if the input matches the pattern, returns true, otherwise false is returned. Can be understood as: IsMatch method, returns whether the collection returned by the matches method is empty.
Iv. Examples of regex classes
1. String substitution
For example, I want to change the name value in the following format record to Wang
String Line= "Addr=1234;name=zhang; phone=6789 ";
Regex reg = new Regex ("name= (. +);");
String modified = Reg. Replace (line, "Name=wang;");
The modified string is Addr=1234;name=wang; phone=6789
2. String matching
For example, I want to extract the name value from that record.
Regex reg = new Regex ("name= (. +);");
Match Match=reg. Match (line);
String Value=match. GROUPS[1]. Value;
3. Match Instance 3
The text contains "speed=30.2mph", you need to extract the speed value, but the speed of the unit may be the metric or the imperial, mph,km/h,m/s is possible, and there may be spaces.
String line= "lane=1;speed=30.3mph;acceleration=2.5mph/s";
Regex reg=new Regex (@ "speed\s*=\s* ([\d\.] +) \s* (mph|km/h|m/s) * ");
Match Match=reg. Match (line);
Then match in the returned result. GROUPS[1]. Value will contain a value, and match. GROUPS[2]. Value will contain units.
4, for example, decoding the GPS GPRMC string, just
Regex reg = new Regex (@ "^\ $GPRMC, [\d\.] *,[a| V], (-?[ 0-9]*\.? [0-9]+], ([ns]*), (-?[ 0-9]*\.? [0-9]+], ([ew]*),. * ");
You can get the longitude and latitude values, which previously required dozens of lines of code.
V. Description of the System.Text.RegularExpressions namespace
The namespace consists of 8 classes, 1 enumerations, and 1 delegates. They were:
Capture: Contains the result of a match; a sequence of capturecollection:capture; group: The result of a set of records, inherited from capture; GroupCollection: Represents the collection of capturing groups Match: The matching result of a single expression, inherited by group, a sequence of matchcollection:match; MatchEvaluator: The delegate used to perform the replace operation; Regex: an instance of the compiled expression. RegexCompilationInfo: Provides information that the compiler uses to compile a regular expression into a stand-alone assembly RegexOptions provides enumeration values for setting regular expressions The Regex class also contains some static methods: Escape: Escapes the escape character in a regex in a string; IsMatch: If the expression matches in a string, the method returns a Boolean value; match: Returns an instance of match; Matches: Returns a series of match methods; Replace: Replace the matched expression with a replacement string; Split: Returns a series of strings determined by an expression; Unescape: Escapes escaped characters in a string.
The use of the Regex class for C # regular expressions