First, let's look at several requirements.
Requirement 1: Determine whether a string is an email? It must contain @ and., cannot start or end with @ or., and @ must be before the last.
Requirement 2: Extract all emails from a text: I have all 33 m photos, send me the email: me@wo.com. I also want you@you.com, 123456@163.com, the landlord good person: 888888@qq.cn.
Requirement 3: Extract all images and hyperlinks on the webpage.
Is it helpless to see such a requirement? Once you have mastered regular expressions, everything becomes simple.
1. What is a regular expression and what can be done by a regular expression?
The regular expression is usedText ProcessingIsLanguage independenceIn almost all languages. Javascript is also used.
A regular expression is a text expression consisting of common characters and special characters (called metacharacters. This mode describes one or more strings to be matched when searching the text subject. A regular expression is used as a template to match a character pattern with the searched string.
The regular expression can be:
String Matching,String Extraction,String replacement
The first step of a regular expression:Metacharacters
. Match any single character except \ nFor example, the regular expression "B. g can match the following strings: "big", "Bug", and "B g", but it does not match "buug", "B .. g can match "buug ".
[] Match any character in bracketsFor example, the regular expression "B [AuI] G" matches bugs, big, and bag, but does not match beg or Baug.
You can use the hyphen "-" in brackets to specify the character range to simplify the representation. For example, a regular expression [0-9] can match any number character, in this way, the regular expression "A [0-9] C" is equivalent to "A [0123456789] C" and can match strings such as "a0c", "A1c", and "a2c; you can also specify multiple intervals. For example, [A-Za-Z] can match any uppercase/lowercase letter, and [A-Za-z0-9] can match any uppercase/lowercase letter or number.
()Defines the expressions enclosed in () as "group" and saves the characters matching the expression to a temporary area;This metacharacter is very useful for string extraction. Express some characters as a whole
There are two functions: 1. Change the priority; 2. Define the extraction group.
|: Perform logical "or" operations on the two matching conditions.'Z | food' can match "Z" or "food ". '(Z | f) Ood' matches "zood" or "food"
*: Matches up to 0 subexpressions before it. It has nothing to do with the wildcard.For example, the regular expression "zo *" can match "Z", "zo", and "Zoo". Therefore". *" Means matching any string.
"Z (B | C) *" → ZB, zbc, zcb, zccc, zbbbccc. "Z (AB) *" can match Z, Zab, and zabab (use parentheses to change the priority ).
+: Match the previous subexpression once or multiple times, and compare it with * (0 to multiple times ). For example, the regular expression 9 + matches 9, 99, and 999. "Zo +" can match "zo" and "Zoo", but cannot match "Z ".
? : Match the previous subexpression zero or one time. For example, "Do (ES )? "Can match" do "or" does ". It is generally used to match "optional parts ".
{N}: matches the specified n times. . "Zo {2}" → Zoo. For example, "E {2}" cannot match "E" in "bed", but can match two "E" in "seed ".
{N ,}: match at least N times . For example, "E {2,}" cannot match "E" in "bed", but can match all "E" in "seeeeeeeed ".
{N, m}: matches at least N times and at most m times . "E {}" will match the first three "E" in "seeeeeeeed ".
^ (SHIFT + 6): match the beginning of a row . For example, the regular expression "^ RegEx" can match the start of the string "RegEx I will use", but cannot match "I will use RegEx ".
^ Another meaning: Not!
$: Match the row Terminator . For example, the regular expression "floating cloud $" can match the end of the string "Everything is floating cloud", but cannot match the string "floating cloud"
Note that these short expressions do not consider escape characters. Here, \ represents the character \, rather than the C # string Level \, in C #Code@ Or \ double escape is required. Differentiate the transfer at the C # level and the transfer at the regular expression level, just as the escape characters of the C # And the regular expression are both. The regular expression is transferred after C # (layer-by-layer exploitation ). Think of the Escape Character of C # As %. In C #'s view, @ "\-" is a normal string, but it has a special meaning in the regular expression analysis engine. "\ D" or @ "\ D"
\ D: represents a number, equivalent to [0-9]
\ D: represents a non-number, equivalent to [^ 0-9]
\ S: blank characters such as line breaks and Tab tabs
\ S: Non-blank characters
\ W: matches letters, numbers, underscores, or Chinese characters to form word characters.
\ W: not \ W, equivalent to [^ \ W]
D: Digital; s: space, W: Word. Uppercase is "not"
Regular Expressions in. net
Regular Expressions are represented by strings in. net. This string format is special, no matter how special,In the C # language, it seems to be a common string. What is the meaning of the RegEx class?.
The main class of a regular expression (Regular Expression:RegEx
Three common cases: (C # syntax)
1. Determine whether a match exists: RegEx. ismatch ("string", "Regular Expression ")
2. String extraction: RegEx. Match ("string", "Regular Expression of the string to be extracted"); string extraction (loop extraction): RegEx. Matches ()
3. String replacement: RegEx. Replace ("string", "regular", "replace content ");
Application of Regular Expressions:
A clever method: Copy common regular expressions from regularexpressionvalidator in ASP. NET, and find regular expressions on the Internet. Or go to http://www.regexlib.com/search.