. NET Advanced Technology (CLASS0515)

Last Update:2014-07-10 Source: Internet

Author: User

Tags valid email address

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Some of the things in this course are based on the cognitive rules of beginners to adjust, not rigorous, such as many places in the multi-appdomain conditions are wrong, but said rigorous everyone dizzy, so continue to not rigorous talk about it. Many of the interview questions are in this stage of the course. NET advanced technology is a high-level content, based on their own basis to determine the depth of learning. Reference material: "C # Advanced Programming", C # Illustrated tutorial "CLR Via C #"

The prelude to Regular Expressions: Hell

Requirement 1: "192.168.10.5[port=8080]", this string indicates that 8080 port of the server with IP address 192.168.10.5 is open, use the program to parse this string, and then print out "The port of the server with IP address = * * * is open".

Requirement 2: "192.168.10.5[port=21,type=ftp]", this string represents the 21 port of the server with IP address 192.168.10.5 is provided by the FTP service, where if ", Type=ftp" section is omitted, The default is the HTTP service. Please use the program to parse this string, and then print out the "Port of the server with IP address = * * * * * * * * * * * * *"

Requirement 3: Determine if a string is an email? Must contain @ and., not at @ or. Start or end, @ to be in the last.

Requirement 4: Extract all emails from a single text: I have all 333M photos, want to send me email:[email protected]. I also want [email protected],[email protected], landlord good: [email protected]. Requirement 5: Extract all the pictures and hyperlinks in the Web page.

Getting Started with regular expressions: Paradise

Regular expressions are the techniques used for text processing, which are language-independent and are implemented in almost all languages. JavaScript is also used.

A regular expression is a text pattern consisting of ordinary characters and special characters (called metacharacters ). This pattern describes one or more strings to match when looking up a text body. A regular expression, as a template, matches a character pattern to the string you are searching for. Just like the wildcard "*.jpg", "%ab%", it is a special string matching string regular expression is very complex, do not want to grasp at once, understand what the regular expression can do (string matching, string extraction, string substitution), master the common use of regular expressions, You can use it later. Look for the highlights of the job. A regular expression is also involved in filtering sensitive words, validator, and so on later in the project.

Meta-character 1

To learn the regular expression, understanding meta-characters is a must to overcome the difficulties. Don't try to remember.

.: matches any single character. For example, the regular expression "B.G" can match the following string: "Big", "Bug", "B g", but does not match "Buug", "B." G "Can match" Buug ".

[]: Matches any one of the characters in the parentheses. For example, the regular expression "b[aui]g" matches the bug, big and bag, but does not match beg, Baug. You can use the hyphen "-" in parentheses to specify the interval of the character to simplify the representation, such as the regular expression [0-9] can match any numeric character, so that the regular expression "a[0-9]c" equivalent to "a[0123456789]c" can Match "a0c", "A1c", "A2C" such as String, you can also create multiple intervals, such as "[A-za-z]" can match any uppercase and lowercase letters, "[a-za-z0-9]" can match any uppercase or lowercase letters or numbers.

(): The expression that is enclosed in () is defined as "group", and the character that matches the expression is saved to a staging area, which is useful when the string is extracted. To represent some characters as a whole. Change the priority, define the extraction group of two roles.

| : A logical OR operation of two matching criteria. ' Z|food ' can match "z" or "food". ' (z|f) Ood ' matches "Zood" or "food".

Meta-character 2

*: Match 0 to more sub-expressions before it, and wildcards * okay. For example, the regular expression "zo*" can Match "Z", "Zo" and "Zoo", so ". *" means that you can match any string. "Z (b|c) *" →zb, ZBC, ZCB, ZCCC, ZBBBCCC. "Z (AB) *" can match Z, Zab, Zabab (with parentheses to change precedence).

+: Matches the preceding subexpression one or more times, and * contrasts (0 to multiple). For example, the regular expression + + matches 9, 99, 999, and so on. "zo+" can Match "Zo" and "Zoo" and cannot match "Z".

? : matches the preceding subexpression 0 or one time. For example, "Do (es)?" can match "do" or "does". Typically used to match the "optional section".

{n}: matches the determined n times. "Zo{2}" →zoo. For example, "e{2}" cannot match "E" in "bed", but can match two "E" in "seed".

{N,}: matches at least n times. For example, "e{2,}" cannot match "E" in "bed", but can match all "E" in "Seeeeeeeed".

{n,m}: matches at least n times and matches up to M times. "e{1,3}" will match the first three "E" in "Seeeeeeeed".

Metacharacters 3

^ (shift+6): Matches the start of a row. For example, the regular expression "^regex" can match the beginning of the string "Regex I will use", but does not match "I will use regex".

^ Another meaning: No! (not understood at the moment)

$: Matches line terminator. For example, the regular expression "cloud $" can match the string "Everything is a cloud" end, but cannot match the string "floating clouds Ah"

Shorthand expressions

Note that these shorthand expressions do not consider the escape character, where \ represents the character \, not the C # string level \, which requires the use of either @ or \ double escaping in C # code. Distinguishes between C # level transitions and regular expression level transfers, just as C # escapes the wildcards regular expression with an escape character of \. The transfer of regular expressions is after C # (layer exploits). Think of the escape character of C # as%. In C # It appears that @ "\-" is the ordinary string of \-, except that in the regular expression analysis engine it appears that he has a special meaning. "\\d" or @ "\d"

\d: Represents a number, equivalent to [0-9]

\d: Represents a non-numeric equivalent to [^0-9]

\s: Represents a line break, Tab tab, and other whitespace characters

\s: Represents non-whitespace characters

\w: Matches letters or numbers or underscores or kanji, which are characters that can form words

\w: Non-\w, equivalent to [^\w]

D:digital;s:space, W:word. Uppercase is "non"

. NET Regular Expressions 1

Reference: "C # Advanced Programming" 7.3. Send Meta character interpretation ppt

The regular expression is in the. NET is a string representation, this string format is very special, no matter how special, in the C # language seems to be a normal string , what the meaning of the Regex class inside the syntax analysis. Main classes of regular expressions (Regular expression): Regex

3 Common cases: (C # syntax)

Determine if match: Regex.IsMatch ("string", "regular expression");

String extraction: Regex.match ("string", "regular expression of the string to extract");

String extraction (loop fetch All): Regex.Matches ()

String substitution: Regex.Replace ("string", "regular", "replace content");

. NET Regular Expressions 2

The Regex.IsMatch method is used to determine whether a string matches a regular expression.

Examples of string matching:

Regex.IsMatch ("BBBBG", "^b.*g$");

Regex.IsMatch ("BG", "^b.*g$");

Regex.IsMatch ("Gege", "^b.*g$");

Must not forget ^ and $, otherwise can also match yesbagit

String matching Case 1

Exercise 1: Determine if it is a valid ZIP code (6 digits)

Regex.IsMatch ("100830", "^[0-9]{6}$")

Regex.IsMatch ("119", @ "^\d{6}$");

Explanation: The meta-character definition indicates that "[0-9]" represents any character from 0 to 9, "{6}" means that the preceding character matches 6, so {6} in "[0-9]{6}" indicates that the number is matched 6 times. The shorthand expression learns that "[0-9]" can be replaced by "\d", so the second way of writing "\d{6}" is correct.

String matching case 2

Determine if a string is an ID number, that is, 15 or 18 digits.

Error notation: Regex.IsMatch ("123456789123456789", @ "^\d{15}|\d{18}$"), which represents the beginning of a 15-digit number or the end of a 18-digit number. //Match "start with 15 digits" or "end with 18 digits" (| The lowest priority, last execution)

Correct wording: Console.WriteLine (Regex.IsMatch ("0111111111111111", @ "^\d{15}$|^\d{18}$") or @ "^ (\d{15}|\d{18}) $"

String matching case 3

Determine if the string is the correct domestic phone number, regardless of the extension.

010-8888888 or 010-88888880 or 010xxxxxxx

0335-8888888 or 0335-88888888 (area code-phone number)

10086, 10010, 95595, 95599, 95588 (5-bit)

13888888888 (11 digits are numbers)

Regex.IsMatch (PhoneNumber, @ "^ ((\d{3,4}\-?\d{7,8}) | ( \D{5}) | (\d{11})) $");

According to the requirements of one write, are used | To connect up. Note: Because the area code is sometimes 010-xxxxxxx sometimes 010xxxxxxx,-dispensable, so need? Because-represents an interval, so here to escape \-. Finally, don't forget to add a pair () at the outermost of all |

 while (true)            {                string phonenumber = console.readline ();                 BOOL @" ^ ((\d{3,4}\-\d{7,8}) | ( \D{5}) | (\d{11})) $");                Console.WriteLine (b);            }

String matching case 4

Determines whether a string is a valid email address. An email address is characterized by a sequence of characters, followed by the "@" sign, and a sequence of characters behind it, followed by the symbol ".", and finally the character sequence Regex.IsMatch ("[Email protected]", @ "^\[email protected]\w+\.\w+$ ");

[] Any character in parentheses, \w letters, numbers, underscores, and more than one to many. Because. There is a special meaning in the regular expression, so for really want to express "." You need to transfer "\.".

Email:^ ([a-za-z0-9_\-\.] +) @ ([a-za-z0-9_\-\.] +)\. ([a-za-z]{2,5}) {1}) +([;.] ([A-za-z0-9_\-\.] +) @ ([a-za-z0-9_\-\.] +)\. ([a-za-z]{2,5}) {1}) +)*$

String Matching Exercises

1, matching IP address, 4 segments with. The maximum number of three digits to split. The 192.168.54.77 and 333.333.333.333 assumptions are correct.

2. Judge whether it is a valid date format "2008-08-08". Four digits-two digits-two digits.

3, judge whether it is a legitimate URL address, http://www.test.com/a.htm, Ftp://127.0.0.1/1.txt. String sequence://String sequence. @ "^\w+://.+$". Simplified identification, the project you search "URL Regular expression": + instead of \w, otherwise "? Id=1" in the? will not match. Http://www.test.com/a.aspx?id=1

The characters in the metacharacters need to be shifted if they want to match directly: \. \+ \? \+ \-\* .....

=================================================

Trickery method: From the RegularExpressionValidator of the ASP. Copy common regular expressions, in the work is usually from the Internet to find ready-made. Or go to http://www.regexlib.com/search.

IP-Address regular expression:^( -[0-5]|2[0-4][0-9]| [0-1]{1}[0-9]{2}| [1-9]{1}[0-9]{1}| [1-9])\. ( -[0-5]|2[0-4][0-9]| [0-1]{1}[0-9]{2}| [1-9]{1}[0-9]{1}| [1-9]|0)\. ( -[0-5]|2[0-4][0-9]| [0-1]{1}[0-9]{2}| [1-9]{1}[0-9]{1}| [1-9]|0)\. ( -[0-5]|2[0-4][0-9]| [0-1]{1}[0-9]{2}| [1-9]{1}[0-9]{1}| [0-9]) $ URL match:^ (http|https|ftp) \://([A-za-z0-9\.\-]+ (\:[a-za-z0-9\.&amp;%\ $\-]+) *@) * ((25[0-5]|2[0-4][0-9]| [0-1] {1} [0-9] {2}| [1-9] {1} [0-9] {1}| [1-9]) \. (25[0-5]|2[0-4][0-9]| [0-1] {1} [0-9] {2}| [1-9] {1} [0-9] {1}| [1-9]|0] \. (25[0-5]|2[0-4][0-9]| [0-1] {1} [0-9] {2}| [1-9] {1} [0-9] {1}| [1-9]|0] \. (25[0-5]|2[0-4][0-9]| [0-1] {1} [0-9] {2}| [1-9] {1} [0-9] {1}| [0-9]) | Localhost| ([a-za-z0-9\-]+\.) *[a-za-z0-9\-]+\. (com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum| [A-za-z] {2})) (\:[0-9]+) * (/($|[ A-za-z0-9\.\,\?\ ' \\\+&amp;%\ $#\=~_\-]+))*$ (0[1-9])| (1[012]) match Month

String extraction-Extract all hyperlinks

MatchCollection matches = regex.matches (str,"<a href=\ ". *?\" >.*?</a>"); foreach(Match Iteminchmatches) {                if(item. Success) {Console.WriteLine (item.                Value); }}@ "<a (\s*?) (.+?) (\s*?)) >.+?</a> "//whitespace characters do not match. Using \s. Does not match carriage returns, newline, and other whitespace characters. Regex.match () can only match the first one, what should I do? The success property of the match indicates whether the match was successful, and extracts all the strings in the document that match the regular expression. Describes greedy mode. test1.htm@ "<a (\s*?) (.+?) (\s*?)) >.+?</a> "//whitespace characters do not match. Using \s//you do not need to use ^$ when extracting a string

string " <a href=\ "www.baidu.com\" >baidu</a>fdsfdsfdsfdfd<a href=\ "www.google.com\" >google</a > Mess everything has ffdsf<a href=\ "www.163.com\" >163</a>s Counseling book Frewrewre<a href=\ "www.sohu.com\" >sohu< /a>"@"//  Whitespace characters do not match. Using \s

Greedy mode and non-greedy mode

Extract the name from the text:

Match match = Regex.match ("Hello everyone.") I'm S.H.E. I'm 22 years old. I'm sick, whining. "," I am (. +). ");//No Add ^$.

Look at the results. The matches of +, *, {n}, {n,}, {n,m} are greedy (greedy) by default: As many matches as possible until the matching pattern after "greedy a little" does not match.

Add after +, * then become non-greedy mode (? Other uses): match the pattern as soon as possible after the match. Modified to "I am (. +?)." "

The general development of the time without deliberately to decorate for non-greedy mode, only encountered a bug when found to be greedy mode problem to solve.

Extract all the URLs in the Web page to fetch all the HTTP in the page: // ..... Not at <a ></a>@ "http://[a-za-z0-9\-_\?&=\." +");             foreach inch matches)            {                Console.WriteLine (match. Value);            }

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More