The younger brother raised his mind to learn re, but the younger brother is more lazy, always hope to see if there is some way to learn quickly, so the younger brother and please out of the Google great God, with his divine power, the younger brother found Jim Hollenhorst on the Internet article, after reading, little brother feel really good, So just make a small report, share with move-to.net friends, hoping to bring you a little bit of help in learning re. Jim Hollenhorst a big article of the Web site as follows, there is a need for a large direct link.
The minute Regex tutorial by Jim Hollenhorst
Http://www.codeproject.com/useritems/regextutorial.asp
What is re?
You must have used all the characters "*" in the file search, for example, to find all the word files in the Windows directory, you might use the "*.doc" method to do the lookup, because "*" represents any character. The RE is doing something like this, but it's more powerful.
When writing a program, you often need to match a string to a specific style, the most important function of RE is to describe this particular style, so you can treat the re as a description of a particular style, for example, "W +" represents a non-empty string of any letter and number (Non-null string). Provides a very powerful category library in the. NET Framework, which makes it easy to use the RE to find and replace text, to decode complex headers, and to validate text.
The best way to learn re is to do it in person by example. Jim Hollenhorst also provides a tool program Expresso (a cup of coffee), to help us learn re, download the URL is http://www.codeproject.com/useritems/regextutorial/ Expressosetup2_1c.zip.
Next, let's experience some examples.
A few simple examples
If you are looking for a alive string of characters after the Elvis in the article, using the re may pass through the following procedure, meaning that the parentheses are under re:
1. Elvis (find Elvis)
The order of characters you want to look for is Elvis. In. NET you can set the case of a bit of character, so "Elvis", "Elvis", or "Elvis" are all the re in accordance with 1. But because this character appears in the order of Elvis, so pelvis is also consistent with 1 of the RE. Can be improved with a 2 re.
2. \belvis\b (Elvis as a whole, such as Elvis, Elvis in case of slightly character)
"\b" in the re has a special meaning, in the above example refers to the boundary of the word, so \belvis\b with \b to Elvis before and after the boundary is defined, that is to Elvis the word.
Suppose you want to find a string of Elvis followed by a alive in the same line, and then use another two special characters "." and "*". ”.” is represented by any character other than a newline character, and "*" represents a repeat item until a string is found that conforms to the re. So ". *" refers to any number of characters except the newline character. So look for the same line after the Elvis followed by a alive string of characters found out, can be down as 3 of the RE.
3. \belvis\b.*\balive\b (Find a string of characters followed by a alive Elvis, such as Elvis is Alive)
A powerful re can be composed of simple special characters, but it is also found that when more and more special characters are used, the re becomes more and more difficult to understand.
And look at another example.
Compose a valid phone number
If you want to collect a 7-digit phone number from a Web page that has a customer format of xxx-xxxx, where x is a number, the re may write this.
4. \b\d\d\d-\d\d\d\d (find seven digits of telephone number, such as 123-1234)
Each \d represents a number. "-" is a generic hyphen, and to avoid too many repetitive \d,re can be written in the same way as 5.
5. \b\d{3}-\d{4} (a good way to find seven-digit phone numbers, such as 123-1234)
After \d {3}, the representative repeats the previous item three times, which is equal to \d\d\d.
Re's learning and testing tools Expresso
Because the RE is not easy to read and users can easily error the characteristics of the RE, Jim developed a tool software Expresso to help users learn and test the RE, in addition to the above URL, can also be on the Ultrapico Web site (http://www.ultrapico.com). After the installation of Expresso, in the expression library, Jim greatly put the example of the article is built in which, you can see the article side of the test, you can try to modify the example under the RE, immediately can see the results, little brother feel very good. You can have a big try.
Basic concepts of RE in. Net
Special characters
Some words have special meaning, such as "\b", ".", "*", "\d", etc. as previously seen. "\s" represents the arbitrary spaces, such as spaces, tabs, newlines and so on. "\w" represents any letter or number character.
Let's see some more examples.
6. \ba\w*\b (look for words beginning with a, such as able)
This re describes the starting boundary (\b) for a word, then the letter "a", plus any number of alphanumeric (\w*), and then the ending boundary (\b) of the word.
7. \d+ (Find numeric string)
"+" and "*" are very similar, except for + at least repeat the previous item once. In other words, there is at least one number.
8. \b\w{6}\b (find six alphanumeric characters, such as ab123c)
The following table is a common special character for re
. Any character other than a newline character
\w arbitrary alphanumeric characters
\s arbitrary spaces
\d any number of characters
\b Boundary of the defined word
^ The beginning of an article, such as "^the", to indicate that the string appearing at the beginning of the article is "the"
$ The end of an article, such as "end$", to indicate that the end of the article appears
Special characters "^" and "$" are used to find certain words must be the beginning or end of the article, which is especially useful when verifying that the input conforms to a certain style, for example, to verify a seven-digit number, you may enter the following 9 re.
9. ^\d{3}-\d{4}$ (Verify the seven-digit telephone number)
This is the same as the 5th re, but there are no other characters before and after, that is, the entire string has only seven digits of the phone number. If you set multiline this option in. NET, then "^" and "$" are compared for each row, as long as the beginning of a row conforms to the RE, not the entire article string.
Conversion character (escaped characters)
Sometimes it may take "^", "$" simple literal meaning (literal meaning) instead of them as special characters, at this point the "\" character is the character used to remove the special meaning of special characters, so "\^", "\.", "\ \" Represent "^", ".", "\" The literal meaning.
Repeat the aforementioned item
"{3}" and "*" can be used to repeat the preceding characters, and then we will see how to repeat the entire description (subexpressions) with the same syntax. The following table describes some of the ways in which you would use repeating previously mentioned items.
* Repeat any number of times
+ Repeat at least once
? Repeat 0 times or once
{n} repeat n times
{n,m} repeats at least n times, but not more than m times
{N,} repeat at least n times
Let's try some more examples.
\b\w{5,6}\b (find five or six alphanumeric characters, such as as25d, D58SDF, etc.)
\B\D{3}\S\D{3}-\D{4} (Find 10 digits of phone number, such as 800 123-1234)
\D{3}-\D{2}-\D{4} (Find social Security number, such as 123-45-6789)
^\w* (first word of each line or whole article)
In espresso you can try the difference between multiline and multiline.
Match a range of characters
Sometimes you need to look up some specific characters. Then the brackets "[]" came in handy. therefore [Aeiou] is looking for "a", "E", "I", "O", "u" these vowels, [.?!] What to look for is ".", "?", "!" These symbols, the special meaning of the special characters in the brackets, are removed, that is, the literal meaning of the interpretation. You can also specify a range of characters, such as [a-z0-9], that refer to any lowercase letter or any number.
Next, look at an example of re with the first complex search phone number.
(? \d{3}[(] \s?\d{3}[-]\d{4} (Find a 10-digit number, such as (080) 333-1234)
Such re can be found in more than one format of the phone number, such as (080) 123-4567, 511 254 6654, and so on. ”\(?” Represents one or 0 left parentheses "(," and "[(]" means "to find a right parenthesis") or spaces, "\s?" Refers to one or 0 spaces groups. But such a re would find a phone like "800" 45-3321, which is not symmetric, and then learn the alternatives to solve the problem.
Not included in a particular character group (negation)
Sometimes you need to find characters that are contained in a particular set of characters, and the following table shows how to do a description like this.
\w is not an alphanumeric character
\s is not a spaces any character
\d is not any character of a numeric character
\b is not a word boundary position
[^x] is not any character of X
[^aeiou] is not any character of a, E, I, O, u
\s+ (string that does not contain spaces)
Choose one (alternatives)
Sometimes you'll need to look for a few specific choices, and this particular character will come in handy, for example, to find five digits and nine digits (with a "-") ZIP code.
\b\d{5}-\d{4}\b\b\d{5}\b (find five digits and nine digits (with "-") ZIP code)
In the use of alternatives should be noted before and after the order, because the re in alternatives will be preferred to match the leftmost items, 16, if you find five numbers of items in front, then the RE will only find five digits of the ZIP code. Learn to choose one, you can make a better correction of 14.
Current 1/2 page
12 Next read the full text