Regular Expression-Section 1)

Source: Internet
Author: User

What is regular expression?

When writing a program or webpage that processes strings, it is often necessary to find strings that meet certain complex rules. Regular Expressions are tools used to describe these rules. In other words,Regular ExpressionIs the code that records text rules.

You may have used the wildcard (wildcard) for file search in Windows/DOS, that is, * and ?. If you want to find all the Word documents under a directory, you will search*. Doc. Here, * is interpretedAny string. Like wildcards, regular expressions are also a tool for text matching, but they can more accurately describe your needs than wildcards-of course, the cost is more complex-for example, you can write a regular expression to search for all numbers starting with 0, followed by 2-3 numbers, and then a hyphen "-", it is a string of 7 or 8 digits (such as 010-12345678 or 0376-7654321 ).


Simple Example:

If we want to search for hi, if case-insensitive, there may be hi, hi, hi and HI. Many other words contain the two consecutive characters hi, such as him, history, if you want to precisely find the word "hi", we should use \ BHI \ B.

\ B is a special code specified by a regular expression (well, some people call it metacharacter), which indicates the start or end of a word, that isWord Division. Although the English word is usually composedSpace, punctuation, or line feedTo separate,\ B does not match any of these word delimitersIt matches only one location.

If you are looking for a Lucy not far behind hi, you should use\ BHI \ B. * \ blucy \ B.

The dot here is another metacharacters that match any character except the line break. * It is also a metacharacter, but it does not represent a character, nor a position, but a number-It specifies * the content of the front edge can be repeatedly displayed any time in a row to match the entire expression. Therefore,.*Connecting together meansAny number of characters that do not contain line breaks. Now \ BHI \ B. * \ blucy \ B is very obvious: first a word hi, thenAny character (but cannot be a line break)The last word is Lucy.

0 \ D-\ D, here \ D is a new metacharacter, match a digit (0, or 1, or 2, or ......). -It is not a metacharacter. It only matches itself --Hyphen or minus sign.
To avoid so many annoying repetitions, we can also write this expression: 0 \ D {2}-\ D {8 }. Here {2} ({8}) after \ D means that the previous \ D must be repeated twice (eight times ).

Note:

We can use the RegEx tester tool under. net. First, make sure that. NET Framework 2.0 is installed, and then download RegEx tester. After the download, open the compressed package and run regextester.exe directly.

Metacharacters

Now you know several useful metacharacters, such as \ B,., *, And \ D. Regular Expressions include more metacharacters, such as \ s matching any blank space, including spaces, tabs, line breaks, and Chinese fullwidth spaces. \ W matchingLetters, numbers, underscores, and Chinese characters.

\ Ba \ W * \ B matches a word that starts with the letter A. First, a word starts with (\ B), and then, then there are any number of letters or numbers (\ W *), and finally the end of the word (\ B)

\ D + matches one or more consecutive numbers. Here, the "+" is similar to the "*" metacharacters. The difference is that * matches any number of times (which may be 0 times), and "+" matches one or more times.
\ B \ W {6} \ B matches exactly 6 letters, numbers, and underscores. 【Originally, \ W represents characters, numbers, underscores, and Chinese characters, but \ B matches the start or end of a word. Therefore, these combinations except Chinese characters can be ~]

The metacharacters ^ (the symbol on the same key position as the number 6) and $ both match a position, which is a bit similar to \ B. ^ Match the start of the string you want to search for, and $ match the end. These two codes are very useful when verifying the entered content. For example, if a website requires that the QQ number you enter must be 5 to 12 digits, you can use: ^ \ D {5, 12} $. ^ And $ have the starting and ending points of matched rows ~

Table 1. Common metacharacters

Code Description
. Match any character except the line break [. And *, * represents not the character and position, but the number]
\ W Match letters, numbers, underscores, or Chinese Characters
\ S Match any blank space character (space, tab, line break, Chinese fullwidth space]
\ D Matching number
\ B Match the start or end of a word (match a position]
^ Start of matching string
$ End of matching string

Character escape
If you want to find the metacharacters themselves, for example, if you want to search for. Or *, you may encounter a problem: You cannot specify them because they will be interpreted as other meanings. In this case, you must use \ to cancel the special meanings of these characters. Therefore, you should use \. And \*. Of course, to find the \ itself, you also need to use \\.
For example, unibetter \. com matches unibetter.com. If no escape character is added, it matches any character other than the line break.

C: \ windows matches c: \ windows.

Repeated

You have read the above matching methods *, +, {2}, {5, 12. The following are all the qualifiers in the regular expression (a specified number of codes, such as *, {5, 12 ):
Table 2. Common delimiters

Code/syntax Description
* Repeated zero or more times
+ Repeat once or more times
? Zero or one repetition
{N} Repeated n times
{N ,} Repeat N or more times
{N, m} Repeat n to m times

The following are examples of repeated use:
Windows \ D + matches one or more numbers after windows
^ \ W + matches the first word of a row (or the first word of the entire string. The option setting must be used to specify the meaning of the match)


Character class
What if you want to match character sets that do not have predefined metacharacters (such as vowels A, E, I, O, u?
You just need to list them in square brackets. For example, [aeiou] matches any English vowel, [.?!] Match punctuation marks (. Or? Or !).

[0-9] represents exactly the same meaning as \ D: A number. Similarly, [a-z0-9A-Z _] is completely equivalent to \ W (if only English is considered ).

The following is a more complex expression:\(? 0 \ D {2} [)-]? \ D {8}
This expression can match phone numbers in several formats, such as (010) 88886666, 022-22334455, or 02912345678. Let's analyze it. First, it is an escape character \ (it can appear 0 times or once (?), Then there is a 0 followed by two numbers (\ D {2}), followed by one of),-, or space. It appears once or does not appear (?), The last eight digits are (\ D {8 }).

Branch Condition

Unfortunately, the expression just now matches the format of "Incorrect" such as 010) 12345678 or (022-87654321 or 021 58771255. To solve this problem, we need to use the branch condition. The branch condition in a regular expression refers to several rules. If any rule is satisfied, it should be regarded as a match. The specific method is to use | to separate different rules. Can't you understand? It doesn't matter. Let's look at the example:

0 \ D {2}-\ D {8} | 0 \ D {3}-\ D {7} This expression can match two phone numbers separated by a hyphen: one is a three-digit area code, an eight-digit Local Code (for example, 010-12345678), a four-digit area code, and a seven-digit local code (0376-2233445 ).
\ (0 \ D {2} \) [-]? \ D {8} | 0 \ D {2} [-]? The expression \ D {8} matches the phone number of the three-digit area code. The area code can be enclosed in parentheses or not. The area code can be separated by a hyphen or space, or there is no interval. You can try to use the branch condition to extend this expression to a four-digit area code.
The expression \ D {5}-\ D {4} | \ D {5} is used to match the zip code of the United States. The U.S. Postal Code uses five digits or nine digits separated by a hyphen. This example is given because it indicates a problem:Note the order of each condition when using the branching condition.. If you change it to \ D {5} | \ D {5}-\ D {4, then, it will only match the 5-digit ZIP code (and the first 5-digit of the 9-digit ZIP code ). The reason is that, when matching a branch condition, each condition will be tested from left to right. If a branch is satisfied, other conditions will not be managed.


Group

We have already mentioned how to repeat a single character (simply add a qualifier after the character); but what if you want to repeat multiple characters? You can use parentheses to indicate the subexpression (also called grouping), and then you can specify the number of repetitions of this subexpression, you can also perform other operations on the subexpression (which will be introduced later ).
(\ D {1, 3} \.) {3} \ D {1, 3} is a simple IP address matching expression. To understand this expression, analyze it in the following order: \ D {1, 3} matches 1 to 3 digits (\ D {1, 3 }\.) {3} matches three digits with an English ending (this group is used as a whole), repeats three times, and finally adds one to three digits (\ D {1, 3 }).

Unfortunately, it will also match an impossible IP address such as 256.300.888.999. If arithmetic comparison can be used, this problem may be solved simply. However, regular expressions do not provide any mathematical functions. Therefore, you can only use lengthy grouping and selection, character class to describe a correct IP Address: (2 [0-4] \ d | 25 [0-5] | [01]? \ D ?) \.) {3} (2 [0-4] \ d | 25 [0-5] | [01]? \ D ?)
The key to understanding this expression is to understand 2 [0-4] \ d | 25 [0-5] | [01]? \ D ?, I will not elaborate on it here. You should be able to analyze its meaning.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.