Getting started with regular expression in 30 minutes

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objectives
Within 30 minutes, you can understand what a regular expression is and have some basic knowledge about it, so that you can use it in your own program or webpage.

How to use this tutorial

The most important thing is-Please give me 30 minutes. If you have no experience using regular expressions, please do not try to get started within 30 seconds-unless you are a superman

Don't be intimidated by the complex expressions below. If you follow me step by step, you will find that regular expressions are not as difficult as you think. Of course, if you find that you understand a lot and can hardly remember anything after reading this tutorial, it is also normal-I think, after reading this tutorial, people who have never touched on regular expressions can remember more than 80% of the syntaxes mentioned. Here is just to let you understand the basic principles. In the future, you will need to practice more and use more to master regular expressions.

In addition to getting started, this article also attempts to become a reference manual for regular expression syntax that can be used in daily work. As far as the author's experience is concerned, this goal is still well accomplished-you see, I can't write down everything myself, can I?

Clear format text format Conventions: Terminology metacharacters/syntax format part of the regular expression Regular Expression (used for analysis) to match the source string to a regular expression or a part of the description

Hidden Side Note: There are some comments on the right side of this article, mainly used to provide some relevant information, or to explain some basic concepts to readers without a programmer background, which can be ignored.

What is a regular expression?

Character is the most basic unit for computer software to process text. It may be letters, numbers, punctuation marks, spaces, line breaks, Chinese characters, and so on. A string is a sequence of 0 or more characters. Text is text, a string. When a string matches a regular expression, it usually means that some (or several parts) of the string can satisfy the conditions given by the expression.

When writing a program or webpage that processes strings, it is often necessary to find strings that meet certain complex rules. Regular Expressions are tools used to describe these rules. In other words, a regular expression is the code that records text rules.

You may have used the wildcard (wildcard) for file search in Windows/Dos, that is, * and ?. If you want to find all the Word documents in a directory, you will search for *. doc. Here, * is interpreted as any string. Like wildcards, regular expressions are also a tool for text matching, but they can more accurately describe your needs than wildcards-of course, the cost is more complex-for example, you can write a regular expression to search for all numbers starting with 0, followed by 2-3 numbers, and then a hyphen "-", it is a string of 7 or 8 digits (such as 010-12345678 or 0376-7654321 ).

Getting started

The best way to learn regular expressions is to start with the example, understand the example, and then modify and experiment the example. The following are some simple examples and detailed descriptions of them.

If you search for hi in an English novel, you can use the regular expression hi.

This is almost the simplest regular expression. It can precisely match a string consisting of two characters, the first character is h, and the last one is I. Generally, the regular expression processing tool provides a case-insensitive option. If this option is selected, it can match any of the four cases: hi, HI, Hi, and hI.

Unfortunately, many words contain the two consecutive characters hi, such as him, history, and high. If you use hi for search, the hi here will also be found. To precisely search for the word "hi", we should use \ bhi \ B.

\ B is a special code specified by a regular expression (well, some people call it metacharacter). It represents the start or end of a word, that is, the boundary of a word. Although the English words are generally separated by spaces, punctuation marks, or line breaks, \ B does not match any of these word delimiters. It only matches one position.

To be more precise, \ B matches the following position: the first character and the last character are not all (one is, one is not or does not exist) \ w.

If you are looking for a Lucy not far behind hi, you should use \ bhi \ B. * \ bLucy \ B.

Here, it is another metacharacters that match any character except the line break. * It is also a metacharacter, but it does not represent a character, nor a position, but a number-It specifies * the content of the front edge can be repeatedly used for any consecutive times to match the entire expression. Therefore,. * When connected, it means that any number of characters do not contain line breaks. Now \ bhi \ B. * \ bLucy \ B is very obvious: first, a word hi, then any character (but not a line break), and finally Lucy.

The line break is '\ n' and the ASCII code is 10 (hexadecimal 0x0A) characters.

If other metacharacters are used at the same time, we can construct a more powerful regular expression. For example:

0 \ d-\ d match a string that starts with 0 and then contains two numbers, then there is a hyphen "-" and the last eight digits (that is, the Chinese phone number. Of course, this example can only match a three-digit area code ).

Here \ d is a new metacharacters that match a digit (0, or 1, or 2, or ......). -It is not a metacharacter. It only matches itself-a hyphen (or a hyphen ).

To avoid so many annoying repetitions, we can also write this expression: 0 \ d {2}-\ d {8 }. Here {2} ({8}) after \ d means that the previous \ d must be repeated twice (eight times ).

Test Regular Expression

Other available test tools:

RegexBuddy
Javascript Regular Expression Online Testing Tool
If you don't think regular expressions are hard to read and write, you can either be a genius or you are not a human on Earth. The syntax of a regular expression is a headache, even for those who often use it. Because it is difficult to read/write and error-prone, it is necessary to find a tool to test the regular expression.

The details of regular expressions vary in different environments. This tutorial introduces Microsoft.. Net Framework 4.0.. Net tool Regular Expression tester. Install and run the software according to the instructions on this page.

Metacharacters

Now you know several useful metacharacters, such as \ B ,., *, and \ d. there are more metacharacters in the regular expression, such as \ s matching any blank space, including spaces, tabs, line breaks, and Chinese fullwidth spaces. \ W matches letters, numbers, underscores, and Chinese characters.

Special processing of Chinese/Chinese characters is supported by the Regular Expression Engine provided by. Net. For details about other environments, see relevant documents.

Here are more examples:

\ Ba \ w * \ B matches a word that starts with the letter a. First, a word starts with (\ B), and then, then there are any number of letters or numbers (\ w *), and finally the end of the word (\ B ).

Well, now let's talk about the meaning of the word in the regular expression: It's not less than a continuous \ w. Yes, it does not have to do with thousands of things with the same name when learning English.

\ D + matches one or more consecutive numbers. Here, the "+" is similar to the "*" metacharacters. The difference is that * matches any number of times (which may be 0 times), and "+" matches one or more times.

\ B \ w {6} \ B matches exactly 6 Characters of words.

Table 1. Common metacharacters
Code
Description

.
Match any character except linefeed

\ W
Match letters, numbers, underscores, or Chinese Characters

\ S
Match any blank space character

\ D
Matching number

\ B
Start or end of a matching word

^
Start of matching string

$
End of matching string

The Regular Expression Engine usually provides a method to test whether a specified string matches a regular expression, such as RegExp in JavaScript. the test () method or.. NET Regex. isMatch () method. Matching here refers to whether the character string matches the expression rules. If ^ and $ are not used, for \ d {5, 12}, this method can only ensure that the string contains 5 to 12 consecutive digits, instead of the entire string, it is a 5-12-digit number.

The metacharacters ^ (the symbol on the same key position as the number 6) and $ both match a position, which is a bit similar to \ B. ^ Match the start of the string you want to search for, and $ match the end. These two codes are very useful when verifying the entered content. For example, if a website requires that the QQ number you enter must be 5 to 12 digits, you can use: ^ \ d {5, 12} $.

The {5, 12} Here is similar to the {2} mentioned above, except that the {2} match can only be repeated twice, {5, 12} indicates that the number of repetitions cannot be less than 5, but not more than 12. Otherwise, none of them match.

Because ^ and $ are used, the entire input string must be matched with \ d {5, 12}, that is, the entire input must be 5 to 12 digits, therefore, if the entered QQ number can match this regular expression, it will meet the requirements.

Similar to case-insensitive options, some regular expression processing tools also have an option to process multiple rows. If this option is selected, the meaning of ^ and $ is changed to the start and end of the matching row.

Character escape

If you want to find the metacharacters themselves, for example, if you want to search for. Or *, you may encounter a problem: You cannot specify them because they will be interpreted as other meanings. In this case, you must use \ to cancel the special meanings of these characters. Therefore, you should use \. And \*. Of course, to find the \ itself, you also need to use \\.

For example, deerchao \. net matches deerchao.net, C :\\ Windows matches C: \ Windows.

Repeated

For more information about regular expressions, visit http://www.jb51.net/list/list_6_1.htm.

You have read the above matching methods *, +, {2}, {5, 12. The following are all the qualifiers in the regular expression (a specified number of codes, such as *, {5, 12 ):

Table 2. Common delimiters
Code/syntax
Description

*
Repeated zero or more times

+
Repeat once or more times

?
Zero or one repetition

{N}
Repeated n times

{N ,}
Repeat n or more times

{N, m}
Repeat n to m times

The following are examples of repeated use:

Windows \ d + matches one or more numbers after Windows

^ \ W + matches the first word of a row (or the first word of the entire string. The option setting must be used to specify the meaning of the match)

Character class

To search for numbers, letters, or numbers, the blank space is very simple, because there are already metacharacters corresponding to these character sets, but what should you do if you want to match character sets that do not have predefined metacharacters (such as vowels a, e, I, o, u?

You just need to list them in square brackets. For example, [aeiou] matches any English vowel, [.?!] Match punctuation marks (. Or? Or !).

We can also easily specify a character range. For example, [0-9] indicates that the meaning is exactly the same as \ d: a digit; similarly, [a-z0-9A-Z _] is equivalent to \ w (if only English is considered ).

The following is a more complex expression :\(? 0 \ d {2} [)-]? \ D {8 }.

"(" And ")" are also metacharacters, which will be mentioned later in the grouping section. Therefore, escape is required here.

This expression can match phone numbers in several formats, such as (010) 88886666, 022-22334455, or 02912345678. Let's analyze it. First, it is an escape character \ (it can appear 0 times or once (?), Then there is a 0 followed by two numbers (\ d {2}), followed by one of),-, or space. It appears once or does not appear (?), The last eight digits are (\ d {8 }).

Branch Condition

Unfortunately, the expression just now can also match the "Incorrect" format of 010) 12345678 or (022-87654321. To solve this problem, we need to use the branch condition. The branch condition in a regular expression refers to several rules. If any rule is satisfied, it should be regarded as a match. The specific method is to use | to separate different rules. Can't you understand? It doesn't matter. Let's look at the example:

0 \ d {2}-\ d {8} | 0 \ d {3}-\ d {7} This expression can match two phone numbers separated by a hyphen: one is a three-digit area code, an eight-digit Local Code (for example, 010-12345678), a four-digit area code, and a seven-digit local code (0376-2233445 ).

$? 0 \ d {2 }$? [-]? \ D {8} | 0 \ d {2} [-]? The expression \ d {8} matches the phone number of the three-digit area code. The area code can be enclosed in parentheses or not. The area code can be separated by a hyphen or space, or there is no interval. You can try to use the branch condition to extend this expression to a four-digit area code.

The expression \ d {5}-\ d {4} | \ d {5} is used to match the zip code of the United States. The U.S. Postal Code uses five digits or nine digits separated by a hyphen. This example is given because it indicates a problem: when using a branch condition, pay attention to the order of each condition. If you change it to \ d {5} | \ d {5}-\ d {4, then, it will only match the 5-digit ZIP code (and the first 5-digit of the 9-digit ZIP code ). The reason is that, when matching a branch condition, each condition will be tested from left to right. If a branch is satisfied, other conditions will not be managed.

Group

We have already mentioned how to repeat a single character (simply add a qualifier after the character); but what if you want to repeat multiple characters? You can use parentheses to indicate the subexpression (also called grouping), and then you can specify the number of repetitions of this subexpression, you can also perform other operations on the subexpression (which will be introduced later ).

(\ D {1, 3} \.) {3} \ d {1, 3} is a simple IP address matching expression. To understand this expression, analyze it in the following order: \ d {1, 3} matches 1 to 3 digits (\ d {1, 3 }\.) {3} matches three digits with an English ending (this group is used as a whole), repeats three times, and finally adds one to three digits (\ d {1, 3 }).

Each number in the IP address cannot exceed 255. Never be fooled by the scriptwriter in the third quarter of "24 ......

Unfortunately, it will also match an impossible IP address such as 256.300.888.999. If arithmetic comparison can be used, this problem may be solved simply. However, regular expressions do not provide any mathematical functions. Therefore, you can only use lengthy grouping and selection, character class to describe a correct IP Address: (2 [0-4] \ d | 25 [0-5] | [01]? \ D ?) \.) {3} (2 [0-4] \ d | 25 [0-5] | [01]? \ D ?).

The key to understanding this expression is to understand 2 [0-4] \ d | 25 [0-5] | [01]? \ D ?, I will not elaborate on it here. You should be able to analyze its meaning.

Antsense

Sometimes you need to find characters that do not belong to a simple character class. For example, if you want to search for any character except a number, you need to use the negative sense:

Table 3. Commonly Used negative code
Code/syntax
Description

\ W
Match any character that is not a letter, number, underline, or Chinese Character

\ S
Match any character that is not a blank character

\ D
Match any non-numeric characters

\ B
Match is not the start or end of a word

[^ X]
Match any character except x

[^ Aeiou]
Match any character except aeiou

Example: \ S + matches strings that do not contain blank characters.

<A [^>] +> match a string prefixed with a enclosed in angle brackets.

Backward reference

After a subexpression is specified with parentheses, the text that matches the subexpression (that is, the content captured by this group) can be further processed in the expression or other programs. By default, each group will automatically have a group number. The rule is: from left to right, marked by the left parentheses of the group, and the first group number that appears is 1, the second is 2, and so on.

Er ...... In fact, the allocation of group numbers is not as simple as I just mentioned:
Group 0 corresponds to the entire regular expression
In fact, the group number allocation process needs to be scanned from left to right twice: The first time is only allocated to untitled groups, for the second time, only the name group is assigned. Therefore, the group numbers of all naming groups are greater than those of untitled groups.
You can use (? : Exp) to deprive a group of the right to participate in group number allocation.
Backward reference is used to repeatedly search text matched by the previous Group. For example, \ 1 indicates the text matched by Group 1. Hard to understand? See the example:

\ B (\ w +) \ B \ s + \ 1 \ B can be used to match duplicate words, such as go or kitty. This expression is a word, that is, more than one letter or number (\ B (\ w +) \ B) between the start and end of a word ), this word is captured in a group numbered 1, followed by one or several blank characters (\ s + ), finally, the content captured in group 1 (that is, the previously matched word) (\ 1 ).

You can also specify the group name of the subexpression. To specify the group name of a subexpression, use the following syntax :(? <Word> \ w +) (or you can change the angle brackets :(? 'Word' \ w +), so that the Group Name of \ w + is specified as Word. To reverse reference the content captured by this group, you can use \ k <Word>, so the previous example can also be written as follows: \ B (? <Word> \ w +) \ B \ s + \ k <Word> \ B.

When parentheses are used, there are many syntax for specific purposes. The most common ones are listed below:

Table 4. Common grouping syntax
Category
Code/syntax
Description

Capture
(Exp)
Match exp and capture text to automatically named group

(? <Name> exp)
Match exp and capture the text to the group named name. You can also write (? 'Name' exp)

(? : Exp)
Matches exp, does not capture matched text, and does not assign group numbers to this group

Assertion with Zero Width
(? = Exp)
Match the position before exp

(? <= Exp)
Match position after exp

(?! Exp)
The position behind matching is not exp

(? <! Exp)
Match the position that is not exp

Note
(? # Comment)
This type of grouping does not affect the processing of regular expressions. It is used to provide comments for reading.

We have discussed the first two syntaxes. Third (? : Exp) does not change the processing method of the regular expression, but the content of such a group match will not be captured into a group as in the first two methods, nor will it have a group number. "Why do I want to do this ?" -- Well, why do you think?

Assertion with Zero Width

Do Earth people think these terms are too complicated and difficult to remember? I feel the same way. You just need to know what it is, so let it go! If a person is unknown, he can concentrate on his sword training. If a person is unknown, he can choose whatever ......

The following four items are used to search for things before or after some content (but not including the content), that is, they are used to specify a location like \ B, ^, $, this position should satisfy certain conditions (that is, assertion), so they are also called assertion with zero width. We 'd better illustrate it with examples:

Assertions are used to declare a fact that should be true. In a regular expression, matching continues only when the assertions are true.

(? = Exp) is also called a zero-width positive prediction predicate. It asserted that the position where it appears can match the expression exp. For example, \ B \ w + (? = Ing \ B), matching the front part of the word ending with ing (except for the ing part), such as searching for I'm singing while you're dancing. it will match sing and danc.

(? <= Exp) is also called the zero-width positive review and then asserted that it can match the expression exp in front of its own position. For example (? <= \ Bre) \ w + \ B will match the second half of the word starting with re (Except re). For example, it matches ading when searching for reading a book.

If you want to add a comma (, of course, from the right side) to each of the three digits in a long number, you can search for the parts that need to be added with a comma :((? <= \ D) \ d {3}) + \ B. When it is used to search for 1234567890, the result is 234567890.

The following example uses both assertions :(? <= \ S) \ d + (? = \ S) match the numbers separated by spaces (emphasize again, do not include these spaces ).

Assertion with negative Zero Width

We have previously mentioned how to find out characters that are not a character or are not in a character class ). But what if we only want to ensure that a character does not appear, but do not want to match it? For example, if we want to find such a word, which contains the Letter q, but q is not followed by the letter u, we can try this:

\ B \ w * q [^ u] \ w * \ B matches a word that contains the Letter q, not the letter u. But if you do more tests (or if you are keen enough, you can simply observe them), you will find that if q appears at the end of a word, like Iraq, Benq, this expression will cause an error. This is because [^ u] Always matches one character, so if q is the last character of a word, the [^ u] Following will match the word separator (which may be a space, a full stop or something else) after q, And the \ w * \ B Following will match the next word, therefore, \ B \ w * q [^ u] \ w * \ B can match the entire Iraq fighting. The negative zero-width assertion can solve this problem because it only matches one location and does not consume any characters. Now, we can solve this problem as follows: \ B \ w * q (?! U) \ w * \ B.

0-width negative prediction predicate (?! Exp), asserted that the position is not followed by the expression exp. Example: \ d {3 }(?! \ D) match three digits, and the three digits cannot be followed by digits; \ B ((?! Abc) \ w) + \ B match words that do not contain consecutive strings abc.

Similarly, we can use (? <! Exp), zero-width negative review, and then assertion to assert that the front of this position cannot match the expression exp :(? <! [A-z]) \ d {7} matches the first seven digits that are not lowercase letters.

Analyze the expressions in detail (? <= <(\ W +)> ).*(? = <\/\ 1>). This expression can best represent the true use of assertion with zero width.

A more complex example :(? <= <(\ W +)> ).*(? = <\/\ 1>) matches the content in the simple HTML Tag that does not contain the attribute. (? <= <(\ W +)>) specifies the prefix: The word enclosed by Angle brackets (for example, <B>), and then. * (any string), followed by a suffix (? = <\/\ 1> ). Pay attention to the \/In the suffix, which uses the character escape mentioned above; \ 1 is a reverse reference, which references the first group captured, the previous (\ w +) if the prefix is <B>, the suffix is </B>. The entire expression matches the content between <B> and </B> (remind me again, excluding the prefix and suffix itself ).

Note

Another use of parentheses is through the syntax (? # Comment) to include comments. Example: 2 [0-4] \ d (? #200-249) | 25 [0-5] (? #250-255) | [01]? \ D? (? #0-199 ).

To include comments, it is best to enable the "blank characters in ignore mode" option. In this way, spaces, tabs, and line breaks can be added when an expression is written, which will be ignored in actual use. After this option is enabled, all the text that ends at the end of the line after # is ignored as a comment. For example, we can write the previous expression as follows:

(? <= # Prefix of the text to be matched <(\ w +)> # search for letters or numbers enclosed in angle brackets (that is, HTML/XML tags) # end with the prefix. * # match any text (? = # Assert the suffix of the text to be matched <\/\ 1> # search for the content enclosed by Angle brackets: the front is a "/", followed by the previously captured labels) # End of suffix

Greed and laziness

When a regular expression contains a qualifier that can accept duplicates, the common behavior is to match as many characters as possible (on the premise that the entire expression can be matched. Take this expression as an example: a. * B, which will match the longest string starting with a and ending with B. If you use it to search for aabab, it will match the entire string aabab. This is called greedy matching.

Sometimes, we need to be more lazy to match, that is, to match as few characters as possible. All the qualifiers given above can be converted to the lazy match mode, as long as a question mark is added after it ?. This way .*? This means to match any number of duplicates, but use the minimum number of duplicates if the entire match is successful. Now let's look at the lazy version example:

A .*? B matches the string that is shortest, starts with a, and ends with B. If it is applied to aabab, it will match aab (first to third character) and AB (fourth to fifth character ).

Why is the first match aab (the first to the third character) rather than AB (the second to the third character )? Simply put, because a regular expression has another rule, it has a higher priority than a lazy/greedy rule: The first match to start has The highest priority-The match that begins earliest wins.

Table 5. Lazy delimiters
Code/syntax
Description

*?
Repeat any time, but as few as possible

+?
Repeat once or more times, but as few as possible

??
Repeated 0 or 1 times, but as few as possible

{N, m }?
Repeat n to m times, but as few as possible

{N ,}?
Repeated more than n times, but as few as possible

Processing options

In C #, you can use the Regex (String, RegexOptions) constructor to set the processing options of regular expressions. For example, Regex regex = new Regex (@ "\ ba \ w {6} \ B", RegexOptions. IgnoreCase );

The preceding describes several options, such as case-insensitive and multi-row processing. These options can be used to change the way regular expressions are processed. Below are the regular expression options commonly used in. Net:

Table 6. Common processing options
Name
Description

IgnoreCase (Case Insensitive)
Matching is case insensitive.

Multiline (Multiline Mode)
Change the meaning of ^ and $ so that they match the beginning and end of a row, not just the beginning and end of the entire string. (In this mode, the exact meaning of $ is: match the position before \ n and the position before the string ends .)

Singleline (single row Mode)
Change the meaning of. To match each character (including line break \ n ).

IgnorePatternWhitespace (ignore blank space)
Ignore non-escape spaces in the expression and enable annotation marked.

ExplicitCapture (explicit capture)
Only explicitly named groups are captured.

A frequently asked question is: Can I only use one of the multiple-row mode and single-row mode at the same time? The answer is: no. There is no relationship between the two options except that their names are similar (so confusing.

Balanced group/recursive match

The balanced group syntax described here is supported by. Net Framework. Other languages/libraries do not necessarily support this function, or different syntaxes are required to support this function.

Sometimes we need to match a nested hierarchical structure like (100*(50 + 15), and then simply use $. + $ then it will only match the content between the leftmost left brace and rightmost right brace (here we are discussing the greedy pattern, and the lazy pattern also has the following problems ). If the numbers of left and right brackets in the original string are not the same, for example (5/(3 + 2 ))), then the numbers in our matching results are not equal. Is there a way to match the longest pair of brackets in such a string?

To avoid (and \ (confuse your brain completely, we should replace parentheses with Angle brackets. Now our question is, how can we capture the content in the longest pair angle brackets in a string like xx <aa <bbb> aa> yy?

The following syntax structure is required:

(? 'Group') Name the captured content as a group and press it into the Stack)
(? '-Group') from the stack, the capture content named "group" pushed into the stack is displayed. If the stack is empty, the matching of the group fails.
(? (Group) yes | no) if the capture content named group exists on the stack, continue to match the expression of the yes part; otherwise, continue to match the no part.
(?!) Assertion with Zero Width and negative direction, attempts to match always fail because there is no suffix expression
If you are not a programmer (or you claim to be a programmer but do not know what a stack is), you can understand the above three syntaxes: the first is to write a "group" on the blackboard, the second is to erase a "group" from the blackboard, and the third is to see whether "group" is written on the blackboard ", if yes, continue to match the yes part; otherwise, the no part is matched.

What we need to do is press a "Open" button every time we encounter a left bracket, and each right bracket is displayed, at the end, let's see if the stack is empty. If it is not empty, it means that there are more left brackets than right brackets, and the matching should fail. The Regular Expression Engine will backtrack (discard the first or last character) and try to match the entire expression.

<# Left parentheses of the outermost layer [^ <>] * # The left parentheses of the outermost layer are not the content of the brackets (((? 'Open' <) # When you encounter a left bracket, write an "Open" [^ <>] * on the blackboard. # match the content behind the left bracket instead of the brackets.) + ((? '-Open'>) # If you encounter a right brace, erase an "Open" [^ <>] * # match the content that is not followed by the right brace) + )*(? (Open )(?!)) # In front of the outermost right parenthesis, judge whether there is any "Open" on the blackboard that has not been erased; if there is still, the matching fails> # The outermost right parenthesis

The most common application of a balancing group is to match HTML. The following example can match nested <div> labels: <div [^>] *> [^ <>] * (? 'Open' <div [^>] *>) [^ <>] *) + ((? '-Open' </div>) [^ <>] *) + )*(? (Open )(?!)) </Div>.

Nothing to mention

The above describes a large number of elements for constructing regular expressions, but there are still many things that are not mentioned. The following is a list of Unmentioned elements, including syntax and simple description. You can find more detailed references on the Internet to learn about them-when you need them. If you have installed the MSDN Library, you can also find the detailed document of the Regular Expression in. net.

The introduction here is very simple. If you need more detailed information and have not installed the MSDN Library on your computer, you can view the MSDN online documentation on the regular expression language elements.

Table 7. Syntax not discussed in detail
Code/syntax
Description

\
Alarm character (print it to the computer)

\ B
It is usually the word boundary, but if it is used in the character class, it indicates the return.

\ T
Tab, Tab

\ R
Enter

\ V
Vertical Tab

\ F
Page feed

\ N
Line Break

\ E
Escape

\ 0nn
The octal character of the ASCII code is nn.

\ Xnn
Character of the hexadecimal code nn in ASCII code

\ Unnnn
Character of the hexadecimal code in Unicode code that is nnnn

\ CN
ASCII control characters. For example, \ cC stands for Ctrl + C

\
String (similar to ^, but not affected by the option of multi-line processing)

\ Z
End or end of a string (not affected by the option of processing multiple rows)

\ Z
End of a string (similar to $, but not affected by the option of processing multiple rows)

\ G
Start of the current search

\ P {name}
The name of a character class in Unicode, for example, \ p {IsGreek}

(?> Exp)
Greedy subexpression

(? <X>-<y> exp)
Balance Group

(? Im-nsx: exp)
Change the processing option in the subexpression exp.

(? Im-nsx)
Is the partial change processing option after the expression

(? (Exp) yes | no)
Use exp as a positive assertion with Zero Width. If this position can match, use yes as the expression of this group; otherwise, use no

(? (Exp) yes)
Same as above, only use an empty expression as no

(? (Name) yes | no)
If the content is captured by a group named name, use yes as the expression; otherwise, use no

(? (Name) yes)
Same as above, only use an empty expression as no

Contact author

Okay, I admit, I lied to you. It takes more than 30 minutes to read it. believe me, this is my fault, not because you are too stupid. the reason why I say "30 minutes" is to give you confidence and patience to continue. now that you see this, it proves that my conspiracy has succeeded. is it refreshing to be fooled?

If you want to complain to me, or think that I can be more clever, or have any other problems, please come to my blog and let me know.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More