30 minutes to get started on a regular Expression Basics Tutorial _ Basics

Source: Internet
Author: User
Tags assert control characters documentation processing text expression engine

Objective of this article
In 30 minutes you will understand what the regular expression is and have some basic knowledge of it so that you can use it in your own program or Web page.


How to use this tutorial

Most importantly-please give me 30 minutes, if you don't have the experience of using regular expressions, please don't try to get started in 30 seconds-unless you're Superman:)

Don't be intimidated by the complex expressions below, as long as you follow me step-by-step, you will find that the regular expression is not as difficult as you think. Of course, if you have finished this tutorial, it's normal to find out that you know a lot, but you can't remember nearly anything--I think that people who have not contacted the regular expression will have zero chance to remember the grammar mentioned above over 80% after reading this tutorial. Here is just to let you understand the basic principles, you need more practice, more use, to master regular expression.

In addition to being an introductory tutorial, this article attempts to become a regular expression grammar reference manual that you can use in your daily work. As far as the author's own experience is concerned, the goal is well done--you see, I haven't been able to write everything down myself, have I?

Clear Rich Text Format conventions: Professional terms meta-character/grammar format a regular expression, part of a regular expressions (used for parsing) a description of a regular expression, or part of it, to match the source string

Hide side note there are comments on the right side of this article, mainly to provide some relevant information, or to explain some basic concepts to readers who do not have a programmer background, which can often be ignored.


What exactly is a regular expression?

Characters are the most basic unit of computer software processing text, which may be letters, numbers, punctuation marks, spaces, line breaks, Chinese characters, and so on. A string is a sequence of 0 or more characters. The text is the text, the string. To say that a string matches a regular expression, usually refers to a part of the string (or several parts, respectively) that satisfies the condition given by the expression.

When writing a program or Web page that handles strings, there is often a need to find strings that match some of the complex rules. Regular expressions are the tools used to describe these rules. In other words, regular expressions are code that records text rules.

It is likely that you have used the wildcard character (wildcard) for file lookup under Windows/dos, that is, * and?. If you want to find all the Word documents in a directory, you will search for *.doc. Here, * will be interpreted as any string. Like wildcard characters, regular expressions are also tools for text matching. It's just a more precise description of your needs than a wildcard--and, of course, the price is more complicated--for example, you can write a regular expression that looks for all of the 0 starts, followed by 2-3 digits, then a hyphen "-", The last is a 7-or 8-digit string (like 010-12345678 or 0376-7654321).

Entry

The best way to learn regular expressions is to start with examples, to understand the examples, and to modify the examples themselves. A few simple examples are given here, and they are described in detail.

If you look for hi in an English novel, you can use the regular expression hi.

This is almost the simplest regular expression, and it can exactly match such a string: it is composed of two characters, the first character is H, and the latter is I. Typically, the tool that handles regular expressions provides an option to ignore the case, and if this option is selected, it can match any of the four Hi,hi,hi,hi.

Unfortunately, many words contain the two consecutive characters of Hi, such as Him,history,high and so on. If you look it up with Hi, the side of Hi will also be found. If you want to find the word "hi" accurately, we should use \bhi\b.

\b is a special code that is prescribed by regular expressions (well, some people call it metacharacters, metacharacter), representing the beginning or end of a word, the boundary of a word. Although English words are usually separated by spaces, punctuation marks, or newline, \b does not match any of these word-delimited characters, it matches only one position.

If a more precise statement is needed, the \b matches the position that the first and last characters are not all (one is, one is not, or does not exist) \w.

If you're looking for hi, not far behind. Follow a Lucy, you should use \bhi\b.*\blucy\b.

Here,. is another meta character that matches any character except the newline character. * is also a meta character, but it represents not a character, nor a position, but a quantity--it specifies that the content at the front of the * can be reused repeatedly to match the entire expression. So,. * Connecting together means any number of characters that do not contain a newline. Now the meaning of \bhi\b.*\blucy\b is obvious: first a word hi, then any arbitrary character (but not a newline), and finally the word Lucy.

The newline character is ' \ n ', and the ASCII encoding is 10 (hexadecimal 0x0a).

If you use other metacharacters at the same time, we can construct a more powerful regular expression. For example, the following example:

0\d\d-\d\d\d\d\d\d\d\d matches a string that starts with 0, then two digits, then a hyphen "-", and finally 8 digits (that is, China's phone number). Of course, this example can only match a case with an area code of 3 digits.

The \d here is a new meta character that matches a digit (0, or 1, or 2, or ...). -Not a metacharacters, just match itself-hyphen (or minus, or middle horizontal, or whatever you call it).

To avoid so many annoying repetitions, we can also write this expression: 0\d{2}-\d{8}. Here \d {2} ({8}) means that the preceding \d must be repeated 2 times (8 times) consecutively.

Testing Regular Expressions

Other test tools available:

Regexbuddy
JavaScript regular expression on-line test tool
If you don't think regular expressions are hard to read or write, you're either a genius or you're not from Earth. The syntax of regular expressions is a headache, even for people who often use it. Because it is difficult to read and write, it is easy to make mistakes, so it is necessary to find a tool to test regular expressions.

Some of the details of regular expressions in different environments are not the same, and this tutorial describes the behavior of the Microsoft. Net Framework 4.0 Regular expressions, so I recommend that I write to you. NET tool Regular expression tester. Please refer to the description of the page to install and run the software.


Metacharacters

Now you know a few very useful meta characters, such as \b,., *, and \d. There are more metacharacters in regular expressions, such as \s matching any whitespace, including spaces, tabs (tab), line breaks, Chinese full-width spaces, and so on. \w matches letters or numbers or underscores or kanji.

Special handling of Chinese/Chinese characters is made by. NET provides a regular expression engine that is supported by the, in other circumstances, see the relevant documentation.

Here's a look at more examples:

\ba\w*\b matches words that begin with the letter A--first at the beginning of a word (\b), then the letter A, then any number of letters or numbers (\w*), and finally at the end of the word (\b).

Well, now let's talk about what the word in the regular expression means: it's not less than a continuous \w. Yes, it does not really matter to the tens of thousands of things that have the same name to memorize when learning English:)

\d+ matches 1 or more consecutive digits. Here the + is and * similar to the meta character, the difference is the * match repeat any time (may be 0 times), and + will match repeat 1 or more times.

\b\w{6}\b matches a word that is exactly 6 characters.

Table 1. Commonly used meta-characters
Code
Description

.
Match any character except the line feed

\w
Match letters or numbers or underscores or kanji

\s
Match any white space character

\d
Matching numbers

\b
Match the start or end of a word

^
Match the start of a string

$
End of Match string

The regular expression engine usually provides a way to "test whether a specified string matches a regular expression", such as the Regexp.test () method in JavaScript or. NET in the Regex.IsMatch () method. The match here refers to the part of the string that conforms to the expression rule. If you do not use ^ and $, for \d{5,12}, using such a method would only guarantee that the string contains 5 to 12 consecutive digits, rather than the entire string being 5 to 12 digits.

The meta character ^ (and the number 6 symbol on the same point) and $ all match a position, which is a bit like \b. ^ matches the beginning of the string you want to use to find, and the $ matches the end. These two code is very useful when validating the input content, for example a website if asks you to fill in the QQ number must be 5 digits to 12 digits, may use: ^\d{5,12}$.

The {5,12} here is similar to {2} described earlier, except that {2} matches only a little more than a few 2 times, {5,12} is the number of repetitions can not be less than 5 times, not more than 12 times, otherwise all do not match.

Because of the use of ^ and $, the entire string entered should be used to match the \d{5,12}, which means that the entire input must be 5 to 12 digits, so if the input QQ number can match the regular expression, then it will meet the requirements.

Similar to the option of ignoring case, some regular expression processing tools also have an option to handle multiple rows. If this option is selected, the meaning of ^ and $ becomes the beginning and end of the matching line.


Word Escape

If you want to look up the meta characters themselves, such as if you're looking for, or *, there's a problem: You can't specify them because they'll be interpreted as something else. Then you have to use \ To remove the special meaning of these characters. Therefore, you should use \. and \*. Of course, to find \ itself, you have to use \.

For example: deerchao\.net matching deerchao.net,c:\\windows matching C:\Windows.

Repeat

More regular expressions related materials can be viewed in http://www.jb51.net/list/list_6_1.htm

You've seen the previous *,+,{2},{5,12} these several matching repetitions of the way. The following are all qualifiers in the regular expression (a specified number of codes, such as *,{5,12}):

Table 2. Commonly used qualifiers
Code/syntax
Description

*
Repeat 0 or more times

+
Repeat one or more times

?
Repeat 0 times or once

N
Repeat n times

{N,}
Repeat N or more times

{N,m}
Repeat N to M times

Here are some examples of using duplicates:

windows\d+ match Windows followed by 1 or more digits

^\w+ matches the first word of a line (or the first word of the entire string, exactly which meaning depends on the option setting)

Character class

To find numbers, letters, or numbers, whitespace is simple because there are metacharacters that correspond to these characters, but what if you want to match character sets (such as vowel a,e,i,o,u) without predefined meta characters?

Simply, you just need to list them in square brackets, like [aeiou] to match any one English vowel, [.?!] Matches a punctuation mark (. or? or!).

We can also easily specify a range of characters, as the meaning of [0-9] represents is exactly the same as \d: a number, and the same [a-z0-9a-z_] is exactly equivalent to \w (if only in English).

The following is a more complex expression: \ (? 0\d{2}[)-]?\d{8}.

"(and") is also a meta character, which is mentioned in a later grouping section, so you need to use escape here.

This expression can match phone numbers in several formats, such as (010) 88886666, or 022-22334455, or 02912345678. Let's do some analysis of it: First, an escape character \ (it can occur 0 or 1 times). And then a 0, followed by 2 digits (\d{2}), or--or one of the spaces, which appears 1 times or does not appear (?).), and finally 8 digits (\d{8}).

Branching conditions

Unfortunately, that expression can also match the "incorrect" format of 010 12345678 or (022-87654321). To solve this problem, we need to use the branching conditions. The branching condition in a regular expression is a few rules, and if any one of these rules is to be matched, the specific method is to separate the different rules. Don't you understand? Okay, look at the example:

0\d{2}-\d{8}|0\d{3}-\d{7} This expression can match two phone numbers separated by a hyphen: one is a three-bit area code, a 8-bit local number (such as 010-12345678), a 4-digit area code, and a 7-bit local number (0376-2233445).

\ (? 0\d{2}\)? [-]?\d{8}|0\d{2}[-]?\d{8} This expression matches the phone number of the 3-bit area code, where the area code can be enclosed in parentheses, or not, and can be separated by a hyphen or space between the area code and the local number, or without spacing. You can try branching conditions to extend this expression to also support 4-bit area code.

\d{5}-\d{4}|\d{5} This expression is used to match the postal code of the United States. The rules for the U.S. ZIP Code are 5 digits, or 9 digits with a hyphen interval. The reason to give this example is because it illustrates a problem: when using branching conditions, pay attention to the order of each condition. If you change it to \d{5}|\d{5}-\d{4, it will only match the 5-bit ZIP code (and the first 5 digits of the 9-bit zip code). The reason is that when you match the branching conditions, you will test each condition from left to right, and if you satisfy a branch, you will not be able to control the other conditions.

Group

We've mentioned how to repeat a single character (just add a qualifier to the character), but what if you want to repeat more than one character? You can specify a subexpression (also called a grouping) with parentheses, and then you can specify the number of repetitions of the subexpression, and you can do other things with the sub-expression (described later).

(\d{1,3}\.) {3}\d{1,3} is a simple IP address matching expression. To understand this expression, analyze it in the following order: \d{1,3} matches 1 to 3 digits, (\d{1,3}\.) {3} matches a three-digit number plus an English period (this whole is this group) repeats 3 times, and finally adds a number of one to three digits (\d{1,3}).

IP address in each number can not be greater than 255, we must not be "24" the third quarter of the screenwriter to cheat ...

Unfortunately, it will also match 256.300.888.999 this impossible IP address. If you can use arithmetic comparisons, you may be able to solve this problem simply, but the regular expression does not provide any function of mathematics, so you can use a lengthy grouping, selection, character class to describe a correct IP address: (2[0-4]\d|25[0-5]|[ 01]?\d\d?) \.) {3} (2[0-4]\d|25[0-5]| [01]?\d\d?].

The key to understanding this expression is understanding 2[0-4]\d|25[0-5]| [01]?\d\d, I will not elaborate here, you should be able to analyze the meaning of it.

Anti-righteousness

Sometimes you need to find characters that are not part of a simple definition of a character class. For example, if you want to find any other character that is in addition to a number, then you need to use the following antisense:

Table 3. Common Anti-semantic code
Code/syntax
Description

\w
Match any characters that are not letters, numbers, underscores, Chinese characters

\s
Match any character that is not a blank symbol

\d
Match any number of non-numeric characters

\b
Matches a position that is not the beginning or end of a word

[^x]
Matches any character other than X

[^aeiou]
Match any character other than aeiou these letters

Example: \s+ matches a string that does not contain whitespace characters.

<a[^>]+> matches a string enclosed in angle brackets that starts with a.

Back reference

After you use parentheses to specify a subexpression, the text that matches the subexpression (that is, what this group captures) can be further processed in an expression or other program. By default, each grouping automatically has a group number, which is left to right, is marked with the left parenthesis of the group, the first group that appears is 1, the second is 2, and so on.

Uh...... In fact, the distribution of the group number is not as simple as I have just said:
Group 0 corresponds to the entire regular expression
In fact, the group number allocation process is to scan from left to right two times: the first time assign only to unnamed groups, and the second time assign only to named groups--so that all named groups have larger group numbers than unnamed group numbers
You can use the syntax of (?: EXP) to deprive a group of the right to participate in the allocation of the number of groups.
A back reference is used to repeat the search for a previously grouped text. For example, \1 represents grouping 1 matching text. Hard to understand? Take a look at the example:

\b (\w+) \b\s+\1\b can be used to match repeated words, like go, or Kitty kitty. This expression begins with a word, which is more than one letter or number (\b (\w+) \b) between the beginning and the end of the word, and the word is captured in a group numbered 1, followed by 1 or several blank characters (\s+), Finally, the content captured in Group 1 (that is, the word previously matched) (\1).

You can also specify the group name of the subexpression yourself. To specify a group name for a subexpression, use this syntax: (? <word>\w+) (or change the angle bracket to ' also OK: (? '). Word ' \w+ '), so that the \w+ group name is specified as Word. To reverse reference to the content captured by this grouping, you can use \k<word>, so the previous example can be written like this: \b (? <word>\w+) \b\s+\k<word>\b.

When using parentheses, there are many specific uses of the syntax. Some of the most commonly used are listed below:

Table 4. Common grouping syntax
Classification
Code/syntax
Description

Capture
(exp)
Match exp, and capture text into an automatically named group

(? <name>exp)
Match exp and capture the text into a group named name, or you can write a (? ' Name ' exp ')

(?: EXP)
Matches exp, does not capture matching text, and does not assign group numbers to this group

0 Wide Assertion
(? =exp)
Match the position of the exp front

(? <=exp)
Match the position of the exp back

(?! Exp
Match the back to the position not exp

(? <!exp)
Matches a position that is not exp at the front

Comments
(? #comment)
This type of grouping does not have any effect on the processing of regular expressions and is used to provide comments for people to read

We have discussed the first two syntaxes. The third (?: EXP) does not change the way regular expressions are handled, except that the matching content of such groups is not captured in a group like the first two, nor does it have a group number. "Why would I want to do that?" "--good question, what do you think of it?"

0 Wide Assertion

Do people on Earth think that these terms are too complex to remember? I have the same feeling. Know that there is such a thing on the line, what it called, let it Go! If a person is nameless, he can concentrate on practising his sword, and if he is nameless, he can choose freely.

The next four are used to find things before or after certain content, but not the content, that is, they are used as \b,^,$ to specify a position that satisfies certain conditions (that is, assertions), so they are also called 0-wide assertions. It's best to take an example to illustrate:

Assertions are used to declare a fact that should be true. A regular expression will continue to match only if the assertion is true.

(? =exp) is also called the 0-width positive lookahead assertion, which asserts that the position in which it appears is followed by the expression exp. For example \b\w+ (? =ing\b), matches the front part of the word with ing ending (except for the part of ING), such as finding I ' m singing while you ' re dancing. When it matches sing and Danc.

(? <=exp) also known as the 0 width is reviewed later to assert that it asserts that the position itself appears to precede the expression exp. For example (? <=\bre) \w+\b matches the second half of a word that begins with re (except for parts other than re), for example, when looking for reading a book, it matches ading.

If you want to add a comma to every three digits in a very long number (plus, of course, from the right), you can look for parts that need commas in front and inside: (? <=\d) \d{3}) +\b, the result is 234567890 when you use it to find 1234567890.

The following example uses both assertions: (? <=\s) \d+ (? =\s) matches numbers that are separated by whitespace (again emphasizing, excluding these whitespace characters).

Negative 0 Wide Assertion

We mentioned earlier how to find a method that is not a character or a character that is not in a certain character class (antisense). But what if we just want to make sure that a character doesn't appear, but doesn't want to match it? For example, if we want to find a word that has the letter Q in it, but Q is not followed by the letter u, we can try this:

\b\w*q[^u]\w*\b matches words that contain the letter Q that is not followed by the letter U. But if you do more testing (or if you're sensitive enough to see it directly), you'll find that if Q appears at the end of the word, like Iraq,benq, the expression will go wrong. This is because [^u] always matches one character, so if Q is the last character of the word, then [^u] will match the word delimiter after Q (possibly a space, or a period or something), and the \w*\b will match the next word, so \b\w*q[^u]\w*\ B will be able to match the entire Iraq fighting. A negative 0-wide assertion solves such a problem because it matches only one location and does not consume any characters. Now, we can solve this problem like this: \b\w*q. u) \w*\b.

0 width Negative lookahead assertion (?!) EXP), asserting that the expression exp is not matched at the back of this position. For example: \d{3} (?! \d) matches three digits, and this three-digit number cannot be followed by a number; \b (?! ABC) +\b matches words that do not contain the \w string ABC.

Similarly, we can use the (? <!exp), 0-width negative review to assert that the front of this position cannot match the expression Exp: (? <![ A-z]) \d{7} matches a seven-digit number that is not preceded by a lowercase letter.

Please analyze the expression in detail (?<=< (\w+) >). * (?=<\/\1>), which can best represent the true purpose of the 0-wide assertion.

A more complex example: (?<=< (\w+) >). * (?=<\/\1>) matches the contents of a simple HTML tag that does not contain attributes. (?<=< (\w+) >) specifies a prefix that is enclosed in angle brackets (for example, a possible <b>) followed by a. * (any string) and finally a suffix (?=<\/\1>). Notice the \/in the suffix, which uses the previously mentioned word escape \1 is a reverse reference, which refers to the first set of captures, the preceding (\w+) matching content, so that if the prefix is actually <b>, the suffix is </b>. The entire expression matches the content between <b> and </b> (reminders again, excluding prefixes and suffixes themselves).

Comments

Another use of parentheses is to include a comment through a syntax (? #comment). For example: 2[0-4]\d (? #200 -249) |25[0-5] (? #250-255) | [01]?\d\d? (? #0-199).

To include annotations, it is best to enable the "ignore whitespace in mode" option so that you can add any space, Tab, and line breaks when you write an expression, which will be ignored when you actually use it. When this option is enabled, all text that ends in # after the end of the line will be ignored as comments. For example, we can write one of the preceding expressions:

(? <= # asserts that the prefix of the text to be matched < (\w+) > # Find the letter or number (that is, the html/xml tag) enclosed in the angle brackets) # prefix ends. * # match any text (? = # assertion to match the suffix of the text <\/\1> # find Bracketed content: Preceded by a "/", followed by a previously captured label) # suffix End

Greed and laziness

When a regular expression contains a qualifier that accepts duplicates, the usual behavior is to match as many characters as possible (subject to the entire expression being matched). Take this expression as an example: A.*b, it will match the longest string that begins with a and ends with a B. If you use it to search for Aabab, it will match the entire string aabab. This is called greedy match.

Sometimes we need a lazy match, which is to match as few characters as possible. The qualifier given above can be converted to lazy match mode, just add a question mark after it. this. * means matching any number of repetitions, but using the least repetition if the whole match succeeds. Now look at the lazy version of the example:

A.*?b matches the shortest string, starting with a and ending with B. If applied to Aabab, it will match AaB (first to third characters) and AB (fourth to fifth characters).

Why is the first match AaB (first to third character) instead of AB (second to third character)? Simply put, because regular expressions have another rule, higher precedence than lazy/greedy rules: The first match has the highest priority--the match that begins earliest wins.

Table 5. Lazy Qualifier
Code/syntax
Description

*?
Repeat any time, but try to repeat as little as possible

+?
Repeat 1 or more times, but repeat as little as possible

??
Repeat 0 or 1 times, but repeat as little as possible

{n,m}?
Repeat N to M times, but repeat as little as possible

{N,}?
Repeat more than n times, but repeat as little as possible

Processing options

In C #, you can use the Regex (String, RegexOptions) constructor to set the processing options for regular expressions. such as: regex regex = new Regex (@ "\ba\w{6}\b", regexoptions.ignorecase);

These options can be used to change the way a regular expression is handled, such as ignoring case, handling multiple rows, and so on. Here's the. Common regular Expression options in net:

Table 6. Common Processing Options
Name
Description

IgnoreCase (Ignore case)
Matches are case-insensitive.

Multiline (multi-line mode)
Change the meaning of ^ and $ so that they match at the beginning and end of any line, not just at the beginning and ending of the entire string. (In this mode, the exact meaning of $ is to match the previous position and the position before the end of the string.)

Singleline (single-line mode)
Change the meaning of. So that it matches each character (including the newline character \ n).

Ignorepatternwhitespace (Ignore whitespace)
Ignores the non-escaped whitespace in an expression and enables annotations by the # tag.

Explicitcapture (Explicit capture)
Only groups that have been explicitly named are captured.

One question that is often asked is: Is it possible to use only one of multiple-line and single-line patterns at the same time? The answer is: No. There is no relationship between the two options, except that their names are more similar (and so confusing).

Balance Group/recursive matching

The balanced group syntax described here is supported by the. Net Framework; Other languages/libraries do not necessarily support this feature, or support this feature but require different syntax.

Sometimes we need to match a nested hierarchical structure such as (100 * (50 + 15)), simply using \ (. +\) will only match the content between the leftmost left parenthesis and the rightmost closing parenthesis (we're talking about greedy mode, and lazy mode has the following problem). If the original string has an unequal number of opening and closing parentheses, such as (5/(3 + 2)), the number of the two matches will not be equal. Is there a way to match the content between the longest and the paired brackets in such a string?

To avoid (and/or confuse) your brain, we'll use angle brackets instead of parentheses. Now our question becomes how to capture the contents of the longest pairing in a string such as xx <aa <bbb> <bbb> aa> yy?

Here you need to use the following syntax constructs:

(?' Group ') names the captured content as group and presses the stack
(?' -group ') The capture of the last pressed stack from the stack is ejected, and if the stack is empty, the match for this group fails
(? (group) Yes|no) If there is a capture on the stack that is named Group, continue to match the expression of the Yes part, otherwise continue to match the no part
(?!) 0 wide Negative lookahead assertion, because there is no suffix expression, trying to match always fails
If you're not a programmer (or you call yourself a programmer but don't know what the stack is), you understand the above three kinds of grammar: the first one is to write a "group" on the blackboard, the second is to erase a "group" from the blackboard, and the third is to see if there are any "group" on the blackboard. ", if there is, continue to match the Yes part, otherwise it will match the no part."

All we need to do is to hit the left parenthesis every just press an "Open", and each time you hit a closing parenthesis, it pops up, and at the end, see if the stack is empty--if it's not empty, it proves that the left parenthesis is more than the right parenthesis, and that match should fail. The regular expression engine will backtrack (discard some of the first or last characters) and try to match the entire expression.

< #最外层的左括号 [^<>]* #最外层的左括号后面的不是括号的内容 ((?) Open ' < ' #碰到了左括号, write an "open" [^<>]* #匹配左括号后面的不是括号的内容) on the blackboard + (? -open ' >) #碰到了右括号, Erase an "Open" [^<>]* #匹配右括号后面不是括号的内容) +) * ( Open) (?!)) #在遇到最外层的右括号前面, judge if there is any "Open" on the blackboard, and if so, the match fails > #最外层的右括号

One of the most common applications of the balancing group is to match HTML, and the following example matches the nested <div> Tags: <div[^>]*>[^<>]* ((?) Open ' <div[^>]*> ' [^<>]*] + (? -open ' </div>) [^<>]*] +) * ( Open) (?!)) </div>.

There's something else that's not mentioned.

The top has already described a large number of elements that construct regular expressions, but there are many things that are not mentioned. The following is a list of elements that are not mentioned, including syntax and a simple description. You can find more detailed references on the web to learn about them-when you need them. If you install the MSDN Library, you can also find a detailed document in the. NET regular expression.

The introduction here is simple, and if you need more detailed information and you don't have the MSDN Library installed on your computer, you can view the MSDN online documentation about the regular expression language elements.

Table 7. Syntax not discussed in detail
Code/syntax
Description

\a
Alarm character (the effect of printing it is a computer beep)

\b
It's usually a word demarcation, but if you use a backspace in a character class

\ t
tab, tab

\ r
Enter

\v
Vertical tab

\f
Page breaks

\ n
Line feed

\e
Escape

\0nn
Characters with octal code as nn in ASCII code

\xnn
Characters with hexadecimal code NN in ASCII code

\unnnn
Characters with hexadecimal code nnnn in Unicode code

\cn
ASCII control characters. Like \CC, the representative of CTRL + C

\a
String at the beginning (similar to ^, but not affected by multiline options)

\z
End of string, or end of line (not affected by multiple-line options)

\z
End of string (similar to $ but not affected by multiline options)

\g
The beginning of the current search

\p{name}
A character class named name in Unicode, such as \p{isgreek}

(? >exp)
Greedy subexpression

(? <x>-<y>exp)
Balance Group

(? im-nsx:exp)
Changing processing options in subexpression exp

(? im-nsx)
Change the processing options for the part that follows the expression

(? (exp) yes|no)
Take exp as a 0 wide forward assertion, and if you can match at this position, use Yes as an expression for this group;

(? (exp) Yes)
Ditto, just use an empty expression as no

(? (name) yes|no)
If the group named name captures the content, use Yes as an expression;

(? (name) Yes)
Ditto, just use an empty expression as no

Contact author

Well, I admit, I lied to you, you must have spent more than 30 minutes reading here. Believe me, it's my fault, not because you're too stupid. The reason why I say "30 minutes" is to give you confidence and patience to continue. Now that you've seen it, that proves my plot was a success. Is it cool to be fooled?

To complain about me, or to think that I can actually make a fool of myself, or have any other questions, welcome to my blog to let me know.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.