Regular Expression Notes

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The true understanding of RE begins with this article. RE is really profound and profound, and I have sorted out and recalled what I learned.

A regular expression is the code that records text rules..

As mentioned above, RE is a character arrangement rule described, which has two elements:

1,Expression Form:

The expression includes the subject and combination.

A)Subject: Various characters, no matter whether the character of a regular expression represents one or more types of characters. In short, these subjects can be seen

B)Combination Method: That isLocation, The same character, location is different, arrange different, represents the string is also different

Therefore, the content of the expression can be seen directly from the RE matching results.

.	Match linefeedAny character other than "\ n"
\ B	Indicates the start or end of a word, that is,Division [\ B matches the following position: its first character and the last character are not all \ w] Word in RE: not less than a continuous \ w
\ D	One digit(0, 1, 2 ...... 9)
\ S	Any blank space character: Space, tab, line break, Chinese fullwidth Space
\ W	Letters, numbers, underscores, or Chinese Characters
^	StringStart
$	StringEnd
+	Match the previous content1 time or multiple times
*	Number: the content on the front can be reused continuously.Any timeIs the entire expression match,0 or multiple times
?	Repeated0 or 1 time
{N}	RepeatedN times
{N ,}	RepeatedN times or times
{N, m}	RepeatedN to m times
\	When a special symbol is matched, the special meaning of the canceled symbol is :\--\\,--\,.--\.,(--$,)--$
[Aeiou]	Matches one character. The candidate value is aeiou.
[0-9]	Matches a number. The candidate value is 0, 1, 2, 3, 4, 5, 6, 8, 9.

Antsense: Upper case indicates the opposite meaning of lower case

\ W	Any character that is not a letter, number, underline, or Chinese Character
\ S	Any non-null characters
\ D	Any non-Numeric Character
\ B	It is not the start or end position of a word.
[^ X]	Any character except x
[^ Aeiou]	Any character except aeiou

2,Expression:

In terms of expressions, how to reasonably, accurately, and briefly describe rules is the scope of expressions, such: branch, group, Back Reference, zero-width assertion, greedy and lazy, recursive matching, etc.

Some people may say that the things mentioned in the previous line are also directly reflected in the results. What I'm talking about is,ExpressionThe use of various methods makes the expression more concise and accurate. In fact, some methods are useless. For example, backward reference is a clear example.

Branch Condition

Use"|Separate different rules
Eg.
> Domestic fixed telephone:0 \ d {2}-\ d {8} | 0 \ d {3}-\ d {7}Separated by "-", the three-digit and four-digit area numbers
> China's fixed telephone, with a three-or four-digit area code. The area code can be separated by a hyphen (-), a space, or nothing:
$? 0 \ d {2 }$? [-]? \ D {8} | $? 0 \ d {3 }$? [-]? \ D {7}
Bug:No three or four-digit area codes can be assigned to 01234567890; no matching is successful for 012-34567890
(\ D {11}) | ^ (\ d {7, 8}) | (\ d {4} | \ d {3})-(\ d {7, 8 }) | (\ d {4} | \ d {3})-(\ d {7, 8 }) -(\ d {4} | \ d {3} | \ d {2} | \ d {1}) | (\ d {7, 8 }) -(\ d {4} | \ d {3} | \ d {2} | \ d {1}) $)

Supports mobile phone numbers, 3-4 area codes, 7-8 live video numbers, and 1-4 extension numbers.
> U.S. Postal code: \ d {5}-\ d {4} | \ d {5} 9 digits. The first five digits are separated by "-", or only five digits are allowed.
(Note the order of each conditionIf it is \ d {5} | \ d {5}-\ d {4}, the value matches the first five digits of the zip code or bit, if the matching is completedLazyPrinciples)

Group

Repeat a group of characters()To specifySubexpressionGroup
Eg.
> IP Address:
(2 [0-4] \ d | 25 [0-5] | [01]? \ D ?) \.) {3} (2 [0-4] \ d | 25 [0-5] | [01]? \ D ?)

Backward reference

After the subexpression (group), useNo.ComeReferenceIn the preceding group, the number of the added group starts from 1 by default, and \ 1 indicates the text matched by Group 1. Group 0 matches the entire regular expression.
Eg.
Repeated words, such as "go"
(\ B (\ w +) \ B) \ s + \ 1 \ B
You can specify the group name for the group by yourself :(? <GroupName> expr) or (? 'Groupname' expr)
[During group number matching, scan both sides: 1. Scan unnamed groups; 2. Scan named groups]
Eg.
> IP Address:
((? <IP> 2 [0-4] \ d | 25 [0-5] | [01]? \ D ?) \.) {3} \ k <IP>

Assertion with Zero Width

"Zero Width" indicates that this syntax does not occupy any character in the matching string.
"Assertion" indicates that exit if the condition is not met
Make sure that some characters are near the matching string
(? = Exp)Zero-width positive prediction first asserted: The expression exp can be matched after the location where the assertions appear.
The matching string is followed by exp
Eg.
> Except for ing, the word ending with ing in an article:
\ B \ w + (? = Ing \ B ?)
Rolling in the deep it matches Roll
(? <= Exp)When the blank width is positive, the system determines that the expression exp can be matched in front of the location where the assertions appear.
The matching string is preceded by exp.
Eg.
> Match the parts except re in the words starting with re:
(? <= \ Bre) \ w + \ B
Reading a book that matches ading
> Numbers separated by blank spaces (excluding these blank spaces)
(? <= \ S) \ d + (? = \ S)

Assertion with negative Zero Width

Make sure there are no characters near the matching string
(?! Exp)Zero-width negative prediction first asserted: The expression exp cannot be matched after this position
The matching string cannot be followed by exp
Eg.
> The word is not followed by q of the letter u:
\ B \ w * q (?! U) \ w * \ B
(? <! Exp)Zero-width negative review post-asserted: The expression exp cannot be matched before the asserted position
The matching string cannot start with exp.
Eg.
The first seven digits are not lower-case letters:
(?! <[A-z]) \ d {7}
Simple HTML tags without attributesLiContent:
(? <= <\ W +> ).*(? = <\/\ 1>)

Note

(? # Comment)
Eg.
IP Address:
((? <IP> 2 [0-4] \ d (? #200-249) | 25 [0-5] (? #250-255) | [01]? \ D? (? #0-199) \.) {3} \ k <IP>

Greedy

When RE contains a qualified qualifier that can accept duplicates, it usually matchesAs many as possible.
Eg.
A. * BMatches the longest string that starts with a and ends with B.
Aabab will match the entire string

Laziness

MatchAs few as possibleCharacter
Add the following separator to the front?It can be converted to the lazy mode.
Eg.
A .*? BThe matching results for aabab are aab and AB.

*?	Repeated 0 or multiple times, but as few as possible
+?	Repeat once or multiple times, but as few as possible
??	Repeated 0 or 1 times, but as few as possible
{N, m }?	Repeat n to m times, but as few as possible
{N ,}?	Repeated more than n times, but as few as possible

First match

The first match has a higher priority than greedy or lazy.

The above thinking may not be so rigorous. I just want to explain how I understand this set of things.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Regular Expression Notes

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support