Quick recall Regular Expression

Source: Internet
Author: User
Tags control characters

Quick recall Regular Expression

This is not an entry-level article, but if you have an understanding of regular expressions or have used them, it can help you quickly recall them. Reading this article requires you to have used regular expressions or some knowledge before, because I have not written many examples. To sum up regular expressions, I just want to give a brief overview of the symbols and usage in regular expressions after years of accumulation. There are a lot of regular expressions on the Internet, but I always feel that there are too many technical terms to read. The last two sections are from the regular expression 30-minute getting started tutorial, some repairs.

A regular expression string consists of two basic characters: literal text characters and metacharacters. Metacharacters are special characters with special meanings in regular expressions. metacharacters may be a single character or a basic unit consisting of multiple characters.

Metacharacters

A metacharacter may represent a number, letter, position, or number.

Character

Code

Description

.

Match any character except linefeed

\ W

Match letters, numbers, underscores, or Chinese Characters

\ S

Match any blank space character

\ D

Matching number

\ S

Opposite to \ s

\ D

Opposite to \ d

\ W

Opposite to \ w

Quantity

Code

Description

*

Repeated zero or more times

+

Repeat once or more times

?

Zero or one repetition

{N}

Repeated n times

{N ,}

Repeat n or more times

{N, m}

Repeat n to m times


Location

Code

Description

\ B

Start or end of a matching word

\ B

It is not the start or end of a word.

^

Start of matching string

$

End of matching string

Escape characters

Since metacharacters have special meanings in regular expressions, what should we do if we want to treat them as plain text characters? Add a backslash (\) before the metacharacters to indicate that the current metacharacters have lost the special meaning in the Regular Expression and become a literal character.

Predefined Character Set

The metacharacters are too wide to match. What should I do if I only want to match characters in a small range?

It's easy to use square brackets. For example, [aeiou] or [.?!], It indicates that only characters in square brackets can be matched.

Note: In this square brackets, the metacharacters of the slash are still metacharacters, but those without the slash prefix are no longer metacharacters, A metacharacter is added, that is, the hyphen -.

 

If the hyphen is between two characters, it indicates the range. The hyphen itself is not counted, for example, [0-9] and [a-z]. If the hyphen is not followed, it indicates that the hyphen is also one of the predefined character sets, such as [* %-];

 

In the predefined character set,Escape characters can be used..

Antonymy)

Sometimes you only need to find characters that do not belong to a simple defined character set. This isAntsense.

Code

Description

\ W

Match any character that is not a letter, number, underline, or Chinese Character

\ S

Match any character that is not a blank character

\ D

Match any non-numeric characters

\ B

Match is not the start or end of a word

[^ X]

Match any character except x

[^ Aeiou]

Match any character except aeiou

Or | Branch

This is equivalent to or statement. The specific method is to use a vertical line | to separate different rules. For example, the expression 0 \ d {2}-\ d {8} | 0 \ d {3}-\ d {7} matches two phone numbers separated by a font size: one is a three-digit area code, an eight-digit Local Code (for example, 010-12345678), a four-digit area code, and a seven-digit local code (0376-2233445 ).

Group)

The so-called grouping is actually using the regular expression in parentheses () as a small matching unit.

Grouping has two functions:

 

By default, the regular expression will assign a group number to each group by the parser. In this way, the subsequent regular expression can reference this group by the group number. Matched content. After a group, you can use metacharacters to represent the number of metacharacters to simplify Regular Expression writing. For example, (\ d {1, 3 }\.) {3} \ d {1, 3} is a simple IP address matching expression.

 

The group followed by the number of metacharacters has a problem: I think several groups are generated, but they are actually a group, because the group number is given when the parser parses it, the string representing the group appears only once in the entire regular expression string, so only one unique group number is assigned. Therefore, after the regular expression is executed, the group matches the content of the last match.

For example:

/(\ D {1, 3} \.) {3} \ d {1, 3}/g.exe c ("201.202.203.204 ");

Result: ["202.203.204", "203."]

The subsequent regular expression uses the group number to reference the content matched by the previous Group. This is calledBackward reference. The group number rule is: from left to right, marked by the Left brackets of the Group, the first group number is 1, and the second is 2, and so on. When referencing, remember to add a slash before the group number.

For example:

/\ B (\ w +) \ B \ s + \ 1 \ B/. test ("hellohello"); // true

/\ B (\ w +) \ B \ s + \ 1 \ B/. test ("hellohell"); // false

 

Grouping Syntax:

Code/syntax

Description

(Exp)

Match exp and capture text to automatically named group

(? Exp)

Match exp and capture the text to the group named name. You can also write (? 'Name' exp)

(? : Exp)

Matches exp, does not capture matched text, and does not assign group numbers to this group

Assertion with Zero Width

A zero-width assertion is used to specify a position like \ B, ^, and $, but this position must satisfy certain conditions. This condition is called assertion. This assertion does not have a group number like a group and does not consume matching strings. Therefore, it is called a zero-width assertion.

Code/syntax

Description

(? = Exp)

There must be content matching exp behind the asserted position;

(? <= Exp)

There must be content matching exp before the asserted position;

(?! Exp)

There must be no matching exp content behind the asserted position

(?

There must be no matching exp content before the asserted position

Note

Use (? # Comment.

Greedy and idle)

A fixed regular expression that matches the entire string or is only part of the entire string. What if a regular expression can match both the entire string and only the part?

/H. * o/. exec ("hello ho"); // ["hello ho"]

/H .*? O/. exec ("helloho"); // ["hello"]

In the above example, the "hello" match is a lazy match, but the greedy match is used without a question mark.

 

Greedy match is to repeat as many characters as possible to match as many characters as possible. Otherwise, the lazy match isMatch as few as possible. By default, the greedy algorithm is used. It is used only after the number of metacharacters (that is, question marks.

Lazy qualifier

Syntax

Description

*?

Repeat any time, but as few as possible

+?

Repeat once or more times, but as few as possible

??

Repeated 0 or 1 times, but as few as possible

{N, m }?

Repeat n to m times, but as few as possible

{N ,}?

Repeated more than n times, but as few as possible

Balance group/recursive match)

All the matching mentioned above are linear. For nested hierarchical structures like (100*(50 + 15), all the above methods are useless. Because you cannot know when parentheses will appear, and what if the numbers of left and right brackets are not the same? How to match the content between the longest pair of parentheses?

The following syntax structure is required:

  • (? 'Group' exp) Name the captured content as a group and press it into the Stack )(? '-Group' exp) from the stack, the capture content named "group" pushed into the stack is displayed. If the stack is empty, the matching of the group fails (? (Group) yes | no) if the capture content named group exists on the stack, continue to match the expression of the yes part; otherwise, continue to match the no part (?!) Assertion with Zero Width and negative direction. Since there is no exp, attempts to match always fail.

    To avoid (and \ (confuse your brain completely, we should replace parentheses with Angle brackets. Now our problem is how to set xx Aa> is the content in the longest pair of angle brackets captured in a string like yy?

    <# Left parenthesis of the outermost layer

    [^ <>] * # The left brackets behind the outermost layer are not the content of the brackets.

    (

    (

    (? 'Open' <) # Open it on the blackboard when you encounter a left bracket"

    [^ <>] * # Match the content not enclosed by brackets

    ) +

    (

    (? '-Open'>) # Run the right parenthesis to erase an "Open"

    [^ <>] * # Match the content not enclosed by brackets

    ) +

    )*

    (? (Open )(?!)) # In front of the outermost right parenthesis, check whether there is any "Open" on the blackboard that has not been erased. If there are still, the match fails.

     

    > # Outer right brackets

    More

    Common escape characters and other metacharacters

    Code/syntax

    Description

    \

    Alarm character (print it to the computer)

    \ B

    It is usually the word boundary, but if it is used in the character class, it indicates the return.

    \ T

    Tab, Tab

    \ R

    Enter

    \ V

    Vertical Tab

    \ F

    Page feed

    \ N

    Line Break

    \ E

    Escape

    \ 0nn

    The octal character of the ASCII code is nn.

    \ Xnn

    Character of the hexadecimal code nn in ASCII code

    \ Unnnn

    Character of the hexadecimal code in Unicode code that is nnnn

    \ CN

    ASCII control characters. For example, \ cC stands for Ctrl + C

    \

    String (similar to ^, but not affected by the option of multi-line processing)

    \ Z

    End or end of a string (not affected by the option of processing multiple rows)

    \ Z

    End of a string (similar to $, but not affected by the option of processing multiple rows)

    \ G

    Start of the current search

    \ P {name}

    The name of a character class in Unicode, for example, \ p {IsGreek}

     

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.