Detailed explanation of regular expression syntax

Source: Internet
Author: User
Tags control characters printable characters

As a technology, we often encounter things related to regular expressions. Many times we are busy with the progress. We can find a regular expression on the Internet or use another method to implement it, I have never systematically understood regular expressions. Today I have finally had time and time to look at the regular expression syntax in detail. I don't want to become a master. At least I can use regular expressions to solve problems, or get a regular expression that can at least understand the meaning ......


A regular expression describes one or more strings to be matched when searching text bodies. This expression can be used as a template that matches the character pattern with the string to be searched.

Regular Expressions include common characters (for example, letters between a and z) and special characters (called metacharacters ").

Special characters

--------------------------------------------------------------------------------

The following table contains a list of Single-character metacharacters and their behavior in regular expressions.

Description
To match one of these special characters, you must first escape the character, that is, add a backslash (\) before the character (\). For example, to search for "+" text characters, you can use the expression "\ + ".
 

Metacharacters
Action
Example
 
*
Matches the previous character or subexpression zero or multiple times.

Equivalent to {0 ,}.
Zo * matches "z" and "zoo.
 
+
Match the previous character or subexpression one or more times.

Equivalent to {1 ,}.
Zo + matches "zo" and "zoo", but does not match "z.
 
?
Matches the previous character or subexpression zero or once.

It is equivalent to {0, 1 }.

When? Followed by any other delimiters (*, + ,? , {N}, {n,}, or {n, m}), the matching mode is not greedy. The non-Greedy mode matches the searched strings with as few as possible, while the default greedy mode matches the searched strings with as many as possible.
Zo? It matches "z" and "zo", but does not match "zoo.

O ++? Only matches a single "o" in "oooo", while o + matches all "o.

Do (es )? Matches "do" in "do" or "does.
 
^
Match the start position of the search string. If the flag contains m (multi-line search) characters, ^ matches the position following \ n or \ r.

If ^ is used as the first character in a bracket expression, the character set is reversed.
^ \ D {3} matches the three numbers at the start of the search string.

[^ Abc] matches any character except a, B, and c.
 
$
Match the position at the end of the search string. If the flag contains m (multi-line search) characters, ^ matches the position before \ n or \ r.
\ D {3} $ matches the three numbers at the end of the search string.
 
.
Match any single character except linefeed \ n. To match any character including \ n, use a mode such as [\ s \ S.
A. c matches "abc", "a1c", and "a-c.
 
[]
Mark the start and end of the parentheses expression.
[1-4] matches "1", "2", "3", or "4. [^ AAeEiIoOuU] matches any non-Vowel character.
 
{}
Mark the start and end of a qualifier expression.
A {2, 3} matches "aa" and "aaa.
 
()
Mark the start and end of a subexpression. You can save the subexpression for future use.
A (\ d) matches "A0" to "A9. Save the number for future use.
 
|
Indicates selecting between two or more items.
Z | food Matches "z" or "food. (Z | f) matches "zood" or "food.
 
/
Indicates the start or end of the regular expression mode in JScript. After the second slash (/), add a single character flag to specify the search behavior.
/Abc/gi is a regular expression of JScript text that matches "abc. The g (global) Flag specifies all matching items in the search mode. The I (case-insensitive) flag makes the search case insensitive.
 
\
Mark the next character as a special character, text, reverse reference, or octal escape character.
\ N matches the linefeed. \ (Matches. \ Matches.
 

When most special characters appear in a bracket expression, they lose their meaning and represent common characters. For more information, see "characters in parentheses expressions" in the matching character list ".

Metacharacters

--------------------------------------------------------------------------------

The following table contains a list of Multi-character metacharacters and their behavior in regular expressions.

Metacharacters
Action
Example
 
\ B
Match with a word boundary, that is, the position between the word and the space.
Er \ B matches "er" in "never", but does not match "er" in "verb.
 
\ B
Non-boundary word match.
Er \ B matches "er" in "verb", but does not match "er" in "never.
 
\ D
Match numeric characters.

It is equivalent to [0-9].
In the search string "12 345", \ d {2} matches "12" and "34. \ D matches "1", "2", "3", "4", and "5.
 
\ D
Match non-numeric characters.

It is equivalent to [^ 0-9].
\ D + matches "abc" and "def" in "abc123 def.
 
\ W
Match any of the following characters: A-Z, a-z, 0-9, and underline.

It is equivalent to [A-Za-z0-9 _].
Search for The string "The quick brown fox ..." , \ W + matches "The", "quick", "brown", and "fox.
 
\ W
Match any character except A-Z, a-z, 0-9, and underline.

It is equivalent to [^ A-Za-z0-9 _].
Search for The string "The quick brown fox ..." Medium, \ W + and "…" Matches all spaces.
 
[Xyz]
Character Set. Matches any specified character.
[Abc] matches "a" in "plain.
 
[^ Xyz]
Reverse character set. Matches any unspecified character.
[^ Abc] matches "p", "l", "I", and "n" in "plain.
 
[A-z]
Character range. Matches any character in the specified range.
[A-z] matches any lowercase letter in the range from "a" to "z.
 
[^ A-z]
Reverse character range. Matches any character that is not within the specified range.
[^ A-z] matches any character that is not in the range of "a" to "z.
 
{N}
Exactly match n times. N is a non-negative integer.
O {2} does not match "o" in "Bob", but matches two "o" in "food.
 
{N ,}
Match at least n times. N is a non-negative integer.

* Equal to {0.

+ Is equal to {1.
O {2,} does not match "o" in "Bob", but matches all "o" in "foooood.
 
{N, m}
Match at least n times, at most m times. N and m are non-negative integers, where n <= m. No space is allowed between commas and numbers.

? Equal to {0, 1.
In the search string "1234567", \ d {123} matches "456", "", and "7.
 
(Mode)
Match with the pattern and save the match. You can retrieve saved matches from the array elements returned by exec Method in JScript. To match the parentheses (), use "\ (" or "\)".
(Chapter | Section) [1-9] matches "Chapter 5" and saves "Chapter" for future use.
 
(? : Mode)
Matches the pattern, but does not save the match. That is, the match is not stored for future use. This is useful for components that use the "or" character (|) combination mode.
Industr (? : Y | ies.
 
(? = Mode)
Proactive prediction first. After a match is found, the next match is searched before the match text. No matching items are saved for future use.
^ (? =. * \ D). {} $ apply the following restrictions to the password: It must be between 4 and 8 characters in length and contain at least one number.

In this mode,. * \ d is followed by any number of characters. For the search string "abc3qr", this matches "abc3.

Starting from before (rather than after) the match,. {} matches a string containing 4-8 characters. This matches "abc3qr.

^ And $ specify the start and end positions of the search string. This will block matching when the search string contains any character other than the matching character.
 
(?! Mode)
Negative predictions come first. Match the search string that does not match the pattern. After a match is found, the next match is searched before the match text. No matching items are saved for future use.
\ B (?! Th) \ w + \ B matches words that do not start with "th.

In this mode, \ B matches a word boundary. For the search string "quick", this matches the first space. (?! Th. This matches "qu.

From this match, \ w + matches a word. This matches "quick.
 
\ Cx
Match the control characters indicated by x. The value of x must be within the A-Z or a-z range. If this is not the case, it is assumed that c is the text "c" character itself.
\ CM matches Ctrl + M or a carriage return.
 
\ Xn
Match n, where n is a hexadecimal escape code. The hexadecimal escape code must be exactly two digits long. ASCII code can be used in regular expressions.
\ X41 matches ". \ X041 is equivalent to "\ x04" followed by "1" (because n must be exactly two digits ).
 
\ Num
Matches num. Here, num is a positive integer. This is a reference to saved matches.
(.) \ 1 matches two consecutive identical characters.
 
\ N
Identifies an octal escape code or a reverse reference. If \ n contains at least n capture subexpressions, then n is a reverse reference. Otherwise, if n is an octal number (0-7), n is an octal escape code.
(\ D) \ 1 matches two consecutive identical numbers.
 
\ Nm
Identifies an octal escape code or a reverse reference. If there are at least one capture sub-expression before \ nm, then nm is a reverse reference. If at least n capture subexpressions exist before \ nm, n is a reverse reference, followed by text m. If none of the above conditions exists, when n and m are Octal numbers (0-7), \ nm matches the octal escape code nm.
\ 11 matches the tab.
 
\ Nml
When n is an octal digit (0-3), m and l are octal digits (0-7), match the octal escape code nml.
\ 011 matches the tab.
 
\ Un
Match n, where n is a four-digit hexadecimal Unicode character.
\ U00A9 and copyright symbol (©.
 

Non-printable characters

--------------------------------------------------------------------------------

The following table contains escape sequences that indicate non-printable characters.

Character
Match
Equivalent
 
\ F
Page Break.
\ X0c and \ cL
 
\ N
Line Break.
\ X0a and \ cJ
 
\ R
Carriage return.
\ X0d and \ cM
 
\ S
Any blank characters. It includes spaces, tabs, and page breaks.
[\ F \ n \ r \ t \ v]
 
\ S
Any non-blank characters.
[^ \ F \ n \ r \ t \ v]
 
\ T
Tab character.
\ X09 and \ cI
 
\ V
Vertical tab.
\ X0b and \ cK
 

Priority Order

--------------------------------------------------------------------------------

The regular expression is calculated in a similar way as an arithmetic expression, that is, it is calculated from left to right and follows the priority order.

The following table lists the priority orders of Regular Expression operators from high to low.

Operator
Description
 
\
Escape Character
 
(),(? :),(? =), []
Brackets and brackets
 
*, + ,? , {N}, {n ,}, {n, m}
Qualifier
 
^, $, \ Any metacharacters
Location and Sequence
 
|
Replace
 

The character has a priority higher than the replacement operator. For example, allow "m | food" to match "m" or "food ".


 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.