Regular expression quick-check table

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Character
X character X
\ \ backslash Character
\0n characters with octal value 0 N (0 <= n <= 7)
\0nn character nn with octal value 0 (0 <= n <= 7)
\0mnn characters with octal value 0 mnn (0 <= m <= 3, 0 <= n <= 7)
\xhh character hh with hexadecimal value of 0x
\uhhhh characters with a hexadecimal value of 0x HHHH
\ t tab (' \u0009 ')
\ n New Line (newline) character (' \u000a ')
\ r return character (' \u000d ')
\f page Break (' \u000c ')
\a Alarm (Bell) symbol (' \u0007 ')
\e Escape character (' \u001b ')
\cx the control that corresponds to X

Character class
[ABC] A, B or C (simple Class)
[^ABC] Any character except A, B, or C (negation)
[A-za-z] A to Z or A to Z, the letters at both ends are included (range)
[A-d[m-p]] A to D or M to P:[a-dm-p] (set)
[A-z&&[def]] D, E or F (intersection)
[A-Z&AMP;&AMP;[^BC]] A to Z, except B and C:[ad-z] (minus)
[A-z&&[^m-p]] A to Z, not M to P:[a-lq-z] (minus)

Predefined character classes
. Any character (may or may not match the line terminator)
\d number: [0-9]
\d non-numeric: [^0-9]
\s whitespace characters: [\t\n\x0b\f\r]
\s non-whitespace characters: [^\s]
\w Word character: [a-za-z_0-9]
\w non-word characters: [^\w]

POSIX character class (Us-ascii only)
\p{lower} lowercase alphabetic characters: [A-z]
\p{upper} uppercase characters: [A-z]
\P{ASCII} all ascii:[\x00-\x7f]
\p{alpha} alphabetic characters: [\p{lower}\p{upper}]
\p{digit} decimal number: [0-9]
\p{alnum} alphanumeric characters: [\p{alpha}\p{digit}]
\P{PUNCT} punctuation:! " #$%& ' () *+,-./:;<=>[email protected][\]^_ ' {|} ~
\p{graph} visible characters: [\p{alnum}\p{punct}]
\p{print} printable characters: [\p{graph}\x20]
\p{blank} spaces or tabs: [\ t]
\p{cntrl} control characters: [\x00-\x1f\x7f]
\p{xdigit} hex Number: [0-9a-fa-f]
\p{space} white space character: [\t\n\x0b\f\r]

Java.lang.Character Class (Simple Java character type)
\p{javalowercase} is equivalent to Java.lang.Character.isLowerCase ()
\p{javauppercase} is equivalent to Java.lang.Character.isUpperCase ()
\p{javawhitespace} is equivalent to Java.lang.Character.isWhitespace ()
\p{javamirrored} is equivalent to java.lang.Character.isMirrored ()

Classes for Unicode blocks and categories
\p{ingreek} characters in Greek block (simple block)
\P{LU} capital letters (simple category)
\P{SC} currency symbol
\p{ingreek} all characters except (negation) in the Greek block
[\p{l}&&[^\p{lu}]] All letters, except capital letters (minus)

Boundary Matching Device
^ The beginning of the line
End of the $ line
\b Word boundaries
\b Non-word boundaries
\a the beginning of the input
\g the end of the previous match
\z the end of the input, only for the last terminator (if any)
\z End of input

Greedy number of words
X? X, not once or once
X* X, 0 or more times
x+ X, one or more times
X{n} X, exactly n times
X{n,} X, at least n times
X{n,m} X, at least n times, but no more than m times

Reluctant number of words
X?? X, not once or once
X*? X, 0 or more times
X+? X, one or more times
X{n}? X, exactly n times
X{n,}? X, at least n times
X{n,m}? X, at least n times, but not more than m times

Possessive number of words
x?+ X, not once or once
x*+ X, 0 or more times
X + + ×, one or more times
x{n}+ X, exactly n times
x{n,}+ X, at least n times
x{n,m}+ X, at least n times, but no more than m times

Logical operator
XY X followed by Y
X| Y X or Y
(x) x, as capturing group

Back reference
\ n Any matching nth capturing group

Reference
\ Nothing, but references the following characters
\q nothing, but references all characters until \e
\e Nothing, but ends a reference starting from \q

Special construction (non-capture)
(?: x) x, as a non-capturing group
(? idmsux-idmsux) Nothing, but will match the flag i d M s u x on-off
(? idmsux-idmsux:x) x, as with the given flag I d M s u X on-off
(? =x) X, through a 0-width positive lookahead
(?! x) x, through a 0-width negative lookahead
(? <=x) X, through a 0-width positive lookbehind
(? <! x) x, through a 0-width negative lookbehind
(? >x) X, as a standalone, non-capturing group
Backslashes, escapes, and references

The backslash character (' \ ') is used to reference an escaped construct, as defined in the previous table, and also used to refer to other characters that will be interpreted as non-escaped constructs. Therefore, the expression \ \ Matches a single backslash, and \{matches the opening parenthesis.

It is an error to use backslashes before any alphabetic character that does not represent an escaped construct, which is reserved for future extended regular expression languages. You can use a backslash before a non-alphabetic character, regardless of whether the character is part of a non-escaped construct.

According to the requirements of Java Language Specification, the backslash in a string of Java source code is interpreted as Unicode escape or other character escaping. Therefore, you must use two backslashes in string literals to indicate that regular expressions are protected from being interpreted by the Java bytecode compiler. For example, when interpreted as a regular expression, the string literal "\b" matches a single backspace character, while "\\b" matches the word boundary. The string literal "\ (hello\)" is illegal and will result in a compile-time error; to match the string (hello), you must use the string literal "\ \ (hello\\)".

Character class

Character classes can appear in other character classes, and can include the set operator (implicit) and the intersection operator (&&). The set operator represents a class that contains at least all the characters in one of its operand classes. The intersection operator represents a class that contains all the characters in its two operand classes.

The precedence of the character class operator is as follows, in order from highest to lowest:
Literal escaped \x
Group [...]
Range A-Z
Set [A-e][i-u]
intersection [A-z&&[aeiou]]
Note that different sets of meta-characters are actually inside the character class, not outside of the character class. For example, a regular expression. Within the character class, it loses its special meaning, and the expression-becomes the range that forms the metacharacters.

Line Terminator

The line terminator is a sequence of one or two characters that marks the end of the line of the input character sequence. The following code is recognized as a line terminator:

New lines (line break) (' \ n '),
The carriage return immediately following the new line character ("\ r \ n"),
A separate carriage return (' \ R '),
The next line of characters (' \u0085 '),
Line delimiter (' \u2028 ') or
The paragraph delimiter (' \u2029 ').
If Unix_lines mode is activated, the new line character is the only line terminator that is recognized.

If the DOTALL flag is not specified, the regular expression. can match any character except the line terminator.

By default, regular expressions ^ and $ ignore line terminators, which match only the beginning and end of the entire input sequence. If MULTILINE mode is activated, the match occurs after the beginning of the input and after the line terminator (the end of the input). When in MULTILINE mode, $ is matched only before the line terminator or at the end of the input sequence.

Groups and captures

Capturing groups can be numbered by calculating their opening brackets from left to right. For example, in an expression ((A) (B (C))), there are four such groups:
((A) (B (C)))
\a
(B (C))
C
Group 0 always represents an entire expression.

This is why the capturing group is named because in the match, each subsequence of the input sequence that matches these groups is saved. The captured subsequence can later be used in an expression through the back reference, or it can be obtained from the match after the match operation is complete.

The capture input associated with a group is always a sub-sequence that matches the group most recently. If the group is recalculated because of quantization, it retains its previously captured value (if any) on the second calculation failure, for example, the string "ABA" with an expression (a (b)?). + matches, the second group is set to "B". At the beginning of each match, all captured input is discarded.

A group that begins with a (?) is a pure, non-capturing group that does not capture text and does not count against group totals.

Regular expression quick-check table

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Regular expression quick-check table

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Regular expression quick-check table

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support