Regular Expression Quick look-up table

Regular Expression Quick look-up table _ Regular expression

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Character

Copy Code code as follows:

X character X
\ backslash Character
\0n with octal value 0 of the character n (0 <= n <= 7)
\0nn with octal value 0 of the character nn (0 <= n <= 7)
\0mnn characters with octal value 0 mnn (0 <= m <= 3, 0 <= n <= 7)
\XHH characters with hexadecimal value of 0x hh
\uhhhh characters with hexadecimal value of 0x HHHH
\ t tab (' \u0009 ')
\ n New Line (newline) character (' \u000a ')
\ r return character (' \u000d ')
\f page feed (' \u000c ')
\a Alarm (Bell) character (' \u0007 ')
\e Escape character (' \u001b ')
\CX corresponds to the control character of X

Character class

Copy Code code as follows:

[ABC] A, B, or C (simple Class)
[^ABC] Any character except A, B, or C (negation)
[A-za-z] A to Z or A to Z, and the letters at both ends are included (range)
[A-d[m-p]] A to D or M to P:[a-dm-p] (and set)
[A-z&&[def]] D, E or F (intersection)
[A-Z&&[^BC]] A to Z, except B and C:[ad-z] (minus)
[A-z&&[^m-p]] A to Z, not M to P:[a-lq-z] (minus)

Predefined character classes

Copy Code code as follows:

. Any character (may or may not match the line terminator)
\d number: [0-9]
\d Non-digit: [^0-9]
\s whitespace characters: [\t\n\x0b\f\r]
\s non-whitespace characters: [^\s]
\w Word characters: [a-za-z_0-9]
\w non-word characters: [^\w]

POSIX character class (Us-ascii only)

Copy Code code as follows:

\p{lower} lowercase alphabetic characters: [A-z]
\p{upper} uppercase characters: [A-z]
\P{ASCII} all ascii:[\x00-\x7f]
\p{alpha} alphabetic characters: [\p{lower}\p{upper}]
\p{digit} decimal digits: [0-9]
\p{alnum} alphanumeric characters: [\p{alpha}\p{digit}]
\P{PUNCT} punctuation:! " #$%& ' () *+,-./:;<=>?@[\]^_ ' {|} ~
\p{graph} visible characters: [\p{alnum}\p{punct}]
\p{print} printable characters: [\p{graph}\x20]
\p{blank} spaces or tabs: [\ t]
\p{cntrl} control character: [\x00-\x1f\x7f]
\p{xdigit} hexadecimal number: [0-9a-fa-f]
\p{space} whitespace characters: [\t\n\x0b\f\r]

Java.lang.Character Class (Simple Java character type)

Copy Code code as follows:

\p{javalowercase} is equivalent to Java.lang.Character.isLowerCase ()
\p{javauppercase} is equivalent to Java.lang.Character.isUpperCase ()
\p{javawhitespace} is equivalent to Java.lang.Character.isWhitespace ()
\p{javamirrored} is equivalent to java.lang.Character.isMirrored ()

Classes for Unicode blocks and categories

Copy Code code as follows:

Characters in \p{ingreek} Greek blocks (simple blocks)
\p{lu} Capital Letter (Simple category)
\P{SC} currency symbol
\p{ingreek} All characters, except in the Greek block (negation)
[\p{l}&&[^\p{lu}]] All letters except uppercase letters (minus)

Boundary Matching Device

Copy Code code as follows:

^ The beginning of a line
$ End of line
\b Word boundaries
\b Non-word boundaries
\a the beginning of the input
\g the end of the previous match
\z the end of the input, only for the last terminator (if any)
End of \z input

Greedy quantity Word

Copy Code code as follows:

X? X, not once or once
X* X, 0 or more times
x+ X, one or more times
X{n} X, exactly n times
X{n,} X, at least n times
X{n,m} X, at least n times, but not more than m times

Reluctant quantity word

Copy Code code as follows:

X?? X, not once or once
X*? X, 0 or more times
X+? X, one or more times
X{n}? X, exactly n times
X{n,}? X, at least n times
X{n,m}? X, at least n times, but not more than m times

Possessive quantity Word

Copy Code code as follows:

x?+ X, once or once there is no
x*+ X, 0 or more times
x + + x., one or more times
x{n}+ X, exactly n times
x{n,}+ X, at least n times
x{n,m}+ X, at least n times, but not more than m times

Logical operator

Copy Code code as follows:

XY X followed by Y
X| Y X or Y
(x) x, as a capturing group

Back reference

Copy Code code as follows:

\ n Any matching nth capture group

Reference

Copy Code code as follows:

\ Nothing, but the following characters are referenced
\q nothing, but references all characters until \e
\e nothing, but ending a reference starting from \q

Special construction (not capture)

Copy Code code as follows:

(?: x) x, as a non-capturing group
(? idmsux-idmsux) Nothing but will match the logo i d m s u x on-off
(? idmsux-idmsux:x) x, as with the given flag I d m s u x on-off
(? =x) X, through a positive lookahead of 0 widths
(?! x) x, through a negative lookahead of 0 widths
(? <=x) X, through a positive lookbehind of 0 widths
(? <! x) x, through a negative lookbehind of 0 widths
(? >x) X, as a separate, non-capturing group

Backslashes, escapes, and references

The backslash character (' \ ') is used to reference the escaped construct, as defined in the previous table, and to refer to other characters that will be interpreted as non-escaped constructs. Therefore, the expression \ \ Matches a single backslash, and \{matches the left parenthesis.

It is wrong to use backslashes before any alphabetic characters that escape constructs are used, and they are reserved for future extensions of regular expression languages. You can use a backslash before a non-alphanumeric character, regardless of whether the character is part of an escaped construct or not.

The backslash in the Java source code string is interpreted as Unicode escape or other character escape, as required by the Java Language specification. Therefore, you must use two backslashes in the string literal to indicate that the regular expression is protected and not interpreted by the Java bytecode compiler. For example, when interpreted as a regular expression, the string literal "\b" matches a single backspace character, and "\\b" matches the word boundary. string literal "\ (hello\)" is illegal and will result in a compile-time error; to match the string (hello), you must use string literal "\ (hello\\)".

Character class

A character class can appear in other character classes, and can contain a set operator (implicit) and an intersection operator (&&). The collection operator represents a class that contains at least one of its operand classes. The intersection operator represents a class that contains all the characters in its two operand classes.

The precedence of the character class operators is as follows, in order from highest to lowest:
Literal value Escape \x
Group [...]
Range A-Z
and set [A-e][i-u]
intersection [A-z&&[aeiou]]
Note that the different sets of metacharacters are actually inside the character class, not outside of the character class. For example, regular expressions. It loses its special meaning inside the character class, and the expression-becomes the range that forms the meta character.

Line Terminator

A line terminator is a sequence of one or two characters that marks the end of the line of the input character sequence. The following code is recognized as a line terminator:

New Lines (newline) (' \ n '),
The carriage return immediately followed by the new line character ("\ r \ n"),
A separate carriage return (' \ R '),
Next line of characters (' \u0085 '),
Row delimiter (' \u2028 ') or
Paragraph separator (' \u2029 ').
If you activate Unix_lines mode, the new line character is the only recognized line terminator.

If the DOTALL flag is not specified, the regular expression. can match any character (except the line terminator).

By default, regular expressions ^ and $ ignore line terminators, matching only the beginning and end of the entire input sequence. If the MULTILINE mode is activated, then ^ matches at the beginning of the entry and after the line terminator (the end of the input). When in MULTILINE mode, $ matches only before the row terminator or at the end of the input sequence.

Groups and captures

Capturing groups can be numbered by counting their open brackets from left to right. For example, in an expression ((A) (B (C)), there are four such groups:
((A) (B (C)))
\a
(B (C))
C
Group 0 always represents an entire expression.

The capture group is named so that each subsequence of the input sequence that matches the groups is saved in the match. The captured subsequence can later be used in an expression by a back reference, or it can be obtained from the match after the matching operation completes.

The capture input associated with a group is always the child sequence that matches the group most recently. If the group is recalculated again for quantification, the value that was previously captured (if any) will be preserved if the second calculation fails, for example, to "ABA" the string with an expression (a (b)) + matches, the second group is set to "B". At the beginning of each match, all captured inputs are discarded.

The group that begins with (?) is a pure, non-capturing group that does not capture text and is not counted for group totals.

The above mentioned is the entire content of this article, I hope you can enjoy.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More