The construction summary of regular expressions
Construct match
Character
X character X
\ backslash Character
\0n with octal value 0 of the character n (0 <= n <= 7)
\0nn with octal value 0 of the character nn (0 <= n <= 7)
\0mnn characters with octal value 0 mnn (0 <= m <= 3, 0 <= n <= 7)
\XHH characters with hexadecimal value of 0x hh
\uhhhh characters with hexadecimal value of 0x HHHH
\ t tab (' \u0009 ')
\ n New Line (newline) character (' \u000a ')
\ r return character (' \u000d ')
\f page feed (' \u000c ')
\a Alarm (Bell) character (' \u0007 ')
\e Escape character (' \u001b ')
\CX corresponds to the control character of X
Character class
[ABC] A, B, or C (simple Class)
[^ABC] Any character except A, B, or C (negation)
[A-za-z] A to Z or A to Z, and the letters at both ends are included (range)
[A-d[m-p]] A to D or M to P:[a-dm-p] (and set)
[A-z&&[def]] D, E or F (intersection)
[A-Z&&[^BC]] A to Z, except B and C:[ad-z] (minus)
[A-z&&[^m-p]] A to Z, not M to P:[a-lq-z] (minus)
Predefined character classes
. Any character (may or may not match the line terminator)
\d number: [0-9]
\d Non-digit: [^0-9]
\s whitespace characters: [\t\n\x0b\f\r]
\s non-whitespace characters: [^\s]
\w Word characters: [a-za-z_0-9]
\w non-word characters: [^\w]
POSIX character class (Us-ascii only)
\p{lower} lowercase alphabetic characters: [A-z]
\p{upper} uppercase characters: [A-z]
\P{ASCII} all ascii:[\x00-\x7f]
\p{alpha} alphabetic characters: [\p{lower}\p{upper}]
\p{digit} decimal digits: [0-9]
\p{alnum} alphanumeric characters: [\p{alpha}\p{digit}]
\P{PUNCT} punctuation:! " #$%& ' () *+,-./:;<=>?@[\]^_ ' {|} ~
\p{graph} visible characters: [\p{alnum}\p{punct}]
\p{print} printable characters: [\p{graph}\x20]
\p{blank} spaces or tabs: [\ t]
\p{cntrl} control character: [\x00-\x1f\x7f]
\p{xdigit} hexadecimal number: [0-9a-fa-f]
\p{space} whitespace characters: [\t\n\x0b\f\r]
Java.lang.Character Class (Simple Java character type)
\p{javalowercase} is equivalent to Java.lang.Character.isLowerCase ()
\p{javauppercase} is equivalent to Java.lang.Character.isUpperCase ()
\p{javawhitespace} is equivalent to Java.lang.Character.isWhitespace ()
\p{javamirrored} is equivalent to java.lang.Character.isMirrored ()
Classes for Unicode blocks and categories
Characters in \p{ingreek} Greek blocks (simple blocks)
\p{lu} Capital Letter (Simple category)
\P{SC} currency symbol
\p{ingreek} All characters, except in the Greek block (negation)
[\p{l}&&[^\p{lu}]] All letters except uppercase letters (minus)
Boundary Matching Device
^ The beginning of a line
$ End of line
\b Word boundaries
\b Non-word boundaries
\a the beginning of the input
\g the end of the previous match
\z the end of the input, only for the last terminator (if any)
End of \z input
Greedy quantity Word
X? X, not once or once
X* X, 0 or more times
x+ X, one or more times
X{n} X, exactly n times
X{n,} X, at least n times
X{n,m} X, at least n times, but not more than m times
Reluctant quantity Word
X?? X, not once or once
X*? X, 0 or more times
X+? X, one or more times
X{n}? X, exactly n times
X{n,}? X, at least n times
X{n,m}? X, at least n times, but not more than m times
Possessive quantity Word
x?+ X, once or once there is no
x*+ X, 0 or more times
x + + x., one or more times
x{n}+ X, exactly n times
x{n,}+ X, at least n times
x{n,m}+ X, at least n times, but not more than m times
Logical operator
XY X followed by Y
X| Y X or Y
(x) x, as a capturing group
Back reference
\ n Any matching nth capture group
Reference
\ Nothing, but the following characters are referenced
\q nothing, but references all characters until \e
\e nothing, but ending a reference starting from \q
Special Construction (not capture)
(?: x) x, as a non-capturing group
(? idmsux-idmsux) Nothing but will match the logo i d m s u x on-off
(? idmsux-idmsux:x) x, as with the given flag I d m s u x on-off
Non-capturing group (? =x) X, through a positive lookahead of 0 widths
(?! x) x, through a negative lookahead of 0 widths
(? <=x) X, through a positive lookbehind of 0 widths
(? <! x) x, through a negative lookbehind of 0 widths
(? >x) X, as a separate, non-capturing group
-------------------------------------------------
Backslashes, escapes, and references
The backslash character (' \ ') is used to reference the escaped construct, as defined in the previous table, and to refer to other characters that will be interpreted as non-escaped constructs. Therefore, the expression \ \ Matches a single backslash, and \{matches the left parenthesis.
It is wrong to use backslashes before any alphabetic characters that escape constructs are used, and they are reserved for future extensions of regular expression languages. You can use a backslash before a non-alphanumeric character, regardless of whether the character is part of an escaped construct or not.
The backslash in the Java source code string is interpreted as Unicode escape or other character escape, as required by the Java Language specification. Therefore, you must use two backslashes in the string literal to indicate that the regular expression is protected and not interpreted by the Java bytecode compiler. For example, when interpreted as a regular expression, the string literal "\b" matches a single backspace character, and "\\b" matches the word boundary. string literal "\ (hello\)" is illegal and will result in a compile-time error; to match the string (hello), you must use string literal "\ (hello\\)".
Character class
A character class can appear in other character classes, and can contain a set operator (implicit) and an intersection operator (&&). The collection operator represents a class that contains at least one of its operand classes. The intersection operator represents a class that contains all the characters in its two operand classes.
The precedence of the character class operators is as follows, in order from highest to lowest:
1-literal escape \x
2 Grouping [...]
3 Range A-Z
4 and set [A-e][i-u]
5 intersection [A-z&&[aeiou]]
Note that the different sets of metacharacters are actually inside the character class, not outside of the character class. For example, regular expressions. It loses its special meaning inside the character class, and the expression-becomes the range that forms the meta character.
Line Terminator
A line terminator is a sequence of one or two characters that marks the end of the line of the input character sequence. The following code is recognized as a line terminator:
New Lines (newline) (' \ n '),
The carriage return immediately followed by the new line character ("\ r \ n"),
A separate carriage return (' \ R '),
Next line of characters (' \u0085 '),
Row delimiter (' \u2028 ') or
Paragraph separator (' \u2029 ').
If you activate Unix_lines mode, the new line character is the only recognized line terminator.
If the DOTALL flag is not specified, the regular expression. can match any character (except the line terminator).
By default, regular expressions ^ and $ ignore line terminators, matching only the beginning and end of the entire input sequence. If the MULTILINE mode is activated, then ^ matches at the beginning of the entry and after the line terminator (the end of the input). When in MULTILINE mode, $ matches only before the row terminator or at the end of the input sequence.
Groups and captures
Capturing groups can be numbered by counting their open brackets from left to right. For example, in an expression ((A) (B (C)), there are four such groups:
1 ((A) (B (C)))
2 \a
3 (B (C))
4 (C)
Group 0 always represents an entire expression.
The capture group is named so that each subsequence of the input sequence that matches the groups is saved in the match. The captured subsequence can later be used in an expression by a back reference, or it can be obtained from the match after the matching operation completes.
The capture input associated with a group is always the child sequence that matches the group most recently. If the group is recalculated again for quantification, the value that was previously captured (if any) will be preserved if the second calculation fails, for example, to "ABA" the string with an expression (a (b)) + matches, the second group is set to "B". At the beginning of each match, all captured inputs are discarded.
The group that begins with (?) is a pure, non-capturing group that does not capture text and is not counted for group totals.
The above is a small series for everyone to bring the regular expression (the grammar recommended) all the content, I hope that we support the cloud-Habitat Community ~