Regular expression quick-check table

Source: Internet
Author: User
Tags alphabetic character character classes class operator alphanumeric characters

Character x character x\\ backslash character \0n character with octal value 0 N (0<= n<= 7)\0nn character nn with octal value 0 (0 <= n<= 7)\0mnn characters with octal value 0 mnn (0 <= M<= 3, 0<= n<= 7)\xhh character with hexadecimal value of 0x hh\uhhhh character with hexadecimal value 0x hhhh\t tab (' \u0009 ') \ n New Line (newline) character (' \u000a ') \ r carriage return (' \u000d ') The \f (' \u000c ') \a alarm (' \u0007 ') \e escape character (' \u001b ') \cx the control character class corresponding to x [ABC] A, B or C (simple Class) [^ABC] Any characters, except A, B or C (negation) [a-za-z] A to Z or A to Z, the letters at both ends are included (range) [A-d[m-p]] A to D or M to P:[a-dm-p] (and set) [A-z&&[def]] D,    E or F (intersection) [A-Z&&[^BC]] A to Z, except for B and C:[ad-z] (minus) [A-z&&[^m-p]] A to Z, not M to P:[a-lq-z] (minus) predefined character classes.    Any characters (which might or may not match the line terminator) \d number: [0-9]\d non-numeric: [^0-9]\s whitespace character: [\t\n\x0b\f\r]\s non-whitespace character: [^\s]\w word character: [a-za-z_0-9]\w Non-word characters: [^\w] POSIX character class (Us-ascii only) \p{lower} lowercase alphabetic characters: [A-z]\p{upper} uppercase characters: [A-z]\p{ascii} All ascii:[\x00-\x7f]\p{ Alpha} alphabetic characters: [\p{lower}\p{upper}]\p{digit} decimal digits: [0-9]\p{alnum} alphanumeric characters: [\p{alpha}\p{digit}]\p{punct} punctuation:! " #$%& ' () *+,-./:;<=>[Email protected] [\]^_`{|} ~\p{graph} visible character: [\p{alnum}\p{punct}]\p{print} printable character: [\p{graph}\x20]\p{blank} Space or Tab: [\t]\p{cntrl} control character: [\x00- \x1f\x7f]\p{xdigit} hexadecimal number: [0-9a-fa-f]\p{space} white space character: [\t\n\x0b\f\r] Java.lang.Character class (simple Java character type) \p{javalowe Rcase} equivalent to Java.lang.Character.isLowerCase () \p{javauppercase} is equivalent to Java.lang.Character.isUpperCase () \p{  Javawhitespace} equivalent to Java.lang.Character.isWhitespace () \p{javamirrored} is equivalent to java.lang.Character.isMirrored () Unicode Block and Category Class \p{ingreek} Greek block (simple block) characters in \p{lu} Capital letter (Simple category) \P{SC} currency symbol \p{ingreek} All characters except in Greek block (negation) [\p{l}&&amp ;     [^\p{lu}] All letters, except capital letters (minus) bounds match the beginning of the line at the end of the \b Word boundary \b The non-word boundary \a The beginning of the input \g the end of the previous matching \z input, only for the last terminator (if any) \z input    End greedy quantity Word X?    X, once or once also without x* x, 0 or more x+ x, one or more x{n} x, exactly n times x{n,} x, at least n times x{n,m} x, at least n times, but not more than m reluctant number of words X??    X, no x* once or once?    X, 0 times or multiple x+?    X, one or more x{n}?    X, exactly n times x{n,}?    X, at least n times x{n,m}? X, at least n times, but not more thanM-Times possessive number of words x?+ x, once or once also without x*+ x, 0 or more X + + ×, one or more x{n}+ x, exactly n times x{n,}+ X, at least n times x{n,m}+ x, at least n times, but not super M times Logical operator XY X followed by yx|  y x or y (x) x, as the capturing group back references \ n any matching nth capturing group reference \ Nothing, but references the following characters \q none, but references all the characters until the \e\e anything, but ends from \q begins with a reference to a special construct (non-capture) (?: x) x, as a non-capturing group (? idmsux-idmsux) Nothing, but will match the flag i d M s u x on-off (? idmsux-idmsux:x) x as a Given the flag i d M s u x on-off (? =x) x, through the 0 width of the positive lookahead (?! x) x, through the 0 width of the negative lookahead (?<=x) X, through the 0 width of the positive lookbehind (? <! x) x, through the 0 width of the negative lookbehind (?>x) x, as a separate non-capturing group backslash, escape, and reference backslash character (' \ ') is used to reference the escaped construct, as defined in the previous table, and also used to refer to other characters that will be interpreted as non-escaped constructs. Therefore, the expression \ \ Matches a single backslash, and \{matches the opening parenthesis. It is an error to use backslashes before any alphabetic character that does not represent an escaped construct, which is reserved for future extended regular expression languages. You can use a backslash before a non-alphabetic character, regardless of whether the character is part of a non-escaped construct. According to the requirements of Java Language Specification, the backslash in a string of Java source code is interpreted as Unicode escape or other character escaping. Therefore, you must use two backslashes in string literals to indicate that regular expressions are protected from being interpreted by the Java bytecode compiler. For example, when interpreted as a regular expression, the string literal "\b" matches a single backspace character, while "\\b" matches the word boundary. The string literal "\ (hello\)" is illegal and will result in a compile-time error; to match the string (hello), you must use the string literal "\ \ (hello\\)". Character class character classes can appear in other character classes, and can include the set operator (implicit) and the intersection operator (&&). The set operator represents a class that contains at least all the characters in one of its operand classes. The intersection operator represents a class that contains all the characters in its two operand classes. The precedence of the character class operator is as follows, in order from highest to lowest: 1 literal escaped \x2 Group [...] 3 range a-z4 [a-e][i-u]5 intersection [A-z&&[aeiou]] Note that different sets of meta-characters are actually inside the character class, not outside of the character class. For example, a regular expression. Within the character class, it loses its special meaning, and the expression-becomes the range that forms the metacharacters. The line end lines is a sequence of one or two characters that marks the end of the line of the input character sequence. The following code is recognized as the line terminator: The new lines (' \ n ') character, the carriage return immediately following the new line character ("\ r \ n"), the individual carriage return (' \ R '), the next line of characters (' \u0085 '), the line delimiter (' \u2028 '), or the paragraph delimiter (' \u2029 '). If Unix_lines mode is activated, the new line character is the only line terminator that is recognized. If the DOTALL flag is not specified, the regular expression. can match any character except the line terminator. ImpliedIn case, the regular expression ^ and $ ignore the line terminator, and only match the beginning and end of the entire input sequence, respectively. If MULTILINE mode is activated, the match occurs after the beginning of the input and after the line terminator (the end of the input). When in MULTILINE mode, $ is matched only before the line terminator or at the end of the input sequence. Groups and capture capturing groups can be numbered by calculating their opening brackets from left to right. For example, in an expression ((a) (b (c))), there are four such groups: 1 ((a) (b (c))) 2 \a3 (b (c)) 4 (c) Group 0 always represents the entire expression. This is why the capturing group is named because in the match, each subsequence of the input sequence that matches these groups is saved. The captured subsequence can later be used in an expression through the back reference, or it can be obtained from the match after the match operation is complete. The capture input associated with a group is always a sub-sequence that matches the group most recently. If the group is recalculated because of quantization, it retains its previously captured value (if any) on the second calculation failure, for example, the string "ABA" with an expression (a (b)?). + matches, the second group is set to "B". At the beginning of each match, all captured input is discarded. A group that begins with a (?) is a pure, non-capturing group that does not capture text and does not count against group totals.

Regular expression quick-check table

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.