Java regular expression syntax
Regular expression syntax
A regular expression is a text pattern that includes common characters (for example, letters between A and Z) and special characters (referred to as "metacharacters "). Mode description one or more strings to be matched when searching text.
Regular Expression example
Expression matching
/^ \ S * $/
Matches empty rows.
/\ D {2}-\ D {5 }/
The ID number consists of two digits, one hyphen and five digits.
/<\ S * (\ s +) (\ s [^>] *)?> [\ S] * <\ s * \/\ 1 \ s *>/
Matches HTML tags.
The following table contains a complete list of metacharacters and their behavior in the context of a regular expression:
Character Description
\
Mark the next character as a special character, text, reverse reference, or octal escape character. For example, "N" matches the character "N ". "\ N" matches the line break. The sequence "\" matches "\", and "\ (" matches "(".
^
Match the start position of the input string. If the multiline attribute of the Regexp object is set, ^ matches the position after "\ n" or "\ r.
$
Matches the position at the end of the input string. If the multiline attribute of the Regexp object is set, $ also matches the position before "\ n" or "\ r.
*
Matches the previous character or subexpression zero or multiple times. For example, Zo * matches "Z" and "Zoo ". * Is equivalent to {0 ,}.
+
Match the previous character or subexpression one or more times. For example, "zo +" matches "zo" and "Zoo", but does not match "Z. + Is equivalent to {1 ,}.
?
Matches the previous character or subexpression zero or once. For example, "Do (ES )?" Match "do" in "do" or "does ".? It is equivalent to {0, 1 }.
{N}
N is a non-negative integer. Exactly match n times. For example, "o {2}" does not match "O" in "Bob", but matches two "O" in "food.
{N ,}
N is a non-negative integer. Match at least N times. For example, "o {2,}" does not match "O" in "Bob", but matches all o in "foooood. "O {1,}" is equivalent to "O + ". "O {0,}" is equivalent to "O *".
{N, m}
M and n are non-negative integers, where n <= m. Match at least N times, at most m times. For example, "O {1, 3}" matches the first three o in "fooooood. 'O {0, 1} 'is equivalent to 'o? '. Note: you cannot insert spaces between commas and numbers.
?
When this character is followed by any other qualifier (*, + ,? , {N}, {n ,}, {n, m}), the matching mode is "non-greedy ". The "non-greedy" Mode matches the searched strings as short as possible, while the default "greedy" Mode matches the searched strings as long as possible. For example, in the string "oooo", "O + ?" Only one "O" is matched, and "O +" is matched with all "O ".
.
Match any single character except "\ n. To match any character including "\ n", use a mode such as "[\ s.
(Pattern)
Match pattern and capture the matched child expression. You can use $0... $9 the property retrieves the captured match from the result "match" set. To match the parentheses (), use "\ (" or "\)".
(? : Pattern)
A child expression that matches pattern but does not capture the match. That is, it is a non-capturing match and is not stored for future use. This is useful for components that use the "or" character (|) combination mode. For example, 'industr (? : Y | ies) is a more economical expression than 'industry | industries.
(? = Pattern)
Execute a Forward prediction subexpression to search for a string that matches the start point of the string that matches the pattern. It is a non-capture match, that is, it cannot be captured for future use. For example, 'windows (? = 95 | 98 | nt | 2000) 'matches "Windows" in "Windows2000", but does not match "Windows" in "windows3.1 ". Prediction first does not occupy characters, that is, after a match occurs, the next matched search follows the previous match, rather than after the characters that constitute prediction first.
(?! Pattern)
Execute the subexpression of reverse prediction first search, which matches the search string that is not at the start point of the string that matches pattern. It is a non-capture match, that is, it cannot be captured for future use. For example, 'windows (?! 95 | 98 | nt | 2000) 'matches "Windows" in "windows3.1", but does not match "Windows" in "Windows2000 ". Prediction first does not occupy characters, that is, after a match occurs, the next matched search follows the previous match, rather than after the characters that constitute prediction first.
X | y
Match X or Y. For example, 'z | food' matches "Z" or "food ". '(Z | f) Ood' matches "zood" or "food ".
[Xyz]
Character Set. Match any character. For example, "[ABC]" matches "A" in "plain ".
[^ XYZ]
Reverse character set. Match any character that is not included. For example, "[^ ABC]" matches "P" in "plain ".
[A-Z]
Character range. Matches any character in the specified range. For example, "[A-Z]" matches any lowercase letter in the range of "A" to "Z.
[^ A-Z]
Reverse range character. Matches any character that is not within the specified range. For example, "[^ A-Z]" matches any character that is not in the range of "A" to "Z.
\ B
Match A Word boundary, that is, the position between the word and the space. For example, "Er \ B" matches "er" in "never", but does not match "er" in "verb ".
\ B
Non-word boundary match. "Er \ B" matches "er" in "verb", but does not match "er" in "never ".
\ CX
Match the control characters indicated by X. For example, \ cm matches control-M or carriage return. The value of X must be between the A-Z or a-Z. If this is not the case, it is assumed that C is the "c" character itself.
\ D
Match numeric characters. It is equivalent to [0-9].
\ D
Match non-numeric characters. It is equivalent to [^ 0-9].
\ F
Match the page feed. It is equivalent to \ x0c and \ Cl.
\ N
Line feed match. It is equivalent to \ x0a and \ CJ.
\ R
Match a carriage return. It is equivalent to \ x0d and \ cm.
\ S
Matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [\ f \ n \ r \ t \ v.
\ S
Match any non-blank characters. It is equivalent to [^ \ f \ n \ r \ t \ v.
\ T
Tab matching. It is equivalent to \ x09 and \ CI.
\ V
Vertical tab matching. It is equivalent to \ x0b and \ ck.
\ W
Matches any character type, including underscores. Equivalent to [A-Za-z0-9.
\ W
Matches any non-word character. Equivalent to [^ A-Za-z0-9.
\ XN
Match n, where n is a hexadecimal escape code. The hexadecimal escape code must be exactly two digits long. For example, "\ x41" matches "". "\ X041" is equivalent to "\ x04" & "1. ASCII code can be used in regular expressions.
\ Num
Matches num. Here, num is a positive integer. To capture matched reverse references. For example, "(.) \ 1" matches two consecutive identical characters.
\ N
Identifies an octal escape code or a reverse reference. If \ n contains at least N capture subexpressions, then n is a reverse reference. Otherwise, if n is an octal number (0-7), n is an octal escape code.
\ Nm
Identifies an octal escape code or a reverse reference. If there are at least one capture sub-expression before \ nm, then nm is a reverse reference. If there are at least N captures before \ nm, n is a reverse reference, followed by a character M. If neither of the preceding conditions exists, \ nm matches the octal value of nm, where n and m are Octal numbers (0-7 ).
\ NML
When N is the octal number (0-3), M and l are the Octal numbers (0-7), match the octal escape code NML.
\ UN
Match n, where n is a four-digit hexadecimal UNICODE character. For example, \ u00a9 matches the copyright symbol (?).