Objective
I think in the online most famous regular related article is "Regular expression 30 minutes introductory Tutorial", to tell the truth this article is indeed my regular introduction, but with the use of skilled, lengthy articles can not meet me, here to do a summary, for quick review.
The following syntax is valid in Java , and most of it should be generic.
Metacharacters
Metacharacters, also known as the character set, is to use some special symbols to represent a particular kind of character or position.
Match character
.
Match any character other than line break
\w
Match letters or numbers or underscores or kanji
\s
Match any of the whitespace characters
\d
Match numbers
Match location
\b
Match the beginning or end of a word
^
Match the start of a string
$
Match the end of a string
\G
End of last match (this match starts)
\A
The beginning of a string (similar ^
, but not affected by the processing of multiline options)
\Z
End of string or end of line (not affected by multi-line processing options)
\z
End of string (similar $
, but not affected by the processing of multiline options)
Repeat
*
Repeat 0 or more times
+
Repeat one or more times
?
Repeat 0 or one time
{n}
Repeat n times
{n,}
Repeat N or more times
{n,m}
Repeat N to M times
Character escapes
Use escape if you want to match some special characters in the meta-character itself or in the regular \
. For example, matching *
this character is used \*
, matching \
this character, using \\
.
Characters to be escaped:,,,,,,,,,,,,, $
(
)
*
+
.
[
]
?
\
^
{
}
|
Character class
A character class is used when a specific character or character set is required to match.
Special characters
\0hh
8 The character represented by the binary value HH
\xhh
16 The character represented by the binary value HH
\uhhhh
16 Binary Value HHHH The Unicode character represented by
\t
Tab
\n
Line break
\r
Carriage return character
\f
Page break
\e
Escape
\cN
ASCII control characters. such as \cC
representingCtrl+C
\p{name}
A character class named name in Unicode, such as\p{IsGreek}
Display
[aeiou]
Match a single vowel character
[.?!]
Matches a given punctuation
Range
[0-9]
Matches the number of 0~9, the same\d
[a-z]
Match all lowercase letters
[a-zA-Z]
Match all letters
[a-z0-9A-Z_\u4E00-\u9FFF]
Equivalent to\w
Anti-righteousness
Represents characters that are not part of a metacharacters or character class
Antisense metacharacters
\W
Match any characters that are not letters, numbers, underscores, kanji
\S
Match any character that is not a whitespace character
\D
Match any non-numeric character
\B
Match a position that is not the beginning or end of a word
Anti-Semantic character class
[^x]
Matches any character except X
[^aeiou]
Matches any character except for the letters AEIOU
Branching conditions
Also called a logical operator, where X
and Y
two expressions are represented
XY
X follows Y
X|Y
Represents x or Y, left-to-right, satisfies the first condition and does not continue to match.
Group
Here I unify the expression as an \w
example:
(\w)
Surrounded by a parenthesis is a whole, representing a grouping
(\w)(\w)
Automatically named groupings, the first parenthesis is group 1, the second parenthesis is group 2
(?‘Word‘\w+))
Indicates that a group called Word
is defined
(?<Word>\w+))
Indicates that a group called Word
is defined
(?:\w+)
Matches exp, does not capture matching text, and does not assign group numbers to this group
Back to reference
The following expression can refer to one of the preceding groupings, \1
denoted as if the value of the grouping 1 is assigned to the \1
variable, which can be referenced anywhere in the following position.
\1
Text that represents a grouping of 1 matches
\k<Word>
Text that represents Word
a grouping match
Match repeat two in English, such as Match Hello Hello
, lei123 lei123
:
(\w+)\s+\1
(?<Word>\w+)\s+\k<Word>
0 Wide assertion (forward and negative)
The 0 wide assertion indicates that matching characters are added with some positioning conditions to make the match more accurate.
\w+(?=ing)
Match ing
multiple characters at the end (excluding ing)
\w+(?!ing)
Match ing
multiple characters that are not at the end
-
(?<=re)\w+
Match re
multiple characters at the beginning (excluding re)
(?<!re)\w+
Match re
multiple characters not with the beginning
-
(?<=\s)\d+(?=\s)
Match numbers with whitespace on both sides, not including white space characters
Greed and laziness
greedy : Match the string as long as possible
Lazy : Match as short a string as possible
The lazy mode is enabled only after the repeating meta-character is added ?
.
*?
Repeat any number of times, but repeat as little as possible
+?
Repeat 1 or more times, but repeat as little as possible
??
Repeat 0 or 1 times, but repeat as little as possible
{n,m}?
Repeat N to M times, but repeat as little as possible
{n,}?
Repeat more than n times, but repeat as little as possible
Processing options
The notation in the expression to enable the majority of patterns, where the regular insertion, from where to enable.
(?i)
: Ignore case (case_insensitive)
(?x)
: Ignore empty characters (COMMENTS)
(?s)
: .
matches any character, including line break (Dotall)
(?m)
: Multiline mode (MULTILINE)
(?u)
: not sensitive to UNICODE character case (unicode_case), you must enable case_insensitive
(?d)
: Only ' \ n ' is considered a line abort (unix_lines)
Balance Group/recursive matching
The balance group is used to match nested hierarchies, which are often used to match HTML tags (when HTML content is not standardized, the starting and ending tags are not the same, matching the correct pair of tags), and the expression is unified as an \w
example.
(?‘group‘\w)
The captured groupings ( \w
matched to) are named and group
pressed into the stack
(?‘-group‘\w)
After capturing the packet ( \w
matched to the content), the stack's group
top content pops up (the last pressed capture), and the stacks are empty, and the matching of the sub-group fails
(?(group)yes|no)
If the group
stack is a non-empty match expression yes
, the matching expressionno
(?!)
0 wide Negative lookahead assertion, because there is no suffix expression, trying to match always fails
Comments
Note syntax: The (?#comment)
content of this syntax is ignored by the regular, and is used to annotate the meaning. Can be placed anywhere in the regular expression.
Source Address: http://www.xiaoleilu.com/regex-guide/
A concise reference to regular expressions