A concise reference to regular expressions

Last Update:2016-01-14 Source: Internet

Author: User

Tags control characters

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objective

I think in the online most famous regular related article is "Regular expression 30 minutes introductory Tutorial", to tell the truth this article is indeed my regular introduction, but with the use of skilled, lengthy articles can not meet me, here to do a summary, for quick review.

The following syntax is valid in Java , and most of it should be generic.

Metacharacters

Metacharacters, also known as the character set, is to use some special symbols to represent a particular kind of character or position.

Match character

.Match any character other than line break
\wMatch letters or numbers or underscores or kanji
\sMatch any of the whitespace characters
\dMatch numbers

Match location

\bMatch the beginning or end of a word
^Match the start of a string
$Match the end of a string
\GEnd of last match (this match starts)
\AThe beginning of a string (similar ^ , but not affected by the processing of multiline options)
\ZEnd of string or end of line (not affected by multi-line processing options)
\zEnd of string (similar $ , but not affected by the processing of multiline options)

Repeat

*Repeat 0 or more times
+Repeat one or more times
?Repeat 0 or one time
{n}Repeat n times
{n,}Repeat N or more times
{n,m}Repeat N to M times

Character escapes

Use escape if you want to match some special characters in the meta-character itself or in the regular \ . For example, matching * this character is used \* , matching \ this character, using \\ .

Characters to be escaped:,,,,,,,,,,,,, $ ( ) * + . [ ] ? \ ^ { }|

Character class

A character class is used when a specific character or character set is required to match.

Special characters

\0hh8 The character represented by the binary value HH
\xhh16 The character represented by the binary value HH
\uhhhh16 Binary Value HHHH The Unicode character represented by
\tTab
\nLine break
\rCarriage return character
\fPage break
\eEscape
\cNASCII control characters. such as \cC representingCtrl+C
\p{name}A character class named name in Unicode, such as\p{IsGreek}

Display

[aeiou]Match a single vowel character
[.?!]Matches a given punctuation

Range

[0-9]Matches the number of 0~9, the same\d
[a-z]Match all lowercase letters
[a-zA-Z]Match all letters
[a-z0-9A-Z_\u4E00-\u9FFF]Equivalent to\w

Anti-righteousness

Represents characters that are not part of a metacharacters or character class

Antisense metacharacters

\WMatch any characters that are not letters, numbers, underscores, kanji
\SMatch any character that is not a whitespace character
\DMatch any non-numeric character
\BMatch a position that is not the beginning or end of a word

Anti-Semantic character class

[^x]Matches any character except X
[^aeiou]Matches any character except for the letters AEIOU

Branching conditions

Also called a logical operator, where X and Y two expressions are represented

XYX follows Y
X|YRepresents x or Y, left-to-right, satisfies the first condition and does not continue to match.

Group

Here I unify the expression as an \w example:

(\w)Surrounded by a parenthesis is a whole, representing a grouping
(\w)(\w)Automatically named groupings, the first parenthesis is group 1, the second parenthesis is group 2
(?‘Word‘\w+))Indicates that a group called Word is defined
(?<Word>\w+))Indicates that a group called Word is defined
(?:\w+)Matches exp, does not capture matching text, and does not assign group numbers to this group

Back to reference

The following expression can refer to one of the preceding groupings, \1 denoted as if the value of the grouping 1 is assigned to the \1 variable, which can be referenced anywhere in the following position.

\1Text that represents a grouping of 1 matches
\k<Word>Text that represents Word a grouping match

Match repeat two in English, such as Match Hello Hello , lei123 lei123 :

(\w+)\s+\1
(?<Word>\w+)\s+\k<Word>

0 Wide assertion (forward and negative)

The 0 wide assertion indicates that matching characters are added with some positioning conditions to make the match more accurate.

\w+(?=ing)Match ing multiple characters at the end (excluding ing)
\w+(?!ing)Match ing multiple characters that are not at the end
(?<=re)\w+Match re multiple characters at the beginning (excluding re)
(?<!re)\w+Match re multiple characters not with the beginning
(?<=\s)\d+(?=\s)Match numbers with whitespace on both sides, not including white space characters

Greed and laziness

greedy : Match the string as long as possible

Lazy : Match as short a string as possible

The lazy mode is enabled only after the repeating meta-character is added ? .

*?Repeat any number of times, but repeat as little as possible
+?Repeat 1 or more times, but repeat as little as possible
??Repeat 0 or 1 times, but repeat as little as possible
{n,m}?Repeat N to M times, but repeat as little as possible
{n,}?Repeat more than n times, but repeat as little as possible

Processing options

The notation in the expression to enable the majority of patterns, where the regular insertion, from where to enable.

(?i): Ignore case (case_insensitive)
(?x): Ignore empty characters (COMMENTS)
(?s): . matches any character, including line break (Dotall)
(?m): Multiline mode (MULTILINE)
(?u): not sensitive to UNICODE character case (unicode_case), you must enable case_insensitive
(?d): Only ' \ n ' is considered a line abort (unix_lines)

Balance Group/recursive matching

The balance group is used to match nested hierarchies, which are often used to match HTML tags (when HTML content is not standardized, the starting and ending tags are not the same, matching the correct pair of tags), and the expression is unified as an \w example.

(?‘group‘\w)The captured groupings ( \w matched to) are named and group pressed into the stack
(?‘-group‘\w)After capturing the packet ( \w matched to the content), the stack's group top content pops up (the last pressed capture), and the stacks are empty, and the matching of the sub-group fails
(?(group)yes|no)If the group stack is a non-empty match expression yes , the matching expressionno
(?!)0 wide Negative lookahead assertion, because there is no suffix expression, trying to match always fails

Comments

Note syntax: The (?#comment) content of this syntax is ignored by the regular, and is used to annotate the meaning. Can be placed anywhere in the regular expression.

Source Address: http://www.xiaoleilu.com/regex-guide/

A concise reference to regular expressions

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More