A concise reference to regular expressions

Source: Internet
Author: User
Tags control characters

Objective

I think in the online most famous regular related article is "Regular expression 30 minutes introductory Tutorial", to tell the truth this article is indeed my regular introduction, but with the use of skilled, lengthy articles can not meet me, here to do a summary, for quick review.

The following syntax is valid in Java , and most of it should be generic.

Metacharacters

Metacharacters, also known as the character set, is to use some special symbols to represent a particular kind of character or position.

Match character

    • .Match any character other than line break
    • \wMatch letters or numbers or underscores or kanji
    • \sMatch any of the whitespace characters
    • \dMatch numbers

Match location

    • \bMatch the beginning or end of a word
    • ^Match the start of a string
    • $Match the end of a string
    • \GEnd of last match (this match starts)
    • \AThe beginning of a string (similar ^ , but not affected by the processing of multiline options)
    • \ZEnd of string or end of line (not affected by multi-line processing options)
    • \zEnd of string (similar $ , but not affected by the processing of multiline options)

Repeat

    • *Repeat 0 or more times
    • +Repeat one or more times
    • ?Repeat 0 or one time
    • {n}Repeat n times
    • {n,}Repeat N or more times
    • {n,m}Repeat N to M times
Character escapes

Use escape if you want to match some special characters in the meta-character itself or in the regular \ . For example, matching * this character is used \* , matching \ this character, using \\ .

Characters to be escaped:,,,,,,,,,,,,, $ ( ) * + . [ ] ? \ ^ { }|

Character class

A character class is used when a specific character or character set is required to match.

Special characters

    • \0hh8 The character represented by the binary value HH
    • \xhh16 The character represented by the binary value HH
    • \uhhhh16 Binary Value HHHH The Unicode character represented by
    • \tTab
    • \nLine break
    • \rCarriage return character
    • \fPage break
    • \eEscape
    • \cNASCII control characters. such as \cC representingCtrl+C
    • \p{name}A character class named name in Unicode, such as\p{IsGreek}

Display

    • [aeiou]Match a single vowel character
    • [.?!]Matches a given punctuation

Range

    • [0-9]Matches the number of 0~9, the same\d
    • [a-z]Match all lowercase letters
    • [a-zA-Z]Match all letters
    • [a-z0-9A-Z_\u4E00-\u9FFF]Equivalent to\w
Anti-righteousness

Represents characters that are not part of a metacharacters or character class

Antisense metacharacters

    • \WMatch any characters that are not letters, numbers, underscores, kanji
    • \SMatch any character that is not a whitespace character
    • \DMatch any non-numeric character
    • \BMatch a position that is not the beginning or end of a word

Anti-Semantic character class

    • [^x]Matches any character except X
    • [^aeiou]Matches any character except for the letters AEIOU
Branching conditions

Also called a logical operator, where X and Y two expressions are represented

    • XYX follows Y
    • X|YRepresents x or Y, left-to-right, satisfies the first condition and does not continue to match.
Group

Here I unify the expression as an \w example:

    • (\w)Surrounded by a parenthesis is a whole, representing a grouping
    • (\w)(\w)Automatically named groupings, the first parenthesis is group 1, the second parenthesis is group 2
    • (?‘Word‘\w+))Indicates that a group called Word is defined
    • (?<Word>\w+))Indicates that a group called Word is defined
    • (?:\w+)Matches exp, does not capture matching text, and does not assign group numbers to this group
Back to reference

The following expression can refer to one of the preceding groupings, \1 denoted as if the value of the grouping 1 is assigned to the \1 variable, which can be referenced anywhere in the following position.

    • \1Text that represents a grouping of 1 matches
    • \k<Word>Text that represents Word a grouping match

Match repeat two in English, such as Match Hello Hello , lei123 lei123 :

    1. (\w+)\s+\1
    2. (?<Word>\w+)\s+\k<Word>
0 Wide assertion (forward and negative)

The 0 wide assertion indicates that matching characters are added with some positioning conditions to make the match more accurate.

    • \w+(?=ing)Match ing multiple characters at the end (excluding ing)
    • \w+(?!ing)Match ing multiple characters that are not at the end
    • (?<=re)\w+Match re multiple characters at the beginning (excluding re)
    • (?<!re)\w+Match re multiple characters not with the beginning
    • (?<=\s)\d+(?=\s)Match numbers with whitespace on both sides, not including white space characters
Greed and laziness

greedy : Match the string as long as possible

Lazy : Match as short a string as possible

The lazy mode is enabled only after the repeating meta-character is added ? .

    • *?Repeat any number of times, but repeat as little as possible
    • +?Repeat 1 or more times, but repeat as little as possible
    • ??Repeat 0 or 1 times, but repeat as little as possible
    • {n,m}?Repeat N to M times, but repeat as little as possible
    • {n,}?Repeat more than n times, but repeat as little as possible
Processing options

The notation in the expression to enable the majority of patterns, where the regular insertion, from where to enable.

    1. (?i): Ignore case (case_insensitive)
    2. (?x): Ignore empty characters (COMMENTS)
    3. (?s): . matches any character, including line break (Dotall)
    4. (?m): Multiline mode (MULTILINE)
    5. (?u): not sensitive to UNICODE character case (unicode_case), you must enable case_insensitive
    6. (?d): Only ' \ n ' is considered a line abort (unix_lines)
Balance Group/recursive matching

The balance group is used to match nested hierarchies, which are often used to match HTML tags (when HTML content is not standardized, the starting and ending tags are not the same, matching the correct pair of tags), and the expression is unified as an \w example.

    • (?‘group‘\w)The captured groupings ( \w matched to) are named and group pressed into the stack
    • (?‘-group‘\w)After capturing the packet ( \w matched to the content), the stack's group top content pops up (the last pressed capture), and the stacks are empty, and the matching of the sub-group fails
    • (?(group)yes|no)If the group stack is a non-empty match expression yes , the matching expressionno
    • (?!)0 wide Negative lookahead assertion, because there is no suffix expression, trying to match always fails
Comments

Note syntax: The (?#comment) content of this syntax is ignored by the regular, and is used to annotate the meaning. Can be placed anywhere in the regular expression.

Source Address: http://www.xiaoleilu.com/regex-guide/

A concise reference to regular expressions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.