Basics of getting Started with Java regular expressions

Source: Internet
Author: User
Tags first string

The terminology of regular expressions

1) metacharacters: Non-generic characters, characters with a certain meaning. such as: \bx: \b Boundary character, word starting with X

2) commonly used:

  \d : Matches a number: \d, matches at least one number \d+

  \b : A word boundary character, such as \bhe, matches a word that begins with he, hello, etc.

  \w : Equivalent to "[a-za-z0-9_]". such as \w+ match string 1,2,3a (Bc4,) 5,6 (e) f78 (g) result is

1
2
3a
Bc4
5
6
E
F78
G

  () : Repeated occurrences of the whole string in parentheses, such as: (\d,?) * Can match 1,2,3abc4,5,6ef78 results: 4,5,6 78

  .*? :  . Represents any single character that matches except \ r \ n, * indicates that the character can appear 0~n times

\num : Reverse reference, such as match "222333", match to the final result is: 22, 33

  escape character \ : If you want to match the meta character itself, you need to use the escape character, as follows

res=[1,2,15],res2=[2], regex = " \[(\d,?) *\]": \[matches [

3) Regular expression syntax Daquan

Character

Description

\

Marks the next character as a special character, text, reverse reference, or octal escape. For example, "n" matches the character "n". "\ n" matches the line break. The sequence "\ \" matches "\", "\ (" Match "(".

^

Matches the starting position of the input string. If the Multiline property of the RegExp object is set, ^ will also match the position after "\ n" or "\ r".

$

Matches the position of the end of the input string. If you set the Multiline property of the RegExp object, the $ will also match the position before \ n or \ r.

*

Matches the preceding character or sub-expression 0 or more times. For example, zo* matches "z" and "Zoo". * Equivalent to {0,}.

+

Matches the preceding character or sub-expression one or more times. For example, "zo+" matches "Zo" and "Zoo", but does not match "Z". + equivalent to {1,}.

?

Matches the preceding character or sub-expression 0 or one time. For example, "Do (es)?" Match "Do" in "do" or "does".? Equivalent to {0,1}.

{n}

N is a non-negative integer. Matches exactly N times. For example, "o{2}" does not match "O" in "Bob", but matches two "o" in "food".

{n,}

N is a non-negative integer. Match at least N times. For example, "o{2,}" does not match "O" in "Bob", but matches all o in "Foooood". "O{1,}" is equivalent to "o+". "O{0,}" is equivalent to "o*".

{n,m}

m and n are non-negative integers, where n <= M. Matches at least N times, up to m times. For example, "o{1,3}" matches the first three o in "Fooooood". ' o{0,1} ' is equivalent to ' O? '. Note: You cannot insert a space between a comma and a number.


When this character follows any other qualifier (*, + 、?、 {n}, {n,}, {n,m}), the matching pattern is "non-greedy". The "non-greedy" pattern matches the shortest possible string searched, while the default "greedy" pattern matches the string that is searched for as long as possible. For example, in the string "Oooo", "o+?" Only a single "O" is matched, and "o+" matches All "O".

.

Matches any single character except for "\ r \ n". To match any character that includes "\ r \ n", use a pattern such as "[\s\s]".

(pattern)

Matches the pattern and captures the matched sub-expression. You can use the $0...$9 property to retrieve a captured match from the result "match" collection. To match the bracket character (), use "\ (" or "\)".

(?:pattern)

A subexpression that matches the pattern but does not capture the match, that is, it is a non-capturing match and does not store a match for later use. This is useful for combining pattern parts with the "or" character (|). For example, ' Industr (?: y|ies) is a more economical expression than ' industry|industries '.

(? =pattern)

A subexpression that performs a forward lookahead search that matches the string at the starting point of the string that matches the pattern . It is a non-capture match, that is, a match that cannot be captured for later use. For example, ' Windows (? =95|98| nt|2000) ' Matches Windows 2000 ' in Windows, but does not match Windows 3.1 in Windows. Lookahead does not occupy characters, that is, when a match occurs, the next matching search immediately follows the previous match, rather than the word specifier that makes up the lookahead.

(?! pattern)

A subexpression that performs a reverse lookahead search that matches a search string that is not at the starting point of a string that matches the pattern . It is a non-capture match, that is, a match that cannot be captured for later use. For example, ' Windows (?! 95|98| nt|2000) ' matches Windows 3.1 ' in Windows, but does not match Windows 2000 in Windows. Lookahead does not occupy characters, that is, when a match occurs, the next matching search immediately follows the previous match, rather than the word specifier that makes up the lookahead.

x| y

Match x or y. For example, ' Z|food ' matches ' z ' or ' food '. ' (z|f) Ood ' matches "Zood" or "food".

[XYZ]

Character. Matches any one of the characters contained. For example, "[ABC]" matches "a" in "plain".

[^XYZ]

The reverse character set. Matches any characters that are not contained. For example, "[^abc]" matches "plain" in "P", "L", "I", "N".

[A-Z]

The character range. Matches any character within the specified range. For example, "[A-z]" matches any lowercase letter in the range "a" to "Z".

[^ A-Z]

The inverse range character. Matches any character that is not in the specified range. For example, "[^a-z]" matches any character that is not in the range "a" to "Z".

\b

matches a word boundary, which is the position between the word and the space . For example, "er\b" matches "er" in "never", but does not match "er" in "verb".

\b

Non-word boundary match. "er\b" matches "er" in "verb", but does not match "er" in "Never".

\cx

Matches the control character indicated by x . For example, \cm matches a control-m or carriage return character. The value of x must be between A-Z or a-Z. If this is not the case, then the C is assumed to be the "C" character itself.

\d

numeric character matching . equivalent to [0-9].

\d

Non-numeric character matching. equivalent to [^0-9].

\f

The page break matches. Equivalent to \x0c and \CL.

\ n

Line break matches. Equivalent to \x0a and \CJ.

\ r

Matches a carriage return character. Equivalent to \x0d and \cm.

\s

Matches any whitespace character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].

\s

Matches any non-whitespace character. equivalent to [^ \f\n\r\t\v].

\ t

TAB matches. Equivalent to \x09 and \ci.

\v

Vertical tab matches. Equivalent to \x0b and \ck.

\w

matches any character, including underscores. Equivalent to "[a-za-z0-9_]" .

\w

Matches any non-word character. Equivalent to "[^a-za-z0-9_]".

\xN

Match N, where n is a hexadecimal escape code. The hexadecimal escape code must be exactly two digits long. For example, "\x41" matches "A". "\x041" is equivalent to "\x04" & "1". Allows the use of ASCII code in regular expressions.

\Num

matches num, where num is a positive integer. To capture a matching reverse reference. For example, "(.) \1 "matches two consecutive identical characters.

\N

Identifies an octal escape code or a reverse reference. If there are at least N captured subexpression in front of \n , then n is a reverse reference. Otherwise, if n is an octal number (0-7), then n is the octal escape code.

\nm

Identifies an octal escape code or a reverse reference. If there is at least a nm capture subexpression in front of the \nm , then nm is a reverse reference. If there are at least N captures in front of the \nm , then n is a reverse reference followed by the character M. If neither of the preceding conditions exists, the \nm matches the octal value nm, where n and m are octal digits (0-7).

\NML

When N is an octal number (0-3),m and l are octal numbers (0-7), the octal escape code NMLis matched.

\uN

Matches n, where n is a Unicode character represented by a four-bit hexadecimal number. For example, \u00a9 matches the copyright symbol (©).

Second, Java regular expression-related classes

1) Pattern: Generate related regular expression Rule engine

2) Matcher: Regular expression parsing engine

Look at the use of pattern and Mather

Results:

Business

Time

Amount

"URL"

The Matcher constructor is private and can only get objects through Pattern.matcher.

Mathcer mainly have matches, find, Lookingat

The matches method needs to match to all strings to return true, otherwise return false

Privatestaticvoid testmatches () { = "\\d+"; // true System.out.println (Pattern.matches (Regex, "123abc123")); // false System.out.println (Pattern.matches (Regex, "123abc")); // false }
The Lookingat method matches the first string, and the position of the matched string must be at the front to return true, otherwise false
    Private Static void Testlookingat () {        = "123AA";         = "\\d+";                 = Pattern.compile (regex);         = P.matcher (str);                System.out.println (M.lookingat ()); // true = "aa123";                          = P.matcher (str);        System.out.println (M.lookingat ()); // false    }
The Find method matches to a string that can be anywhere, if it matches to return true, otherwise returns false
    Private Static voidTestfind () {String str= "123AA"; String Regex= "\\d+"; Pattern P=pattern.compile (regex); Matcher m=P.matcher (str); System.out.println (M.find ());//trueStr= "Aa123"; M=P.matcher (str); System.out.println (M.find ());//trueStr= "AA"; M=P.matcher (str); System.out.println (M.find ());//false}

Iii. Online use of tools

http://tool.oschina.net/regex/

Iv. Reference Documents

Http://deerchao.net/tutorials/regex/regex.htm#howtouse

Basics of getting Started with Java regular expressions

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.