Java Regular Expressions: The pattern class and the Matcher Class (GO)

Source: Internet
Author: User

Java.util.regex is a class library package that matches strings by using regular expression-ordered patterns. It consists of two classes: pattern and Matcher pattern a pattern is a regular expression that is compiled. Matcher a Matcher object is a state machine that matches a string to a matching pattern based on pattern objects. First, a pattern instance is used to customize the compiled pattern of a similar regular expression with Perl, and then a Matcher instance matches the string in the pattern control of the given pattern instance.

Let's take a look at these two categories as follows:

First, the concept of capturing group

Capturing groups can be numbered from left to right by calculating their opening brackets, numbering starting at 1. For example, in an expression ((A) (B (C))), there are four such groups:

1        (a) (b (c)) 2        (a) 3        (B (c)) 4        (c)

Group 0 always represents an entire expression. A group that begins with a (?) is a pure, non-capturing group that does not capture text and does not count against group totals.

The capture input associated with a group is always a sub-sequence that matches the group most recently. If the group is recalculated because of quantization, it retains its previously captured value (if any) on the second calculation failure, for example, the string "ABA" with an expression (a (b)?). + matches, the second group is set to "B". At the beginning of each match, all captured input is discarded.



Second, detailed pattern class and Matcher class

Java regular expressions are implemented by the pattern class and the Matcher class under the Java.util.regex package (it is recommended to open the Java API documentation when reading this article, and to see the method descriptions in the Java API when it comes to this document).
The pattern class is used to create a regular expression, or it can be said to create a matching pattern, which is constructed privately and cannot be created directly, but can be created by using the Pattern.complie (String regex) Simple factory method to create a regular expression.
Java code example:

Pattern () returns the string form of a regular expression, which is actually the regex parameter that returns Pattern.complile (string regex)

1.pattern.split (charsequence input)

Pattern has a split (Charsequence input) method that separates the string and returns a string[], and I guess String.Split (String regex) is through Pattern.split ( Charsequence input) to achieve this.
Java code example:

Results: str[0]= "My QQ is:" str[1]= "My Phone is:" str[2]= "My mailbox is: [Email protected]"

2.pattern.matcher (string regex,charsequence input) is a static method that is used to quickly match a string, which is suitable for matching only once and matching all strings.

Java code example:

3.pattern.matcher (charsequence input)

Having said so much, it was finally the Matcher class, Pattern.matcher (Charsequence input) returned a Matcher object.
The construction method of the Matcher class is also private and cannot be created at will, only an instance of the class can be obtained through the Pattern.matcher (Charsequence input) method.
The pattern class can only do a few simple matching operations, in order to get stronger and more convenient regular matching operations, it is necessary to work with the pattern and Matcher. The Matcher class provides grouping support for regular expressions and multiple matching support for regular expressions.
Java code example:

4.matcher.matches ()/Matcher.lookingat ()/Matcher.find ()

The Matcher class provides three matching operation methods, three methods return a Boolean type, and return true when matched to, false if not matched to
Matches () matches the entire string and returns true only if the entire string matches.
Java code example:

Pattern p=pattern.compile ("\\d+"); Matcher m=p.matcher ("22bb23"); M.matches ();//returns false because the BB cannot be matched by \d+, causing the entire string match to be unsuccessful. Matcher m2=p.matcher ("2223"); M2.matches ();//returns True because \d+ matches the entire string

Let's look back at Pattern.matcher (String regex,charsequence input), which is equivalent to the following code
Pattern.compile (regex). Matcher (Input). Matches ()

Lookingat () matches the preceding string, only the string that matches to the front returns true
Java code example:

Find () matches a string that matches to a string that can be anywhere.
Java code example:

5.mathcer.start ()/Matcher.end ()/Matcher.group ()

When you perform a match operation using matches (), Lookingat (), find (), you can use the above three methods to get more detailed information.
Start () returns the index position of the substring that matches to the string.
End () returns the index position of the last character in the string that matches the substring.
Group () returns the substring that is matched to
Java code example:

Pattern p=pattern.compile ("\\d+"); Matcher m=p.matcher ("AAA2223BB"); M.find ();//Match 2223 M.start ();//Return 3 m.end ();//Return 7, return 2223 after the index number M.group ();//Return 2223 mathcer m2=m.matcher ("2223BB"); M.lookingat ();   Match 2223 M.start ();   Returns 0, since Lookingat () can only match the preceding string, so when Lookingat () is used, the start () method always returns 0 M.end ();   Returns 4 M.group ();   return 2223 Matcher m3=m.matcher ("2223BB"); M.matches ();   Matches the entire string M.start ();   Return 0, reason to believe everyone also clear m.end ();   Return 6, reason to believe everyone is also clear, because matches () needs to match all strings m.group ();   

Say so much, I believe we all understand the use of the above several methods, it is said that the regular expression of the grouping in Java is how to use.
Start (), End (), group () have an overloaded method they are start (int i), end (int i), group (int i) are dedicated to the grouping operation, and the Mathcer class has a groupcount () to return the number of groups.
Java code example:

Pattern P=pattern.compile ("([a-z]+) (\\d+)"); Matcher m=p.matcher ("AAA2223BB"); M.find ();   Match aaa2223 M.groupcount ();   Returns 2 because there are 2 groups of m.start (1);   Returns 0 returns the first set of matched substrings to the index number in the string M.start (2);   Returns 3 M.end (1);   Returns 3 returns the index position in the string of the last character of the first set of matched substrings. M.end (2);   Returns 7 M.group (1);   Returns the AAA that returns the first set of substrings matched to the substring m.group (2);   

Now let's use a regular matching operation with a slightly higher point, such as a text with a lot of numbers in it, and the numbers are separate, we're now going to take all the numbers out of the text, and it's easy to take advantage of Java's regular operations.
Java code example:

Pattern p=pattern.compile ("\\d+"); Matcher M=p.matcher ("My QQ is: 456456 My phone is: 0532214 my mailbox is: [email protected]"); while (M.find ()) {      

Output:

If you replace the above while () loop with the

while (M.find ()) {      System.out.println (M.group ());      System.out.print ("Start:" +m.start ());      

The output:

Now you should be aware that each time the match operation is performed, start (), End (), group () Three method values change, the information that changes to the substring to match to, and their overloaded methods, will also change to the corresponding information.
Note: Only if the match operation succeeds can you use Start (), End (), group () three methods, otherwise it will throw java.lang.IllegalStateException, that is, when matches (), Lookingat (), Find () can be used if either of the methods returns True.

Transfer from http://www.kaiyuanba.cn/html/1/131/138/7609.htm

When using the Pattern.compile function, you can include parameters that control the matching behavior of the regular expression:  
Pattern pattern.compile (String regex, int flag)  
The value range of flag is as follows:  
Pattern.canon_eq if and only if the two-character normal decomposition (canonical decomposition) is identical, the match is determined. For example, after using this flag, the expression "a\ u030a "will match"? ". By default, "canonical equality (canonical equivalence)" is not considered.  
pattern.case_insensitive (? i) by default, Case-insensitive matching is only available for the US-ASCII character set. This flag allows the expression to ignore the case for matching. To match a Unicode character to an unknown size, simply combine the unicode_case with this flag.  
Pattern.comments (? x) in this mode, the match is ignored (in the regular expression) empty characters (translator Note: Not refers to the expression "\\s", but refers to the expression in the Space, tab, enter and so on). Comments start with #, Until the end of the line. You can enable the Unix line mode by using the embedded flag.  
Pattern.dotall (? s) in this mode, expression '. ' You can match any character, including a Terminator that represents a row. By default, the expression '. ' Terminator.   for mismatched rows;
pattern.multiline 
(? m) in this mode, ' ^ ' and ' $ ' match the start and end of a line, respectively. Also, ' ^ ' still matches the beginning of the string, ' $ ' Also matches the end of the string. By default, these two expressions match only the beginning and end of a string.  
pattern.unicode_case 
(? u) in this mode, if you have also enabled Case_ The insensitive flag, then it matches the Unicode character with a case-insensitive match. By default, case-insensitive matches apply only to the Us-ascii character set.  
Pattern.unix_lines (? d) in this mode, Only ' \ n ' is considered a line abort and is matched with '. ', ' ^ ', and ' $ '.

Java Regular expression: The pattern class and the Matcher Class (go)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.