Java Regular expression pattern and matcher detailed

Source: Internet
Author: User
Tags character classes

Java.util.regex is a class library package that matches strings by using regular expression-ordered patterns.

1. Introduction:

Java.util.regex is a class library package that matches strings by using regular expression-ordered patterns.
It consists of two classes: pattern and matcher.

Pattern: A pattern is a compiled representation of a regular expression.

Matcher: A Matcher object is a state machine that matches a string to a matching pattern based on pattern objects.

First, a pattern instance is used to customize the compiled pattern of a similar regular expression with Perl, and then a Matcher instance matches the string in the pattern control of the given pattern instance.

Let's take a look at these two categories as follows:

2.Pattern class:
The pattern method is as follows:

Static pattern compile (String regex)  
  Compiles and assigns the given regular expression to the Pattern class  
static pattern compile ( String regex, int flags)  
 , but increases the flag parameter designation, the optional flag parameter includes: Case Insensitive,multiline,dotall,unicode case, CANON eq 
int flags ()  
  Returns the matching flag parameter for the current pattern.  
Matcher Matcher (charsequence input)  
  generates a given named Matcher object  
Static Boolean matches (String regex, charsequence input)  
  compiles the given regular expression and matches the input string with the regular expression, which is appropriate for the case where the regular expression is used only once, that is, only one match is performed, because in this case it is not necessary to build   into a matcher instance.  
String pattern ()  
  Returns the regular expression compiled by the patter object.  
string[] Split (charsequence input)  
  splits the target string by the regular expression contained in pattern.  
string[] Split (charsequence input, int limit)  
 , add parameter limit to specify the number of segments to be split, such as setting Limi to 2, Then the target string is divided into two segments according to the regular expression.  

A regular expression, which is a string of characters with a certain meaning, must first be compiled into an instance of the pattern class, which will use the Matcher () method to generate an Matcher instance, which can then be used The Matcher instance matches the target string based on a compiled regular expression, and multiple matcher can share a pattern object.

Now let's look at a simple example and then analyze it to see how to generate a pattern object and compile a regular expression, and then split the target string according to the regular expression:

The output is:
Kevin has seen "LEON" seveal times,because It is a good film.
Kevin has seen "this killer is not too cold" several times, because it is a good movie.
noun: Kevin.

Obviously, the program fragments the string by "/".

We then use the split (charsequence input, int limit) method to specify the number of segments of the segment, and the program changes to:
Tring[] result = P.split ("Kevin has seen LEON" Seveal times,because It's a good film./Kevin has seen "this killer is not too cold" several times, because it is a good movie. /noun: Kevin. ", 2);
The parameter "2" In this section indicates that the target statement is divided into two segments.
The output is:
Kevin has seen "LEON" seveal times,because It is a good film.
Kevin has seen "this killer is not too cold" several times, because it is a good movie. /noun: Kevin.

3.Matcher class:  
Matcher method is as follows: Matcher Appendreplacement (StringBuffer SB, string replacement)  
replaces the current matching substring with the specified string. and adds the replaced substring and the string segment that precedes the last matching substring to a StringBuffer object.  

stringbuffer Appendtail (StringBuffer SB) adds the remaining string to a StringBuffer object after the last matching work. int end () returns the index position of the last character of the currently matched substring in the original target string. the int end (int group) returns the position of the last character of the substring that matches the specified group in the matching pattern. The Boolean find () tries to find the next matching substring in the target string. The Boolean find (int start) resets the Matcher object and attempts to find the next matched substring from the specified position in the target string. String Group () returns the contents of all substrings matched by the current lookup string Group (int group) returns the substring content of the current lookup that matches the specified group int groupcount () returns the number of matching groups obtained by the current lookup Amount Boolean Lookingat () detects whether the target string starts with a matching substring. Boolean matches () attempts to expand the match detection for the entire target character, which returns true only if the entire target string matches exactly. The pattern pattern () returns the existing matching pattern for the Matcher object, which is the corresponding pattern object. String ReplaceAll (string replacement) replaces all substrings in the target string that match the existing pattern with the specified string. String Replacefirst (string replacement) replaces the first substring in the target string with a string that matches the existing pattern. Matcher Reset () resets the Matcher object. Matcher Reset (charsequence input) resets the Matcher object and specifies a new target string. int start () returns the position of the start character of the currently found substring in the original target string. the int start (int group) returns the position of the first character in the original target string of the substring obtained by the current lookup and the specified group. 



Is it not very difficult to understand the explanation of the method of light? Don't worry, it will be easier to understand when you combine examples.)


A Matcher instance is used to match a target string based on an existing pattern (that is, a regular expression compiled by a given pattern), and all input to matcher is provided through the Charsequence interface. The purpose of this is to support the matching of data from a diverse source of data.
Let's take a look at the use of each method separately:
★matches ()/lookingat ()/find ():
A Matcher object is generated by invoking its Matcher () method by a Pattern object, and once the Matcher object is generated, it can perform three different matching lookups:
The matches () method attempts to expand the match detection for the entire target character, which returns true only if the entire target string matches exactly.
The Lookingat () method detects whether the target string starts with a matching substring.
The Find () method attempts to find the next matching substring in the target string.
All three of these methods will return a Boolean value to indicate success or not.
★replaceall ()/appendreplacement ()/appendtail ():
The Matcher class also provides four ways to replace a matched substring with a specified string:
ReplaceAll ()
Replacefirst ()
Appendreplacement ()
Appendtail ()
The use of ReplaceAll () and Replacefirst () is simple, see the explanation of the above method. Our main focus is on the appendreplacement () and Appendtail () methods.
Appendreplacement (StringBuffer SB, string replacement) replaces the current matching substring with the specified string. and adds the replaced substring and the string segment before it to the last matching substring to a StringBuffer object, while Appendtail (StringBuffer SB) Method adds the remaining string to a StringBuffer object after the last matching work.
For example, there is a string fatcatfatcatfat, assuming that both the regular expression pattern is "cat", the first match after the call Appendreplacement (SB, "Dog"), then StringBuffer SB's content is Fatdog, That is, the cat in the FatCat is replaced with the dog and the content before the matching substring is added to SB, and the second match calls Appendreplacement (SB, "Dog"), then the content of SB becomes Fatdogfatdog, If the Appendtail (SB) is called again at the end, the final content of SB will be fatdogfatdogfat.

Or is it a little blurry? So let's look at a simple procedure:
This example will change "Kelvin" in the sentence to "Kevin".

import java.util.regex.*; public class matchertest{public static void Main (string[] args ) throws Exception {//Generate pattern object and compile a simple regular expression "Kelvin" Pattern p = pattern.compile ("Kevin");//with the Pattern class Matcher () method to generate a Matcher object Matcher m = P.matcher ("Kelvin Li and Kelvin Chan is both working in Kelvin Chen's Kelvinsoftshop Company" ); StringBuffer sb = new StringBuffer (); int i=0; Use the Find () method to find the first matching object, boolean result = M.find (); Use the loop to find and replace all the Kelvin in the sentence and add the contents to the SB while (result) {i++; M.appendreplacement (SB, "Kevin"); System.out.println ("+i+" after the match after the content of SB is: "+SB); Continue to find the next matching object result = M.find (); }//The Last Call to the Appendtail () method adds the remaining string after the last match to SB; M.appendtail (SB); System.out.println ("Call M.appendtail (SB) after SB's final content is:" + sb.tostring ());}} 


The resulting output is:
The content of SB after the 1th match is: Kevin
After the 2nd match, the content of SB is: Kevin Li and Kevin
The 3rd match after SB's content is: Kevin Li and Kevin Chan is both working in Kevin
The 4th match after SB's content is: Kevin Li and Kevin Chan is both working in Kevin Chen's Kevin
The final content of the call to M.appendtail (SB) After SB is: Kevin Li and Kevin Chan is both working in Kevin Chen's Kevinsoftshop company.
See if this routine above appendreplacement (), Appendtail () Two methods of the use of more clearly, if still not very sure it is best to write a few lines of code to test it.


★group ()/group (int group)/groupcount ():
This series of methods is similar to the Matchresult. Group () method in the Jakarta-oro described in the previous article (for Jakarta-oro refer to the previous article), to return the substring content that matches the group, and the following code will explain its usage well:



The output is:
The number of matching groups that this lookup obtains is: 2
The sub-strings for group 1th are: CA
The sub-strings in group 2nd are: T
Other methods of Matcher objects are well understood and, due to their limited space, the reader is programmed to authenticate themselves.


4. A small program that verifies the email address:
Finally, let's look at a routine that verifies the email address, which is used to verify that the characters contained in an incoming email address are legitimate, although this is not a complete email address verification program, it does not verify all the possible scenarios, but you can add the required functionality on top of it if necessary.

import java.util.regex.*; public Class e-mail {public static void main (string[] args) throw s Exception {String input = args[0];//detects if the e-mail address entered is an illegal symbol "." or "@" as the starting character, Pattern p = pattern.compile ("^.| ^@"); Matcher m = p.matcher (input); if (m//detection with "www.") is the starting P = pattern.compile ("^www."); m = p.matcher (input); if (m//detect contains illegal characters P = pattern.compile ("[^[email protected]_-~#]+"); m = P.matcher (input); StringBuffer sb = new StringBuffer (); Boolean result = M.find (); Boolean deletedillegalchars = false; while (result) {//If an illegal character is found then set the tag Deletedillegalchars = true;//If it contains illegal characters such as colon double quotes etc., then remove them and add them to SB. M.appendreplacement (SB, ""); result = M.find (); } m.appendtail (SB); input = Sb.tostring (); if (deletedillegalchars) {System.out.println ("The email address entered contains a colon, a comma and other illegal characters, please modify"); System.out.println ("Your present input is:" +args[0]); SYSTEM.OUT.PRINTLN ("The modified legal address should resemble:" +input);}}} 



For example, we type in the command line: Java email [email protected]
Then the output will be: Email address cannot be ' www. ' Starting
If the email you entered is @[email protected]
The output is: Email address cannot be '. ' or ' @ ' as the starting character
When input is: cgjmail#$%@163.net
Then the output is:
The email address you entered contains illegal characters such as colons, commas, etc., please modify
Your current input is: cgjmail#$%@163.net
The modified legal address should resemble the following: [email protected]

5. Regular expression rules:

http://blog.csdn.net/u011225629

character     
x character x 
\ \ backslash character  
\0n octal value character 0n (0 <= N <= 7)  
\0nn octal value characters 0nn (0 <= N <= 7)  
\0mnn octal value characters 0mnn 0mnn (0 <= m <= 3, 0 <= n <= 7)  
\ xhh character 0xhh  of hexadecimal value,
\uhhhh character 0xhhhh  of hexadecimal value,
\ t tab (' \u0009 ')  
\ n line break (' \u000a ')  
\ r Carriage return (' \u000d ')  
\f page Break (' \u000c ')  
\a Bell character (' \u0007 ')  
\e escape character (' \u001b ')  
\cx t corresponds to the control character of X x& nbsp
  
character class  
[ABC] A, B, or C (simple Class)  
[^ABC] any character except A, B, or C (negation)  
[A-za-z] A to Z or A to Z, containing (range)  
[A-Z-[BC]] A to Z, except for B and C: [Ad-z] (minus)  
[A-z-[m-p]] A to Z, except m to P: [A-lq-z]  
[A-z-[^def]] D, E, or f 
Note:
The regular expression "t[aeio]n" of the square brackets matches only "tan", "Ten", "Tin" and "ton", only a single character.
parentheses, because the square brackets allow only a single character to be matched, so the parentheses "()" are used when matching multiple characters. For example, using the "T (A|e|i|o|oo) n" Regular expression, you must use parentheses.

Pre-defined character classes
. Any character (may be able to match the line terminator, perhaps not) Note: The period symbol represents any one character. For example: The expression is "T.N", it matches "tan", "ten", "Tin" and "ton", also matches "t#n", "TPN" and even "T n".
\d number: [0-9]
\d non-numeric: [^0-9]
\s whitespace: [\t\n\x0b\f\r]
\s non-whitespace: [^\s]
\w Word character: [a-za-z_0-9]
\w non-word characters: [^\w]

Symbol of the number of expressions
Number of symbols
* 0 or more times
+ 1 or more times
? 0 or 1 times
{n} exactly n times
{N,m} from N-Times to M-Times

This paper draws on: http://www.jb51.net/article/17943.htm

Http://www.cnblogs.com/playing/archive/2011/03/15/1984943.html

Copyright notice: I feel like I'm doing a good job. I hope you can move your mouse and keyboard for me to order a praise or give me a comment, under the Grateful!_____________________________________________________ __ Welcome reprint, in the hope that you reprint at the same time, add the original address, thank you with

Java Regular expression pattern and matcher detailed

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.