Java Regular Expressions pattern and matcher

Source: Internet
Author: User

1. Introduction:
Java. util. RegEx is a class library package that uses regular expressions to customize the pattern to match strings.
It includes two classes: Pattern and matcher pattern. A pattern is the expression mode after a regular expression is compiled.
Matcher A matcher object is a state machine. It checks the string based on the pattern object as the matching mode.
First, a pattern instance customizes a regular expression with a syntax similar to Perl after compilation, then, a matcher instance matches strings under the pattern control of the given pattern instance.
Let's take a look at these two classes:

2. pattern class:
The pattern method is as follows:

Static pattern compile (string RegEx)
Compile the given regular expression and assign it to the pattern class.

Static pattern compile (string RegEx, int flags)
Same as above, but with the flag parameter specified, the optional flag parameters include: Case Insensitive, multiline, dotall, Unicode case, Canon EQ

Int flags ()
Returns the matching flag parameter of the current pattern.
Matcher (charsequence input)

Generate a matcher object with a given name.
Static Boolean matches (string RegEx, charsequence input)
Compile the given regular expression and perform matching on the input string using the regular expression as the modulo. This method is suitable for the regular expression that is used only once, that is, only one matching operation, in this case, you do not need to generate a matcher instance.

String Pattern ()
Returns the regular expression compiled by the patter object.
String [] Split (charsequence input)

Divide the target string according to the regular expression in pattern as a modulo.

String [] Split (charsequence input, int limit)

The purpose of adding the limit parameter is to specify the number of segments to be split. For example, if Limi is set to 2, the target string is split into two segments based on the regular expression.
A regular expression, that is, a string of specific characters, must first be compiled into an instance of the pattern class. This pattern object will use matcher () method to generate a matcher instance. Then, the matcher instance can be used to match the target String Based on the compiled regular expression. Multiple matcher instances can share a pattern object.
Now let's take a look at a simple example and analyze it to learn how to generate a pattern object and compile a regular expression. Finally, we can split the target string based on this regular expression:

Import Java. util. regEx. *; public class replacement {public static void main (string [] ARGs) throws exception {// generate a pattern and compile a regular expression pattern P = pattern. compile ("[/] +"); // use the split () method of pattern to split the string by "/". String [] result = P. split ("Kevin has seen" Leon "seveal times, because it is a good film. "+"/Kevin has watched "this killer is not too cold" several times, because it is a "+" good movie. /Term: Kevin. "); For (INT I = 0; I <result. length; I ++) system. Out. println (result [I]) ;}}

Output result:
Kevin has seen "Leon" seveal times, because it is a good film.
Kevin has watched "this killer is not too cold" several times, because it is a good movie.
Term: Kevin.
Obviously, this program segments the string by "/". We will use the split (charsequence input, int limit) method to specify the number of segments. The program is changed:
Tring [] result = P. split ("Kevin has seen" Leon "seveal times, because it is a good film. /Kevin has watched "this killer is not too cold" several times, because it is a good movie. /Term: Kevin. ", 2 );
The parameter "2" indicates that the target statement is divided into two sections.
The output result is:
Kevin has seen "Leon" seveal times, because it is a good film.
Kevin has watched "this killer is not too cold" several times, because it is a good movie. /Term: Kevin.


From the above example, we can compare Java. util. the implementation method of the RegEx package in constructing the pattern object and compiling the specified regular expression is different from that of the Jakarta-Oro package in completing the same work, the Jakarta-Oro package first constructs a patterncompiler class object and then generates a pattern object. Then, the regular expression uses the patterncompiler class's compile () method to assign the required regular expression compilation to the pattern class:
Patterncompiler orocom = new perl5compiler ();
Pattern pattern = orocom. Compile ("Regular Expressions ");
Patternmatcher = new perl5matcher ();
However, in the Java. util. RegEx package, we only need to generate a pattern class and directly use its compile () method to achieve the same effect:
Pattern P = pattern. Compile ("[/] + ");
Therefore, it seems that the java. util. RegEx constructor is simpler and easier to understand than Jakarta-Oro.

3. matcher class:
The matcher method is as follows:

Matcher appendreplacement (stringbuffer Sb, string replacement)
Replace the current matched substring with the specified substring, and add the substring after the replacement and the string segment after the previous matched substring to a stringbuffer object.
Stringbuffer appendtail (stringbuffer SB)
Add the remaining strings after the last match to a stringbuffer object.
Int end ()
Returns the index position of the original target string for the last character of the matched substring.
Int end (INT Group)
Returns the position of the last character of the substring that matches the specified group in the matching mode.
Boolean find ()
Find the next matched substring in the target string.
Boolean find (INT start)
Reset the matcher object and try to find the next matched substring from the specified position in the target string.
String group ()
Returns the content of all substrings that match the Group obtained by the current query.
String group (INT Group)
Returns the content of the substring that matches the specified group.
Int groupcount ()
Returns the number of matched groups obtained by the current query.
Boolean lookingat ()
Checks whether the target string starts with a matched substring.
Boolean matches ()
Try to expand the matching check for the entire target character, that is, the true value is returned only when the entire target string is completely matched.
Pattern pattern ()
Returns the existing matching mode of the matcher object, that is, the corresponding pattern object.
String replaceall (string replacement)
Replace all substrings in the target string that match the existing mode with the specified string.
String replacefirst (string replacement)
Replace the first substring that matches the existing mode in the target string with the specified substring.
Matcher reset ()
Reset the matcher object.
Matcher reset (charsequence input)
Reset the matcher object and specify a new target string.
Int start ()
Returns the position of the starting character of the substring in the original target string.
Int start (INT Group)
Returns the position of the first character in the original target string of the substring that matches the specified group.
(Is it hard to understand the explanation of the method? Don't worry. It will be easier to understand in the future with examples)

A matcher instance is used to search for the target string based on the existing pattern (that is, the regular expression compiled by a given pattern, all input to matcher is provided through the charsequence interface, so as to support matching of the data provided from a wide range of data sources.
Let's take a look at the usage of each method:
Matches ()/lookingat ()/find ():
A matcher object is generated by a pattern object calling its matcher () method. Once the matcher object is generated, it can perform three different matching search operations:
The matches () method tries to expand the matching check for the entire target character, that is, the true value is returned only when the entire target string is completely matched.
The lookingat () method checks whether the target string starts with a matched substring.
The find () method tries to find the next matched substring in the target string.
All three methods return a Boolean value to indicate whether the operation is successful or not.

Replaceall ()/appendreplacement ()/appendtail ():
The matcher class also provides four methods to replace matched substrings with specified strings:
Replaceall ()
Replacefirst ()
Appendreplacement ()
Appendtail ()
Replaceall () and replacefirst () are easy to use. Please refer to the explanation of the above method.

We mainly focus on the appendreplacement () and appendtail () methods.
Appendreplacement (stringbuffer Sb, string replacement) replaces the current matched substring with the specified string, in addition, the replaced substring and its string segments after the matched substring are added to a stringbuffer object, while appendtail (stringbuffer SB) the method adds the remaining strings after the last matching operation to a stringbuffer object.
For example, if there is a string fatcatfatcatfat and the regular expression pattern is "cat" and appendreplacement (SB, "dog") is called after the first match, then the content of stringbuffer Sb is fatdog, that is, CAT in fatcat is replaced with dog and added to sb before matching the substring. After the second matching, appendreplacement (SB, "dog") is called "), then the content of Sb becomes fatdogfatdog. If you call appendtail (SB) Again, the final content of Sb will be fatdogfatdogfat.
Still a little fuzzy? Let's look at a simple program:

// Change "Kelvin" in the sentence to "Kevin" Import Java. util. regEx. *; public class matchertest {public static void main (string [] ARGs) throws exception {// generate a pattern object and compile a simple regular expression "Kelvin" pattern P = pattern. compile ("Kevin"); // use the matcher () method of the pattern class to generate a matcher object matcher M = P. matcher ("Kelvin Li and Kelvin Chan are both working in Kelvin Chen's kelvinsoftshop company"); stringbuffer sb = new stringbuffer (); int I = 0; // Use Find () method to find the first matched object boolean result = m. find (); // use a loop to locate and replace all Kelvin in the sentence and add the content to sb. While (result) {I ++; M. appendreplacement (SB, "Kevin"); system. out. println ("+ I +" the content of the SB after the second match is: "+ Sb); // continue to find the next matching object result = m. find ();} // finally call the appendtail () method to add the remaining string after the last match to sb; M. appendtail (SB); system. out. println ("call m. the final content of Sb after appendtail (SB) is: "+ sb. tostring ());}}

 

The final output result is:
After 1st matches, the Sb content is: Kevin
After 2nd matches, the Sb content is: Kevin Li and Kevin
After 3rd matching, the Sb content is: Kevin Li and Kevin Chan are both working in Kevin
After 4th matching, the Sb content is: Kevin Li and Kevin Chan are both working in Kevin Chen's Kevin
The final content of Sb after calling M. appendtail (SB) is: Kevin Li and Kevin Chan are both working in Kevin Chen's kevinsoftshop company.
Check whether the above routine is more clear about the use of appendreplacement () and appendtail () methods. If you are not sure, you 'd better write a few lines of code to test it yourself.
Group ()/group (INT group)/groupcount ():
The methods in this series are similar to the matchresult. Group () method in Jakarta-Oro introduced in the previous article. They all return the content of the matched substring. The following code will explain its usage well:

Import Java. util. regEx. *; public class grouptest {public static void main (string [] ARGs) throws exception {pattern P = pattern. compile ("(CA) (t)"); matcher M = P. matcher ("one cat, two cats in the yard"); stringbuffer sb = new stringbuffer (); boolean result = m. find (); system. out. println ("the number of matched groups obtained by this query is:" + M. groupcount (); For (INT I = 1; I <= m }}

 

Output:
The number of matched groups obtained by this query is: 2
The sub-string content in the 1st group is ca.
The sub-string content in the 2nd group is T.
Other methods of the matcher object, for better understanding and limited space, should be programmed and verified by the reader.

4. A small program to check the email address:
Finally, let's look at a routine to check the email address. This program is used to check whether the characters contained in an input email address are legal. Although this is not a complete email address verification program, it cannot test all possible situations, but you can add the required functions on the basis of it if necessary.

Import Java. util. regEx. *; public class email {public static void main (string [] ARGs) throws exception {string input = ARGs [0]; // checks whether the input email address is invalid ". "or" @ "as the starting character pattern P = pattern. compile ("^. | ^ @ "); matcher M = P. matcher (input); If (M // checks whether "www. "Start with P = pattern. compile ("^ www. "); M = P. matcher (input); If (M // checks whether it contains invalid characters P = pattern. compile ("[^ A-Za-z0-9. @_-~ #] + "); M = P. matcher (input); stringbuffer sb = new stringbuffer (); boolean result = m. find (); Boolean deletedillegalchars = false; while (result) {// If an invalid character is found, set deletedillegalchars to true; // if it contains illegal characters, such as double quotation marks with colons, delete them and add them to sb. M. appendreplacement (SB, ""); Result = m. find ();} M. appendtail (SB); input = sb. tostring (); If (deletedillegalchars) {system. out. println ("the entered email address contains invalid characters such as colons and commas. Please modify it"); system. out. println ("your current input is:" + ARGs [0]); system. out. println ("the valid address after modification should be similar to:" + input );}}}

 

For example, we enter the Java email www.kevin@163.net in the command line
The output result will be: the email address cannot start with 'www. '.
If the entered email is @ kevin@163.net
The output is: the email address cannot start with '.' or '@'.
When the input is: cgjmail # $ % @ 163.net
The output is:
The entered email address contains invalid characters such as colons and commas. Modify the email address.
Your current input is: cgjmail #$ % @ 163.net
The valid address after modification should be similar to: cgjmail@163.net

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.