Java Regular expression pattern and matcher_ practical skills

Source: Internet
Author: User
Tags stringbuffer
1. Introduction:
Java.util.regex is a class library package that uses patterns that are customized by regular expressions to match strings.
It consists of two classes: pattern and matcher patterns, which is a regular expression of a compiled representation.
Matcher a Matcher object is a state machine that matches the string expansion matching check based on pattern objects.
First, a pattern instance is used to order a compiled schema of a similar regular expression in Perl, followed by a Matcher instance that matches the string under the mode control of the given pattern instance.
Let's take a look at these two classes separately:
2.Pattern class:
The pattern method is as follows: Static pattern compile (String regex)
Compiles and assigns the given regular expression to the pattern class
static pattern compile (String regex, int flags)
Ditto, but add the flag parameter designation, optional flag parameters include: Case insensitive,multiline,dotall,unicode case, CANON EQ
int flags ()
Returns the matching flag parameter for the current pattern.
Matcher Matcher (charsequence input)
Generates a given named Matcher object
Static Boolean matches (String regex, charsequence input)
Compiles a given regular expression and matches the input string with the regular expression, which is appropriate for the case where the regular expression is used only once, that is, only one match is performed, because there is no need to generate a Matcher instance.
String pattern ()
Returns the regular expression compiled by the patter object.
String[] Split (charsequence input)
The target string is segmented according to the regular expression contained in the pattern.
String[] Split (charsequence input, int limit)
function above, add parameter limit purpose is to specify the number of segments, such as the Limi set to 2, then the target string will be divided according to the regular expression cut to two paragraphs.
A regular expression, a string of characters of a particular meaning, must first be compiled into an instance of a pattern class that will use the Matcher () method to generate a Matcher instance, and then you can use the The Matcher instance matches the target string based on the compiled regular expression, and multiple matcher can share a pattern object.
Now let's look at a simple example and then analyze it to see how to generate a pattern object and compile a regular expression, and then split the target string according to the regular expression:
Copy Code code as follows:

Import java.util.regex.*;
public class replacement{
public static void Main (string[] args) throws Exception {
Generate a pattern and compile a regular expression at the same time
Pattern p = pattern.compile ("[/]+");
Split the string by "/" using the pattern Split () method
String[] result = P.split (
"Kevin has seen LEON" Seveal times,because It is a good film. "
+ "/Kevin has seen" this killer is not too cold "several times, because it is a"
+ "good movie." /noun: Kevin. ");
for (int i=0; i<result.length; i++)
System.out.println (Result[i]);
}
}

The output results are:
Kevin has seen "LEON" seveal times,because It is a good film.
Kevin has seen "this killer is not too cold" several times, because it is a good film.
noun: Kevin.
Obviously, the program segments the string by "/", and we use the split (charsequence input, int limit) method to specify the number of segments in the segment, and the program changes to:
Tring[] result = P.split ("Kevin has seen LEON" Seveal times,because It is a good film./Kevin has seen "this killer is not too cold" several times, because it is a good film. /noun: Kevin. ", 2);
The parameter "2" here indicates that the target statement is divided into two segments.
The output result is:
Kevin has seen "LEON" seveal times,because It is a good film.
Kevin has seen "this killer is not too cold" several times, because it is a good film. /noun: Kevin.
From the example above, we can compare the difference between the Java.util.regex package's implementation of the Pattern object and the compilation of the specified regular expression and the Jakarta-oro package we described in the previous article, Jakarta-oro The package first constructs a Patterncompiler class object and then generates a pattern object, and then the regular expression is used by the compile () method of the Patterncompiler class to assign the required regular expression to the pattern class:
Patterncompiler orocom=new Perl5compiler ();
Pattern Pattern=orocom.compile ("REGULAR EXPRESSIONS");
Patternmatcher matcher=new Perl5matcher ();
But in the Java.util.regex package, we only need to generate a pattern class, the direct use of its compile () method can achieve the same effect:
Pattern p = pattern.compile ("[/]+");
So it seems that Java.util.regex's construction method is simpler and easier to understand than Jakarta-oro.
3.Matcher class:
The Matcher method is as follows: Matcher appendreplacement (StringBuffer SB, String replacement)
Replaces the current matching substring with the specified string and adds the replacement substring and the string segment that precedes the last matching substring to a StringBuffer object.
StringBuffer Appendtail (StringBuffer SB)
Adds the remaining string to a StringBuffer object after the last matching job.
int End ()
Returns the index position of the last character of the currently matched substring in the original target string.
int end (int group)
Returns the position of the last character of a substring that matches the group specified in the match pattern.
Boolean find ()
Attempts to find the next matching substring in the target string.
Boolean find (int start)
Resets the Matcher object and attempts to find the next matching substring in the target string, starting at the specified location.
String Group ()
Returns the contents of all substrings that are obtained from the current lookup and that match the group
String Group (int group)
Returns the contents of a substring that matches the specified group for the current lookup.
int GroupCount ()
Returns the number of matching groups that are available for the current lookup.
Boolean Lookingat ()
Detects whether the target string starts with a matching substring.
Boolean matches ()
Attempts to expand the match detection for the entire target character, which means that the true value is returned only if the entire target string matches exactly.
Pattern pattern ()
Returns the existing matching pattern for the Matcher object, which is the corresponding patterns object.
String ReplaceAll (String replacement)
Replaces all substrings that match the existing pattern with the specified string in the target string.
String Replacefirst (String replacement)
Replaces the first substring in the target string with the specified string as the one that matches the existing pattern.
Matcher Reset ()
Resets the Matcher object.
Matcher Reset (charsequence input)
Resets the Matcher object and specifies a new target string.
int start ()
Returns the position of the start character of the current lookup substring in the original destination string.
int start (int group)
Returns the position of the first character in the original destination string for the substring obtained by the current lookup and the specified group match.
(The explanation of the method is not very difficult to understand?) Don't worry, it will be easier to understand when you combine the examples.
A Matcher instance is used to match the target string based on the existing schema (i.e., a regular expression compiled by a given pattern), and all input to the matcher is provided through the Charsequence interface, This is done to support the matching of data that is provided from a diverse data source.
Let's look at the use of each method separately:
★matches ()/lookingat ()/find ():
A Matcher object is generated by a pattern object calling its Matcher () method, and once the Matcher object is generated, it can perform three different matching lookup operations:
The matches () method attempts to expand the match detection for the entire target character, that is, the true value is returned only if the entire target string matches exactly.
The Lookingat () method detects whether the target string starts with a matching substring.
The Find () method attempts to locate the next matching substring in the target string.
All three of these methods will return a Boolean value to indicate success or failure.
★replaceall ()/appendreplacement ()/appendtail ():
The Matcher class also provides four ways to replace a matching substring with a specified string:
ReplaceAll ()
Replacefirst ()
Appendreplacement ()
Appendtail ()
The use of ReplaceAll () and Replacefirst () is simple, please see the explanation of the above method. Our main focus is on the appendreplacement () and Appendtail () methods.
Appendreplacement (StringBuffer SB, string replacement) replaces the current matching substring with the specified string. and adds the replacement substring and the string segment that precedes the last matching substring to a StringBuffer object, and Appendtail (StringBuffer SB) Method adds the remaining string to a StringBuffer object after the last matching job.
For example, there is a string fatcatfatcatfat, assuming that both the regular expression pattern is "cat", the first match after the call Appendreplacement (SB, "Dog"), then StringBuffer SB's content is Fatdog, That is, the cat in FatCat is replaced with the dog and the contents of the matching substring are added to SB, and the second match calls Appendreplacement (SB, "Dog"), then SB's content becomes Fatdogfatdog, If the last call to Appendtail (SB), then the final content of SB will be fatdogfatdogfat.
Or is it a little blurry? So let's take a look at a simple program:
This example will change the sentence "Kelvin" to "Kevin"
Import java.util.regex.*;
public class matchertest{
public static void Main (string[] args)
Throws Exception {
Generates pattern objects and compiles a simple regular expression "Kelvin"
Pattern p = pattern.compile ("Kevin");
Generates a Matcher object using the Matcher () method of the Pattern class
Matcher m = P.matcher ("Kelvin Li and Kelvin Chan are both in working, Chen ' s Kelvin Company");
StringBuffer sb = new StringBuffer ();
int i=0;
Finds the first matching object using the Find () method
Boolean result = M.find ();
Use loops to find and replace all the Kelvin in a sentence and add the contents to SB
while (result) {
i++;
M.appendreplacement (SB, "Kevin");
System.out.println ("The" "+i+" after the second match SB's content is: "+SB);
Continue to find next matching object
result = M.find ();
}
Finally, the Appendtail () method is invoked to add the remaining string after the last match to SB;
M.appendtail (SB);
System.out.println ("Call M.appendtail (SB) After the end of SB's content is:" + sb.tostring ());
}
}
The final output results are:
1th time after the match SB's content is: Kevin
2nd time after the match SB's content is: Kevin Li and Kevin
After the 3rd time, SB's content is: Kevin Li and Kevin Chan are both working in Kevin
4th time after the match SB's content is: Kevin Li and Kevin Chan are both working in Kevin Chen's Kevin
After calling M.appendtail (SB), SB's final content is: Kevin Li and Kevin Chan are both working in Kevin Chen's Kevinsoftshop company.
See if the above routine is more clear about the use of appendreplacement (), Appendtail () Two methods, if it is still not very sure the best to write a few lines of code to test.
★group ()/group (int group)/groupcount ():
This series of methods is similar to the Matchresult. Group () method in the Jakarta-oro that we described in the previous article (refer to Jakarta-oro for reference) to return the substring that matches the group, and the following code will explain its usage well:
Import java.util.regex.*;
public class grouptest{
public static void Main (string[] args)
Throws Exception {
Pattern p = pattern.compile ("(CA) (t)");
Matcher m = P.matcher ("One cat,two cats in the Yard");
StringBuffer sb = new StringBuffer ();
Boolean result = M.find ();
System.out.println ("The number of matching groups for this lookup is:" +m.groupcount ());
for (int i=1;i<=m
}
}
The output is:
The number of matching groups found for this second lookup is: 2
The 1th set of substring contents is: CA
The 2nd set of substring contents is: t
Other methods of Matcher objects are better understood and, due to space limitations, readers should be programmed to authenticate themselves.
4. A small program to check email address:
Finally, let's look at a routine that examines email addresses, the program is used to verify that an input email address contains the characters are legitimate, although this is not a complete email address inspection program, it can not verify all the possible cases, but if necessary, you can add the required functionality on the basis.
Copy Code code as follows:

Import java.util.regex.*;
public class Email {
public static void Main (string[] args) throws Exception {
String input = args[0];
Check if the email address you entered is illegal symbol "." or "@" as the starting character
Pattern p = pattern.compile ("^.| ^@");
Matcher m = p.matcher (input);
if (m
Check to see if "www." As the starting
p = pattern.compile ("^www.");
m = p.matcher (input);
if (m
Detect if illegal characters are included
p = Pattern.compile ("[^a-za-z0-9.@_-~#]+");
m = p.matcher (input);
StringBuffer sb = new StringBuffer ();
Boolean result = M.find ();
Boolean deletedillegalchars = false;
while (result) {
If you find an illegal character, then set the tag.
Deletedillegalchars = true;
If it contains illegal characters Furu colon double quotes and so on, then eliminate them, add to SB inside
M.appendreplacement (SB, "");
result = M.find ();
}
M.appendtail (SB);
input = Sb.tostring ();
if (deletedillegalchars) {
System.out.println ("Enter the email address contains a colon, comma and other illegal characters, please modify");
System.out.println ("Your current input is:" +args[0]);
SYSTEM.OUT.PRINTLN ("Modified legal address should be similar:" +input);
}
}
}

For example, we enter at the command line: Java Email www.kevin@163.net
Then the output will be: Email address can not be ' www. ' Starting
If the input email is @kevin@163.net
The output is: The email address cannot be '. ' or ' @ ' as the starting character
When entered as: Cgjmail#$%@163.net
Then the output is:
Enter the email address contains a colon, comma and other illegal characters, please modify
Your current input is: cgjmail#$%@163.net
The modified legal address should be similar: cgjmail@163.net
5. Summarize:
This paper introduces the classes and their methods in the regular expression library--java.util.regex in Jdk1.4.0-beta3, and it is easier for readers to use the API if combined with the Jakarta-oro API described in the previous article. Of course, the performance of the library will continue to expand in the days to come, want to get the latest information of the readers is best to timely to the Sun's website to understand.
6. Conclusion:
Originally planned to write a more about the need to pay the regular expression library more representative works, but feel that since there is free and excellent regular expression library can be used, why still have to pay for it, I believe many readers think so: So be interested to learn more about the other third founder The Friends of the expression library can find out on the Internet or go to the website I provided in the resources.
Resources
Java.util.regex's Help document
The regular Expressions and the java™programming Language by Dana Nourie and Mike McCloskey
Need more third founder expression resources and applications based on which they are developing see http://www.meurrens.org/ip-Links/java/regex/index.html
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.