1. Introduction:
Java. util. regex is a class library package that uses regular expressions to customize the pattern to match strings.
It includes two classes:PatternAndMatcher
Pattern |
A Pattern is the expression Pattern of a regular expression compiled. |
Matcher |
A Matcher object is a state machine. It checks the string based on the Pattern object as the matching mode. |
First, a Pattern instance customizes a regular expression with a syntax similar to PERL after compilation, then, a Matcher instance matches strings under the Pattern control of the given Pattern instance.
Let's take a look at these two classes:
2. Pattern class:
The Pattern method is as follows:
Static Pattern |
Compile(String regex) Compile the given regular expression and assign it to the Pattern class. |
Static Pattern |
Compile(String regex, int flags) Same as above, but with the flag parameter specified, the optional flag parameters include: case insensitive, MULTILINE, DOTALL, unicode case, CANON EQ |
Int |
Flags() Returns the matching flag parameter of the current Pattern. |
Matcher |
Matcher(CharSequence input) Generate a Matcher object with a given name |
Static boolean |
Matches(String regex, CharSequence input) Compile the given regular expression and perform matching on the input string using the regular expression as the modulo. This method is suitable for the regular expression that is used only once, that is, only one matching operation, in this case, you do not need to generate a Matcher instance. |
String |
Pattern() Returns the regular expression compiled by the Patter object. |
String [] |
Split(CharSequence input) Divide the target string according to the regular expression in Pattern as a modulo. |
String [] |
Split(CharSequence input, int limit) The purpose of adding the limit parameter is to specify the number of segments to be split. For example, if limi is set to 2, the target string is split into two segments based on the regular expression. |
A regular expression, that is, a string of specific characters, must first be compiled into an instance of the Pattern class. This Pattern object will useMatcher ()Method to generate a Matcher instance. Then, the Matcher instance can be used to match the target String Based on the compiled regular expression. Multiple Matcher instances can share a Pattern object.
Now let's take a look at a simple example and analyze it to learn how to generate a Pattern object and compile a regular expression. Finally, we can split the target string based on this regular expression:
Import java. util. regex .*; Public class Replacement { Public static void main (String [] args) throws Exception { // Generate a Pattern and compile a regular expression Pattern p = Pattern. compile ("[/] + "); // Use the split () method of Pattern to split the string "/" String [] result = p. split ( "Kevin has seen" LEON "seveal times, because it is a good film ." + "/Kevin has read" this killer is not too cold "several times, because it is" + "Good movies. /Term: Kevin. "); For (int I = 0; I <result. length; I ++) System. out. println (result [I]); } }
|
Output result:
Kevin has seen "Leon" seveal times, because it is a good film.
Kevin has watched "this killer is not too cold" several times, because it is a good movie.
Term: Kevin.
Obviously, this program segments the string by "/". We will use the followingSplit(CharSequence input, int limit) method to specify the number of segments, the program changed:
Tring [] result = P. split ("Kevin has seen" Leon "seveal times, because it is a good film. /Kevin has watched "this killer is not too cold" several times, because it is a good movie. /Term: Kevin. ", 2 );
The parameter "2" indicates that the target statement is divided into two sections.
The output result is:
Kevin has seen "Leon" seveal times, because it is a good film.
Kevin has watched "this killer is not too cold" several times, because it is a good movie. /Term: Kevin.
From the above example, we can compare Java. util. the implementation method of the RegEx package in constructing the pattern object and compiling the specified regular expression is different from that of the Jakarta-Oro package we described in the previous article in completing the same work, the Jakarta-Oro package first constructs a patterncompiler class object and then generates a pattern object. Then, the regular expression uses the patterncompiler class's compile () method to assign the required regular expression compilation to the pattern class:
Patterncompiler orocom = new perl5compiler ();
Pattern pattern = orocom. Compile ("Regular Expressions ");
Patternmatcher = new perl5matcher ();
However, in the Java. util. RegEx package, we only need to generate a pattern class and directly use its compile () method to achieve the same effect:
Pattern p = Pattern.compile("[/]+");
Therefore, it seems that the java. util. RegEx constructor is simpler and easier to understand than Jakarta-Oro.
3. matcher class:
The matcher method is as follows:
Matcher |
AppendReplacement(StringBuffer sb, String replacement) Replace the current matched substring with the specified substring, and add the substring after the replacement and the string segment after the previous matched substring to a StringBuffer object. |
StringBuffer |
AppendTail(StringBuffer sb) Add the remaining strings after the last match to a StringBuffer object. |
Int |
End() Returns the index position of the original target string for the last character of the matched substring. |
Int |
End(Int group) Returns the position of the last character of the substring that matches the specified group in the matching mode. |
Boolean |
Find() Find the next matched substring in the target string. |
Boolean |
Find(Int start) Reset the Matcher object and try to find the next matched substring from the specified position in the target string. |
String |
Group() Returns the content of all substrings that match the Group obtained by the current query. |
String |
Group(Int group) Returns the content of the substring that matches the specified group. |
Int |
GroupCount() Returns the number of matched groups obtained by the current query. |
Boolean |
LookingAt() Checks whether the target string starts with a matched substring. |
Boolean |
Matches() Try to expand the matching check for the entire target character, that is, the true value is returned only when the entire target string is completely matched. |
Pattern |
Pattern() Returns the existing matching mode of the Matcher object, that is, the corresponding Pattern object. |
String |
ReplaceAll(String replacement) Replace all substrings in the target string that match the existing mode with the specified string. |
String |
ReplaceFirst(String replacement) Replace the first substring that matches the existing mode in the target string with the specified substring. |
Matcher |
Reset() Reset the Matcher object. |
Matcher |
Reset(CharSequence input) Reset the Matcher object and specify a new target string. |
Int |
Start() Returns the position of the starting character of the substring in the original target string. |
Int |
Start(Int group) Returns the position of the first character in the original target string of the substring that matches the specified group. |
(Is it hard to understand the explanation of the method? Don't worry. It will be easier to understand in the future with examples)
A matcher instance is used to search for the target string based on the existing pattern (that is, the regular expression compiled by a given pattern, all input to matcher is provided through the charsequence interface, so as to support matching of the data provided from a wide range of data sources.
Let's take a look at the usage of each method:
★Matches ()/lookingat ()/find ():
A matcher object is generated by a pattern object calling its matcher () method. Once the matcher object is generated, it can perform three different matching search operations:
- The matches () method tries to expand the matching check for the entire target character, that is, the true value is returned only when the entire target string is completely matched.
- The lookingat () method checks whether the target string starts with a matched substring.
- The find () method tries to find the next matched substring in the target string.
All three methods return a Boolean value to indicate whether the operation is successful or not.
★ReplaceAll ()/appendReplacement ()/appendTail ():
The Matcher class also provides four methods to replace matched substrings with specified strings:
- Replaceall ()
- Replacefirst ()
- Appendreplacement ()
- Appendtail ()
ReplaceAll () and replaceFirst () are easy to use. Please refer to the explanation of the above method. We mainly focus on the appendReplacement () and appendTail () methods.
AppendReplacement (StringBuffer sb, String replacement) replaces the current matched substring with the specified String, in addition, the replaced substring and its string segments after the matched substring are added to a StringBuffer object, while appendTail (StringBuffer sb) the method adds the remaining strings after the last matching operation to a StringBuffer object.
For example, if there is a string fatcatfatcatfat and the regular expression pattern is "cat" and appendReplacement (sb, "dog") is called after the first match, then the content of StringBuffer sb is fatdog, that is, cat in fatcat is replaced with dog and added to sb before matching the substring. After the second matching, appendReplacement (sb, "dog") is called "), then the content of sb becomes fatdogfatdog. If you call appendTail (sb) Again, the final content of sb will be fatdogfatdogfat.
Still a little fuzzy? Let's look at a simple program:
// Change "Kelvin" in the sentence to "Kevin" in this example" Import java. util. regex .*; Public class MatcherTest { Public static void main (String [] args) Throws Exception { // Generate the Pattern object and compile a simple regular expression "Kelvin" Pattern p = Pattern. compile ("Kevin "); // Use the matcher () method of the Pattern class to generate a Matcher object Matcher m = p. matcher ("Kelvin Li and Kelvin Chan are both
Working in Kelvin Chen's KelvinSoftShop company "); StringBuffer sb = new StringBuffer (); Int I = 0; // Use the find () method to find the first matched object Boolean result = m. find (); // Use loops to find and replace all kelvin in the sentence and then add the content to sb. While (result ){ I ++; M. appendReplacement (sb, "Kevin "); System. out. println ("the" + I + "times after the matching, the sb content is:" + sb ); // Continue searching for the next matching object Result = m. find (); } // Call the appendTail () method to add the remaining strings after the last match to sb; M. appendTail (sb ); System. out. println ("the final content of sb after calling m. appendTail (sb) is:" + sb. toString ()); } }
|
The final output result is:
After 1st matches, the sb content is: Kevin
After 2nd matches, the sb content is: Kevin Li and Kevin
After 3rd matching, the sb content is: Kevin Li and Kevin Chan are both working in Kevin
After 4th matching, the sb content is: Kevin Li and Kevin Chan are both working in Kevin Chen's Kevin
The final content of sb after calling m. appendTail (sb) is: Kevin Li and Kevin Chan are both working in Kevin Chen's KevinSoftShop company.
Check whether the above routine is more clear about the use of appendReplacement () and appendTail () methods. If you are not sure, you 'd better write a few lines of code to test it yourself.
★Group ()/group (int group)/groupCount ():
This series of methods corresponds to the MatchResult in Jakarta-ORO described in the previous section. the group () method is similar (for details about Jakarta-ORO, refer to the content in the previous article). The sub-string content that matches the group is returned. The following code will explain its usage well:
Import java. util. regex .*;
Public class GroupTest { Public static void main (String [] args) Throws Exception { Pattern p = Pattern. compile ("(ca) (t )"); Matcher m = p. matcher ("one cat, two cats in the yard "); Stringbuffer sb = new stringbuffer (); Boolean result = M. Find (); System. Out. println ("the number of matched groups obtained by this query is:" + M. groupcount ()); For (INT I = 1; I <= M. groupcount (); I ++ ){ System. Out. println ("the sub-string of the" + I + "group is:" + M. Group (I )); } } }
|
Output:
The number of matched groups obtained by this query is: 2
The sub-string content in the 1st group is ca.
The sub-string content in the 2nd group is t.
Other methods of the Matcher object, for better understanding and limited space, should be programmed and verified by the reader.
4. A small program to check the Email address:
Finally, let's look at a routine to check the Email address. This program is used to check whether the characters contained in an input EMAIL address are legal. Although this is not a complete EMAIL address verification program, it cannot test all possible situations, but you can add the required functions on the basis of it if necessary.
Import java. util. RegEx .*; Public class email { Public static void main (string [] ARGs) throws exception { String input = ARGs [0]; // Check whether the entered EMAIL address uses the invalid symbol "." or "@" as the start character. Pattern p = Pattern. compile ("^/. | ^ /@"); Matcher m = p. matcher (input ); If (m. find ()){ System. err. println ("the EMAIL address cannot start with '.' or "); } // Check whether it starts with "www ." P = Pattern. compile ("^ www /."); M = p. matcher (input ); If (m. find ()){ System. out. println ("the EMAIL address cannot start with 'www "); } // Check for illegal characters P = Pattern. compile ("[^ A-Za-z0-9 /./@_/-~ #] + "); M = p. matcher (input ); StringBuffer sb = new StringBuffer (); Boolean result = m. find (); Boolean deletedIllegalChars = false; While (result ){ // If an invalid character is found, mark it. DeletedIllegalChars = true; // If it contains illegal characters, such as double quotation marks (:), remove them and add them to SB. M. appendReplacement (sb ,""); Result = M. Find (); } M. appendtail (SB ); Input = sb. tostring (); If (deletedillegalchars ){ System. Out. println ("the entered email address contains invalid characters such as colons and commas. Please modify it "); System. Out. println ("your current input is:" + ARGs [0]); System. Out. println ("the valid address after modification should be similar to:" + input ); } } }
|
For example, we enter the java Email www.kevin@163.net in the command line
The output result will be: the EMAIL address cannot start with 'www. '.
If the entered EMAIL is @ kevin@163.net
The output is: the email address cannot start with '.' or '@'.
When the input is: cgjmail # $ % @ 163.net
The output is:
The entered email address contains invalid characters such as colons and commas. Modify the email address.
Your current input is: cgjmail #$ % @ 163.net
The valid address after modification should be similar to: cgjmail@163.net