Java Regular Expression Application

Source: Internet
Author: User

1. Introduction:
Java. util. RegEx is a class library package that uses regular expressions to customize the pattern to match strings.

It includes two classes: Pattern and matcher pattern. A pattern is the expression mode after a regular expression is compiled.
Matcher A matcher object is a state machine. It checks the string based on the pattern object as the matching mode. First, a pattern instance customizes a regular expression with a syntax similar to Perl after compilation, then, a matcher instance matches strings under the pattern control of the given pattern instance.

Let's take a look at these two classes:

2. pattern class:
The pattern method is as follows: static pattern compile (string RegEx)
Compile the given regular expression and assign it to the pattern class.
Static pattern compile (string RegEx, int flags)
Same as above, but with the flag parameter specified, the optional flag parameters include: Case Insensitive, multiline, dotall, Unicode case, Canon EQ
Int flags ()
Returns the matching flag parameter of the current pattern.
Matcher (charsequence input)
Generate a matcher object with a given name
Static Boolean matches (string RegEx, charsequence input)
Compile the given regular expression and perform matching on the input string using the regular expression as the modulo. This method is suitable for the regular expression that is used only once, that is, only one matching operation, in this case, you do not need to generate a matcher instance.
String Pattern ()
Returns the regular expression compiled by the patter object.
String [] Split (charsequence input)
Divide the target string according to the regular expression in pattern as a modulo.
String [] Split (charsequence input, int limit)
The purpose of adding the limit parameter is to specify the number of segments to be split. For example, if Limi is set to 2, the target string is split into two segments based on the regular expression.

A regular expression, that is, a string of specific characters, must first be compiled into an instance of the pattern class. This pattern object will use matcher () method to generate a matcher instance. Then, the matcher instance can be used to match the target String Based on the compiled regular expression. Multiple matcher instances can share a pattern object.

Now let's take a look at a simple example and analyze it to learn how to generate a pattern object and compile a regular expression. Finally, we can split the target string based on this regular expression:
Import java. util. RegEx .*;
Public class replacement {
Public static void main (string [] ARGs) throws exception {
// Generate a pattern and compile a regular expression
Pattern P = pattern. Compile ("[/] + ");
// Use the split () method of pattern to split the string "/"
String [] result = P. Split (
"Kevin has seen" Leon "seveal times, because it is a good film ."
+ "/Kevin has read" this killer is not too cold "several times, because it is"
+ "Good movies. /Term: Kevin. ");
For (INT I = 0; isystem. Out. println (result [I]);
}
}

Output result:

Kevin has seen "Leon" seveal times, because it is a good film.
Kevin has watched "this killer is not too cold" several times, because it is a good movie.
Term: Kevin.

Obviously, this program segments the string by "/". We will use the split (charsequence input, int limit) method to specify the number of segments. The program is changed:
Tring [] result = P. split ("Kevin has seen" Leon "seveal times, because it is a good film. /Kevin has watched "this killer is not too cold" several times, because it is a good movie. /Term: Kevin. ", 2 );

The parameter "2" indicates that the target statement is divided into two sections.

The output result is:

Kevin has seen "Leon" seveal times, because it is a good film.
Kevin has watched "this killer is not too cold" several times, because it is a good movie. /Term: Kevin.

3. matcher class:
The matcher method is as follows: matcher appendreplacement (stringbuffer Sb, string replacement)
Replace the current matched substring with the specified substring, and add the substring after the replacement and the string segment after the previous matched substring to a stringbuffer object.
Stringbuffer appendtail (stringbuffer SB)
Add the remaining strings after the last match to a stringbuffer object.
Int end ()
Returns the index position of the original target string for the last character of the matched substring.
Int end (INT Group)
Returns the position of the last character of the substring that matches the specified group in the matching mode.
Boolean find ()
Find the next matched substring in the target string.
Boolean find (INT start)
Reset the matcher object and try to find the next matched substring from the specified position in the target string.
String group ()
Returns the content of all substrings that match the Group obtained by the current query.
String group (INT Group)
Returns the content of the substring that matches the specified group.
Int groupcount ()
Returns the number of matched groups obtained by the current query.
Boolean lookingat ()
Checks whether the target string starts with a matched substring.
Boolean matches ()
Try to expand the matching check for the entire target character, that is, the true value is returned only when the entire target string is completely matched.
Pattern pattern ()
Returns the existing matching mode of the matcher object, that is, the corresponding pattern object.
String replaceall (string replacement)
Replace all substrings in the target string that match the existing mode with the specified string.
String replacefirst (string replacement)
Replace the first substring that matches the existing mode in the target string with the specified substring.
Matcher reset ()
Reset the matcher object.
Matcher reset (charsequence input)
Reset the matcher object and specify a new target string.
Int start ()
Returns the position of the starting character of the substring in the original target string.
Int start (INT Group)
Returns the position of the first character in the original target string of the substring that matches the specified group.

(Is it hard to understand the explanation of the method? Don't worry. It will be easier to understand in the future with examples)

A matcher instance is used to search for the target string based on the existing pattern (that is, the regular expression compiled by a given pattern, all input to matcher is provided through the charsequence interface, so as to support matching of the data provided from a wide range of data sources.

Let's take a look at the usage of each method:

★Matches ()/lookingat ()/find ():
A matcher object is generated by a pattern object calling its matcher () method. Once the matcher object is generated, it can perform three different matching search operations:

The matches () method tries to expand the matching check for the entire target character, that is, the true value is returned only when the entire target string is completely matched.
The lookingat () method checks whether the target string starts with a matched substring.
The find () method tries to find the next matched substring in the target string.

All three methods return a Boolean value to indicate whether the operation is successful or not.

★Replaceall ()/appendreplacement ()/appendtail ():
The matcher class also provides four methods to replace matched substrings with specified strings:

Replaceall ()
Replacefirst ()
Appendreplacement ()
Appendtail ()

Replaceall () and replacefirst () are easy to use. Please refer to the explanation of the above method. We mainly focus on the appendreplacement () and appendtail () methods.

Appendreplacement (stringbuffer Sb, string replacement) replaces the current matched substring with the specified string, in addition, the replaced substring and its string segments after the matched substring are added to a stringbuffer object, while appendtail (stringbuffer SB) the method adds the remaining strings after the last matching operation to a stringbuffer object.

For example, if there is a string fatcatfatcatfat and the regular expression pattern is "cat" and appendreplacement (SB, "dog") is called after the first match, then the content of stringbuffer Sb is fatdog, that is, CAT in fatcat is replaced with dog and added to sb before matching the substring. After the second matching, appendreplacement (SB, "dog") is called "), then the content of Sb becomes fatdogfatdog. If you call appendtail (SB) Again, the final content of Sb will be fatdogfatdogfat.
Still a little fuzzy? Let's look at a simple program:
// Change "Kelvin" in the sentence to "Kevin" in this example"
Import java. util. RegEx .*;
Public class matchertest {
Public static void main (string [] ARGs)
Throws exception {
// Generate the pattern object and compile a simple regular expression "Kelvin"
Pattern P = pattern. Compile ("Kevin ");
// Use the matcher () method of the pattern class to generate a matcher object
Matcher M = P. matcher ("Kelvin Li and Kelvin Chan are both working in Kelvin Chen's kelvinsoftshop company ");
Stringbuffer sb = new stringbuffer ();
Int I = 0;
// Use the find () method to find the first matched object
Boolean result = M. Find ();
// Use loops to find and replace all Kelvin in the sentence and then add the content to sb.
While (result ){
I ++;
M. appendreplacement (SB, "Kevin ");
System. Out. println ("the" + I + "times after the matching, the Sb content is:" + Sb );
// Continue searching for the next matching object
Result = M. Find ();
}
// Call the appendtail () method to add the remaining strings after the last match to sb;
M. appendtail (SB );
System. Out. println ("the final content of Sb after calling M. appendtail (SB) is:" + sb. tostring ());
}
}

The final output result is:
After 1st matches, the Sb content is: Kevin
After 2nd matches, the Sb content is: Kevin Li and Kevin
After 3rd matching, the Sb content is: Kevin Li and Kevin Chan are both working in Kevin
After 4th matching, the Sb content is: Kevin Li and Kevin Chan are both working in Kevin Chen's Kevin
The final content of Sb after calling M. appendtail (SB) is: Kevin Li and Kevin Chan are both working in Kevin Chen's kevinsoftshop company.

Check whether the above routine is more clear about the use of appendreplacement () and appendtail () methods. If you are not sure, you 'd better write a few lines of code to test it yourself.

★Group ()/group (INT group)/groupcount ():
This series of methods corresponds to the matchresult in Jakarta-Oro described in the previous section. the group () method is similar (for details about Jakarta-Oro, refer to the content in the previous article). The sub-string content that matches the group is returned. The following code will explain its usage well:
Import java. util. RegEx .*;

Public class grouptest {
Public static void main (string [] ARGs)
Throws exception {
Pattern P = pattern. Compile ("(CA) (t )");
Matcher M = P. matcher ("one cat, two cats in the yard ");
Stringbuffer sb = new stringbuffer ();
Boolean result = M. Find ();
System. Out. println ("the number of matched groups obtained by this query is:" + M. groupcount ());
For (INT I = 1; I <= M. groupcount (); I ++ ){
System. Out. println ("the sub-string of the" + I + "group is:" + M. Group (I ));
}
}
}

Output:
The number of matched groups obtained by this query is: 2
The sub-string content in the 1st group is ca.
The sub-string content in the 2nd group is T.

Other methods of the matcher object, for better understanding and limited space, should be programmed and verified by the reader.

4. A small program to check the email address:
Finally, let's look at a routine to check the email address. This program is used to check whether the characters contained in an input email address are legal. Although this is not a complete email address verification program, it cannot test all possible situations, but you can add the required functions on the basis of it if necessary.
Import java. util. RegEx .*;
Public class email {
Public static void main (string [] ARGs) throws exception {
String input = ARGs [0];
// Check whether the entered email address uses the invalid symbol "." or "@" as the start character.
Pattern P = pattern. Compile ("^ //. | ^ //@");
Matcher M = P. matcher (input );
If (M. Find ()){
System. Err. println ("the email address cannot start with '.' or ");
}
// Check whether it starts with "www ."
P = pattern. Compile ("^ www //.");
M = P. matcher (input );
If (M. Find ()){
System. Out. println ("the email address cannot start with 'www ");
}
// Check for illegal characters
P = pattern. Compile ("[^ A-Za-z0-9 //.//@_//-~ #] + ");
M = P. matcher (input );
Stringbuffer sb = new stringbuffer ();
Boolean result = M. Find ();
Boolean deletedillegalchars = false;
While (result ){
// If an invalid character is found, mark it.
Deletedillegalchars = true;
// If it contains illegal characters, such as double quotation marks (:), remove them and add them to sb.
M. appendreplacement (SB ,"");
Result = M. Find ();
}
M. appendtail (SB );
Input = sb. tostring ();
If (deletedillegalchars ){
System. Out. println ("the entered email address contains invalid characters such as colons and commas. Please modify it ");
System. Out. println ("your current input is:" + ARGs [0]);
System. Out. println ("the valid address after modification should be similar to:" + input );
}
}
}

For example, we enter the Java email www.kevin@163.net in the command line

The output result will be: the email address cannot start with 'www. '.

If the entered email is @ kevin@163.net

The output is: the email address cannot start with '.' or '@'.

When the input is: cgjmail # $ % @ 163.net

The output is:

The entered email address contains invalid characters such as colons and commas. Modify the email address.
Your current input is: cgjmail #$ % @ 163.net
The valid address after modification should be similar to: cgjmail@163.net

5. Conclusion:
This article introduces the Regular Expression Library in jdk1.4.0-beta3 -- Java. util. if you compare the classes and methods in RegEx with the Jakarta-Oro API described in the previous article, you will be more familiar with the use of this API, of course, the performance of this database will continue to expand in the future, and readers who want to obtain the latest information are better to learn about it on Sun's website in a timely manner.

Other problems

Java Regular Expressions use Java. util. implementation of the pattern class and matcher class in the RegEx package (we recommend that you open the Java API documentation when you read this article. when you introduce the method, you can view the method description in the Java API to see better results ).
The pattern class is used to create a regular expression. It can also be said to create a matching pattern. Its constructor is private and cannot be directly created, but it can be created through pattern. complie (string RegEx) simple factory method to create a regular expression,
Java code example:
Pattern P = pattern. Compile ("// W + ");
P. pattern (); // return/W +

Pattern () returns the string form of the regular expression, which is actually the RegEx parameter of pattern. complile (string RegEx ).

1. pattern. Split (charsequence input)
Pattern has a split (charsequence input) method used to separate strings and return a string [], I guess string. split (string RegEx) uses pattern. split (charsequence input.
Java code example:
Pattern P = pattern. Compile ("// D + ");
String [] STR = P. Split ("My QQ is: 456456 my phone is: 0532214 my mailbox is: aaa@aaa.com ");

Result: Str [0] = "My QQ is:" str [1] = "my phone is:" str [2] = "my mailbox is: aaa@aaa.com"

2. pattern. matcher (string RegEx, charsequence input) is a static method used to quickly match strings. This method is suitable for matching only once and all strings.
Java code example:
Pattern. Matches ("// D +", "2223"); // return true
Pattern. Matches ("// D +", "2223aa"); // return false. True is returned only when all strings are matched. Here AA cannot match
Pattern. Matches ("// D +", "22bb23"); // return false. True is returned only when all strings are matched. BB cannot match

3. pattern. matcher (charsequence input)
Having said so much, it is finally the matcher class. pattern. matcher (charsequence input) returns a matcher object.
The matcher class constructor method is private and cannot be created at will. You can only use the pattern. matcher (charsequence input) method to obtain the class instance.
The pattern class can only perform some simple matching operations. To obtain more convenient Regular Expression matching operations, we need to work with matcher. the matcher class supports grouping regular expressions and Multiple matching of regular expressions.
Java code example:
Pattern P = pattern. Compile ("// D + ");
Matcher M = P. matcher ("22bb23 ");
M. pattern (); // returns P, that is, the pattern object created by which the matcher object is returned.

4. matcher. Matches ()/matcher. lookingat ()/matcher. Find ()
The matcher class provides three matching operation methods. All three methods return the boolean type. If the match is received, true is returned. If the match is not found, false is returned.

Matches () matches the entire string. True is returned only when the entire string matches.
Java code example:
Pattern P = pattern. Compile ("// D + ");
Matcher M = P. matcher ("22bb23 ");
M. Matches (); // returns false because BB cannot be matched with/d +, leading to unsuccessful matching of the entire string.
Matcher m2 = P. matcher ("2223 ");
M2.matches (); // returns true because/d + matches the entire string.

Now let's look back at pattern. matcher (string RegEx, charsequence input), which is equivalent to the following code:
Pattern. Compile (RegEx). matcher (input). Matches ()

Lookingat () matches the previous string. True is returned only when the matched string is at the beginning.
Java code example:
Pattern P = pattern. Compile ("// D + ");
Matcher M = P. matcher ("22bb23 ");
M. lookingat (); // returns true because/d + matches the previous 22.
Matcher m2 = P. matcher ("aa2223 ");
M2.lookingat (); // return false because/d + cannot match the preceding AA

Find () matches the string. The matched string can be in any position.
Java code example:
Pattern P = pattern. Compile ("// D + ");
Matcher M = P. matcher ("22bb23 ");
M. Find (); // return true
Matcher m2 = P. matcher ("aa2223 ");
M2.find (); // return true
Matcher m3 = P. matcher ("aa2223bb ");
M3.find (); // return true
Matcher M4 = P. matcher ("AABB ");
M4.find (); // return false

5. mathcer. Start ()/matcher. End ()/matcher. Group ()
After you use matches (), lookingat (), find () to perform the matching operation, you can use the above three methods to obtain more detailed information.
Start () returns the index position of the matched substring in the string.
End () returns the index position of the last character of the matched substring in the string.
Group () returns the matched substring.
Java code example:
Pattern P = pattern. Compile ("// D + ");
Matcher M = P. matcher ("aaa2223bb ");
M. Find (); // match 2223
M. Start (); // 3 is returned.
M. End (); // return 7. The returned index number is 2223.
M. Group (); // returns 2223

Mathcer m2 = M. matcher ("2223bb ");
M. lookingat (); // match 2223
M. Start (); // returns 0. Because lookingat () can only match the preceding string, when lookingat () is used, the START () method always returns 0.
M. End (); // returns 4
M. Group (); // returns 2223

Matcher m3 = M. matcher ("2223bb ");
M. Matches (); // match the entire string
M. Start (); // return 0. The reason is clear.
M. End (); // returns 6 because matches () needs to match all strings.
M. Group (); // return 2223bb

After talking about this, I believe everyone understands how to use the above methods. Let's talk about how regular expression grouping is used in Java.
Both start (), end (), and group () have an overload method. These methods are start (int I), end (int I), and group (int I), which are used for group operations, the mathcer class also has a groupcount () used to return the number of groups.
Java code example:
Pattern P = pattern. Compile ("([A-Z] +) (// D + )");
Matcher M = P. matcher ("aaa2223bb ");
M. Find (); // match aaa2223
M. groupcount (); // returns 2 because there are two groups
M. Start (1); // returns 0 and returns the index number of the first matched substring in the string.
M. Start (2); // returns 3
M. End (1); // return 3 returns the index position of the last character of the first matched substring in the string.
M. End (2); // return 7
M. Group (1); // returns AAA and returns the first matched substring.
M. Group (2); // returns 2223, and returns the substring matched by the second group.

Now we can use a slightly advanced Regular Expression matching operation. For example, a text section contains many numbers, and these numbers are separated, now we need to extract all numbers from the text. Using Java's regular expression operations is so simple.
Java code example:
Pattern P = pattern. Compile ("// D + ");
Matcher M = P. matcher ("My QQ is: 456456 my phone is: 0532214 my mailbox is: aaa123@aaa.com ");
While (M. Find ()){
System. Out. println (M. Group ());
}

Output:
456456
0532214
123

For example, replace the preceding while () loop
While (M. Find ()){
System. Out. println (M. Group ());
System. Out. Print ("START:" + M. Start ());
System. Out. println ("end:" + M. End ());
}
Output:
456456
Start: 6 end: 12
0532214
Start: 19 end: 26
123
Start: 36 end: 39

6. matcher. region (INT start, int end)/matcher. regionend ()/matcher. regionstart ()
During the matching operation, the entire string is matched by default. For example, if there is a string "aabbcc" and "// D +" is used to find, match starts from the first A, that is, the position where the index number is 0. When the position where the index number is 0 does not match, it will go to the next position to match... it does not end until the substring is matched or the index number of the last character is matched. Obviously, "// D +" cannot match "aabbcc". When it matches the last C, when this match is completed, the match fails. That is to say, it will match the complete string. Can it not match the complete string? The answer is yes.
Region (INT start, int end) is used to set the region limit of this vertex.
Let's look at an example.
Java code example:
Pattern P = pattern. Compile ("// D + ");
String content = "aaabb2233cc ";
Matcher M = P. matcher (content );
System. Out. println (m );

Output: Java. util. RegEx. matcher [pattern =/d + region = 0, 11 lastmatch =]

We can see that region = indicates start = 0, end = 11. More commonly, when a string is matched, it is first matched from the position where the index number is 0, if the substring is matched, the system returns the result. If the substring is not matched, the matching is performed at the next position. If the substring matches 11-1, the matching ends.
Why is it 11? Because content. Length () = 11
Now you should understand its role. Let's look at an example.
Java code example:
Pattern P = pattern. Compile ("// D + ");
String content = "aaabb2233cc ";
Matcher M = P. matcher (content );
M. Find (); // returns true if the value matches 2223.

Matcher m2 = P. matcher (content );
M2.region (0, 5 );
M2.find (); // return false. Only the characters ranging from 0 to 5-1 of the index number are matched.

Matcher m3 = P. matcher (content );
M2.region (3, 8 );
M2.find (); // return true
M2.group (); // 223 is returned. Why do you need to count the index number to get it.

Matcher. regionstart () returns the start value in Region (INT start, int end). The default value is 0.
Matcher. regionend () returns the end value in Region (INT start, int end). The default value is the length () value of the matched string.

7. matcher. Reset ()/matcher. Reset (charsequence input)
Used to reset the matching. See examples
Java code example:
Pattern P = pattern. Compile ("[A-Z] + ");
String content = "aaabb2233cc ";
Matcher M = P. matcher (content); // at this time, M is just created and is in the initial state.
M. Find ();
M. Group (); // returns aaabb
M. Find ();
M. Group (); // return CC

Matcher m2 = P. matcher (content); // m2 is just created and is in the initial state.
M. Find ();
M. Group (); // returns aaabb
M. Reset (); // restore to the initial state. At this time, M2 is just created.
M. Find ();
M. Group (); // return aaabb. I believe everyone knows.

Matcher. Reset (charsequence input) is restored to the initial state, and the matching string is replaced with input. When the matching operation is performed later, the input is matched instead of the original string.

8. matcher. tomatchresult ()
Check the description of the matcher class in Java API and you will find that it implements the matchresult interface. This interface has only the following methods:
Groupcount ()
Group ()/group (int I)
Start ()/start (int I)
End ()/end (int I)

The functions of these methods have been described earlier. Now let's take a look at how tomatchresult () is used.
Java code example:
Pattern P = pattern. Compile ("// D + ");
Matcher M = P. matcher ("My QQ is: 456456 my phone is: 0532214 my mailbox is: aaa123@aaa.com ");
List list = new arraylist ();
While (M. Find ()){
List. Add (M. tomatchresult ());
}
Matchresult = NULL;
Iterator it = List. iterator ();
Int I = 1;
While (it. hasnext ()){
Matchresult = (matchresult) it. Next ();
System. Out. Print ("th" + (I ++) + "Times matched information :");
System. Out. println (matchresult. Group () + "/T" + matchresult. Start () + "/t" + matchresult. End ());
}

Output:
1st matching information: 456456 6 12
2nd matched information: 0532214 19 26
3rd matched information: 123 36 39

Now you should know That tomatchresult () is used to save the information after a match and will be used later.

Now you should know that the values of the START (), end (), and group () methods will change after each matching operation, and the values will be changed to the information of the matched substring, and their overload methods will also change to the corresponding information.
Note: You can use the START (), end (), group () methods only when the matching operation is successful. Otherwise, Java is thrown. lang. illegalstateexception, which is used only when any of the matches (), lookingat (), find () Methods returns true.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.