The Java Matcher class (Java.util.regex.Matcher) is used to search through a-text for multiple occurrences of a regular ex Pression. can also use a Matcher to search for the same regular expression in different texts.
The Java Matcher class has a lot of useful methods. I'll cover the core methods of the Java Matcher class in this tutorial. For a full list, and the official JAVADOC for the Matcher class. Java Matcher Example
This is a quick Java Matcher example I can get it and how the Matcher class works:
Import Java.util.regex.Pattern;
Import Java.util.regex.Matcher;
public class Matcherexample {public
static void Main (string[] args) {
String text =
' This is the ' text to Be searched "+
" for occurrences of the http://pattern. ";
String patternstring = ". *http://.*";
Pattern pattern = pattern.compile (patternstring);
Matcher Matcher = pattern.matcher (text);
Boolean matches = Matcher.matches ();
}
}
The created a regular expression, and from the pattern instance a Matcher instance be created . Then the Matches () is called on the Matcher instance. The matches () returns true if the regular expression matches the text, and False if not.
Can do a whole lot the Matcher class. The rest is covered throughout the rest of this tutorial. Thepattern class is covered separately in my Java Regex pattern tutorial. Creating a Matcher
Creating a Matcher is done by via the Matcher () method into the pattern class. This is example:
Import Java.util.regex.Pattern;
Import Java.util.regex.Matcher;
public class Creatematcherexample {public
static void Main (string[] args) {
String text =
' This is the Te xt to is searched "+
" for occurrences of the http://pattern. ";
String patternstring = ". *http://.*";
Pattern pattern = pattern.compile (patternstring);
Matcher Matcher = pattern.matcher (text);
}
}
At the "End of" This example the Matcher variable'll contain a Matcher instance which can be used to match the regular exp Ression used to create it against different text input. matches ()
The matches () method is in the Matcher class matches the regular expression against the whole text passed to the PATTERN.MATC Her () method, while the Matcher was created. This is a matcher.matches () Example:
String patternstring = ". *http://.*";
Pattern pattern = pattern.compile (patternstring);
Boolean matches = Matcher.matches ();
If the regular expression matches the whole text, then the matches () method returns True. If not, the matches () method returns FALSE.
You are cannot use the matches () method to search for multiple occurrences of a regular expression in a text. For this, you need to use the Find (), start () and End () methods. Lookingat ()
The Matcher Lookingat () method is the matches () method, with one major difference. The Lookingat () method is only matches the regular expression against the beginning of the text, whereas matches () matches the Regular expression against the whole text. In other words, if the regular expression matches the beginning of a-text but not the whole text, Lookingat () 'll return True, whereas matches () would return false.
This is a matcher.lookingat () Example:
Import Java.util.regex.Pattern;
Import Java.util.regex.Matcher;
public class Creatematcherexample {public
static void Main (string[] args) {
String text =
' This is the Te xt to is searched "+
" for occurrences of the http://pattern. ";
String patternstring = "This is the";
Pattern pattern = Pattern.compile (patternstring, pattern.case_insensitive);
Matcher Matcher = pattern.matcher (text);
System.out.println ("Lookingat =" + Matcher.lookingat ());
System.out.println ("matches =" + matcher.matches ());
}
This example matches the regular expression "This is the" against both the beginning of the text, and against the whole TE Xt. Matching the regular expression against the beginning of the text (Lookingat ()) would return true.
Matching the regular expression against the whole text (matches ()) would return false because the text has more characters than the regular expression. The regular expression says that the text must match the text ' This is the ' exactly, with no extra characters before or AF ter the expression. Find () + start () + End ()
The Matcher find () method searches for occurrences of the regular expressions in the text passed to Thepattern.matcher (Tex T) method While the Matcher was created. If multiple matches can be found into the text, TheFind () method would find the "the", and then for each subsequent call to F IND () It would move to the next match.
The methods start () and end () would give the indexes into the text where the found match starts and ends. actually end () returns the "the" character just after the end of the matching section. Thus, you can use the return values of "call" () inside a string.substring ().
This is the Java Matcher find (), the Start () and end () Example:
Import Java.util.regex.Pattern;
Import Java.util.regex.Matcher;
public class Matcherfindstartendexample {public
static void Main (string[] args) {
String text =
' This is the text which are to searched "+" for occurrences of the
word ' is '. "
String patternstring = "is";
Pattern pattern = pattern.compile (patternstring);
Matcher Matcher = pattern.matcher (text);
int count = 0;
while (Matcher.find ()) {
count++;
System.out.println ("Found:" + Count + ":"
+ matcher.start () + "-" + matcher.end ());}}}
This example'll find the pattern ' is ' four times in the searched string. The output printed is this:
Found:1:2-4
found:2:5-7
found:3:23-25 found:4
: 70-72
Reset ()
The Matcher Reset () method is resets the matching state internally in the Matcher. In case your have started matching occurrences in a string via the ' Find ' () method, the Matcher'll internally keep a state about how far it has searched through the input text. by calling Reset () the matching would start from the beginning of the text again.
There is also a reset (Charsequence) method. This method resets the Matcher, and makes the Matcher search through the charsequence passed as parameter, instead of the Charsequence The Matcher is originally created with. Group ()
Imagine you are searching through a-text for URL ' s, and your would like to extract the found URL ' s out of the ' text. Of course you could does this with the start () methods, but it are easier to does so with the group functions.
Groups are marked with parentheses in the regular expression. For instance:
(John)
This regular expression matches the text John. The parentheses are not part of the text this is matched. The parentheses mark a group. When a match is found in a-text, you can get access to the part of the regular expression inside the group.
You have access a group using the group (int groupno) method. A Regular expression can have more than one group. Each group are thus marked with a separate set of parentheses. To the "Get access" to the "matched" subpart of the the expression in a specific group, pass the number of the group to The group (int groupno) method.
The group with number 0 is always the whole regular expression. To get access to a group marked by parentheses for you should start with group numbers 1.
This is a Matcher group () Example:
Import Java.util.regex.Pattern;
Import Java.util.regex.Matcher;
public class Matchergroupexample {public
static void Main (string[] args) {
String text =
"John writes AB Out of this, and John writes about that, "+
" and John writes about everything. "
;
String patternString1 = "(John)";
Pattern pattern = pattern.compile (patternString1);
Matcher Matcher = pattern.matcher (text);
while (Matcher.find ()) {
System.out.println ("Found:" + matcher.group (1));}}}
This example searches the text for occurrences of the word John. For each match found, group number 1 are extracted, which is what matched the group marked with parentheses. The output of the example is:
Found:john
Found:john
Found:john
multiple Groups
As mentioned earlier, a regular expression can have multiple groups. This is a regular expression illustrating:
(John) (.+?)
This expression matches the text "John" followed by a spaces, and then one or more characters. You cannot to the example above, but there is a spaces after the last group too.
This is expression contains a few characters with special meanings in a regular. The. means "any character". The + means "one or more times", and relates to the. (any character, one or more times). The? means "match as small a number of characters as possible".
Here's a full code example:
Import Java.util.regex.Pattern;
Import Java.util.regex.Matcher;
public class Matchergroupexample {public
static void Main (string[] args) {
String text =
"John writes AB Out of this, and John Doe writes about that, "+
" and John Wayne writes about everything.
"
String patternString1 = "(John) (. +?)";
Pattern pattern = pattern.compile (patternString1);
Matcher Matcher = pattern.matcher (text);
while (Matcher.find ()) {
matcher.group (1) +
"" Matcher.group (2));
}
}
Notice the reference to the two groups and marked in bold. The characters matched by those groups are printed to System.out. This is what the example prints out:
Found:john writes
Found:john Doe
Found:john Wayne
Groups Inside Groups
It is possible to have groups inside groups in a regular. This is example:
((John) (. +?))
Notice how the two groups from the examples earlier are now nested inside a larger group. (Again, you cannot to the the "the" expression, but it is there).
When groups are nested inside all other, they are numbered based on while the left paranthesis of the group is met. Thus, group 1 is the big group. Group 2 is the group with the expression John inside. Group 3 is the group with the expression. +? Inside. This is important to know the groups via the groups (int groupno) method.
This is a example that uses the above nested groups:
Import Java.util.regex.Pattern;
Import Java.util.regex.Matcher;
public class Matchergroupsexample {public
static void Main (string[] args) {
String text =
"John writes a Bout this, and John Doe writes about that, "+
" and John Wayne writes about everything. "
String patternString1 = "((John) (. +?))";
Pattern pattern = pattern.compile (patternString1);
Matcher Matcher = pattern.matcher (text);
while (Matcher.find ()) {
System.out.println ("Found: <" + matcher.group (1) +
"> <" + Matcher.group (2) +
"> <" + Matcher.group (3) + ">");
}
}
Here are the output from the above example:
Found: <john writes> <John> <writes>
found: <john doe> <John> <Doe>
found: <john wayne> <John> <Wayne>
Notice how the value matched by the the "The Outer group" contains the values matched by both of the inner groups. ReplaceAll () + Replacefirst ()
The Matcher ReplaceAll () and Replacefirst () methods can be used to replace parts of the string the Matcher is searching th Rough. The ReplaceAll () method replaces all matches of the regular expression. Thereplacefirst () only replaces the match.
Before any matching are carried out, the Matcher are reset, so that matching starts the input text.
Here are two examples:
Import Java.util.regex.Pattern;
Import Java.util.regex.Matcher;
public class Matcherreplaceexample {public
static void Main (string[] args) {
String text =
"John writes About this, and John Doe writes about that, "+
" and John Wayne writes about everything. "
String patternString1 = "((John) (. +?))";
Pattern pattern = pattern.compile (patternString1);
Matcher Matcher = pattern.matcher (text);
String ReplaceAll = Matcher.replaceall ("Joe Blocks");
System.out.println ("ReplaceAll =" + ReplaceAll);
String Replacefirst = Matcher.replacefirst ("Joe Blocks");
System.out.println ("Replacefirst =" + Replacefirst);
}
And is what the example outputs:
ReplaceAll = Joe Blocks about this, and Joe Blocks writes about that, and
Joe Blocks writes about everything.
Replacefirst = Joe Blocks about this, and John Doe writes about that, and
John Wayne writes about everything.
The line breaks and indendation of the following line are not really part of the output. I added them to make the output easier to read.
Notice how the "the" the "the" the "the" the "printed has all occurrences of" The second string is only has the occurrence replaced. appendreplacement () + appendtail ()
The Matcher appendreplacement () and Appendtail () methods are used to replace string tokens in an input text, and append th e resulting string to a stringbuffer.
When you have found a match using the ' Find ' () method, your can call the Appendreplacement (). Doing so results in the characters of the input text being appended to the StringBuffer, and the matched text being aced. Only the characters starting from then to the last match, and until just before the matched characters the are.
The Appendreplacement () method keeps track of what has been copied to the StringBuffer, so you can continue searching fo R matches using Find () until No. matches are found in the input text.
Once the last match has been found, a part of the input text would still not have been to the copied. This is the characters from the ' End of match ' and until the end of the ' input text. by calling Appendtail () can append this last characters to the stringbuffer too.
This is example:
import java.util.regex.Pattern; import java.util.regex.Matcher;
public class Matcherreplaceexample {public static void main (string[] args) {String text = "John writes about this, and John Doe writes about that," + "and John Wayne writes about E
Verything. ";
String patternString1 = "((John) (. +?))";
Pattern pattern = pattern.compile (patternString1);
Matcher Matcher = pattern.matcher (text);
StringBuffer StringBuffer = new StringBuffer ();
while (Matcher.find ()) {matcher.appendreplacement (StringBuffer, "Joe Blocks");
System.out.println (Stringbuffer.tostring ());
} matcher.appendtail (StringBuffer); System.out.println (Stringbuffer.tostring ());