Java regular expression, Regular Expression
The regular expression defines the string mode.
Regular Expressions can be used to search, edit, or process text.
Regular expressions are not limited to a specific language, but are slightly different in each language.
Java regular expressions are the most similar to Perl expressions.
The java. util. regex package mainly includes the following three classes:
- Pattern class:
The pattern object is the compilation expression of a regular expression. The Pattern class does not have a public constructor. To create a Pattern object, you must first call its public static compilation method, which returns a Pattern object. This method accepts a regular expression as its first parameter.
- Matcher class:
The Matcher object is an engine for interpreting and matching input strings. Like the Pattern class, Matcher does not have a public constructor. You need to call the matcher method of the Pattern object to obtain a Matcher object.
- PatternSyntaxException:
PatternSyntaxException is a non-forced exception class that indicates a syntax error in the regular expression mode.
Capture Group
A capture group is a method of processing multiple characters as a single unit. It is created by grouping the characters in the brackets.
For example, a regular expression (dog) creates a single group that contains "d", "o", and "g ".
The capture group is numbered by calculating its parentheses from left to right. For example, in expression (A) (B (C), there are four such groups:
- (A) (B (C )))
- ()
- (B (C ))
- (C)
You can call the groupCount method of the matcher object to view the number of groups in the expression. The groupCount method returns an int value, indicating that the matcher object currently has multiple capture groups.
There is also a special group (group 0), which always represents the entire expression. This group is not included in the return value of groupCount.
Instance
The following example shows how to find a numeric string from a given string:
Import java. util. regex. matcher; import java. util. regex. pattern; public class RegexMatches {public static void main (String args []) {// search String line = "This order was placed for QT3000 in the specified mode! OK? "; String pattern = "(. *) (\ d + )(. *) "; // create the Pattern object Pattern r = Pattern. compile (pattern); // now create the matcher object Matcher m = r. matcher (line); if (m. find () {System. out. println ("Found value:" + m. group (0); System. out. println ("Found value:" + m. group (1); System. out. println ("Found value:" + m. group (2);} else {System. out. println ("no match ");}}}
The above example compilation and running results are as follows:
Found value: This order was placed for QT3000! OK?Found value: This order was placed for QT300Found value: 0
Regular expression syntax
Character |
Description |
\ |
Mark the next character as a special character, text, reverse reference, or octal escape character. For example, "n" matches the character "n ". "\ N" matches the line break. Sequence "\" match "\", "\ (" match "(". |
^ |
Match the start position of the input string. IfRegExpObjectMultilineProperty, ^ also matches the position after "\ n" or "\ r. |
$ |
Matches the position at the end of the input string. IfRegExpObjectMultilineAttribute, $ also matches the position before "\ n" or "\ r. |
* |
Matches the previous character or subexpression zero or multiple times. For example, zo * matches "z" and "zoo ". * Is equivalent to {0 ,}. |
+ |
Match the previous character or subexpression one or more times. For example, "zo +" matches "zo" and "zoo", but does not match "z. + Is equivalent to {1 ,}. |
? |
Matches the previous character or subexpression zero or once. For example, "do (es )? "Match" do "in" do "or" does ".? It is equivalent to {0, 1 }. |
{N} |
NIt is a non-negative integer. Exactly matchNTimes. For example, "o {2}" does not match "o" in "Bob", but does not match "o" in "food. |
{N,} |
NIt is a non-negative integer. At least matchNTimes. For example, "o {2,}" does not match "o" in "Bob", but matches all o in "foooood. "O {1,}" is equivalent to "o + ". "O {0,}" is equivalent to "o *". |
{N,M} |
MAndNIs a non-negative integer.N<=M. Match at leastNTimes, upMTimes. For example, "o {1, 3}" matches the first three o s in "fooooood. 'O {0, 1} 'is equivalent to 'o? '. Note: you cannot insert spaces between commas and numbers. |
? |
When this character is followed by any other qualifier (*, + ,? ,{N},{N,},{N,M}), The matching mode is "not greedy ". The "non-greedy" pattern matches the searched string as short as possible, while the default "greedy" pattern matches the searched string as long as possible. For example, in the string "oooo", "o +? "Only matches a single" o ", while" o + "matches all" o ". |
. |
Matches any single character except "\ n. To match any character including "\ n", use a mode such as "[\ s \ S. |
(Pattern) |
MatchPatternAnd capture the matched child expression. Available$0... $9The property is retrieved from the result "match" set. To match the parentheses (), use "\ (" or "\)". |
(? :Pattern) |
MatchPatternHowever, the child expression that does not capture the match, that is, it is a non-capturing match and is not stored for future use. This is useful for components that use the "or" character (|) combination mode. For example, 'industr (? : Y | ies) is a more economical expression than 'industry | industries. |
(? =Pattern) |
Execute the subexpression of Forward prediction first search, which matchesPatternThe start point of the string. It is a non-capture match, that is, it cannot be captured for future use. For example, 'windows (? = 95 | 98 | NT | 2000) 'matches "Windows" in "Windows 2000", but does not match "Windows" in "Windows 3.1 ". Prediction first does not occupy characters, that is, after a match occurs, the next matched search follows the previous match, rather than after the characters that constitute prediction first. |
(?!Pattern) |
Execute the subexpression of the reverse prediction first search. This expression does not matchPatternThe start point of the string. It is a non-capture match, that is, it cannot be captured for future use. For example, 'windows (?! 95 | 98 | NT | 2000) 'matches "Windows" in "Windows 3.1", but does not match "Windows" in "Windows 2000 ". Prediction first does not occupy characters, that is, after a match occurs, the next matched search follows the previous match, rather than after the characters that constitute prediction first. |
X|Y |
MatchXOrY. For example, 'z | food' matches "z" or "food ". '(Z | f) ood' matches "zood" or "food ". |
[Xyz] |
Character Set. Match any character. For example, "[abc]" matches "a" in "plain ". |
[^Xyz] |
Reverse character set. Match any character that is not included. For example, "[^ abc]" matches "p", "l", "I", "n" in "plain ". |
[A-z] |
Character range. Matches any character in the specified range. For example, "[a-z]" matches any lowercase letter in the range of "a" to "z. |
[^A-z] |
Reverse range character. Matches any character that is not within the specified range. For example, "[^ a-z]" matches any character that is not in the range of "a" to "z. |
\ B |
Match A Word boundary, that is, the position between the word and the space. For example, "er \ B" matches "er" in "never", but does not match "er" in "verb ". |
\ B |
Non-word boundary match. "Er \ B" matches "er" in "verb", but does not match "er" in "never ". |
\ CX |
MatchXIndicates the control character. For example, \ cM matches Control-M or carriage return.XMust be between the A-Z or a-z. If this is not the case, it is assumed that c is the "c" character itself. |
\ D |
Match numeric characters. It is equivalent to [0-9]. |
\ D |
Match non-numeric characters. It is equivalent to [^ 0-9]. |
\ F |
Match the page feed. It is equivalent to \ x0c and \ cL. |
\ N |
Line feed match. It is equivalent to \ x0a and \ cJ. |
\ R |
Match a carriage return. It is equivalent to \ x0d and \ cM. |
\ S |
Matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [\ f \ n \ r \ t \ v. |
\ S |
Match any non-blank characters. It is equivalent to [^ \ f \ n \ r \ t \ v. |
\ T |
Tab matching. It is equivalent to \ x09 and \ cI. |
\ V |
Vertical tab matching. It is equivalent to \ x0b and \ cK. |
\ W |
Matches any character type, including underscores. It is equivalent to "[A-Za-z0-9. |
\ W |
Matches any non-word character. It is equivalent to "[^ A-Za-z0-9. |
\ XN |
MatchN,NIs a hexadecimal escape code. The hexadecimal escape code must be exactly two digits long. For example, "\ x41" matches "". "\ X041" is equivalent to "\ x04" & "1. ASCII code can be used in regular expressions. |
\Num |
MatchNum,NumIs a positive integer. To capture matched reverse references. For example, "(.) \ 1" matches two consecutive identical characters. |
\N |
Identifies an octal escape code or a reverse reference. If \NAt leastNCapture sub-expressions, thenNIs a reverse reference. Otherwise, ifNIs the eight-digit number (0-7), thenNIt is an octal escape code. |
\Nm |
Identifies an octal escape code or a reverse reference. If \NmAt leastNmCapture sub-expressions, thenNmIs a reverse reference. If \NmAt leastNCaptureNIs reverse reference, followed by charactersM. If neither of the preceding conditions exists, then \NmMatching octal valuesNm, WhereNAndMIt is an octal digit (0-7 ). |
\ Nml |
WhenNIs the number of octal (0-3 ),MAndLMatch the octal escape code when it is an octal number (0-7 ).Nml. |
\ UN |
MatchN, WhereNIt is a Unicode Character in hexadecimal notation. For example, \ u00A9 matches the copyright symbol (©). |
Mather method Index
The index method provides useful index values to precisely indicate where a match can be found in the input string:
| Serial number |
Method and description |
| 1 |
Public int start () Returns the original matched initial index. |
| 2 |
Public int start (int group) Returns the initial index of the subsequence captured by the given group during the previous matching operation. |
| 3 |
Public int end () Returns the Offset after the last matched character. |
| 4 |
Public int end (int group) Returns the Offset after the last character of the subsequence captured by the given group during the previous matching operation. |
Research methods
The research method is used to check the input string and return a Boolean value, indicating whether the mode is found:
| Serial number |
Method and description |
| 1 |
Public boolean lookingAt () Try to match the input sequence starting from the beginning of the region with the pattern. |
| 2 |
Public boolean find () Try to find the next subsequence of the input sequence that matches the pattern. |
| 3 |
Public boolean find (int start) Reset this check box, and then try to find the next sub-sequence of the input sequence that matches the mode and starts from the specified index. |
| 4 |
Public boolean matches () Try to match the entire region with the pattern. |
Replacement Method
The replacement method is to replace the text in the input string:
| Serial number |
Method and description |
| 1 |
Public Matcher appendReplacement (StringBuffer sb, String replacement) To add or replace a non-terminal. |
| 2 |
Public StringBuffer appendTail (StringBuffer sb) Add and replace terminals. |
| 3 |
Public String replaceAll (String replacement) The replacement mode matches each sub-sequence of the input sequence of the given replacement string. |
| 4 |
Public String replaceFirst (String replacement) The replacement mode is the first sub-sequence of the input sequence that matches the given replacement string. |
| 5 |
Public static String quoteReplacement (String s) Returns the literal replacement string of the specified string. This method returns a string, just like a literal string passed to the appendReplacement method of the Matcher class. |
Start and end Methods
The following is an example of counting the number of times the word "cat" appears in the input string:
Import java. util. regex. matcher; import java. util. regex. pattern; public class RegexMatches {public static void main (String args []) {// search String line = "This order was placed for QT3000 in the specified mode! OK? "; String pattern = "(. *) (\ d + )(. *) "; // create the Pattern object Pattern r = Pattern. compile (pattern); // now create the matcher object Matcher m = r. matcher (line); if (m. find () {System. out. println ("Found value:" + m. group (0); System. out. println ("Found value:" + m. group (1); System. out. println ("Found value:" + m. group (2);} else {System. out. println ("no match ");}}}
The above example compilation and running results are as follows:
Match number 1start(): 0end(): 3Match number 2start(): 4end(): 7Match number 3start(): 8end(): 11Match number 4start(): 19end(): 22
We can see that this example uses the word boundary to ensure that the letter "c" t "is not just a substring of a long word. It also provides some useful information about the matching location in the input string.
The Start method returns the initial index of the subsequence captured by the given group during the previous matching operation. The index of the last matching character of the end method is added with 1.
Matches and lookingAt Methods
Both the matches and lookingAt methods are used to try to match an input sequence pattern. The difference between them is that matcher requires that the entire sequence be matched, but lookingAt does not.
These two methods are often used at the beginning of the input string.
The following example is used to explain this function:
import java.util.regex.Matcher;import java.util.regex.Pattern;public class RegexMatches{ private static final String REGEX = "foo"; private static final String INPUT = "fooooooooooooooooo"; private static Pattern pattern; private static Matcher matcher; public static void main( String args[] ){ pattern = Pattern.compile(REGEX); matcher = pattern.matcher(INPUT); System.out.println("Current REGEX is: "+REGEX); System.out.println("Current INPUT is: "+INPUT); System.out.println("lookingAt(): "+matcher.lookingAt()); System.out.println("matches(): "+matcher.matches()); }}
The above example compilation and running results are as follows:
Current REGEX is: fooCurrent INPUT is: fooooooooooooooooolookingAt(): truematches(): false
ReplaceFirst and replaceAll Methods
ReplaceFirst and replaceAll are used to replace the text that matches the regular expression. The difference is that replaceFirst replaces the first match, and replaceAll replaces all matches.
The following example illustrates this function:
import java.util.regex.Matcher;import java.util.regex.Pattern;public class RegexMatches{ private static String REGEX = "dog"; private static String INPUT = "The dog says meow. " + "All dogs say meow."; private static String REPLACE = "cat"; public static void main(String[] args) { Pattern p = Pattern.compile(REGEX); // get a matcher object Matcher m = p.matcher(INPUT); INPUT = m.replaceAll(REPLACE); System.out.println(INPUT); }}
The above example compilation and running results are as follows:
The cat says meow. All cats say meow.
AppendReplacement and appendTail Methods
The Matcher class also provides the appendReplacement and appendTail methods for text replacement:
Let's look at the example below to explain this function:
Import java. util. regex. matcher; import java. util. regex. pattern; public class RegexMatches {private static String REGEX = "a * B"; private static String INPUT = "aabfooaabfooabfoob"; private static String REPLACE = "-"; public static void main (String [] args) {Pattern p = Pattern. compile (REGEX); // obtain the matcher object Matcher m = p. matcher (INPUT); StringBuffer sb = new StringBuffer (); while (m. find () {m. appendReplacement (sb, REPLACE);} m. appendTail (sb); System. out. println (sb. toString ());}}
The above example compilation and running results are as follows:
-foo-foo-foo-
PatternSyntaxException Class Method
PatternSyntaxException is a non-forced exception class that indicates a syntax error in the regular expression mode.
The PatternSyntaxException class provides the following methods to help us check what errors have occurred.
| Serial number |
Method and description |
| 1 |
Public String getDescription () Gets the description of the error. |
| 2 |
Public int getIndex () Obtain the wrong index. |
| 3 |
Public String getPattern () Gets the wrong regular expression mode. |
| 4 |
Public String getMessage () Returns a multi-line string that contains a syntax error and its index description, a wrong regular expression pattern, and visualized indication of the wrong index in the pattern. |
Http://www.w3cschool.cc/java/java-regular-expressions.html