Java Regular Expressions
As we all know, in program development, it is inevitable that a string needs to be matched, searched, replaced, and judged. These situations are sometimes complicated. If they are solved in pure encoding mode, it often wastes the programmer's time and energy. Therefore, learning and using regular expressions have become the main means to solve this contradiction.
As we all know, regular expressions are a specification that can be used for pattern matching and replacement. a regular expression is composed of common characters (such as characters a to z) and special characters (metacharacters) it is used to describe one or more strings to be matched when the text subject is searched. A regular expression is used as a template to match a character pattern with the searched string.
I. What is a regular expression?
1. Definition:A regular expression is a regular expression that can be used for pattern matching and replacement. a regular expression consists of common characters (such as characters a to z) and special characters (metacharacters) it is used to describe one or more strings to be matched when the text subject is searched. A regular expression is used as a template to match a character pattern with the searched string.
2. Purpose:
String Matching (character matching)
String search
String replacement
String segmentation
For example:
Pull the email address from the webpage
Whether the IP address is correct
Pull the link from the webpage
3. Classes for processing Regular Expressions in java:
Java. lang. String
Java. util. regex. Pattern: Pattern class: The Pattern in which the string is to be matched. The Pattern itself has been compiled and is much more efficient to use.
Java. util. regex. Matcher: matching class: This pattern matches the results produced by a string, and there may be many results.
4: Here is a simple introduction to regular expressions through a small program.
Import java. util. regex. matcher; import java. util. regex. pattern; public class Test {public static void main (String [] args) {// matches () determines whether the String matches an expression ,". "represents any character p (" abc ". matches ("... "); // replace the number in the string" a2389a "with *, and \ d indicates the" 0-9 "number p (" a2389a ". replaceAll ("\ d", "*"); // compile any string that is a -- z with a length of 3, in this way, the matching speed can be accelerated. Pattern p = Pattern. compile ("[a-z] {3}"); // match and put the matching result in the Matcher object Matcher m = p. matcher ("abc"); p (m. matches (); // the above three lines of code can be replaced by the following line of code p ("abc ". matches ("[a-z] {3}");} public static void p (Object o) {System. out. println (o );}}
The following is the result.
True
A *****
True
True
Now we use some experiments to illustrate the matching rules of regular expressions. Here we use the Greedy method.
. Any character
A? A does not exist once or once.
A * a zero or multiple times
A + a once or multiple times
A {n }? A EXACTLY n times
A {n ,}? A must be at least n times
A {n, m }? A must be at least n times, but cannot exceed m times
// Preliminary understanding. * +?
P ("a". matches ("."); // true
P ("aa". matches ("aa"); // true
P ("aaaa". matches ("a *"); // true
P ("aaaa". matches ("a +"); // true
P ("". matches ("a *"); // true
P ("aaaa". matches ("? "); // False
P ("". matches ("? "); // True
P ("a". matches ("? "); // True
P ("1232435463685899". matches ("\ d {3,100}"); // true
P ("192.168.0.aaa ". matches ("\ d {1, 3 }\\. \ d {1, 3 }\\. \ d {1, 3 }\\. \ d {1, 3} "); // false
P ("192". matches ("[0-2] [0-9] [0-9]"); // true
[Abc] a, B, or c (simple class)
[^ Abc] any character except a, B, or c (NO)
[A-zA-Z] letters from a to z or from A to Z are included in the range)
[A-d [m-p] a to d or m to p: [a-dm-p] (union)
[A-z & [def] d, e, or f (intersection)
[A-z & [^ bc] a to z, except for B and c: [ad-z] (minus)
[A-z & [^ m-p] a to z, instead of m to p: [a-SCSI-z] (minus)
// Range
P ("a". matches ("[abc]"); // true
P ("a". matches ("[^ abc]"); // false
P ("A". matches ("[a-zA-Z]"); // true
P ("A". matches ("[a-z] | [A-Z]"); // true
P ("A". matches ("[a-z [A-Z]"); // true
P ("R". matches ("[A-Z & [RFG]"); // true
\ D Number: [0-9]
\ D non-numeric: [^ 0-9]
\ S blank character: [\ t \ n \ x0B \ f \ r]
\ S non-blank characters: [^ \ s]
\ W word character: [a-zA-Z_0-9]
\ W non-word characters: [^ \ w]
// Recognize \ s \ w \ d \
P ("\ n \ r \ t". matches ("\ s (4)"); // false
P ("". matches ("\ S"); // false
P ("a_8". matches ("\ w (3)"); // false
P ("abc888 & ^ %". matches ("[a-z] {1, 3} \ d + [& ^ # %] +"); // true
P ("\". matches ("\\\\"); // true
Boundary
^ Beginning of a row
$ End of a row
\ B word boundary
\ B Non-word boundary
\
End of a match on \ G
The end of the \ Z input. It is only used for the final terminator (if any)
\ Z input end
// Boundary match
P ("hello sir". matches ("^ h. *"); // true
P ("hello sir". matches (". * ir $"); // true
P ("hello sir". matches ("^ h [a-z] {1, 3} o \ B. *"); // true
P ("hellosir". matches ("^ h [a-z] {1, 3} o \ B. *"); // false
// Blank line: one or more (blank and non-line break) start with and end with a line break
P ("\ n". matches ("^ [\ s & [^ \ n] * \ n $"); // true
Method Analysis
Matches (): match the entire string
Find (): match the substring
LookingAt (): always starts from the beginning of the entire string.
// Email
P ("asdsfdfagf@adsdsfd.com ". matches ("[\ w [. -] + @ [\ w [. -] + \\. [\ w] + "); // true
// Matches () find () lookingAt ()
Pattern p = Pattern. compile ("\ d {3, 5 }");
Matcher m = p. matcher ("123-34345-234-00 ");
// Use the Regular Expression Engine to search for and match the entire "123-34345-234-00". When the first "-" does not match, it stops,
// But will not spit out the unmatched "-"
P (m. matches ());
// Spit out the unmatched "-"
M. reset ();
// 1: The current surface has p (m. matches (); find the substring starting from "... 34345-234-00"
// The result is that the second and second queries "34345" and "234" fail to be found. The value is false.
// 2: The current surface has p (m. matches (); and m. reset (); The substring starts from "123-34345-234-00 ".
// True, false
P (m. find ());
P (m. start () + "---" + m. end ());
P (m. find ());
P (m. start () + "---" + m. end ());
P (m. find ());
P (m. start () + "---" + m. end ());
P (m. find ());
// If it is not found, an exception occurs in java. lang. IllegalStateException.
// P (m. start () + "---" + m. end ());
P (m. lookingAt ());
P (m. lookingAt ());
P (m. lookingAt ());
P (m. lookingAt ());
String replacement: the following method is very flexible for string replacement.
// String replacement
// Pattern. CASE_INSENSITIVE case insensitive
Pattern p = Pattern. compile ("java", Pattern. CASE_INSENSITIVE );
Matcher m = p. matcher ("java Java jAva ILoveJavA youHateJAVA adsdsfd ");
// Store strings
StringBuffer buf = new StringBuffer ();
// Count the parity
Int I = 0;
While (m. find ()){
I ++;
If (I % 2 = 0 ){
M. appendReplacement (buf, "java ");
} Else {
M. appendReplacement (buf, "JAVA ");
}
}
// If this clause is not added, the string adsdsfd will be abandoned.
M. appendTail (buf );
P (buf );
Result printing:
JAVA java ILovejava youHateJAVA adsdsfd
Group
// Group, ()
Pattern p = Pattern. compile ("(\ d {3, 5}) ([a-z] {2 })");
String s = "123aa-34345bb-234cc-00 ";
Matcher m = p. matcher (s );
P (m. groupCount (); // Group 2
While (m. find ()){
P (m. group (); // There are both numbers and letters
// P (m. group (1); // only a number is allowed.
// P (m. group (2); // only letters are allowed.
}
Ii. simple use of regular expressions
Java Regular Expression