In-depth analysis of Java Regular Expressions

Source: Internet
Author: User
Tags character classes

1. regex (Regular Expression): RegularExpressions (instead of StringTokenizer); string processing tool; popular in unix, perl uses regex.
It is mainly used for string matching, search, and replacement. For example, the matching IP address (with a range of less than 256) can be used with regular expressions. A large number of e-mail addresses are pulled from the webpage to send spam; links are pulled from the webpage. Contains Matcher (results produced after matching strings in the pattern) and pattern.

Copy codeThe Code is as follows:

/*
* Indicates whether the string matches the given regular expression (also a string ).
*/
System. out. println ("abc". matches ("..."); // each "." represents a single character.

Copy codeThe Code is as follows :/*
* Replace all numbers in the string with "-". The normal method requires charAt to judge one by one;
* "\ D" indicates any number or "[0-9]";
* "\ D" indicates any non-digit or "[^ 0-9]"
*/
System. out. println ("ab54564654sbg31646bshj". replaceAll ("[0-9]", "-"); // each "." represents one character.

II,Copy codeThe Code is as follows :/*
* Compile compiles the given regular expression into the mode (time required for each compilation); {3} indicates exactly three times.
* X {n} X, EXACTLY n times
* X {n,} X, at least n times
* X {n, m} X, at least n times, but not more than m times
*/
Pattern p = Pattern. compile ("[a-z] {3 }");
Matcher m = p. matcher ("ggs"); // create a Matcher that matches the specified input in this mode. A priority-state automatic mechanism (Compilation Principle) is actually created internally)
// The character String to be matched in matcher and matches is actually CharSequence (Interface), but String implements this interface and has Polymorphism
System. out. println (m. matches (); // If "ggss" does not match
// You can directly "ggs ". matches ("[a-z] {3}"), but the above is advantageous, at least high efficiency, and Pattern and Matcher provide a lot of functionality

3. in regex ". * +", the name is Meta Character. ctrl + shift + "/" indicates that the comment is replaced by "\", indicating that the comment is removed.Copy codeThe Code is as follows: "a". matches ("."); // true, "." indicates any character.
"Aa". matches ("aa"); // true, that is, a normal string can also be used as a regular expression.
/*
* True, "*" indicates 0 or multiple characters, but the following must be the same as the first character,
* Otherwise, false indicates whether the string is a single character string.
*/
"Aaaa". matches ("*");
"". Matches ("a *"); // true
"Aaa". matches ("? "); // True, once or 0
"". Matches ("? "); // True
"A". matches ("? "); // True
"544848154564113". matches ("\ d {3,100}"); // true
// This is the simplest IP address judgment, but it cannot be determined if it exceeds 255
"192.168.0.aaa". matches ("\ d {1, 3} \. \ d {1, 3} \. \ d {1, 3} \ d {1, 3 }");
"192". matches ("[0-2] [0-9] [0-9]");

4. [abc] indicates matching any character. [^ abc] indicates that other letters except abc are generated (must be letters, and false is returned if it is an empty string; [a-zA-Z] is equivalent to "[a-z] | [A-Z]" whether it is a uppercase or lowercase letter; [A-Z & [ABS] indicates any of ABS in uppercase letters.Copy codeThe Code is as follows: // discover | and | no difference.
System. out. println ("C". matches ("[A-Z & [ABS]"); // false
System. out. println ("C". matches ("[A-Z & [ABS]"); // true
System. out. println ("A". matches ("[A-Z & [ABS]"); // true
System. out. println ("A". matches ("[A-Z & [ABS]"); // true
System. out. println ("C". matches ("[A-Z | [ABS]"); // true
System. out. println ("C". matches ("[A-Z | [ABS]"); // true

5. \ w word character: [a-zA-Z_0-9] for username matching; \ s blank character: [\ t \ n \ x0B \ f \ r]; \ S non-blank character: [^ \ s]; \ W non-word characters: [^ \ w].Copy codeThe Code is as follows: "\ n \ t \ r". matches ("\ s {4}"); // true
"". Matches ("\ S"); // false
"A_8". matches ("\ w {3}"); // true
// "+" Indicates one or more times
"Abc888 & ^ %". matches ("[a-z] {1, 3} \ d + [& ^ # %] +"); // true
/*
* The character to be matched is only a backslash, but it cannot be written as "\", so it is combined with "Next,
* If the preceding "cannot be matched, CE will occur.
* "\" Cannot be written later, so it will run incorrectly (compilation is normal) and must be written "\\\\"
*/
System. out. println ("\". matches ("\\\\"); // true

6. POSIX character classes (US-ASCII only)Copy codeThe Code is as follows: \ p {Lower} lowercase letter: [a-z]; \ p {Upper} uppercase letter: [A-Z]; \ p {ASCII} All ASCII: [\ x00-\ x7F]; \ p {Alpha} letter: [\ p {Lower} \ p {Upper}]; \ p {Digit} decimal number: [0-9].

VII. Border matching
^ Beginning of a row
$ End of a row
\ B word boundary
\ B Non-word boundary
\
End of a match on \ G
The end of the \ Z input. It is only used for the final terminator (if any)
\ Z input endCopy codeThe Code is as follows: "hello world". matches ("^ h. *"); // starts with a ^ line.
"Hello world". matches (". * ld $"); // end of the $ row
"Hello world". matches ("^ h [a-z] {1, 3} o \ B. *"); // \ B word boundary
"Helloworld". matches ("^ h [a-z] {1, 3} o \ B .*");

"\ N ". matches ("^ [\ s & [^ \ n] * \ n $"); // determines whether a blank line starts with a blank line.

8. You can also use the m. start () and m. end () methods in the find method to return the next one of the start and end positions. If not, an error occurs.Copy codeThe Code is as follows: Pattern p = Pattern. compile ("\ d {3, 5 }");
String s = "133-34444-333-00 ";
Matcher m = p. matcher (s );
M. matches (); // matches all strings
M. reset ();
/*
* If the reset method is called first, true, and false are output.
* Otherwise, the penultimate find statement also outputs false.
* The reasons are as follows:
* Matches the first "-" and finds that it does not match, but the four characters have been eaten.
* 34444 starts, and the second find starts from 333, because find matches the next subsequence.
* The reset method is used to spit out the strings that matches eats.
* In summary, reset must be used between matches and find, because the two affect each other.
*
*/
M. find ();
M. find ();
M. find (); // try to find the next subsequence of the input sequence that matches the pattern
M. find ();
/*
* Try to match the input sequence starting from the beginning of the region with the pattern.
* The author of Thinking in java severely criticized this method, because it cannot be seen from the literal where the matching starts.
* All of the following values are true, because each time we start from scratch
*/
M. lookingAt ();
M. lookingAt ();
M. lookingAt ();
M. lookingAt ();

9. String replacementCopy codeThe Code is as follows: import java. util. regex. Matcher;
Import java. util. regex. Pattern;

Public class TestRegexReplacement {

Public static void main (String [] args ){

Pattern p = Pattern. compile ("java", Pattern. CASE_INSENSITIVE); // The following parameter indicates "case insensitive"
Matcher m = p. matcher ("Java java hxsyl Ilovejava java JaVaAcmer ");
While (m. find ()){
System. out. println (m. group (); // m. group will output all java (Case Insensitive)

}

String s = m. replaceAll ("Java"); // This method is also available in String
System. out. println (s );

M. reset (); // must be added because find and matcher affect each other.
StringBuffer sb = new StringBuffer ();
Int I = 0;
/*
* The following method replaces the odd number of java found with "Java", and the even number with "java"
*/
While (m. find ()){
I ++;
// Cannot be directly written as I & 1 and must be converted to boolean
If (I & 1) = 1 ){
M. appendReplacement (sb, "Java ");
} Else {
M. appendReplacement (sb, "java ");
}
}

M. appendTail (sb); // Add the remaining string behind the last java
System. out. println (sb); // If reset is not added, only Acmer is output.
}
}

10. GroupCopy codeThe Code is as follows :/*
* Parentheses are added separately, which are not the outermost braces. The first left parenthesis is the first one.
*/
Pattern p = Pattern. compile ("(\ d {3, 5}) ([a-z] {2 })");
String s = "123aaa-77878bb-646dd-00 ";
Matcher m = p. matcher (s );
While (m. find ()){
System. out. println (m. group ());
System. out. println (m. group (1); // outputs the numbers that match each pair.
System. out. println (m. group (2); // output each pair of matching letters
}

11. Capture emails on webpagesCopy codeThe Code is as follows: import java. io. BufferedReader;
Import java. io. FileNotFoundException;
Import java. io. FileReader;
Import java. io. IOException;
Import java. util. regex. Matcher;
Import java. util. regex. Pattern;

/*
* If you need a method, first use the method name.
* Press ctrl + 1 to list the recommendations. The system creates this method.
*/
Public class EmailSpider {

Public static void main (String [] args ){
// TODO Auto-generated method stub
Try {
BufferedReader br = new BufferedReader (new FileReader ("F: \ regex.html "));
String line = "";
Try {
While (line = br. readLine ())! = Null ){
Solve (line );
}
} Catch (IOException e ){
// TODO Auto-generated catch block
E. printStackTrace ();
}

} Catch (FileNotFoundException e ){
// TODO Auto-generated catch block
E. printStackTrace ();
}

}

Private static void solve (String line ){
// TODO Auto-generated method stub
// If the regular expression does not meet the corresponding function, no error occurs because it is a string.
Pattern p = Pattern. compile ("[\ w [. -] + @ [\ w [. -] + \\. [\ w] + ");
Matcher m = p. matcher (line );

While (m. find ()){
System. out. println (m. group ());
}

}

}

12. Code statisticsCopy codeThe Code is as follows: View Code
/*
* Count the number of blank lines, comment lines, and program lines in the code.
* You can also use startsWith and endsWith in String.
* If it is used by the project manager, the number of characters in each line must be counted as {; To prevent laziness.
*/
Import java. io. BufferedReader;
Import java. io. File;
Import java. io. FileNotFoundException;
Import java. io. FileReader;
Import java. io. IOException;

Public class CoderCount {

Static long normalLines = 0;
Static long commentLines = 0;
Static long whiteLines = 0;

Public static void main (String [] args ){
File f = new File ("D: \ share \ src ");
File [] codeFiles = f. listFiles ();
For (File child: codeFiles ){
If (child. getName (). matches (". * \. java $ ")){
Solve (child );
}
}

System. out. println ("normalLines:" + normalLines );
System. out. println ("commentLines:" + commentLines );
System. out. println ("whiteLines:" + whiteLines );

}

Private static void solve (File f ){
BufferedReader br = null;
Boolean comment = false;
Try {
Br = new BufferedReader (new FileReader (f ));
String line = "";
While (line = br. readLine ())! = Null ){
/*
* // Some comments have a tab in front of the line
* It cannot be written after readLine.
* The pointer is null for the last row.
*/
Line = line. trim ();
// Read the string in readLine and remove the line feed.
If (line. matches ("^ [\ s & [^ \ n] * $ ")){
WhiteLines ++;
} Else if (line. startsWith ("/*")&&! Line. endsWith ("*/")){
CommentLines ++;
Comment = true;
} Else if (line. startsWith ("/*") & line. endsWith ("*/")){
CommentLines ++;
} Else if (true = comment ){
CommentLines ++;
If (line. endsWith ("*/")){
Comment = false;
}
} Else if (line. startsWith ("//")){
CommentLines ++;
} Else {
NormalLines ++;
}
}
} Catch (FileNotFoundException e ){
E. printStackTrace ();
} Catch (IOException e ){
E. printStackTrace ();
} Finally {
If (br! = Null ){
Try {
Br. close ();
Br = null;
} Catch (IOException e ){
E. printStackTrace ();
}
}
}
}

}

XIII. Quantifiers
Including? * +; The default values are Greedy and Reluctant and Possessive ).Copy codeThe Code is as follows: // Add the group to make it clearer.
Pattern p = Pattern. compile ("(. {3, 10}) + [0-9]");
String s = "aaaa5bbbb6"; // The length is 10.
Matcher m = p. matcher (s );
/*
* Currently, the output is 0-10. The default value is Greedy. First, it is swallowed up with 10 characters. If no match is found, it is spit out and a match is found;
* If it is Pattern. compile ("(. {3, 10 }?) + [0-9] ") is Reluctant, so it is swallowed up with three characters first. If no matching is found, it is swallowed up to know the matching, and the output is 0 to 5;
* If it is Pattern. compile ("(. {3, 10} + +) + [0-9] ") is Possessive (Exclusive type). It is also swallowed up with 10 characters but does not spit out, so it does not match,
* This method is mainly used in areas requiring high efficiency (with errors ).
*/
If (m. find ()){
System. out. println (m. start () + "----" + m. end ());
} Else {
System. put. println ("Not match! ");
}

14. Supplement (non-capturing Group)Copy codeThe Code is as follows: // The non-capturing group is opposite to the word surface, meaning that if it is consistent, the capture
Pattern p = Pattern. compile ("(? = A). {3 }");
/*
* Output a66, which must start with a. You can also write Pattern. compile ("[a]. {2 }");
* If it is Pattern. compile (". {3 }(?! = A) ") not end with a {2} [^ a], but the next character is not a (lookahead) and output 44a, 66b, so this usage is not commonly used;
* If Pattern. compile (". {3 }(? = A), the output is 444 (because? = A is lookahead). If it is put in front, it is included in the group, and later it is not included in the group;
*
*
*/
String s = "444a66b ";
Matcher m = p. matcher (s );
While (m. find ()){
System. out. println (m. group ());
}

15. Back ReferenceCopy codeThe Code is as follows: Pattern p = Pattern. compile ("(\ d) \ 1 ");
/*
* If the output is true and \ 1 is the same as that in the first group, it is incorrect if it is changed to 1213;
* If Pattern. compile ("(\ d) \ 2"), you need to change it to 122.
*
*/
String s = "1212 ";
Matcher m = p. matcher (s );
System. out. println (m. matches ());

16. Short for flags
"." Does not match the line feed. Remember CASE_INSENSITIVE. It can be abbreviated as "using an embedded flag expression (? I) You can also enable case-insensitive matching ".

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.