---------- Android training, Java training, and hope to communicate with you!
----------
A regular expression is a regular expression. You can use a regular expression to find the content that meets these rules in the string.
Function: used to operate strings.
Features: it is used for some specific symbols to represent some code operations. This simplifies writing.
Therefore, learning regular expressions is to learn the use of some special symbols.
Benefits: You can simplify complex operations on strings.
Disadvantages: the more symbols are defined, the longer the regular expression, and the worse the readability.
Regular Expressions are mainly used in the following four parts:
1. Match: String matches method. Match the entire string with the rule. If one of them does not match the rule, the match ends and false is returned.
2. Cut: String split () method.
3. Replace: String replaceall (RegEx, STR); If RegEx has a defined group, you can use the $ symbol in the second parameter to obtain the existing group in the regular expression.
4. Obtain: remove the sub-strings that match the rules in the string.
Construction of common Regular Expressions:
Character class
[ABC] A, B, or C (simple class)
[^ ABC] any character except A, B, or C (NO)
[A-Za-Z] letters from A to Z or from A to Z are included in the range)
[A-d [M-p] A to D or m to P: [A-DM-p] (union)
[A-Z & [DEF] D, E, or F (intersection)
[A-Z & [^ BC] A to Z, except for B and C: [ad-Z] (minus)
[A-Z & [^ m-p] A to Z, instead of m to P: [A-SCSI-Z] (minus)
Predefined character classes
. Any character (may or may not match the line terminator)
\ D Number: [0-9]
\ D non-numeric: [^ 0-9]
\ S blank character: [\ t \ n \ x0b \ f \ r]
\ S non-blank characters: [^ \ s]
\ W word character: [a-zA-Z_0-9]
\ W non-word characters: [^ \ W]
Boundary
^ Beginning of a row
$ End of a row
\ B word boundary
\ B Non-word boundary
\
End of a match on \ G
The end of the \ Z input. It is only used for the final terminator (if any)
\ Z input end
Greedy
Quantifiers
X? X, neither once nor once
X * X, zero or multiple times
X + X, once or multiple times
X {n} X, EXACTLY n times
X {n,} X, at least N times
X {n, m} X, at least N times, but not more than m times
Logical operators
XY x followed by Y
X | y X or Y
(X) x, used as the capture group
Back
Reference
\ N any matching nth capture group
Examples of these regular expressions:
1. Match: String matches method. Match the entire string with the rule. If one of them does not match the rule, the match ends and false is returned.
Verification mobile phone number: 13 XXXX, 15 XXXX, 18 xxxx
String Tel = "13900001111"; string telreg = "1 [358] \ D {9}"; // 1 [358] The first is 1, the second digit is 3, 5, or 8, and the \ D {9} is the 9-digit system after \ escape first. out. println (tel. matches (telreg ));
Verify that the email address is correct:
String mailReg = "\\w+@[a-zA-Z0-9]+(\\.[a-zA-Z]+){1,3}|\\w+";boolean falg = mail.matches(mailReg);
2. Cut: String split () method
Public static void main (string [] ARGs) {splitdemo ("zhangsan. lisi. wangwu ","\\. "); splitdemo (" C :\\ ABC \ a.txt "," \\\\ "); splitdemo (" erktyqqquizzzzzo ","(.) \ 1 + "); // cut by overlapping words. In order to make the results of the Rules reusable // You can encapsulate the rules into a group. Completed. The Group is numbered. // Start from 1. To use an existing group, you can use \ n (n is the group number .} Public static void splitdemo (string STR, string REG) {string [] arr = Str. split (REG); system. out. println (ARR. length); For (string S: ARR) {system. out. println (s );}}
3. Replace: String replaceall (RegEx, STR); if there is a defined group in RegEx, you can use the $ symbol in the second parameter to get the existing group in the regular expression.
Public static void main (string [] ARGs) {string STR = "wer138998425ty1234564uiod234345675f"; // Replace the array in the string #. Replacealldemo (STR, "\ D {5,}", "#"); // Replace the stacked words with # string str1 = "erktyqqquizzzzzo "; // replace overlapping characters with a single letter. Zzzz-> zreplacealldemo (str1 ,"(.) \ 1 + "," $1 "); // $1 indicates the encapsulated group content} public static void replacealldemo (string STR, string Reg, string newstr) {STR = Str. replaceall (Reg, newstr); system. out. println (STR );}
4. Get: remove the sub-strings that match the rules in the string.
Procedure:
1. encapsulate a regular expression into an object.
2. Associate the regular object with the string to be operated.
3. Obtain the regular expression matching engine after Association.
4. Use the engine to perform operations on the sub-strings that comply with the rules, such as extracting them.
Public static void main (string [] ARGs) {getdemo ();} public static void getdemo () {string STR = "Ming Tian JIU Yao Fang jia le, Da Jia. "; System. Out. println (STR); string Reg =" \ B [a-Z] {4} \ B "; // encapsulate the rule into an object. Pattern P = pattern. Compile (REG); // associate the regular object with the string to be applied. Obtain the matching object. Matcher M = P. matcher (STR); // system. Out. println (M. Matches (); // actually the matches method in the string class. It is completed using the pattern and matcher objects. // It is easy to use after being encapsulated by the string method. However, the function is single. While (M. find () {system. out. println (M. group (); system. out. println (M. start () + ".... "+ M. end ());}}
Exercise: Web Crawler
/* Web crawler (SPIDER) to obtain the information you want */import Java. io. *; import Java. util. regEx. *; import java.net. *; import Java. util. *; Class regextest2 {public static void main (string [] ARGs) throws exception {getmails_1 ();} /* get the email information in the specified webpage */public static void getmails_1 () throws exception {URL url = new URL ("http: // 192.168.1.254: 8080/myweb/mail.html "); // use the URL object to obtain URL Information: urlconnection conn = URL. openconnection (); // open the address bufferedreader bufin = New bufferedreader (New inputstreamreader (Conn. getinputstream (); // obtain the stream object string line = NULL for the specified page; string mailreg = "\ W + (\\. \ W +) + "; pattern P = pattern. compile (mailreg); While (line = bufin. readline ())! = NULL) {matcher M = P. matcher (line); While (M. find () {system. out. println (M. group () ;}}/ * get the email address in the specified document. Use the get function. Pattern matcher */public static void getmails () throws exception {bufferedreader bufr = new bufferedreader (New filereader ("mail.txt"); string line = NULL; string mailreg = "\ W + @ \ W + (\\. \ W +) + "; pattern P = pattern. compile (mailreg); While (line = bufr. readline ())! = NULL) {matcher M = P. matcher (line); While (M. Find () {system. Out. println (M. Group ());}}}}
---------- Android training, Java training, and hope to communicate with you!
----------
See http://edu.csdn.net/heima/ for details