JAVA 65th-Regular Expressions
Regular Expression: It is mainly used in operation strings and is embodied by some specific symbols.
Example:
QQ number verification
6 ~ 9-digit, 0 cannot begin with, must be a number
The matches method in the String class
matches(String regex)
Indicates whether the string matches the given regular expression.
Regex is the given regular expression.
Public static void checkQQ () {// the first digit is 1-9, and the second digit is 0-9, the remaining digits except the first digit range are 5 to 8 digits String regex = "[1-9] [0-9] {5, 8 }"; // Regular Expression String qq = "123459"; boolean flag = qq. matches (regex); System. out. println (qq + ":" + flag );}
PS: Regular Expressions simplify writing, but the code reading is very poor.
Symbolic Meaning
Regular Expressions are hard to contain too many symbols.
Predefined character classes |
. |
Any character (may or may not match the line terminator) |
\ D |
Number:[0-9] |
\ D |
Non-numeric:[^ 0-9] |
\ S |
Blank characters:[\ T \ n \ x0B \ f \ r] |
\ S |
Non-blank characters:[^ \ S] |
\ W |
Word character:A-zA-Z_0-9 |
\ W |
Non-word characters:[^ \ W] |
Character class |
[Abc] |
A,BOrC(Simple class) |
[^ Abc] |
Any characterA,BOrC(No) |
[A-zA-Z] |
AToZOrAToZ, Two letters included (range) |
[A-d [m-p] |
AToDOrMToP:[A-dm-p](Union) |
[A-z & [def] |
D,EOrF(Intersection) |
[A-z & [^ bc] |
AToZ,BAndC:[Ad-z](Minus) |
[A-z & [^ m-p] |
AToZ, RatherMToP:[A-SCSI-z](Minus) |
Boundary |
^ |
Start of a row |
$ |
End of a row |
\ B |
Word boundary |
\ B |
Non-word boundary |
\ |
Start of input |
\ G |
Last matched end |
\ Z |
The end of the input. It is only used for the last terminator (if any) |
\ Z |
End of input |
|
Greedy quantifiers |
X? |
X, Neither once nor once |
X* |
X, Zero or multiple times |
X+ |
X, Once or multiple times |
X{N} |
X, ExactlyNTimes |
X{N,} |
X, At leastNTimes |
X{N,M} |
X, At leastNTimes, but no moreMTimes |
Logical operators |
XY |
XFollowedY |
X|Y |
XOrY |
(X) |
X, used as a capture group |
Back Reference |
\N |
Any matchedNTh capture group |
Public static void check () {String string = "aoooooz"; String regex = "ao {4,} z"; // Regular Expression boolean flag = string. matches (regex); System. out. println (string + ":" + flag );}
Common functions
1. Match 2. Cut 3. Replace 4. Get
Match: The matches method in the String class is used.
Public static void check () {// match whether the mobile phone number is correct String tel = "18753377511"; // The first is 1, the second digit is 3, 5, or 8 // String regex = "1 [358] [0-9] {9 }"; string regex = "1 [358] \ d {9}"; // \ In the String, which indicates escape, therefore, add "\" to escape "\" boolean flag = tel. matches (regex); System. out. println (tel + ":" + flag );}
Cut: it is the split (String regex) method in the String class used previously. It used to be "". Generally, non-special characters with spaces can be considered as rules.
Space
Public static void check () {// split by space. The space may appear multiple times String str = "a B c d e f"; String regex = "+ "; // String [] line = str. split (regex); for (String I: line) {System. out. println (I );}}
Point, PS: The point itself is a special match in the Regular Expression
String str = ". b. c. d. e .. f "; String regex = "\\. + ";//\. after escaping ., so add another \ String [] line = str. split (regex );
Separated by overlapping words
Regular Expressions use () to encapsulate groups
Therefore, overlapping words can be expressed as.. represents any character, (.) is encapsulated into groups, (.) \ 1, indicating that the rest is the same as that of the first group.
String str = "a@@@b####c...dtttef";String regex = "(.)\\1+";//String[] line = str.split(regex);
GROUP: (A) (B (C) which groups are there?
Number of parentheses,
(A) (B (C) 1
(A) 2
(B (C) 3
(C) 4
There are 0th groups without parentheses.
Replace:
replaceAll(String regex,String replacement)
Replace all the substrings matching the given regular expression with the given replacement.
replaceFirst(String regex,String replacement)
Replace the string with the given replacement to match the first substring of the given regular expression.
Public static void check () {// Replace the stacked word with a String str = "abgggggcffffdggggs"; String regex = "(.) \ 1 + "; // str = str. replaceAll (regex, "$1"); System. out. println (str );}
PS: dollar signs can be used to obtain existing regular rules in the previous parameter among other parameters.
public static void check() {//18753377511 -> 187****7511String str = "18753377511";String regex = "(\\d{3})\\d{4}(\\d{4})";System.out.println(str);str = str.replaceAll(regex, "$1****$2");System.out.println(str);}
Obtain
Regular Expressions are an object.
Pattern class
The regular expression specified as a string must first be compiled into an instance of this class. Then, you can use the obtained mode to createMatcher
Object. According to the regular expression, this object can be
Character Sequence
Match. All statuses involved in the execution match reside in the same pattern. Therefore, multiple matching instances can share the same pattern.
// Encapsulate regular rules into objects
// Pattern p = Pattern. compile ("a * B ");
// Associate the matcher method string of the regular object to obtain the Matcer object for string operations.
// Matcher m = p. matcher ("aaaab ");
// Operate the string using the Matcher object Method
// Boolean B = m. matches ();
Matcher class
matches
Method to match the entire input sequence with this pattern.
lookingAt
Try to match the input sequence from the beginning to the pattern.
find
The method scans the input sequence to find the next subsequence that matches the pattern.
Public static void check () {String str = "ni hao, wohao, ta ye hao "; string regex = "\ B [a-z] {3} \ B"; // \ B: Word boundary Pattern p = Pattern. compile (regex); Matcher m = p. matcher (str); while (m. find () // to obtain the {System. out. println (m. group (); System. out. println (m. start () + ":" + m. end (); // obtain the starting subscript }}
Exercise:
Change aaa... aa... aaa... bbb... B... bbb... ccc to abcd.
Public static void test () {String str = "aaa... aa .. aaa... bbb... b... bbb... ccc... ccc "; System. out. println (str); String regex = "\\. + "; str = str. replaceAll (regex, ""); // click regex = "(.) \ 1 + "; str = str. replaceAll (regex, "$1"); // deprecated System. out. println (str );}
Sort IP addresses
Public static void test () {// String str = "192.0.0.1 127.0.0.24 3.3.3.5 150.15.3.41"; // System. out. println ("ip:" + str); // String regex = "+"; // String [] strs = str. split (regex); // TreeSet
Ts = new TreeSet
(); // Automatic Sorting // for (String s: strs) {// ts. add (s); //} // for (String s: ts) {// sort by String // System. out. println (s); // so in each segment of each ip address, use two zeros to complete String str = "192.0.0.1 127.0.0.24 3.3.3.5 150.15.3.41 "; string regex = "(\ d +)"; str = str. replaceAll (regex, "00 $1"); System. out. println ("fill 0:" + str); regex = "0 * (\ d {3})"; str = str. replaceAll (regex, "$1"); System. out. println ("reserved 3 bits:" + str); regex = "+"; String [] strs = str. split (regex); TreeSet
Ts = new TreeSet
(); // Automatic Sorting for (String s: strs) {ts. add (s) ;}for (String s: ts) {System. out. println (s. replaceAll ("0 * (\ d +)", "$1 "));}}
Simple email address verification
Public static void test () {String mail = "aa_a@163.com.cn"; String regex = "\ w + (\\. [a-zA-Z] {2, 3}) + "; // + represents one or more boolean flag = mail. matches (regex); System. out. println (mail + ":" + flag );}
Note: during development, the regular expression reading is poor and will be continuously verified and encapsulated.
Exercise: Web Crawler: a program is used to obtain data that meets specified rules on the Internet.
Crawl the email address.
Public class asd {public static void main (String [] args) throws Exception {// List
List = getmail (); // Local List
List = getweb (); // network for (String I: list) {System. out. println (I) ;}} public static List
Getweb () throws Exception {// URL url = new URL ("http: // 192.168.0.1: 8080/myweb/mymail.html "); URL url = new URL ("http://news.baidu.com/"); BufferedReader brin = new BufferedReader (new InputStreamReader (url. openStream (); String mail_regex = "\ w + (\\. \ w) + "; Pattern p = Pattern. compile (mail_regex); List
List = new ArrayList
(); String line = null; while (line = brin. readLine ())! = Null) {Matcher m = p. matcher (line); while (m. find () {list. add (m. group () ;}} return list;} public static List
Getmail () throws Exception {// 1. read the source file BufferedReader br = new BufferedReader (new FileReader ("g: \ mymail.html"); String mail_regex = "\ w + @ \ w + (\\. \ w) + "; Pattern p = Pattern. compile (mail_regex); List
List = new ArrayList
(); String line = null; // 2. Match the read data rule to obtain the data that complies with the rule while (line = br. readLine ())! = Null) {Matcher m = p. matcher (line); while (m. find () {// 3. store data that meets the rules to the collection list. add (m. group () ;}} return list ;}}