JAVA learning lesson 65th-regular expressions, java Regular Expressions

Source: Internet
Author: User
Tags character classes

JAVA learning lesson 65th-regular expressions, java Regular Expressions

Regular Expression: It is mainly used in operation strings and is embodied by some specific symbols.

Example:

QQ number verification

6 ~ 9-digit, 0 cannot begin with, must be a number

The matches method in the String class

matches(String regex)
Indicates whether the string matches the given regular expression.

Regex is the given regular expression.

Public static void checkQQ () {// the first digit is 1-9, and the second digit is 0-9, the remaining digits except the first digit range are 5 to 8 digits String regex = "[1-9] [0-9] {5, 8 }"; // Regular Expression String qq = "123459"; boolean flag = qq. matches (regex); System. out. println (qq + ":" + flag );}

PS: Regular Expressions simplify writing, but the code reading is very poor.

Symbolic Meaning

Regular Expressions are hard to contain too many symbols.

Predefined character classes
. Any character (may or may not match the line terminator)
\ D Number:[0-9]
\ D Non-numeric:[^ 0-9]
\ S Blank characters:[\ T \ n \ x0B \ f \ r]
\ S Non-blank characters:[^ \ S]
\ W Word character:A-zA-Z_0-9
\ W Non-word characters:[^ \ W]
Character class
[Abc] A,BOrC(Simple class)
[^ Abc] Any characterA,BOrC(No)
[A-zA-Z] AToZOrAToZ, Two letters included (range)
[A-d [m-p] AToDOrMToP:[A-dm-p](Union)
[A-z & [def] D,EOrF(Intersection)
[A-z & [^ bc] AToZ,BAndC:[Ad-z](Minus)
[A-z & [^ m-p] AToZ, RatherMToP:[A-SCSI-z](Minus)

Boundary
^ Start of a row
$ End of a row
\ B Word boundary
\ B Non-word boundary
\ Start of input
\ G Last matched end
\ Z The end of the input. It is only used for the last terminator (if any)
\ Z End of input

Greedy quantifiers
X? X, Neither once nor once
X* X, Zero or multiple times
X+ X, Once or multiple times
X{N} X, ExactlyNTimes
X{N,} X, At leastNTimes
X{N,M} X, At leastNTimes, but no moreMTimes

Logical operators
XY XFollowedY
X|Y XOrY
(X) X, used as a capture group

Back Reference
\N Any matchedNTh capture group

<Span style = "white-space: pre"> </span> public static void check () {<span style = "white-space: pre "> </span> String string =" aoooooz "; <span style =" white-space: pre "> </span> String regex =" ao {4 ,} z "; // Regular Expression <span style =" white-space: pre "> </span> boolean flag = string. matches (regex); <span style = "white-space: pre"> </span> System. out. println (string + ":" + flag); <span style = "white-space: pre"> </span>}

Common functions

1. Match 2. Cut 3. Replace 4. Get

Match: The matches method in the String class is used.

Public static void check () {// match whether the mobile phone number is correct String tel = "18753377511"; // The first is 1, the second digit is 3, 5, or 8 // String regex = "1 [358] [0-9] {9 }"; string regex = "1 [358] \ d {9}"; // \ In the String, which indicates escape, therefore, add "\" to escape "\" boolean flag = tel. matches (regex); System. out. println (tel + ":" + flag );}


Cut: it is the split (String regex) method in the String class used previously. It used to be "". Generally, non-special characters with spaces can be considered as rules.

Space

Public static void check () {// split by space. The space may appear multiple times String str = "a B c d e f"; String regex = "+ "; // String [] line = str. split (regex); for (String I: line) {System. out. println (I );}}
Point, PS: The point itself is a special match in the Regular Expression
String str = ". b. c. d. e .. f "; String regex = "\\. + ";//\. after escaping ., so add another \ String [] line = str. split (regex );

Separated by overlapping words

Regular Expressions use () to encapsulate groups

Therefore, overlapping words can be expressed as.. represents any character, (.) is encapsulated into groups, (.) \ 1, indicating that the rest is the same as that of the first group.

String str = "a@@@b####c...dtttef";String regex = "(.)\\1+";//String[] line = str.split(regex);

GROUP: (A) (B (C) which groups are there?

Number of parentheses,

(A) (B (C) 1

(A) 2

(B (C) 3

(C) 4

There are 0th groups without parentheses.


Replace:

replaceAll(String regex,String replacement)
Replace all the substrings matching the given regular expression with the given replacement.

replaceFirst(String regex,String replacement)
Replace the string with the given replacement to match the first substring of the given regular expression.

Public static void check () {// Replace the stacked word with a String str = "abgggggcffffdggggs"; String regex = "(.) \ 1 + "; // str = str. replaceAll (regex, "$1"); System. out. println (str );}

PS: dollar signs can be used to obtain existing regular rules in the previous parameter among other parameters.

public static void check() {//18753377511 -> 187****7511String str = "18753377511";String regex = "(\\d{3})\\d{4}(\\d{4})";System.out.println(str);str = str.replaceAll(regex, "$1****$2");System.out.println(str);}

Obtain

Regular Expressions are an object.

Pattern class

The regular expression specified as a string must first be compiled into an instance of this class. Then, you can use the obtained mode to createMatcherObject. According to the regular expression, this object can beCharacter SequenceMatch. All statuses involved in the execution match reside in the same pattern. Therefore, multiple matching instances can share the same pattern.

// Encapsulate regular rules into objects
// Pattern p = Pattern. compile ("a * B ");
// Associate the matcher method string of the regular object to obtain the Matcer object for string operations.
// Matcher m = p. matcher ("aaaab ");
// Operate the string using the Matcher object Method
// Boolean B = m. matches ();

Matcher class

  • matchesMethod to match the entire input sequence with this pattern.

  • lookingAtTry to match the input sequence from the beginning to the pattern.

  • findThe method scans the input sequence to find the next subsequence that matches the pattern.

Public static void check () {String str = "ni hao, wohao, ta ye hao "; string regex = "\ B [a-z] {3} \ B"; // \ B: Word boundary Pattern p = Pattern. compile (regex); Matcher m = p. matcher (str); while (m. find () // to obtain the {System. out. println (m. group (); System. out. println (m. start () + ":" + m. end (); // obtain the starting subscript }}

Exercise:

Change aaa... aa... aaa... bbb... B... bbb... ccc to abcd.

Public static void test () {String str = "aaa... aa .. aaa... bbb... b... bbb... ccc... ccc "; System. out. println (str); String regex = "\\. + "; str = str. replaceAll (regex, ""); // click regex = "(.) \ 1 + "; str = str. replaceAll (regex, "$1"); // deprecated System. out. println (str );}
Sort IP addresses

Public static void test () {// String str = "192.0.0.1 127.0.0.24 3.3.3.5 150.15.3.41"; // System. out. println ("ip:" + str); // String regex = "+"; // String [] strs = str. split (regex); // TreeSet <String> ts = new TreeSet <String> (); // auto sort // for (String s: strs) {// ts. add (s); //} // for (String s: ts) {// sort by String // System. out. println (s); // so in each segment of each ip address, use two zeros to complete String str = "192.0.0.1 127.0.0.24 3.3.3.5 150.15.3.41 "; string regex = "(\ d +)"; str = str. replaceAll (regex, "00 $1"); System. out. println ("fill 0:" + str); regex = "0 * (\ d {3})"; str = str. replaceAll (regex, "$1"); System. out. println ("reserved 3 bits:" + str); regex = "+"; String [] strs = str. split (regex); TreeSet <String> ts = new TreeSet <String> (); // Automatic Sorting for (String s: strs) {ts. add (s) ;}for (String s: ts) {System. out. println (s. replaceAll ("0 * (\ d +)", "$1 "));}}

Simple email address verification

Public static void test () {String mail = "aa_a@163.com.cn"; String regex = "\ w + (\\. [a-zA-Z] {2, 3}) + "; // + represents one or more boolean flag = mail. matches (regex); System. out. println (mail + ":" + flag );}

Note: during development, the regular expression reading is poor and will be continuously verified and encapsulated.


Exercise: Web Crawler: a program is used to obtain data that meets specified rules on the Internet.

Crawl the email address.

Public class asd {public static void main (String [] args) throws Exception {// List <String> list = getmail (); // Local List <String> list = getweb (); // network for (String I: list) {System. out. println (I) ;}} public static List <String> getweb () throws Exception {// URL url = new URL ("http: // 192.168.0.1: 8080/myweb/mymail.html "); URL url = new URL (" http://news.baidu.com/"); BufferedReader brin = new BufferedReader (new Input StreamReader (url. openStream (); String mail_regex = "\ w + (\\. \ w +) + "; Pattern p = Pattern. compile (mail_regex); List <String> list = new ArrayList <String> (); String line = null; while (line = brin. readLine ())! = Null) {Matcher m = p. matcher (line); while (m. find () {list. add (m. group () ;}} return list;} public static List <String> getmail () throws Exception {// 1. read the source file BufferedReader br = new BufferedReader (new FileReader ("g: \ mymail.html"); String mail_regex = "\ w + @ \ w + (\\. \ w) + "; Pattern p = Pattern. compile (mail_regex); List <String> list = new ArrayList <String> (); String line = null; // 2. matches the read data to obtain the data that complies with the rules. whil E (line = br. readLine ())! = Null) {Matcher m = p. matcher (line); while (m. find () {// 3. store data that meets the rules to the collection list. add (m. group () ;}} return list ;}}





JAVA Regular Expression Learning

Regular Expressions are irrelevant to java
For more information, see the regular expressions in Java API or the regular expressions in JQuery.

Expression Complete Set
Character Description
\ Mark the next character as a special character, an original character, or a backward reference, or an octal escape character. For example, "n" matches the character "n ". "\ N" matches a line break. The serial "\" matches "\", and "\ (" matches "(".
^ Matches the start position of the input string. If the Multiline attribute of the RegExp object is set, ^ matches the position after "\ n" or "\ r.
$ Matches the end position of the input string. If the Multiline attribute of the RegExp object is set, $ also matches the position before "\ n" or "\ r.
* Matches the previous subexpression zero or multiple times. For example, zo * can match "z" and "zoo ". * Is equivalent to {0 ,}.
+ Match the previous subexpression once or multiple times. For example, "zo +" can match "zo" and "zoo", but cannot match "z ". + Is equivalent to {1 ,}.
? Match the previous subexpression zero or once. For example, "do (es )?" It can match "do" in "does" or "does ".? It is equivalent to {0, 1 }.
{N} n is a non-negative integer. Match n times. For example, "o {2}" cannot match "o" in "Bob", but can match two o in "food.
{N,} n is a non-negative integer. Match at least n times. For example, "o {2,}" cannot match "o" in "Bob", but can match all o in "foooood. "O {1,}" is equivalent to "o + ". "O {0,}" is equivalent to "o *".
Both {n, m} m and n are non-negative integers, where n <= m. Match at least n times and at most m times. For example, "o {1, 3}" matches the first three o in "fooooood. "O {0, 1}" is equivalent to "o ?". Note that there must be no space between a comma and two numbers.
? When this character is followed by any other delimiter (*, + ,?, The matching mode after {n}, {n ,}, {n, m}) is not greedy. The non-Greedy mode matches as few searched strings as possible, while the default greedy mode matches as many searched strings as possible. For example, for strings "oooo", "o + ?" A single "o" will be matched, while "o +" will match all "o ".
. Match any single character except "\ n. To match any character including "\ n", use a pattern like "(. | \ n.
(Pattern) matches pattern and obtains this match. The obtained match can be obtained from the generated Matches set. The SubMatches set is used in VBScript, and $0… is used in JScript... $9 attribute. To match the parentheses, use "\ (" or "\)".
(? : Pattern) matches pattern but does not get the matching result. That is to say, this is a non-get match and is not stored for future use. This is useful when you use the "(|)" character to combine all parts of a pattern. For example, "industr (? : Y | ies) "is a simpler expression than" industry | industrial.
(? = Pattern) Forward validation pre-query: match the search string at the beginning of any string that matches pattern. This is a non-get match, that is, the match does not need to be obtained for future use. For example (? = 95 | 98 | NT | 2000) "can match" Windows "in" Windows2000 ", but cannot match" Windows "in" Windows3.1 ". Pre-query does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters.
(?! Pattern) forward negative pre-query, in any ...... the remaining full text>

Java Regular Expression

^ And $ are used to match the start and end of the string respectively. The following are examples:

"^ The": must start with a "The" string;

"Of despair $": the end must contain a string of "of despair;

So,

"^ Abc $": a string that must start with abc and end with abc. In fact, only abc matches.

"Notice": match a string containing notice.

You can see that if you do not use the two characters we mentioned (the last example), that is, the pattern (Regular Expression) can appear anywhere in the string to be tested, you didn't lock him to either side.

Next, let's talk about '*', '+', and '? ',

They are used to indicate the number or sequence of occurrences of a character. They represent:

"Zero or more" is equivalent to {0 ,},
"One or more" is equivalent to {1 ,},
"Zero or one." is equivalent to {0, 1}. Here are some examples:
"AB *": it is synonymous with AB {0,}. It matches a string starting with a and followed by 0 or N B ("a", "AB ", "abbb", etc );
"AB +": it is synonymous with AB {1,}. It is the same as the above, but there must be at least one B ("AB", "abbb", etc .);
"AB? ": It is synonymous with AB {0, 1} and can have no or only one B;
"? B + $ ": match the string ending with one or zero a plus more than one B.
Key points: '*', '+', and '? 'Only the character before it.

You can also limit the number of characters in braces, such

"AB {2}": requires that a be followed by two B (one cannot be less) ("abb ");
"AB {2,}": requires that there must be two or more B (such as "abb", "abbbb", etc.) after .);
"AB {3, 5}": requires 2-5 B ("abbb", "abbbb", or "abbbbb") after ").
Now we can put a few characters in parentheses, for example:

"A (bc) *": match a with 0 or a "bc ";
"A (bc) {}": one to five "bc ."
There is also a character '│', which is equivalent to the OR operation:

"Hi │ hello": Match string containing "hi" or "hello;

"(B │ cd) ef": matched with "bef ...... remaining full text>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.