Regular usage in java

Source: Internet
Author: User
Tags expression engine

Regular usage in java
I. What is a regular expression?

1. definition: a regular expression is a regular expression that can be used for pattern matching and replacement. a regular expression consists of common characters (such as characters a to z) and special characters (metacharacters) it is used to describe one or more strings to be matched when the text subject is searched. A regular expression is used as a template to match a character pattern with the searched string.

2. Purpose:

  • String Matching (character matching)
  • String search
  • String replacement
  • String segmentation

    For example:

    • Pull the email address from the webpage
    • Whether the IP address is correct
    • Pull the link from the webpage

      3. Classes for processing Regular Expressions in java:

      • Java. lang. String
      • Java. util. regex. Pattern: Pattern class: The Pattern in which the string is to be matched. The Pattern itself has been compiled and is much more efficient to use.
      • Java. util. regex. Matcher: matching class: This pattern matches the results produced by a string, and there may be many results.

        4: Here is a simple introduction to regular expressions through a small program.

         

         
        Import java. util. regex. matcher; import java. util. regex. pattern; public class Test {public static void main (String [] args) {// matches () determines whether the String matches an expression ,. represents any character p (abc. matches (...)); // replace the number in the string a2389a with *, and d indicates the "0--9" number p (a2389a. replaceAll (\ d, *); // compile any string that is a -- z with a length of 3 to accelerate the matching speed. compile ([a-z] {3}); // match and put the matching result in the Matcher object Matcher m = p. matcher (abc); p (m. matches (); // the above three lines of code can replace p (abc. matches ([a-z] {3});} public static void p (Object o) {System. out. println (o );}}

        The following is the result.

        truea****atruetrue

        Now we use some experiments to illustrate the matching rules of regular expressions. Here we use the Greedy method.

        . Any character

        A?A does not exist once or once.

        A*A zero or multiple times

        A + a once or multiple times

        A{N}? A EXACTLY n times

        A {n ,}? A must be at least n times

        A{N,M}? A must be at least n times, but cannot exceed m times

         

        // Preliminary understanding. * +? P (. matches (.)); // true p (aa. matches (aa); // true p (aaaa. matches (a *); // true p (aaaa. matches (a +); // true p (. matches (a *); // true p (aaaa. matches (?)); // False p (. matches (?)); // True p (a. matches (?)); // True p (1232435463685899. matches (\ d {3,100}); // true p (192.168.0.aaa.matches (\ d {1, 3 }\. \ d {1, 3 }\. \ d {1, 3 }\. \ d {1, 3}); // false p (192. matches ([0-2] [0-9] [0-9]); // true

         

        [Abc] A,BOrC(Simple class)

        [^ Abc]Any characterA,BOrC(No)

        [A-zA-Z] AToZOrAToZ, Two letters included (range)

        [A-d [m-p] AToDOrMToP:[A-dm-p](Union)

        [A-z & [def] D,EOrF(Intersection)

        [A-z & [^ bc] AToZ,BAndC:[Ad-z](Minus)

        [A-z & [^ m-p] AToZ, RatherMToP:[A-SCSI-z](Minus)

         

        // The value range is p (. matches ([abc]); // true p (. matches ([^ abc]); // false p (. matches ([a-zA-Z]); // true p (. matches ([a-z] | [A-Z]); // true p (. matches ([a-z [A-Z]); // true p (R. matches ([A-Z & [RFG]); // true

         

        D Number:[0-9]

        D Non-numeric:[^ 0-9]

        S blank characters:[]

        S non-blank characters:[^ S]

        W word character:A-zA-Z_0-9

        W non-word characters:[^ W]

         

        // Recognize s w d p (. matches (\ s (4); // false p (. matches (\ S); // false p (a_8. matches (\ w (3); // false p (abc888 & ^ %. matches ([a-z] {1, 3} \ d + [& ^ # %] +); // true p (\. matches (\); // true

         

        Boundary

        ^Start of a row

        $End of a row

        Word boundary

        BNon-word boundary

        AStart of input

        GLast matched end

        ZThe end of the input. It is only used for the last terminator (if any)

        ZEnd of input

         

        // The boundary matches p (hello sir. matches (^ h. *); // true p (hello sir. matches (. * ir $); // true p (hello sir. matches (^ h [a-z] {1, 3} o \ B. *); // true p (hellosir. matches (^ h [a-z] {1, 3} o \ B. *); // false // blank line: one or more (blank and non-line break) start with a line break and end with a line break p (. matches (^ [\ s & [^ \ n] * \ n $); // true

        Method Analysis

        Matches (): match the entire string

        Find (): match the substring

        LookingAt (): always starts from the beginning of the entire string.

        // Email p (asdsfdfagf@adsdsfd.com.matches ([\ w [. -] + @ [\ w [. -] + \. [\ w] +); // true // matches () find () lookingAt () Pattern p = Pattern. compile (\ d {3, 5}); Matcher m = p. matcher (123-34345-234-00); // use the Regular Expression Engine to search for and match the entire 123-34345-234-00. When the first-unmatched one is reached, it stops, // but will not spit out the unmatched-p (m. matches (); // spit out the unmatched-m. reset (); // 1: The current surface has p (m. matches (); find the substring from... 34345-234-00 start // it will be the first, second, and second. If the second, 34345, and 234 cannot be found, it will be false. // 2: The current surface has p (m. matches (); and m. reset (); find the substring starting from 123-34345-234-00 // it will be true, false p (m. find (); p (m. start () + --- + m. end (); p (m. find (); p (m. start () + --- + m. end (); p (m. find (); p (m. start () + --- + m. end (); p (m. find (); // if it is not found, an exception occurs in java. lang. illegalStateException // p (m. start () + --- + m. end (); p (m. lookingAt (); p (m. lookingAt (); p (m. lookingAt (); p (m. lookingAt ());

        String replacement: the following method is very flexible for string replacement.

        // String replacement // Pattern. CASE_INSENSITIVE case insensitive Pattern p = Pattern. compile (java, Pattern. CASE_INSENSITIVE); Matcher m = p. matcher (java Java jAva ILoveJavA youHateJAVA adsdsfd); // stores the string StringBuffer buf = new StringBuffer (); // counts the parity int I = 0; while (m. find () {I ++; if (I % 2 = 0) {m. appendReplacement (buf, java);} else {m. appendReplacement (buf, JAVA) ;}// if this sentence is not added, the string adsdsfd will be abandoned. appendTail (buf); p (buf );

        Result printing:

        JAVA java JAVA ILovejava youHateJAVA adsdsfd

        Group

         

        // Group. Use () to group Pattern p = Pattern. compile (\ d {3, 5}) ([a-z] {2}); String s = 123aa-34345bb-234cc-00; Matcher m = p. matcher (s); p (m. groupCount (); // two groups of while (m. find () {p (m. group (); // numbers and letters all have // p (m. group (1); // only numbers // p (m. group (2); // only letters}
        Ii. simple use of regular expressions

         

        Java Regular Expression Application

         

        I. Capture the Email address on the webpage

        Use regular expressions to match text in a webpage

        [\ W [.-] + @ [\ w [.-] + \. [\ w] +

        Separate and extract webpage content

        import java.io.BufferedReader;import java.io.FileNotFoundException;import java.io.FileReader;import java.io.IOException;import java.util.regex.Matcher;import java.util.regex.Pattern;public class EmailSpider {    public static void main(String[] args) {        try {            BufferedReader br = new BufferedReader(new FileReader(C:\emailSpider.html));            String line = ;            while((line=br.readLine()) != null) {                parse(line);            }        } catch (FileNotFoundException e) {            e.printStackTrace();        } catch (IOException e) {            e.printStackTrace();        }    }    private static void parse(String line) {        Pattern p = Pattern.compile([\w[.-]]+@[\w[.-]]+\.[\w]+);        Matcher m = p.matcher(line);        while(m.find()) {            System.out.println(m.group());        }    }}

        Print result:

        867124664@qq.com260678675@QQ.com806208721@qq.comhr_1985@163.com32575987@qq.comqingchen0501@126.comyingyihanxin@foxmail.com1170382650@qq.com1170382650@qq.comyingyihanxin@foxmail.comqingchen0501@126.com32575987@qq.comhr_1985@163.com

        Now you can find so many email addresses and use JavaMail knowledge to send spam emails !!!

        Ii. Code statistics

        Import java. io. bufferedReader; import java. io. file; import java. io. fileNotFoundException; import java. io. fileReader; import java. io. IOException; public class CodeCounter {static long normalLines = 0; // normal code line static long commentLines = 0; // comment line static long whiteLines = 0; // blank line public static void main (String [] args) {// find a folder without folders, there is no recursive processing of files in different folders. File f = new File (E: \ Workspaces \ eclipse \ Applic Ation \ JavaMailTest \ src \ com \ java \ mail); File [] codeFiles = f. listFiles (); for (File child: codeFiles) {// only count java files if (child. getName (). matches (. *\. java $) {parse (child) ;}} System. out. println (normalLines: + normalLines); System. out. println (commentLines: + commentLines); System. out. println (whiteLines: + whiteLines);} private static void parse (File f) {BufferedReader br = null; // indicates whether the comment starts boolean Comment = false; try {br = new BufferedReader (new FileReader (f); String line =; while (line = br. readLine ())! = Null) {// remove the annotator/* leading to a blank line = line. trim (); // empty line because readLine () extracts the string, linefeed/has been removed, so it is not ^ [\ s & [^ \ n] * \ n $ if (line. matches (^ [\ s & [^ \ n] * $) {whiteLines ++;} else if (line. startsWith (/*)&&! Line. endsWith (*/) {// count multiple rows/*****/commentLines ++; comment = true;} else if (line. startsWith (/*) & line. endsWith (*/) {// counts a row/**/commentLines ++;} else if (true = comment) {// counts */commentLines ++; if (line. endsWith (*/) {comment = false;} else if (line. startsWith (//) {commentLines ++;} else {normalLines ++ ;}} catch (FileNotFoundException e) {e. printStackTrace ();} catch (IOExcep Tion e) {e. printStackTrace ();} finally {if (br! = Null) {try {br. close (); br = null;} catch (IOException e) {e. printStackTrace ();}}}}}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.