Java Regular expressions simple to use and web crawler Production Code _java

Source: Internet
Author: User

A regular expression is a rule that is specifically used to manipulate strings.

1. In the string class there are methods for matching strings, cutting.

Determines whether the string matches the given regular expression: Boolean matches (string regex);

To cut a string according to a given regular expression: string[] Split (string regex);

Replace the string that matches the regular expression with the other string we want: String ReplaceAll (string regex,string replacement)


2. Here are some common uses of regular expressions

(1)

Copy Code code as follows:

String regex= "[1-9][0-9]{4,15}";
[1-9] Indicates that this number can only be selected within 1-9
[0-9] means that this number can be 0-9
{4,15} represents a number in this format before it can be repeated 4-15 times

This regular expression means that the first number should be any one in 1-9, and then it must be followed by one of the numbers in 0-9, and that number must appear at least 4 times, at most 15 times.

Such as:

10175 in line

10 does not conform, because [0-9]{4,15}, should appear at least 4 times, here only appears once

(2)

[A-za-z0-9_] {6} represents a character that happens to occur 6 times A-Z or a-Z or _

+ indicates that at least one occurrence

* indicates that 0 or more times have occurred

? To show one or 0 times


(3) cutting strings according to regular expressions

Copy Code code as follows:

String str= "SJD.KSDJ.SKDJF";

String regex= "\.";


Attention:. In a regular expression is a table-type arbitrary character, a special symbol. We want to use. To cut, we have to convert it to ordinary characters with \ can.

Because \ is also a special symbol, so you need two \ to represent. When we want to use ordinary \, then we should use \\\\ to express.

String[] Ss=str.split (regex); Returns an array of strings: "SJD" "Ksdj" "SKDJF" implements the cutting of an existing string

(4) Replace what we want to replace according to the regular expression

Replaces all occurrences of 5 or more numeric strings in a string with a #

Copy Code code as follows:

String str= "ABCD1334546LASJDFLDSF2343424SDJ";

String regex= "[0-9]{5,}";

String Newstr=str.replaceall (Regex, "#");

(5) Get a string that matches the regular expression rule

Copy Code code as follows:

Pattern P=pattern.compile (String regex);

Matcher m=p.matcher (String str);

while (M.find ())

{

System.out.println (M.group ());

}

3. Web crawler Production

We make a page that can be read out of all the mailboxes in a webpage and be stored in a text file.

Copy Code code as follows:

/*
Web crawler
That is, get a string or content that matches a regular expression from a Web page

Get the mailbox address from the network
*/
Import java.io.*;
Import java.util.regex.*;
Import java.net.*;
Class MailTest
{
public static void Main (string[] args) throws Exception
{
Getmailaddr ();
}

public static void Getmailaddr () throws Exception
{
URL url=new url ("http://bbs.jb51.net/topics/390148495");
URLConnection con=url.openconnection ();

BufferedReader bufin=new BufferedReader (New InputStreamReader (Con.getinputstream ()));
BufferedWriter bufw=new BufferedWriter (New FileWriter ("E://mailaddress.txt"));
String Str=null;
String regex= "[A-za-z0-9_]{6,12}@[a-za-z0-9]+ (\\.[ a-za-z]+) + ";

Pattern P=pattern.compile (regex);
while ((Str=bufin.readline ())!=null)
{
Matcher M=p.matcher (str);
while (M.find ())
{
String Ss=m.group ();
Bufw.write (Ss,0,ss.length ());
Bufw.newline ();
Bufw.flush ();
}
}


}
}

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.