Java Regular Expressions and web crawler Creation

Source: Internet
Author: User

Regular Expressions are rules used to operate strings.

1. In the string class, some methods are used to match and cut strings.

Boolean matches (string RegEx );

String [] Split (string RegEx) that is cut by the given regular expression );

Replace the string that matches the regular expression with the other string we want: String replaceall (string RegEx, string replacement)

2. The following describes common regular expressions.

(1)

String RegEx = "[1-9] [0-9] {4, 15 }";
// [1-9] indicates that this number can only be selected from 1 to 9
// [0-9] indicates that the number can be 0-9
// {} Indicates that the number in the preceding format can be repeated 4-15 times.

This regular expression means that the first number should be any one from 1 to 9, and then one of the numbers in 0-9 must appear, and this number must appear at least four times, up to 15 times

For example:

10175 compliance

10 does not match, because [0-9] {} appears at least four times, only once here

(2)

[A-zA-Z0-9 _] {6} represents exactly 6 Characters in A-Z or A-Z or _

+ Indicates at least once

* Indicates zero or multiple occurrences.

? Indicates one or zero occurrence.

(3) Cutting strings based on regular expressions

String STR = "SJD. ksdj. skdjf ";

String RegEx = "\\.";

Note: In a regular expression, it is an arbitrary table character and a special symbol. If we want to use. To cut, we must convert it to a common character and use.

Because \ is a special symbol, it must be expressed by two \ characters. When we want to use common \, we need to use \ to represent it.

String [] Ss = Str. Split (RegEx); returns a string array: "SJD" "ksdj" "skdjf" to cut the original string

(4) replace what we want to replace according to the regular expression.

Replace five or more numeric strings in the string #

String STR = "abcd1334546lasjdfldsf2343424sdj ";

String RegEx = "[0-9] {5 ,}";

String newstr = Str. replaceall (RegEx ,"#");

(5) Obtain strings that comply with regular expression rules

Pattern P = pattern. Compile (string RegEx );

Matcher M = P. matcher (string Str );

While (M. Find ())

{

System. Out. println (M. Group ());

}

3. Web Crawler Creation

You can read all the mailboxes on a web page and store them in a text file.

 

 
/* Web crawler: Obtain strings or content that match regular expressions from the web page and obtain the email address from the Internet */import Java. io. *; import Java. util. regEx. *; import java.net. *; Class mailtest {public static void main (string [] ARGs) throws exception {getmailaddr ();} public static void getmailaddr () throws exception {URL url = new URL ("http://bbs.csdn.net/topics/390148495"); urlconnection con = URL. openconnection (); bufferedreader bufin = new bufferedreader (New inputstreamreader (con. geti Nputstream (); bufferedwriter bufw = new bufferedwriter (New filewriter (new file ("E: // mailaddress.txt"); string STR = NULL; string RegEx = "[a-zA-Z0-9 _] {6, 12} @ [a-zA-Z0-9] + (\\. [A-Za-Z] +) + "; pattern P = pattern. compile (RegEx); While (STR = bufin. readline ())! = NULL) {matcher M = P. matcher (STR); While (M. find () {string Ss = m. group (); bufw. write (SS, 0, SS. length (); bufw. newline (); bufw. flush ();}}}}

 

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.