Web crawlers and Crawlers

Source: Internet
Author: User

Web crawlers and Crawlers

/* Web crawler */import java. io. *; import java.net. *; import java. util. regex. *; class RegexTest2 {public static void main (String [] args) throws Exception {getMails ();} public static void getMails () throws Exception {URL url = new URL ("content to be crawled"); URLConnection conn = url. openConnection (); BufferedReader bufr = new BufferedReader (new InputStreamReader (conn. getInputStream (); String line = null; String mailRe G = "Regular Expression"; Pattern p = Pattern. compile (mailReg); while (line = bufr. readLine ())! = Null) {Matcher m = p. matcher (line); while (m. find () {System. out. println (m. group ());}}}}

Web Crawler tools

Install and use the powerful crawler tool download.csdn.net/detail/aklakl/4082490 in heritrix.
 
Which of the following is suitable for Web Crawler C ++ and JAVA?

For the language itself, I think Python is an ideal language for web crawlers. It is often used for document analysis in one breath.

From the perspective of performance, the status of C ++ is still indecisive. If you want to crawl massive data and you can control C ++, select it.

If you do not want to start from scratch and want to perform secondary development on the basis of the framework, you can consider Java.

If you just want to do some simple data analysis and capturing, you don't have to use the "language" layer. In some cases, some crawler tools are better than the crawlers you write.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.