Data persistence of Webcollector crawler __webcollector

Source: Internet
Author: User

The Webcollector crawler does not provide a data persistence interface such as pipeline, as Scrapy does.

Users define actions for each page by customizing the Visit method in Breadthcrawler in Webcollector. Again, the persistence of the data is here for the user to customize.

For example, the following example shows how to save the source code of a Web page to a database:


Import Cn.edu.hfut.dmic.webcollector.crawler.BreadthCrawler;
Import Cn.edu.hfut.dmic.webcollector.model.Page;


The public class Mycrawler extends breadthcrawler{

    /* Defines its own operation in the Visit method *
    /@Override public
    void Visit (Page page {
        ///Add Data Persistence code here/
        /For example, user-defined a class DBHelper, provides methods to manipulate MySQL (add delete data)//
        Here is not given the DBHelper class, the user can simply implement one),
        Suppose DBHelper has a static method insert (String url,string HTML)
        //insert method to submit the URL and source of the Web page to the MySQL database
        Dbhelper.insert ( Page.geturl (), page.gethtml ());
    
    public static void Main (string[] args) throws exception{
        Mycrawler crawler=new mycrawler ();
        
        /* Configure Crawl Hefei website * *
        crawler.addseed ("http://www.hfut.edu.cn/ch/");
        Crawler.addregex ("http://.*hfut\\.edu\\.cn/.*");
        
        /* The crawl
        /Crawler.start (5) with depth 5;
    }
  


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.