Learning Web Crawlers (1) and Learning Web Crawlers

Source: Internet
Author: User

Learning Web Crawlers (1) and Learning Web Crawlers

Learn more about Web Crawlers

The following is a summary of the resources that I find useful. The resources are from the Internet.

 

Programming Language: java

Web Crawler: spiderman

 

Spiderman is a Java open source Web data extraction tool. It can collect specified Web pages and extract useful data from these pages.
Spiderman mainly uses technologies such as XPath and regular expressions to extract real data.

 

Spiderman open source China Link (including documents and downloads ):Http://www.oschina.net/p/spiderman

Spiderman Java crawler example:Http://my.oschina.net/laiweiwei/blog/99937

[The latest updates support pagination of channels and articles] [] captures the Q & A data of OSC to show the capabilities of vertical crawlers:Http://my.oschina.net/laiweiwei/blog/100866

XPath JAVA usage summary and sample code:Http://www.open-open.com/lib/view/open1397717612656.html

W3school XPath Tutorial:Http://www.w3school.com.cn/xpath/index.asp

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.