Python Combat-web crawler

Source: Internet
Author: User

The purpose of Learning web crawler :

1, you can customize a search engine, you can deeply understand the working principle of the search engine.

2, in the era of big data, to carry out data analysis, the first to have data sources, learning reptiles, can let us get more data.

3, practitioners can take advantage of the crawler, understand its principles, and optimize your program.

The composition of web crawler

Network crawler consists of control node, reptile node and resource database.

Types of Reptiles

1, General network Crawler: Also known as the whole network crawler, can crawl the target resources in the whole network.

2, focus on web crawler: Mainly used in the crawling of specific information, mainly for the specific type of people to provide services.

3, incremental network crawler: the so-called incremental, is the incremental update, incremental update refers to the update when only the changes in the place, and the unchanged place is not updated, so the incremental crawler to a certain extent to ensure that the page crawled as far as possible are new pages.

4, Deep web crawler: The so-called deep, refers to the Internet, the Web page according to storage classification, can be divided into surface pages and deep pages, the so-called surface page refers to the need to submit a form, using static links can reach the static page. A deep page is a page that needs to be submitted with a certain keyword.

Python Combat-web crawler

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.