International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > Python

Python static web crawler related knowledge

Last Update:2016-04-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

If you want to develop a simple python crawler case and run it in a Python3 or above environment, what you need to know to complete a simple python What about reptiles?

Crawler's architecture implementation

crawlers include scheduler, manager, parser, downloader, and output. The scheduler can understand the entry of the primary function as the head of the entire crawler, and the manager implementation includes the ability to judge whether the URL is repeated, and to add the crawled URLs to the list to prevent repeated crawls. Parser is to parse the content of the webpage, parse out the new URL and Web content. The downloader is the URL parsed out by the download parser . The output is the name of the device.

1.1 Scheduler

I understand that like the main function of the portal, you can start the crawler, stop the crawler and monitor the operation of the crawler.

1.2 Manager

the manager mainly URL to manage, including crawled URL and to be crawled. URL , categorize and add two Set , why the use of Set This data structure? It will be introduced later.

1.3 Downloader

The Downloader accepts from URL Manager passed in URL This completes the functionality of the downloader by converting it to a string.

1.4 Parser

functions include parsing valuable data, where you need to understand the basic HTML knowledge to crawl the specified data. The Web page contains many URLs, which are parsed out and then added to the manager for the next loop.

1.5 output Device

slightly

Further updates will be updated to help you learn about Python Web development together.

Python static web crawler related knowledge

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Python thread pause, resume, exit detail and Example _python 01-18

Python design mode-UML-Package diagrams (Package Diagram) 09-09

Python abstract class (ABC module) 09-18

The difference between OS and sys two modules in Python 04-05

Python: Database Operations 12-08

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python static web crawler related knowledge

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support