Python Crawler Learning Chapter II

Last Update:2018-01-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The

Tag:apt idt let has a port bottom classification distinguishing padding

Web crawler Skills Overview Network Crawler Skills Overview Map

Search Engine Core

First, the search engine uses the crawler to crawl the Web pages in the Internet and then stores the crawled pages in the original database. The crawler module mainly includes the controller and the crawler, the controller mainly crawls the control, the crawler is responsible for the concrete crawling task.
The data in the original database is then indexed and stored in the index database.
When the user retrieves the information, it will enter the corresponding information through the user interface, the user interaction is bitter equivalent to the search engine input box, the input is completed, there is a word segmentation, and so on, the index database to retrieve data for the corresponding retrieval processing.
Users enter the corresponding information at the same time, the user's behavior will be stored in the user log database, such as the user's IP address, user input keywords and so on. The data in the user log database is then referred to the Log Parser for processing. The Log Parser adjusts the original database and index database based on a large amount of user data, changing the ranking results or doing other things.

The distinction between index and retrieval

Retrieval is a behavior in which an index is an attribute. For example, a supermarket, the grouping of goods, classification is the index, and the process of finding a product is search. Having a good index can improve efficiency.

Python Crawler Learning Chapter II

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python Crawler Learning Chapter II

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support