Python Crawler Learning Chapter II

Source: Internet
Author: User

The

Tag:apt   idt   let    has a    port    bottom    classification     distinguishing    padding   

Web crawler Skills Overview Network Crawler Skills Overview Map

Search Engine Core

First, the search engine uses the crawler to crawl the Web pages in the Internet and then stores the crawled pages in the original database. The crawler module mainly includes the controller and the crawler, the controller mainly crawls the control, the crawler is responsible for the concrete crawling task.
The data in the original database is then indexed and stored in the index database.
When the user retrieves the information, it will enter the corresponding information through the user interface, the user interaction is bitter equivalent to the search engine input box, the input is completed, there is a word segmentation, and so on, the index database to retrieve data for the corresponding retrieval processing.
Users enter the corresponding information at the same time, the user's behavior will be stored in the user log database, such as the user's IP address, user input keywords and so on. The data in the user log database is then referred to the Log Parser for processing. The Log Parser adjusts the original database and index database based on a large amount of user data, changing the ranking results or doing other things.

The distinction between index and retrieval

Retrieval is a behavior in which an index is an attribute. For example, a supermarket, the grouping of goods, classification is the index, and the process of finding a product is search. Having a good index can improve efficiency.

Python Crawler Learning Chapter II

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.