The
Tag:apt idt let has a port bottom classification distinguishing padding
Web crawler Skills Overview Network Crawler Skills Overview Map
Search Engine Core
First, the search engine uses the crawler to crawl the Web pages in the Internet and then stores the crawled pages in the original database. The crawler module mainly includes the controller and the crawler, the controller mainly crawls the control, the crawler is responsible for the concrete crawling task.
The data in the original database is then indexed and stored in the index database.
When the user retrieves the information, it will enter the corresponding information through the user interface, the user interaction is bitter equivalent to the search engine input box, the input is completed, there is a word segmentation, and so on, the index database to retrieve data for the corresponding retrieval processing.
Users enter the corresponding information at the same time, the user's behavior will be stored in the user log database, such as the user's IP address, user input keywords and so on. The data in the user log database is then referred to the Log Parser for processing. The Log Parser adjusts the original database and index database based on a large amount of user data, changing the ranking results or doing other things.
The distinction between index and retrieval
Retrieval is a behavior in which an index is an attribute. For example, a supermarket, the grouping of goods, classification is the index, and the process of finding a product is search. Having a good index can improve efficiency.
Python Crawler Learning Chapter II