Chinese search engine technology unveiling: System Architecture (III)

Source: Internet
Author: User

Source: e800.com.cn


Search Engine System Architecture

The system architecture of the full-text search engine is described here. If there is no special description of the search engine mentioned below, it also refers to the full-text search engine. The implementation principle of the search engine can be seen as four steps: crawling webpages from the Internet → creating an index database → searching in an index database → processing and sorting the search results.

1. Capture webpages from the Internet

Web Crawlers that can automatically collect web pages from the InternetProgram, Automatically access the Internet, and climb to other web pages along all the URLs on any web page, repeat this process, and collect all web pages crawled to the server.

2. Create an index database

The indexing system program analyzes the collected web pages, extract the relevant webpage information (including the URL of the webpage, encoding type, keywords contained in the page content, keyword location, generation time, size, and link to other webpages) based on a certain degree of relevanceAlgorithmPerform a lot of complex calculations to obtain the relevance (or importance) of each webpage for each keyword in the page content and hyperchain, and then use the relevant information to create a web index database.

3. Search in the index database

After a user enters a keyword for search, the search system program breaks down the search request and finds all related webpages that match the keyword from the Web index database.

4. Process and sort search results

All the information related to the keyword on the relevant web pages is recorded in the index database. You only need to combine the relevant information and the page level to form a correlation value, and then sort it. The higher the relevance, the higher the ranking. Finally, the page generation system organizes the URL of the search result and the Content summary of the page and returns the result to the user.

Is a typical architecture diagram of the search engine system. All parts of the search engine are mutually dependent. The process is described as follows:

"Web spider" crawls web pages from the Internet, sends them to the "Web Database", "extracts URLs" from the web pages, and sends the URLs to the "URL database ", "Spider control" gets the URL of the webpage, controls "web spider" to crawl other webpages, and repeats until all webpages are captured.

The system obtains text information from the "Web Database" and sends it to the "Text Index" module to create an index to form an "index database ". At the same time, "link information extraction" is carried out to send the link information (including the anchor text and the link itself) to the "link database" to provide a basis for "webpage rating.

The "user" submits a query request to the "query server". The server searches for related webpages in the "index database, at the same time, "webpage rating" combines query requests and link information to evaluate the relevance of search results, uses "query server" to sort by relevance, and extracts the Content Abstract of keywords, the last page of the Organization is returned to the "user ".

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.