Search engine principle-search engine technology

Source: Internet
Author: User
     Search engineInstead of searching for the Internet, it actually searches for pre-organized Web index databases.

     Search engineAnd cannot really understand the content on the webpage. It can only mechanically match the text on the webpage.

True Search engineIt usually refers to collecting tens of millions to billions of web pages on the Internet, indexing each text (that is, a keyword) on the web page, and building the full text of the index database. Search engine. When a user searches for a keyword, all webpages containing the keyword in the page content will be searched as search results. After complex algorithms are sorted, these results are sorted in sequence based on the relevance of the search keyword.

The current Search engineHyperchain analysis technology is widely used. In addition to analyzing the text of the index page, it also analyzes all the URLs, AnchorText, and even the text around the link pointing to the page. Therefore, sometimes, even if A webpage A does not contain A word such as "devil Satan", if another webpage B directs "Devil Satan" to this webpage, when searching for "Devil Satan", users can also find webpage. Moreover, if there are more web pages (C, D, E, F ......) Use A link named "Devil Satan" to point to webpage A, or provide the source webpage of this link (B, C, D, E, F ......) The better, web page A will be considered more relevant when users search for "Devil Satan", and the higher the ranking.

     Search engineThe principle can be seen as three steps: crawling web pages from the internet → building an index database → searching and sorting in an index database.

    1. Capture webpages from the internet 
The Spider System program that can automatically collect web pages from the internet automatically accesses the internet and crawls all URLs on any web page to other web pages. Repeat this process, and collect all webpages crawled.

    2. Create an index database 
The analysis index system program analyzes the collected web pages, extract the relevant webpage information (including the URL, encoding type, all keywords contained in the page content, keyword location, generation time, size, and links to other webpages ), perform a large number of complex calculations based on a certain relevance algorithm to obtain the relevance (or importance) of each webpage for each keyword in the page text and hyperchain, and then use the relevant information to create a web index database.

3. Search for sorting in the index database 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.