Baidu Search Engine Working principle

Source: Internet
Author: User
Tags web database

According to the explanation of the search engine work principle, the author will step by step for everyone to explain, explain. Conduct a complete SEO training service.

1, www: Our Internet, a huge, complex system;

2, the Collector: this we webmaster are familiar with, we are commonly known as spiders, reptiles, and his task is to visit the page, crawl page, and download the page;

3, the Controller: the spider download down to the controller, the function is the dispatch, such as Bus Group control room, to control the departure time, destination, mainly to control the spider crawl interval, and send the nearest spider to crawl, we do SEO can think of, space position on SEO optimization is advantageous;

4, the original database: access to the Web database, is the original database. In order to save the next job, as well as provide Baidu snapshot, we will find that the same as the MD5 value of the URL is not duplicated, some URLs have, but the title is not, only through the URL of this component to find, because this is not indexed database to build. The primary function of the original database is the speed of depositing and reading, as well as the access space, which is compressed and served back. Web Database Scheduler will spider crawl back to the Web page, after a simple analysis, that is, extract the URL, simply filter the image into the data, then in his data, is not indexed;

5, Web page analysis Template: This piece is very important, SEO optimization of the garbage page, Mirror page filter, the weight of the page is all concentrated in this piece. Called the page weight algorithm, hundreds of are not only;

6, Indexer: The value of the page into the index database, the purpose is to query faster. Convert valuable pages into another form of expression and convert the Web page to keywords. It is called a positive row index, this is to facilitate, how many pages, the number of keywords. Millions of pages and millions of words which one is convenient. Inverted index to the keyword into a Web page, the ranking of the conditions are accessed in this, has formed an efficient storage structure, a lot of ranking factors as an item stored in this, a word in how many pages appear (a page of many keywords, the page into a keyword so a column process is called a positive row index. Recommended index reason: to facilitate, improve efficiency. How many pages a word appears in, turning the word into a Web page such a column process is called an inverted index. Search results in the inverted database simply get the data, put a lot of ranking factors as an item, stored in this inside);

7. Index database: Data for ranking in the future. Keyword number, keyword location, page size, keyword feature tag, point to this page (inner chain, outside chain, anchor text), user experience All of these data are accessed in this, provided to the Retriever. Why Baidu so fast, is Baidu directly in the index database to provide data, rather than direct access to www. That is, preprocessing work;

8, the search: The user query words, Word segmentation, and then sorted, by using the industry interface to return the results to the user. Responsible for cutting words, participle, query, according to the ranking factors to sort the data;

9, user interface: The query record, IP, time, click on the URL, as well as the URL location, the last time with the next click of the interval to deposit into the user behavior log database. Is the Baidu box, a user interface;

10, User behavior log database: The focus of the search engine, SEO tools and brush ranking software is from this inside out. The user uses the search engine process, and the action;

11, log Analyzer: Through the user behavior log database for continuous analysis, these behavior records stored in the indexer, these behaviors will affect the ranking. That's what we call a malicious click, or a night rank. (If the key can not be found, then the domain name will be directly searched, these will be recorded in the user Behavior database);

12, Thesaurus: Web Analytics Module Log Analyzer will find the latest words into the thesaurus, through the thesaurus of Word segmentation, Web Analysis module based on Thesaurus.

Transferred from: http://www.chinaz.com/web/2013/0307/294967.shtml

Baidu Search Engine Working principle

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.