Some of the mainstream search engine principles

Source: Internet
Author: User

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Today to introduce the principle of search engine. First look at a picture ...

  

Next we are layered to explain:

1.www:www refers to all the Web pages of the Internet, how many websites are there in the world? 2006 statistics for 80.65 million sites, in the rapid development of the network in recent years, until today to have a good hundreds of millions of of it, and the search engine's task is to collect these sites and give rankings for users to search.

2. Collectors: Collectors are commonly known as spiders, spiders are responsible for crawling Web sites, spiders have two ways to crawl, one is depth first, one is breadth first. My site for example, depth priority refers to the spider to see the first page of the link to start the order as follows, know that the first link under all the links under all crawl before beginning to grasp the second link, breadth first refers to all the links to a page all crawled complete, start to crawl the second page.

  

3. Controller: Spiders Download the Web page and then pass in the controller. The controller is responsible for the simple analysis of these pages such as the elimination of weight, and so on, the controller is also responsible for fielding spiders, arranging their crawl time, crawling and grasping objects and so on. The controller extracts all the URLs and divides them into two types, one to crawl URLs and not crawl URLs. All URLs are crawled and stored in the original database.

4. Original database: Used to store spiders crawl down the most original page without any rankings.

5. Web Analytics Module: Web Analytics module can be said to be the most important piece. This part is mainly on the filtering of Spam Web pages, such as the elimination of heavy, fraud, illegal and other sites, especially the recent Baidu algorithm big update, updated is this piece of algorithm, mainly to collect false original and garbage outside the chain to give blows, As well as the value of each page and the chain of some complex algorithms to score is what we call the weight, with this weight for later sorting ready.

6. Indexer: The indexer divides the valuable web pages passed by the Web Analytics module into a positive row index and a inverted index. The positive row index is that each page is divided into many keywords. The inverted index, in turn, lists each keyword with many pages and sorts them.

7. Indexer database: The index database is used to hold the pages listed by the indexer in the keywords.

8. The reader: The word entered by the user participle, and from the index database to remove the Web page, and sorting, and finally returned to the user results.

9. User: As the name implies is Netizen.

10. User interface: Can be understood as Baidu search results page.

11. User behavior Log database: User behavior log database is used to store user behavior, including the user clicks on the first few, on a certain site to stay on how much time, click on the second site interval, search keywords are what and so on

12. Log Analyzer: This piece of personal thought is very important, search engine more and more focus on user experience, is the future of the search engine development trend, this piece of user behavior log database inside the user behavior of the detailed analysis of their behavior on the internet many sites for weight and sorting on the add and subtract.

This article is published in: Zhengzhou SEO http://www.8abd.com/?p=65 reprint Please indicate the link thank you

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.