The location of hits algorithm in search engine

Source: Internet
Author: User
Keywords Search

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Hits algorithm, the internet has a more detailed introduction of the article is not much. From the current search engine algorithm, hits algorithm plays a very important position. is one of the more authoritative and widely used algorithms. The hits algorithm is more complex than the PageRank algorithm, but it can describe its essence in a simple form, and also gives an example of how it works.

Hits algorithm, the first thing to do is to identify with the theme of the Web page collection, each submitted to the search engine for each user query to determine a topic-related page set. If the Web page meets the following criteria, you can determine that they are related to the topic:

A these pages belong to a collection of pages, and the collection of pages contains the most relevant text for the user's query.

b These pages link to a Web page that satisfies a condition, or a Web page that satisfies a condition.

Here is an important link hypothesis that is partially based on the "link-content" hypothesis. That is, if a Web page has a link to a theme-related page, even if it does not contain textual information that matches the subject (at least as viewed from the user's query text), the page may also be related to the topic.

Even the relevant pages that are judged by the text content are not relevant at times, because it is difficult to determine the relevance of the topic in practice, especially if the query itself is ambiguous. A classic example is the Jaguar. The user may be looking for information about an animal or a car named after that word. As a result, the returned pages related to the topic are incomplete and only partially relevant. But Kleinberg's experiments show that this is not a serious problem.

The second part of the algorithm calculates the center and authority of each page in the topic-related collection. The algorithm uses the same voting method as the PageRank algorithm, and also uses a reverse voting mechanism so that each page can vote on its web page. The result of the hits algorithm is to give each page a central degree and a degree of authority, rather than just differentiate them into central and authoritative pages, as previously stated.

Simplified hits Algorithm:

Phase one: Find a collection of pages related to queries or topics

1. According to the search engine user input text query, find n the most relevant text page of the query, where n is a predetermined parameter;

2. Add all pages to the collection that have links (linked or linked) to the matching pages;

3. Remove all inbound links;

Phase two: Initialize the center and authority of each page

4. Give each page an authoritative weight x and center weight y, such as X=y=1;

Phase III: Repeat the voting process

5. Statistics each page into the chain of the center of the page and calculate the authority of each page weight;

6. Statistics each page of the chain of the authority of the page and calculate the central weight of each page;

7. Dividing the center of all pages by the highest degree to standardize them, dividing the authority of all pages by the highest authority to standardize them;

8. Repeat steps 5th to 7th N, and Kleinberg in some of the comments is a proposal to repeat 20 times;

Phase IV: report results

9. Return to a list of sorted pages, some of the pages in the list have a high degree of centrality and some have a higher degree of authority so that the user can choose the type of page they think is best (Kleinberg recommends the first 5-10 central pages and the top 5-10 authoritative pages).

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.