Search Engine Algorithm Research topic VI: Hits algorithm

Source: Internet
Author: User

Search Engine Algorithm Research topic VI: Hits algorithmDecember 19, 2017 ? Search technology? A total of 1240 characters? small size big ? Comments Off

HITS (hyperlink-induced Topic Search) is a Web ranking algorithm based on link analysis presented by Kleinberg in the late 90. The algorithm is related to the query.

The Content Authority (authority) and Link Authority (HUB) can be obtained by evaluating the quality of Web pages with the hits algorithm. Content authority is related to the quality of the content information directly provided by the Web page, the more the page is referenced, the higher its content authority is, and the more the link authority is related to the quality of the hyperlinks provided by the Web page, the higher the link authority of the Web page.

A good center page should point to a lot of authoritative web pages, and a good authoritative web page should be a lot of good Central Web page point. For the entire Web collection, authority and hubs are interdependent, mutually reinforcing, mutually optimized relationships, which is the basis of the hits algorithm.

The execution of the hits algorithm is the "iterative-convergence" process, that is, the value of the Web page A link authority is determined by the content authority of the Web page to which it is linked, while the value of the Content authority of page A is determined by the link authority of the Web page to which it is linked. The values of the authority and hub are recursively defined, that is, the value of authority is the sum of the hub values that point to the page, and the value of the hub is the sum of the authority values of the page that the page points to.

The values of the hub and authority for each node are calculated using the following algorithm:

Each node is given a hub value and a authority value of 1. Run the Authority update rule. Run the hub update rule. Normalize values, that is, the hub value of each node in addition to the sum of all hub values, each authority value in addition to the sum of all authority values. Repeat from step two if necessary.

The relevance of the linked page should also be considered in the implementation. The algorithm completes a series of iterative processes with two basic steps for each iteration:

Authority value Update: Updates the authority value of each node, and the sum of the values of the hub that the node points to. That is, the node linked by the information hubs is given a high authority value. Hub Value Update: Updates the hub value of each node so that it equals the sum of the authority values of each node it points to. That is, the node that is linked to the authorities node on the same topic is given a high hub value.

Because the values of the hub and authority do not converge in the pseudo-code above, it is necessary to limit the number of iterations of the algorithm. One way is to normalize the values of the hub and authority after each step, i.e. by: Dividingeach Authority value by the sum of all authority values, and dividing each HUBVA Lue by the sum of all hub values.

Similar to PageRank, hits is also an iterative algorithm based on Web document linking, but there are some important differences:

It is executed at query time, not when the index is built, and is related to query performance such as time. Therefore, the hub and authority weights that are given to the page are also query-specific. It is not a generic technique for search engines (although it is said that Ask.com uses similar algorithms). It calculates the two weights of the document, the hub and the authority, rather than the weight. It handles only a small subset of the relevant documents, while the PageRank is for the complete documentation.

Search Engine Algorithm Research topic VI: Hits algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.