Search engine Algorithm Research Topic 6: HITS algorithm

Source: Internet
Author: User

Hits (Hyperlink-induced topic search) is a link-based analysis proposed by kleberger at the end of 1990s.Web page ranking algorithm. This algorithm is related to queries.

The HITS algorithm is used to evaluate the quality of web pages and obtain the content authority and link Authority (hub ). The content authority is related to the quality of content information directly provided by the webpage. The more pages are referenced, the higher the content authority. The link authority is related to the quality of hyperlinks provided by the webpage, the more webpages with high reference content, the higher the Page Link authority.

A good central webpage should point to many authoritative webpages, while a good authoritative webpage should be pointed to by many good central webpages. For the entire web set, authority and hub are mutually dependent, mutually reinforcing, and mutually optimized. This is the basis of the HITS algorithm.

The implementation of the HITS algorithm is the process of "iteration-convergence", that is, the value of the authority of the webpage a link is determined by the content authority of the webpage to which the link is directed, the content authority value of webpage a is determined by the link authority of the link to its webpage. The authority and hub values are recursively defined, that is, the Authority value is the sum of the hub values pointing to the page, and the hub value is the sum of the Authority values of the pages pointed to by the page.

The Hub and authority values of each node are calculated using the following algorithm:

Assign the hub value and authority value of each node to 1. Run authority to update the rule. Run the hub update Rule. Normalize value, that is, the sum of the hub values of each node except all hub values, and the sum of each authority value except all authority values. If necessary, repeat from step 2.

The relevance of linked pages should also be considered during implementation. This algorithm completes a series of iterations. Each iteration includes two basic steps:

Authority value update: update the Authority value of each node, which is the sum of the hub values pointed to by the node. That is, the node linked by information hubs is assigned a high authority value. Hub value update: updates the hub value of each node to make it equal to the sum of the Authority values of each node to which it points. That is, the nodes linked to the authorities node of the same topic are assigned a high hub value.

Because the hub and authority values do not converge in the above pseudo code, it is necessary to limit the number of iteration steps of the algorithm. One method is to normalize the values of hub and authority after each step, that is, by dividingeach authority value by the sum of all authority values, and dividing each hubvalue by the sum of all hub values.

Similar to PageRank, hits is also an iterative algorithm based on WEB document links. However, there are also some important differences:

It is executed during query, not during index creation, and is related to query performance, such as time. Therefore, the hub and authority granted to the page are also query-specific. It is not a common technology for search engines (although ask.com is said to use similar algorithms ). It calculates the two weights of a document, hub and authority, instead of the document. It only processes a small subset of relevant documents, while PageRank targets the complete set of documents.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.