The principle analysis of search engine's hits algorithm

Source: Internet
Author: User

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Analysis links are a common way to analyze Web pages in search engines, generally, search engines according to the relevant link analysis algorithm, and then related to the Web page chain and the internal chain for detailed data collation and analysis, and according to the characteristics of these links, and then to the page for a rating and sorting, when the user search for a keyword , the search engine will be related to the theme of the Web page of these links to carry out a reasonable analysis, and then sort, and finally got the ranking of the structure, in this article, I want to discuss with you the theme is hits, and hits algorithm is a link analysis algorithm in a more representative one.

Hits algorithm in the application, is generally the use of the Hub page (many links in the Web page, and are all pointing to authoritative pages, generally are navigation or directory pages) and authority (is a large number of links to the page, that is, the authoritative type of Web pages) The links between the pages to strengthen the relationship between the link to give the value of the calculation, that is, the implementation of the algorithm is the search engine from the Internet to all pages into the Hub page and authority page, in the search engine view, a good hub page should be pointing to a lot of authoritative type of Web page, The authoritative value of the page should be a lot of links to the Hub page, so we came to the core of the hits algorithm:

First, we know that the hits algorithm is based on the subject query search engine algorithm, so when the user submitted to the search engine subject query, search engine based on the user's search words to carry out keyword matching query, while returning several items with the topic of highly relevant page set S, in these dependencies of the page collection, There will be a lot of links between pages and pages, so at this time search engine algorithm hits algorithm based on the characteristics of the links on the Web page to expand the page set S, the link on the collection page, the link to the Web page reference, and by the references of other pages are added to the collection, forming a new set T, At the same time, our requirements for the set T are:

1, T is the page associated with the collection of pages

2, the collection page in T should be highly relevant to the topic

3, T to contain a large number of hub and authority pages

After understanding the core idea of hits algorithm, we need to know how to calculate the weights of pages in the collection of Web pages according to the idea given by the algorithm to sort the search results, then the author through http:// Www.gscpp.net the way this site operates further anatomy of the hits algorithm: we can take the expanded page set T as a set matrix, at the same time, all of the hub pages in are considered as vertex set a, and all authoritative Web pages contained in the collection are considered vertex set B, where the hyperlinks in a page in A to B are set E, which forms a Two-point sg= (a,b,e). For any vertex a in the hub set a, the hub value of page A is represented by H (a), and a (b) is used to represent the authority value of the page in B. At the beginning H (a) =a (b) = 1, the b performs I operation modifies its A (b), an O operation modifies its H (a), then normalizes a (b), H (a), and so on repeatedly calculates the following Operation I,o until a (b), H (a) converges. (It is shown that the algorithm converges and is visible)

I operation: (1) O operation: (2)

Analysis: From the above algorithm thought we can explore many problems,

1, for example, if the user to the search engine submitted query subject, search engine to provide users with accurate search results must expand the search results, and from simple search results to enrich the development of a lot of time to analyze, extended the user request response time, Therefore, the search engine can not in the shortest possible time to provide users with search results to prove that the algorithm is a failure, unscientific.

2, a Web page contains a number of links, such as navigation links, advertising links, and the program automatically generated links, and the existence of these links will have an impact on the search results, in the hits algorithm in the search results appear in the page links are analyzed, Therefore, the pages that are referenced by these invalid links may appear in the search results.

3, hits algorithm to the expansion of the Web page will also lead to new problems appear, because the search results are generated again, it is inevitable to add a lot of pages when the collection is extended, and sometimes these pages have a bit of a relationship with the pages in the search results, but are referenced by the pages in the collection, So once there are a lot of such pages in the search results, the results from the hits algorithm will make our query-based queries broader, which means we may not get the exact results.

4, hits algorithm is based on the theme query, that is, the return of the result is based on the keyword exactly match, focus on the subject is highly related to the main community, and for those with less relevant links is rarely able to take into account, so it is easy to lead to the search results in the theme drift problem However, this problem cannot be solved for the time being, which is the biggest shortcoming. Article from Guangzhou website construction, website construction process: http://www.gscpp.net/site/2.html reprint must keep the link!

Related reading:

A5 Registration offer: 2013 GOMX global Network Marketing Conference

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.