Search engine Optimization Relevance ranking technology

Source: Internet
Author: User

Intermediary transaction SEO diagnosis Taobao guest Cloud host technology Hall

Relevance, is the focus of search engine optimization. But for the relevance of the search engine working principle, I believe that most of the seoer for the lack of understanding. Hangzhou SI billion is committed to search engine technology in the relevance ranking technology research to have many years of time. As a professional SEO for search engine algorithm research is necessary, although said, we can not know all the search engine algorithm. But only need our mainstream search engine technology direction, you can know the pulse of the search engine era.

Correlation ranking technology is mainly determined by the characteristics of search engines. First, the number of Web pages that modern search engines can access has reached 1 billion, and even if users are only searching for a small portion of it, search engines based on Full-text search can return thousands of pages. Even if these results pages are required by the user, the user is not likely to browse through all the pages, so the user's most interesting results can be placed in front of the page, it is bound to enhance the satisfaction of search engine users. Second, the search engine user's own search professional ability is usually very limited, in the most common keyword search behavior, users are generally just a few words key. For example, Spink and other search engines such as excite have done an experimental survey of nearly 300 users, found that the input of each person's search word for 3.34. Some domestic scholars also have similar conclusion, found that about 90% of the user input Chinese search word for 2~6, and 2 words mostly, accounting for 58%, followed by 4 words (about 18%) and 3 words (about 14%). Too few search words can not really express the user's search requirements, and users usually do not go to the complex logic structure, only a relatively small number of users for Boolean logic retrieval, restrictive search and advanced search methods, only 5.24% of the search contains Boolean logic operators. Some of the domestic research results also show that about 40% of the users can not correctly use field search or two retrieval, about 80% users can not correctly use advanced search function, and even found that users lack the motivation to learn complex search skills, Most users hope that search engines can automatically construct valid searches for them. Because of the lack of the retrieval personnel in the past online retrieval, there is a de facto gap between the user's actual retrieval behavior and the user's ideal retrieval behavior, and the unsatisfactory retrieval results are not surprising. Because of this feature, the search engine must try to put the user most want the results of the page as far as possible in front of the results of the page, this is the Web page relevance ranking algorithm in search engines why very important reason.

At present, the relevance ranking technology mainly has the following: First, based on the traditional information retrieval technology, it mainly uses the keyword itself in the document to the importance of the document and user query requirements of the relevance of measurement, such as the use of the Web page in the frequency and location of keywords. In general, the retrieved Web pages contain more query keywords, the greater the relevance, and the greater the distinction between the keywords, and the query keyword if it appears in such important positions as the title field, it is more relevant than the text appears. Second, the hyper-chain analysis technology, the use of this technology representative search engine has Google and Baidu and so on. Compared with the former, it is based on the importance of Web page recognition as the relevance of the retrieval results. From the design point of view, it pays more attention to the third party to the Web page recognition, such as a large number of pages linked to the Web page is widely recognized as an important page, and according to the location and frequency of the traditional method of keyword is only a form of web page self recognition, lack of objectivity. Finally, there are other ways to customize the collation, such as a user-defined way. The Skynet FTP search engine in Peking University uses this sort of arrangement, which allows the user to choose a specific sort of index such as time, size, stability, and distance to sort the results pages. Again, such as the fee ranking model, it as a major search engine profit means, in the network portal characteristics of the large-scale search engines are widely used, but the fear of affecting the objectivity of search results, this way is not their mainstream ranking, but only as a supplement to show in the paid search column.

Correlation ranking technology mainly relies on the implementation of hyper-chain analysis technology. Hyper-chain analysis technology can provide a variety of functions, the main function of which is to solve the results of Web page relevance ranking problem. It mainly uses the various hyperlinks that exist between the webpage, analyzes the reference relationship between the pages, and calculates the importance weight of the page according to the number of pages. It is generally believed that if a page has a hyperlink to the B page, the equivalent of a Web page cast a b page A vote, that is, a recognized the importance of the B Web page. In depth understanding of the hyper-chain analysis algorithm, the whole Web page document Set can be viewed as a topological map according to the link structure, each of these pages constitute a node in the diagram, the links between the pages constitute the end of the point between the side, according to this idea, you can according to the size of each node and the degree of evaluation of the importance of the page.

For hyper-chain analysis, the representative algorithm is mainly the PageRank algorithm of page design and the hits algorithm of Kleinberg creation. Among them, the PageRank algorithm in the actual use of the effect is better than the hits algorithm, which is mainly due to the following reasons: First, the PageRank algorithm can be one-time, offline and independent of the query to the Web page to calculate the importance of the estimate, and then in the specific user query, In combination with other query index values, the query results are sorted together, thus saving the computation cost of the system query, and secondly, the PageRank algorithm uses the whole Web page collection to compute, unlike the hits algorithm is susceptible to the local link trap to produce "subject drift" phenomenon, So now this technology is widely used in many search engine systems, the success of Google search engine also shows that the hyperlink analysis as a feature of the Web page relevance ranking algorithm is increasingly mature.

PageRank technology is based on the assumption that for a page a in the Web, if there is a link to page A, you can take a as an important page. PageRank that the number of links to the Web page can reflect the importance of the page, but because the real people in the design of a variety of hyperlinks in the web is often not strict, there are many hyperlinks in the Web page is purely for such purposes as Web site navigation, commercial advertising and other production, It is clear that such pages contribute little to the importance of the pages it points to. However, because of the complexity of the algorithm, PageRank not too much consideration of the Web page hyperlink content on the importance of the impact of the page, but only a relatively simple use of two methods: first, if a page of the chain of pages too many, it on the importance of each link to the recognition of the ability to reduce; If a Web page because of its own link to the number of Web pages to reduce the importance of it, it is the importance of the link to the Web page also reduced the impact. Therefore, in the actual calculation, the importance weight of page A is proportional to the value of the link to the page A, and the number of pages linked to page A is inversely proportional. Since it is not possible to know the importance of Web a itself, it is necessary to calculate the important weights of each Web page iteratively. In other words, the importance of a Web page depends on the importance of other pages.

Author: Hangzhou SI billion Network Technology Co., Ltd.

Original load: http://www.seo.com.cn now contains www.buxian123.com webmaster space

Copyright Notice: Original works, allow reprint, reprint, please be sure to hyperlink form to indicate the original source of the article, the author's information and the

Statement。 Otherwise, legal liability will be held.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.