Research on PageRank algorithm of personalized webpage weights

Source: Internet
Author: User

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

At present, the other common methods of personalized PageRank are modeled PageRank (modular PageRank) and Blockrank. The main features of these methods in the calculation method are the necessary optimization of the algorithm from the angle of efficiency.

The previous research on the accelerated PageRank algorithm mainly uses the sparse graph structure technique, such as Arasu and so on, they not only use the last iteration cycle generation value to calculate this round cycle value, but also use the value that this round cycle has produced to accelerate the calculation of this round cycle. The butterfly knot structure of Web network is proposed and used in the effective calculation of PageRank value. However, these methods are not very practical, the main reason is that the algorithm requires the Web network matrix sorting, this operation needs to follow the principle of depth search first network traversal, which is obviously a costly operation. Recently, Kamvar and so on also proposed some algorithms, using the continuous intermediate loop to infer the real PageRank better estimate, but still has the shortcoming which is affected by the PageRank algorithm initial parameter.

At present, the analysis of the structure of Web map mainly focuses on the properties of the graph, such as the distribution of nodes, the link of web pages and the modeling of the structure. However, these studies do not emphasize how to use these attributes effectively to speed up the hyper-chain analysis.

Many scholars have proposed some improvements, such as Raghavan and Garcia-molina, such as the use of host name or URL implied web structure to represent a more successful approach to the Web map, such as Jeh and widom through the limited modification of the weight of the Web page to express the weight of personalized pages, This weight value can reflect the user-specified initial interest page. Since the calculation of personalized views requires iterating through the pages in the entire Web diagram structure, which can only be implemented during run time, it is unrealistic to compute and store all personalized views beforehand. Using new graph theory results and techniques, they build a "preference vector" for expressing personalized views (partial vector), which can be shared in a personalized view of different users, with a reasonable percentage of how much computing and storing costs and the number of views. In the calculation, the incremental calculation can also be used, which makes it possible to use the preference vector to construct the personalized view during the query. This preference vector is the personalized PageRank vector (personalized PageRank VECTOR,PPV), which, in layman's sense, is a personalized view of a Web page. Following this PPV to sort the results of a Web page can effectively express user preferences.

In simple terms, each PPV length is a mantra, that is, the number of Web pages. But because it takes more than a few times to traverse a Web page map from a fixed angle, it is clearly impossible to respond to a user query online. From another point of view, the total number of PPV vectors will reach 2n (n is the total number of pages), which is obviously too large to achieve offline storage. Therefore, you must limit the pages appearing in the P collection to a subset of the Hub Web page collection H. The H collection usually contains pages that are most interesting to users. In practice, the H set can be a collection of pages with a higher PageRank value (important pages), pages in a human-categorized directory (such as Yahoo and Open directory), important pages for a particular enterprise or program, and so on. The h set can be seen as the basis of computational personalization. This method based on PPV, unlike the traditional way, can be a good proportional scaling relationship with the H set size, and the technique can achieve a similar effect on a larger PPV set, satisfying some personalized computing requirements for arbitrary preference Web pages.

In addition, there are some algorithms for improving computational results.

If a more successful approach is the Blockrank method, it is mainly to make full use of the link structure between Web pages to present a block structure feature to improve the efficiency of the algorithm. Many scholars have demonstrated the characteristics of Web network block structure. For example, according to Bharat and other analysis, through comparative analysis of web link structure, you can find that nearly 80% of the Web hyperlinks are the same site within the host of different pages, and the different host sites between the hyperlinks of the Web page is only about 20%. If you remove unwanted dead links, this proportion is more unbalanced, similar to 9:l. Further limit the scope of the study to the domain name level, the above two proportion has a significant increase, 84:16, two 95:5, imbalance significantly intensified. Generally in a host site, most of the hyperlinks due to navigation and site arrangements, often in several key pages with more internal links. For example, college sites generally have a high proportion of links to pages such as libraries, academic offices and students ' offices. In fact, this kind of internal link is higher, the external link is low in the different level of Web page diagram structure widespread existence, has produced the obvious block phenomenon, and most of the block structure is far smaller than the entire Web diagram structure.

The block structure of this kind of web network can help to compute PageRank quickly and provide a good basis for expressing personalized PageRank. The idea of this algorithm is described as follows: First, the PageRank value of each host's web page is computed, and the relative important weights within the host are obtained. These localized PageRank vectors can further the approximation of global PageRank values by weighting the relative importance of different Web page blocks, and then use this PageRank vector as the starting vector of the standard PageRank algorithm. Admittedly, personalized PageRank is a very appealing idea, but it requires an efficient iterative computation of a large scale of pagerank vectors, The computational complexity of personalized PageRank values can be effectively reduced by using the Blockrank algorithm and simply restricting surfers ' random surfing behavior. The limit is that when he gets tired, he doesn't choose from many pages, but from the host site. That is, there is no need to look at the Surfers jump page, but only the site to jump. At this time, the dimension of the constructed personalization vector is the number k of the host in the Web network, and the element value of the vector also reflects the surfer's preference to the different hosts. With this limitation, localized PageRank vectors do not need to be changed for different personalization users. In fact, the localized PageRank vectors will not change because of the changes in the matrix B structure, only the Blockrank vector 6 will change because of the different personalization features, so it is only necessary to recalculate each individual PageRank vector based on the block structure.

It should be said that, whether in theory or in practice, the use of personalized PageRank to achieve the search engine personalized service is a very feasible choice, to adapt to the Web network resources to the information retrieval characteristics required. It not only considers the important index of Web page objectivity weight in the content of recommended results, but also the performance of this method is high, and the main calculation work is done off-line. However, these existing personalized PageRank technology require users to log in and actively submit personalized information, but ignore the user's understanding of the Web page, not mining user behavior, collecting user personalized information is not natural, which obviously increased the user's burden of use. So, while saving users the time to pick related pages, users need to spend more time to personalize their search. From this, we can see that to explore other effective forms of user personalized information will be the key to improve the effectiveness of this method, this book is mainly to study, to explore better personalized information collection and expression methods to apply to personalized PageRank algorithm, this method is more objective and comprehensive. This article is provided by www.q322.com Webmaster

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.