How to obtain PageRank (2)

Source: Internet
Author: User
Keywords Characteristic how express vector row

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

The calculation of PageRank is to find the intrinsic vector (excellent intrinsic vector) which belongs to the maximum characteristic value of this passage probability row.

This is because, when the linear transformation system t→∞ asymptotically, we are able to record it fundamentally according to the "absolute value maximum characteristic value" and the "inherent vector" of the transformed rows and columns. In other words, the probability process of passing probability ranks is a process of multiplying the ranks repeatedly, and can calculate the probability of the forward state.

Moreover, although it may sound difficult, the value of the property and the intrinsic vector is a basic mathematical method that can be closely analyzed. We are free to assign values to the initial values of vectors, but because of the constant multiplication of rows and columns, the resulting vectors are concentrated in a combination of certain numeric values. We refer to the combination of the stable numerical values as the intrinsic vectors, the characteristic scalar (scalar) in the intrinsic vector is called the characteristic value, and the computational method is always called the decomposition characteristic value, and the problem of solving the characteristic value is called the attribute value problem.

(* note) to N times the square row A to satisfy the Ax =λx number λ called A's characteristic value, called X is belongs to the λ intrinsic vector. If you can't adapt to the concept of the ranks, you can also consider the N two-dollar arrangement. At the same time, the vector can also be considered as the length of N of the ordinary (unary) arrangement.

Simple example
Let's try to compute PageRank in a simple example. First consider the 7 HTML files that have a link relationship like the one shown below. Also, the link between these HTML files is only closed in these 1-7 files. That is, there is no access to any links except for these documents. Also note that all pages have forward and reverse links (that is, no end point), which is an important assumption that will be presented later, not to delve into.

represents the link between pages

First, the adjacency list of this chart construct is expressed as an arrangement, with the following formula. That is, the ID of the link target is enumerated according to each link source ID.

Link source i D link target ID
1 2,3, 4, 5, 7
2 1
3 1,2
4 2,3,5
5 1,3,4,6
6 1,5
7 5
The adjacency row of the link relationship represented in this adjacency list is the 7x7 of the following columns. A row of only elements 0 and 1-bit graphs (bitmap matrix). Landscape view of line I indicates the file ID from the file I forward link.

A = [
0, 1, 1, 1, 1, 0, 1;
1, 0, 0, 0, 0, 0, 0;
1, 1, 0, 0, 0, 0, 0;
0, 1, 1, 0, 1, 0, 0;
1, 0, 1, 1, 0, 1, 0;
1, 0, 0, 0, 1, 0, 0;
0, 0, 0, 0, 1, 0, 0;
]
The PageRank of the passage probability ranks M, which is obtained after the inversion of A is inverted by dividing the values by their respective non 0 elements. The following is the 7x7 of the square. Landscape view of the non-0 elements in line I indicates a file ID (the reverse link source for file i) that points to the file I link. Note that the sum of the values of each column is 1 (full probability).

M = [
0, 1, 1/2, 0, 1/4, 1/2, 0;
1/5, 0, 1/2, 1/3, 0, 0, 0;
1/5, 0, 0, 1/3, 1/4, 0, 0;
1/5, 0, 0, 0, 1/4, 0, 0;
1/5, 0, 0, 1/3, 0, 1/2, 1;
0, 0, 0, 0, 1/4, 0, 0;
1/5, 0, 0, 0, 0, 0, 0;
]
Represents the PageRank vector r (queue for the number of levels of each page), the relationship between r = CMR (c is quantitative). In this case, R corresponds to the intrinsic vector in the linear algebra, and C corresponds to the reciprocal of the corresponding characteristic value. In order to get R, it is OK to decompose this square row M as an attribute value.

There are various numerical methods in the decomposition of the characteristic values, but this article will not be here to elaborate on the various ways to read an appropriate textbook (there must have been a textbook buried in your summer vacation). In this case, we will use the summary GNU Octave to actually compute the attribute value and the intrinsic vector.

(* Note) The GNU Octave is a programming language that supports numerical computing, similar to the descriptive and excellent MATLAB. The extended processing language is more suitable for row and column calculus, but it is basically similar to the language wind in C, so it is very readable. Of course, besides Octave MATLAB and Scilab are also very good language, but according to the GPL, Octave is the easiest to get.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.