How to obtain PageRank (3)

Source: Internet
Author: User
Keywords How

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Practical examples
Here is a practical example. If you don't quite understand what the following examples are doing, just think we can use the Octave program to solve the eigenvalue problem.

First, use the appropriate editor to make the following Octave script. (with a semicolon at the end of the line can eliminate the excess output, but this time to illustrate deliberately removed.) )

% Cat PAGERANK.M
#!/usr/bin/octave
# # PAGERANK.M-Simple GNU Octave script to compute PageRank (TM)

# #设置计时器.
Tic ();

# # According to the definition of PageRank, the pass probability row of link state from File I link to file J is defined as M (I,J)

M = [
0, 1, 1/2, 0, 1/4, 1/2, 0;
1/5, 0, 1/2, 1/3, 0, 0, 0;
1/5, 0, 0, 1/3, 1/4, 0, 0;
1/5, 0, 0, 0, 1/4, 0, 0;
1/5, 0, 0, 1/3, 0, 1/2, 1;
0, 0, 0, 0, 1/4, 0, 0;
1/5, 0, 0, 0, 0, 0, 0;
]
# #计算 The combination of the attribute values of all M and the intrinsic vector columns.

[v,d]= Eig (M)

# # preserves the intrinsic vector corresponding to the maximum value of the attribute value to eigenvector.

eigenvector = V (:, Find (ABS (DIAG (d)) ==max (ABS (DIAG (d))))

# # PageRank is the value that the eigenvector is normalized on the probability vector.
PageRank = eigenvector. /Norm (eigenvector,1)

# # Output calculation time.
Elapsed_time = TOC ()

(2003/7/23: Error correcting the above script.) )

Error: Eigenvector = V (:, Find (Max (ABS (DIAG (D))))
Positive: eigenvector = V (:, Find (ABS (DIAG (d)) = = MAX (ABS (DIAG (d))))
After running the pagerank.m script with Octave, the following results are obtained in the standard output.

% Octave PAGERANK.M
GNU Octave, Version 2.0.16 (I586-REDHAT-LINUX-GNU).
Copyright (C) 1996, 1997, 1998, 1999, John W. Eaton.
This are free software with absolutely NO WARRANTY.
For details, type ' warranty '.


M =

0.00000 1.00000 0.50000 0.00000 0.25000 0.50000 0.00000
0.20000 0.00000 0.50000 0.33333 0.00000 0.00000 0.00000
0.20000 0.00000 0.00000 0.33333 0.25000 0.00000 0.00000
0.20000 0.00000 0.00000 0.00000 0.25000 0.00000 0.00000
0.20000 0.00000 0.00000 0.33333 0.00000 0.50000 1.00000
0.00000 0.00000 0.00000 0.00000 0.25000 0.00000 0.00000
0.20000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000

V =

Columns 1 through 3:

0.69946 + 0.00000i 0.63140 + 0.00000i 0.63140 + 0.00000i
0.38286 + 0.00000i-0.28715 + 0.15402i-0.28715-0.15402i
0.32396 + 0.00000i-0.07422-0.10512i-0.07422 + 0.10512i
0.24297 + 0.00000i 0.00707-0.24933i 0.00707 + 0.24933i
0.41231 + 0.00000i-0.28417 + 0.44976i-0.28417-0.44976i
0.10308 + 0.00000i 0.22951-0.13211i 0.22951+ 0.13211i
0.13989 + 0.00000i-0.22243-0.11722i-0.22243 + 0.11722i

Columns 4 through 6:

0.56600 + 0.00000i 0.56600 + 0.00000i-0.32958 + 0.00000i
0.26420-0.05040i 0.26420 + 0.05040i 0.14584 + 0.00000i
-0.10267 + 0.14787i-0.10267-0.14787i 0.24608 + 0.00000i
-0.11643 + 0.02319i-0.11643-0.02319i-0.24398+ 0.00000i
-0.49468-0.14385i-0.49468 + 0.14385i 0.42562 + 0.00000i
-0.14749+ 0.38066i-0.14749-0.38066i-0.64118 + 0.00000i
0.03106-0.35747i 0.03106+ 0.35747i 0.39720 + 0.00000i

Column 7:

0.00000 + 0.00000i
-0.40825 + 0.00000i
-0.00000 + 0.00000i
0.00000 + 0.00000i
-0.00000 + 0.00000i
0.81650 + 0.00000i
-0.40825 + 0.00000i

D =

Columns 1 through 3:

1.00000 + 0.00000i 0.00000 + 0.00000i 0.00000 + 0.00000i
0.00000 + 0.00000i-0.44433 + 0.23415i 0.00000 + 0.00000i
0.00000 + 0.00000i 0.00000 + 0.00000i-0.44433-0.23415i
0.00000 + 0.00000i 0.00000 + 0.00000i 0.00000 + 0.00000i
0.00000 + 0.00000i 0.00000 + 0.00000i 0.00000 + 0.00000i
0.00000 + 0.00000i 0.00000 + 0.00000i 0.00000 + 0.00000i
0.00000 + 0.00000i 0.00000 + 0.00000i 0.00000 + 0.00000i

Columns 4 through 6:

0.00000 + 0.00000i 0.00000 + 0.00000i 0.00000 + 0.00000i
0.00000 + 0.00000i 0.00000 + 0.00000i 0.00000 + 0.00000i
0.00000 + 0.00000i 0.00000 + 0.00000i 0.00000 + 0.00000i
0.02731 + 0.31430i 0.00000 + 0.00000i 0.00000 + 0.00000i
0.00000 + 0.00000i 0.02731-0.31430i 0.00000 + 0.00000i
0.00000 + 0.00000i 0.00000 + 0.00000i-0.16595 + 0.00000i
0.00000 + 0.00000i 0.00000 + 0.00000i 0.00000 + 0.00000i

Column 7:

0.00000 + 0.00000i
0.00000 + 0.00000i
0.00000 + 0.00000i
0.00000 + 0.00000i
0.00000 + 0.00000i
0.00000 + 0.00000i
-0.00000 + 0.00000i

Eigenvector =
0.69946
0.38286
0.32396
0.24297
0.41231
0.10308
0.13989

PageRank =
0.303514
0.166134
0.140575
0.105431
0.178914
0.044728
0.060703

Elapsed_time = 0.063995

In the output of Octave, the characteristic value is represented as the diagonal component of the diagonal row D, and the intrinsic vector corresponding to each characteristic value is expressed as the column vector of the row V corresponding column. That is to say m * V = D * M is established. If the value of the plural attribute is included, there are 7 attribute values, of which the maximum value of the attribute λ is λ=1. The corresponding intrinsic vector is the real vector:

Eigenvector =
0.69946
0.38286
0.32396
0.24297
0.41231
0.10308
0.13989
The 1th column of row V. Note that the probability vector (element and equal to 1 of the N-ary nonnegative vector) in the derived vector is not standardized, but the vector's "size" equals 1. The expression is, σpi≠1, Σ (PI) 2=1. Here, the probability vector is standardized

PageRank =
0.303514
0.166134
0.140575
0.105431
0.178914
0.044728
0.060703
PageRank is the platoon. Note that the sum of all additions is 1. The calculation took only 0.064 seconds.

Evaluation of PageRank
The evaluation of PageRank is arranged in order (PageRank decimal 3 digits rounded).

Rank pagerank   file id    send link id  link ID
  1     0.304      1       2,3,4,5,7   2,3,5,6
  2      0.179     5       1,3,4,6     1,4,6,7
  3     0.166     2       1            1,3,4
  4     0.141      3       1,2          1,4,5
  5     0.105     4        2,3,5       1,5
  6     0.061      7       5           1
  7     0.045     6       1,5          5

The first thing to be concerned about is that the number of PageRank and backlinks is basically the same. No matter how many forward links to the link will almost not affect the PageRank, conversely how many backlinks are fundamentally determined by the size of the PageRank. However, this alone does not explain the significant difference between the 1th and 2nd digits (same, 3rd and 4th, and the difference between the 6th and 7th digits). In short, the great thing is that PageRank is not only determined by the number of backlinks.

Let's have a look at it in detail. The PageRank of Id=1 's file is 0.304, occupies the entire one-third, becomes the 1th place. In particular, it is important to note that all PageRank (0.166) numbers are obtained from the id=2 page ranked 3rd. The id=2 page has a reverse link from 3 places, and only a link to a id=1 page, so the link (for the id=1 page) gets all the PageRank numbers. However, just because the Id=1 page is the most forward link and Reverse link page, you can also understand that it is the most popular page.

In turn, the last id=6 page has only a weak evaluation of Id=1 's 15%, which can be understood to have a great impact because there is no link from PageRank's high id=1. In short, even if there is the same number of backlinks, link source page evaluation of the high and low also affect the level of PageRank.

represents the passage of links between pages (joined PageRank)

Actually try to calculate the income and expenditure of PageRank. Because λ=1 so the calculation is very simple, as long as the flow of respective page simply add. For example, the inflow of id=1,

Inflow = (rank issued by id=2) + (rank issued by id=3) + (rank issued by id=5) + (rank issued by id=6)
= 0.166+0.141/2+0.179/4+0.045/2
= 0.30375
PageRank in the error range. The same is the case for other page IDs. The above PageRank passage perspective represents this revenue and expenditure. The PageRank issued along the respective links equals the original PageRank of the page divided by the number of links that are issued, and the PageRank of the respective pages.

However, such a wonderful balance itself, of course, is not surprising to people who understand linear algebra. Because this is the nature of the attribute value and the intrinsic vector, in short the group of selected values is the intrinsic vector. But even then, the actual attempt to confirm, has been able to well use the PageRank method to consider.

The above is the basic principle of PageRank. What Google does is to deal with such very characteristic value problems on a large scale.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.