"Algorithm" PageRank

Source: Internet
Author: User

1, the basic concept: PageRank is based on " from many high-quality web links to take over the page, must be a quality page " regression relationship to determine the importance of all Web pages

2, the specific algorithm: the PageRank of a page divided by the positive link that exists in the page, the resulting values and positive links to the page point to add the PageRank, that is, the linked page PageRank.

3, PageRank concept map:

4, PageRank the main points:

Number of backlinks (popularity indicators in pure sense)

Whether backlinks are from highly recommended pages (based on popular metrics)

The number of links to the Backlink source page (the probability indicator is selected)

5, examples to illustrate the specific process of PageRank

Suppose a small group consisting of 4 pages: A, B, C, D. If all pages are linked to a, then A's PR (PageRank) value will be b,c and D.

PR (A) = PR (B) + PR (C) + PR (D)

Continue to assume that B also has links to C, and D also has links to 3 pages that include a. A page cannot be voted 2 times. So b gives each page a priceticket. With the same logic, D cast only one-third of the votes on the PageRank of a.

PR (A) = PR (B)/2 + PR (C) + PR (D)/3

In other words, the PR value of a page is divided by the total number of links

PR (A) = PR (b)/L (b) + PR (c)/L (c) + PR (d)/L (d)

In order to prevent the non-chain of the page passed out of the PR is 0,google through the mathematical system to each page assigned a very small value (1-d)/n, since the page does not have an outside chain or users stop browsing direct jump

Description

The minimum value set for each page in the 1998 text of Sergey Brin and Lawrence Page is 1-d , not here (1-d)/n (You can also refer to the English Wikipedia entry for this section). So the PageRank of a page is computed by the PageRank of the other pages. Google repeatedly calculates the PageRank of each page. If you give each page a random PageRank value (not 0), then after repeated calculations, the PR value of these pages tends to be stable, that is, the state of convergence.

Through the above description, a simple summary of the PageRank formula is as follows:

Description: Dealing with "pages that have no outward links" (these pages are like "black holes" that will devour the probability of the user continuing to browse down), (here is called the damping factor (damping factor), which means that at any given moment, the probability that a user has reached a page and continues to navigate backwards.) (That is, the probability that the user stops clicking and randomly jumps to the new URL) is used on all pages, estimating the probability that the page may be bookmarked by the surfer.

is the page being researched, is linked into A collection of pages, is a chain out the number of pages, and is the number of all pages.

The PageRank value is a feature vector in a special matrix. This feature vector is

  

R is the answer to the equation

  

If not, and for each of them, equals 0.

  

6, simulation of the relationship between HTML pages, Java implementation PageRank algorithm:

1  PackageCOM.PACHIRA.D;2 3  Public classPageRank {4      Public Static voidMain (string[] args) {5         Double[] G = {6{0, 1, 1/2.0, 0, 1/4.0, 1/2.0, 0},7{1/5.0, 0, 1/2.0, 1/3.0, 0, 0, 0},8{1/5.0, 0, 0, 1/3.0, 1/4.0, 0, 0},9{1/5.0, 0, 0, 0, 1/4.0, 0, 0},Ten{1/5.0, 0, 0, 1/3.0, 0, 1/2.0, 1}, One{0, 0, 0, 0, 1/4.0, 0, 0}, A{1/5.0, 0, 0, 0, 0, 0, 0} -                 }; -         Double[] PR = {1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0}; the         DoubleAlpha = 0.85; -         DoubleEPS = 0.0000001; - PageRank (PR, G, Alpha, EPS); -     } +      Public Static voidShowvector (Double[] v) { -          for(inti = 0; i < v.length; i++) { +System.out.print (V[i] + "\ T"); A         } at System.out.println (); -     } -      Public Static voidShowmatrix (Double[] m) { -          for(inti = 0; i < m.length; i++) { -              for(intj = 0; J < M[i].length; J + +) { -System.out.print (M[i][j] + "\ T"); in             } - System.out.println (); to         } +     } -  the     /** * * Calculate the main function of PageRank $      * @paramvector initial PageRank vectorsPanax Notoginseng      * @paramMatrix Initial HTML reverse link probability matrices -      * @paramAlpha Damping Factor the      * @paramEPS convergence threshold value +      * @return A      */ the      Public Static Double[] PageRank (Double[] Vector,Double[] Matrix,DoubleAlphaDoubleEPS) { +         Double[] Vectormove =Vector; -          while(true) { $ showvector (vector); $Vectormove =Vectorxmatrix (vector, Matrix, alpha); -             Doubledis =Norm (vector, vectormove); -             if(Dis <=EPS) { the                  Break; -             }WuyiVector =Vectormove; the         } -         returnVector; Wu     } -  About     /** $ * Calculates the error of two vectors -      * @paramVector -      * @paramVectormove -      * @returnthe error of the vector A      */ +      Public Static DoubleNormDouble[] Vector,Double[] vectormove) { the         if(Vector.length! =vectormove.length) { -             return-1; $         } the         Doublesum = 0; the          for(inti = 0; i < vector.length; i++) { theSum + = Math.Abs (Vector[i]-vectormove[i]); the         } -         returnsum; in     } the  the     /** About * Calculate PageRank value the      * @paramMatrix HTML Reverse link probability the      * @paramvector PageRank vectors the      * @returnNew PageRank vector + * @url:http://zh.wikipedia.org/zh/%E7%9F%A9%E9%98%B5 - * The multiplication of two matrices is only defined if the number of the first matrix A is equal to the number of rows in the other matrix B.  the * If A is an MXN matrix and B is the NXP matrix, their multiply-AB is an MXP matrix, one of its elementsBayi * | 1 0 2|   |3 1| |  (1*3 + 0*2 + 2*1)   (1*1 + 0*1 + 2*0) | |5 1| the * |-1 3 1| x 1| = | ( -1*3 + 3*2 + 1*1) ( -1*1 + 3*1 + 1*0) | = |4 2| the * | 0| - * | 1 0 2| |    1 | |   1*1 + 0*1 + 2*1| | 3 | - * |-1 3 1| x | 1 | = |-1*1 + 3*1 + 1*1| = | 3 | the * | 1 |  the * |3 1|  the * | 1 1 1| x 1| = | (1*3 + 1*2 + 1*1) (1*1 + 1*1 + 1*0) | = |6 2| the * | 0|  -      */ the      Public Static Double[] Vectorxmatrix (Double[] Vector,Double[] Matrix,DoubleAlpha) { the         if(NULL= = Vector | | Matrix = =NULL|| Vector.length = = 0 | | Matrix.length = = 0 | | Vector.length! = matrix[0].length) { the             return NULL;94         } the         Double[] result =New Double[vector.length]; the          for(inti = 0; i < matrix.length; i++) { the             Doublesum = 0;98              for(intj = 0; J < Matrix[i].length; J + +) { AboutSum + = vector[j] *Matrix[i][j]; -             }101sum = Alpha * sum + (1-alpha)/vector.length;102Result[i] =sum;103         }104         returnresult; the     }106}

The above content is excerpted from wiki's PageRank;

"Algorithm" PageRank

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.