PageRank Program:
File contents:
Page1 Page3
Page2 Page1
Page4 Page1
Page3 Page1
Page4 Page2
Page3 Page4
def computecontribs (Neighbors,rank):
For neighbor in Neighbors:yield (neighbor, Rank/len (neighbors))
Links = sc.textfile ("Tst001.txt"). Map (Lambda Line:line.split ()). Map (lambda pages: (pages[0],pages[1])) \
. Distinct (). Groupbykey (). Persist ()
Ranks=links.map (Lambda (page,neighbors): (page,1.0))
In [4]: for x in range (1):
...: print "links Count:" +links.count ()
...: print "ranks count:" Ranks.count ()
In [all]: for x in range (3):
....: Contribs=links.join (ranks). FlatMap (Lambda (page, (Neighbors,rank)): Computecontribs (Neighbors,rank))
....: Ranks=contribs.reducebykey (lambda v1,v2:v1+v2). Map (Lambda (page,contrib): (page,contrib*0.85+0.15))
....:
For rank in Ranks.collect (): Print rank
(U ' page2 ', 0.394375)
(U ' page3 ', 1.2619062499999998)
(U ' page4 ', 0.8820624999999999)
(U ' page1 ', 1.4616562499999997)
[Spark] [Python] PageRank Program