PageRank is a tool that is not easily deceived in computing the importance of Web pages, and PageRank is a function that assigns a real value to each page in the Web (or at least a portion of a Web page that crawls and discovers a connection to it). His intention is that the higher the PageRank of a webpage, the more important it is. There is no fixed PageRank allocation algorithm.
For the PageRank algorithm pushed to me here do not want to do too much explanation, interested can see the information themselves, here I directly give a page PageRank formula:
P (N) =a/g+ (1-a) * Summation (P (m)/C (M)) (M belongs to L (n))
Where: G for the number of pages, P (n) is the PageRank value of page n, C (m) for page m contains the number of connections, a is a random jump factor, where the summation symbol can not be printed, I directly use the text given, L (n) represents a page that exists to page n linked to the collection.
The MapReduce implementation of PageRank is given below, where the input file must require the following format:
Input file Pagerank.txt:
The initial PageRank value of the page ID; {The page ID collection (that is, the out-chain collection) that points to the link contained in page n,n};{for page N, a collection of page IDs containing page n links (that is, the in Chain Collection)}; Contains the number of links
Note: This middle must be a semicolon delimited
1 0.2; {2,4}; {5};2
2 0.2; {3,5}; {1,5};2
3 0.2; {4}; {2,5};1
4 0.2; {5}; {1,3};1
5 0.2; {A-i}; {2,4};3
Distributed cache file RankCache.txt
Rank Page ID: page PageRank value, page ID: page PageRank value, page ID: page PageRank value ....
Rank 1:0.2,2:0.2,3:0.2,4:0.2,5:0.2
Two input files are introduced, the following is the mapreduce implementation of the PageRank algorithm: Of course, the output path is set by itself
Package Soft.project;import Java.io.bufferedreader;import Java.io.bufferedwriter;import java.io.File;import Java.io.filenotfoundexception;import Java.io.filereader;import Java.io.filewriter;import java.io.IOException; Import Java.util.arraylist;import java.util.hashmap;import Java.util.hashtable;import Java.util.Iterator;import Java.util.list;import Java.util.map;import Java.util.vector;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.filecache.distributedcache;import Org.apache.hadoop.fs.filesystem;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.io.writable;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.Mapper; Import Org.apache.hadoop.mapreduce.reducer;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;public class PageRank {private final static String Localinputpath = "/home/hadoop/teSt/mapreduce/pagerank.txt ";//private final static String Hdfsinputpath =" hdfs:/192.168.0.1:9000/user/hadoop/ PageRank ";p rivate final static String Localoutputpath ="/home/hadoop/test/mapreduce/pagerank ";p rivate final static String Hdfsoutputpath = "Hdfs:/192.168.0.1:9000/user/hadoop/pagerank";p rivate final static String rankcachepath= "/ Home/hadoop/test/mapreduce/rankcache.txt ";p rivate static list<rankresult> pageranklist=new Vector< Rankresult> ();p rivate final static double random = 0.85; Random jump factor private final static double stopfactor=0.001; The last and this time each page PageRank difference of absolute value and less than Stopfactor stop iteration private final static long G = 5; Initial number of pages private static class Rankresult{private String order= "";p rivate double rank=0; @SuppressWarnings ("unused") Public Rankresult () {}public Rankresult (String order,double rank) {This.order=order;this.rank=rank;}} private static Class Prmapper extendsmapper<longwritable, text, text, text> {private String KeyInfo = "";p rivate Str ing Valueinfo= ""; @Overrideprotected void Map (longwritable key, text value,mapper<longwritable, text, text, Text>. Context context) throws IOException, interruptedexception {string[] split = Value.tostring (). Split (";"); String outlink[] = Split[1].split ("[{}]") [1].split (",");d ouble pagerank = double.parsedouble (Split[0].split ("\\s") [1 ]);d ouble c = double.parsedouble (split[3]);d ouble k = pagerank/c;/*system.out.println ("page:" + split[0].split ("\\s") [ 0] + "PageRank:" + PageRank + "C:" + C); */for (String page:outlink) {context.write (new text (page), new text (String.valu EOf (k)));//system.out.println ("page:" + page + "Ragerank:" + K);} WriteNode (value, context);} private void WriteNode (text value,mapper<longwritable, text, text, Text>. Context context) throws IOException, interruptedexception {String split[] = value.tostring (). Split ("\\s"); valueinfo = Split[1].split (";", 2) [1];keyinfo = Split[0];context.write (new text (KeyInfo), new text (valueinfo));/* System.out.println ("KeyInfo:" + KeyInfo + "Valueinfo:" + valueinfo); */}}private static class Prcombiner extends Reducer<text, text, text, text> {@Overridepr otected void reduce (text key, iterable<text> value,reducer<text, text, text, Text>. Context context) throws IOException, interruptedexception {String v = "";d ouble pagerank = 0;for (Text text:value) {Strin G valuestring = text.tostring (); if (Valuestring.contains ("{")) {v = valuestring;} else {pagerank + = double.parsedouble (valuestring);}} if (V.equals ("")) {Context.write (Key, New Text (String.valueof (PageRank)));} else {String s = pagerank + ";" + v;context.write (key, new Text (s));}}} private static class Prreducer extends Reducer<text, text, text, text> {private list<double> ranklist=new Vect Or<double> ((int) G); Whether each job is re-creating a ranklist and Rankmap??? Private Hashtable<integer, double> rankmap=new Hashtable<integer, double> (); @Overrideprotected void Setup (Reducer<text, text, text, Text>.) Context context) throws IoexceptIon, Interruptedexception {Configuration conf=context.getconfiguration (); int Order=integer.parseint (Conf.get (" Order ")); System.out.println ("..... ............................. Path Cachepath[]=distributedcache.getlocalcachefiles (conf); if (Cachepath==null | | cachepath.length>0) {for (path P : CachePath) {System.out.println ("Reduce cache:" +p.tostring ());} System.out.println ("CachePath length:" +cachepath.length); Getranklist (cachepath[order-1].tostring (), context);} else {System.out.println ("CachePath ==null | | CachePath ' s lenth is 0 ");}} @Overrideprotected void reduce (text key, iterable<text> value,reducer<text, text, text, Text>. Context context) throws IOException, interruptedexception {double pagerank = 0; string node = ""; for (Text v:value) {String pstring = v.tostring (); SYSTEM.OUT.PRINTLN ("Reduce key=" +key.tostring () + "reduce value=" + pstring); String split[] = Pstring.split (";"); if (split.length = = 1) {//pstring is the same as 0.2+pagerank + = Double.parsedoUble (pstring);} else if (!split[0].contains ("{")) {//pstring is the same as 0.2;{ 2,4}; {1,3};2pagerank + = double.parsedouble (split[0]); node = Pstring.split (";", 2) [1];} else if (Split[0].contains ("{")) {//pstring is the same as {2,4};{ 1,3};2node = pstring;}} PageRank = random/g + (1-random) * Pagerank;node = pagerank + ";" + node; SYSTEM.OUT.PRINTLN ("Reduce key=" + key.tostring () + "node_value=" + node); Rankmap.put (Integer.parseint (key.tostring () ), PageRank); Add the PageRank value of each node to RANKMAPIF (!node.equals ("")) Context.write (Key, New Text (node));} @Overrideprotected void Cleanup (reducer<text, text, text, Text>. Context context) throws IOException, interruptedexception {System.out.println ("....."). Invoke Cleanup () ......................."); System.out.println ("ranklist.size=" +ranklist.size () + "rankmap.size=" +rankmap.size ()); Configuration configuration=context.getconfiguration (); String Order=configuration.get ("Order"); System.out.println ("Order:" +order+ "Invoke CleaNup () ... "); if (Ranklist.size () ==g && rankmap.size () ==g) {Double Gammar=0;int length=ranklist.size ( ), int ordernum=integer.parseint (order), if (ordernum>1) {for (int i=1;i<=length;i++) {Gammar+=math.abs ( Rankmap.get (i)-ranklist.get (i-1));} String s= "+ordernum+" and "+ (ORDERNUM-1) +" sub-iteration difference: ";p ageranklist.add (New Rankresult (S,gammar));} Flushcachefile (RANKMAP);} Else{system.out.println ("ranklist.size ()!=g | | Rankmap.size ()!=g "+" ranklist.size (): "+ranklist.size () +" rankmap.size (): "+rankmap.size ());}} private void Flushcachefile (Hashtable<integer, double> rankmap) {File file =new file (Rankcachepath); StringBuffer stringbuffer=new stringbuffer (); int length=rankmap.size (); if (length==g) {BufferedWriter writer=null; Stringbuffer.append ("Rank"). Append ("\ T"), for (int i=1;i<=g;i++) {stringbuffer.append (i+ ":" +rankmap.get (i) + ",") ;} String string=stringbuffer.tostring (). substring (0,stringbuffer.tostring (). Length ()-2); System.out.println ("StringBuffer:" +string); try {writer=new BufferedwritER (new FileWriter (file, false)); Writer.write (string); Writer.close ();} catch (IOException e) {e.printstacktrace ();}} ELSE{SYSTEM.OUT.PRINTLN ("Reduce rankmap length not enough g, do not execute Flushcachefile");}} private void Getranklist (String path,reducer<text, text, text, Text>. Context context) {FileReader reader = null;try {reader = new FileReader (new File);} catch (FileNotFoundException e) {E.printstacktrace ();} BufferedReader in=new BufferedReader (reader); StringBuffer stringbuffer=new StringBuffer (); String string= ""; try {while ((String=in.readline ())!=null) {Stringbuffer.append (String)}} catch (IOException e) { E.printstacktrace ();} String value=stringbuffer.tostring (). Split ("\ t") [1]; SYSTEM.OUT.PRINTLN ("Reduce ranklist value:" +value); String Split[]=value.split (","); for (string pagerank:split) Ranklist.add (double.parsedouble (Pagerank.split (":") [1] ));}} private Static Boolean Deleteoutput (Boolean islocalfile, Configuration conf) throws IOException {if (islocalfile) {File fi Le = new File (Localoutputpath); return DeleteFile (file);} else if (!islocalfile) {FileSystem HDFs = filesystem.get (conf); Boolean isdelete = Hdfs.delete (new Path (Hdfsoutputpath), t Rue); return isdelete;} Elsereturn false;} private static Boolean DeleteFile (file file) {if (File.isfile ()) {return file.delete ();} else if (File.isdirectory ()) {STR ing FilePath = File.getabsolutepath (); string[] list = File.list (); for (string subfile:list) {string path = FilePath + "/" + subfile; File Sonfile = new file (path);d eletefile (sonfile);} File.delete ();} Return file.exists ()? False:true;} public static Job Getjob (Configuration conf,string input,string output) throws IOException {//configuration conf=new conf Iguration ();/*if (Deleteoutput (true,conf)) {System.out.println ("Delete output Success");} else {System.out.println (" Delete Fail,exit program "); System.exit (1);} */job job = new Job (conf, "PageRank"), Job.setjarbyclass (Pagerank.class);D istributedcache.addcachefile (New Path ( Rankcachepath). Touri (), conf); Job.setmapperclass (PRMAPPER.CLASS); Job.setmapoutputkeyclass (Text.class); Job.setmapoutputvalueclass (Text.class); Job.setcombinerclass ( Prcombiner.class); Job.setreducerclass (Prreducer.class); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (Text.class); Fileinputformat.addinputpath (Job, new Path (input)); Fileoutputformat.setoutputpath (Job, new Path (output); return job;} public static void run (int number) throws IOException, ClassNotFoundException, interruptedexception{configuration Configuration = new configuration (); You got a problem? One configuration per job?? Deleteoutput (True, configuration); int i=1; String input= ""; String output= "", while (I<=number) {System.out.println ("i=" +i+ "Pageranklist.length:" +pageranklist.size ()); if (i >=3 && Pageranklist.get (i-3). Rank<=stopfactor) {System.out.println ("********pageranklist.get (" + (i-3 + "). rank=" +pageranklist.get (i-3). rank+ "<=" +stopfactor+ "" + "satisfies the iteration termination condition, ends the iteration **************************"); if (i==1) {input=localinputpath;output=localoutputpath+ "/trash"; System.out.println ("No. 0 Time mapreduce*************************************** "); Configuration.set (" Order ", String.valueof (0)); Job job=getjob (configuration,input, output); Job.waitforcompletion (true);} else {input=output;} output=localoutputpath+ "/" +i; System.out.println ("*******************" +i+ "secondary mapreduce***************************************"); Configuration.set ("Order", string.valueof (i)); Location is very important, remember to put it here!!! Job job=getjob (configuration,input, output); Job.waitforcompletion (true); i++;}} public static void Printgap () {int num=pageranklist.size ();iterator<rankresult> Iterator=pageranklist.iterator (); int i=1;while (Iterator.hasnext ()) {Rankresult rankresult=iterator.next (); System.out.print (rankresult.order+rankresult.rank+ ""); if (i%3==0) System.out.println (); i++;}} public static void Main (string[] args) throws Ioexception,classnotfoundexception, interruptedexception {int n=10; Long Start=system.currenttimemillis (); Pagerank.run (n); Pagerank.printgap (); Long End=system.currenttimemilLis (); System.out.println ("\ n Iteration" +n+ "Times Total Cost:" + (End-start)/60000+ "min" + ((end-start)%60000)/1000+ "seconds" + (End-start)%1000+ "milliseconds") ;}}
MapReduce implementation of PageRank algorithm