/** Licensed to the Apache software Foundation (ASF) under one or more * Contributor license agreements. See the NOTICE file distributed with * This work for additional information regarding copyright ownership. * The ASF licenses this file to you under the Apache License, Version 2.0 * (the "License"); You are not a use of this file except in compliance with * the License. Obtain a copy of the License at * *http://www.apache.org/licenses/LICENSE-2.0* * Unless required by applicable or agreed to writing, software * Distributed under the License is distribute D on ' As is ' BASIS, * without warranties or CONDITIONS of any KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ PackageOrg.apache.spark.graphxImportScala.reflect.ClassTagImportorg.apache.spark.Logging/** * Implements a pregel-like bulk-synchronous message-passing API. * * Unlike the original Pregel API, the GraphX Pregel API factors The sendMessage computation over * edges, enables the ME Ssage sending computation to read both vertex attributes, and constrains * messages to the graph structure. These changes allow for substantially + efficient * distributed execution while also exposing greater flexibility for g raph-based computation. * * @example We can use the Pregel abstraction to implement PageRank: * {{{* Val pagerankgraph:graph[double, Double] = g Raph *//Associate the degree with each vertex *. outerjoinvertices (graph.outdegrees) {* (vid, vdata, deg) = Deg.getorelse (0) *} *//Set the weight on the edges based on the degree *. maptriplets (E = 1.0/e.srcattr) * Set the vertex attributes to the initial PageRank values *. mapvertices ((ID, attr) = 1.0) * Def Vertexprogra M (Id:vertexid, Attr:double, msgsum:double): Double = * Resetprob + (1.0-resetprob) * msgsum * def sendMessage (Id:vertexid, edge:edgetriplet[double, Double]): iterator[(Vertexid, Doub Le)] = * Iterator ((Edge.dstid, edge.srcattr * edge.attr)) * def messagecombiner (a:double, b:double): Double = a + b * Val initialmessage = 0.0 *//Execute Pregel for a fixed number of iterations. * Pregel (Pagerankgraph, Initialmessage, Numiter) (* Vertexprogram, SendMessage, Messagecombiner) *}}} **/Object PregelextendsLogging {/*** Execute a pregel-like iterative vertex-parallel abstraction. The * user-defined vertex-program ' Vprog ' is executed in parallel on * each vertex receiving any inbound messages and Computing a new * value for the vertex. The ' sendmsg ' function is then invoked on * All out-edges and are used to compute a optional message to the * Destinat Ion Vertex. The ' mergemsg ' function is a commutative * associative function used to combine messages destined to the * same vertex . * On the first iteration all vertices receive the ' initialmsg ' and * in subsequent iterations if a vertex does not r Eceive a message * Then the Vertex-program are not invoked. * * This function iterates until there is no remaining messages, or * for ' maxiterations ' iterations. * * @tparam VD the vertex data type * @tparam ED the Edge data type * @tparam A the Pregel message type * * @paramgraph the input graph. * * @paramINITIALMSG the message each vertex would receive at the first iteration * *@parammaxiterations The maximum number of iterations to run for * *@paramactivedirection The direction of edges incident to a vertex this received a message in * The previous round on whic H to run ' sendmsg '. For example, if this is ' edgedirection.out ', only * out-edges of vertices that received a message in the previous round would run. The default is * ' Edgedirection.either ', which would run ' sendmsg ' on edges where either side received a message * in T He previous round. If this is ' edgedirection.both ', ' sendmsg ' would only run on edges where * *both* vertices received a message. * * @paramVprog the user-defined vertex program which runs on each * vertex and receives the inbound message and computes a n EW vertex * value. On the first iteration the vertex program was invoked on * all vertices and is passed the default message. On subsequent * iterations the vertex program was only invoked on those vertices * that receive messages. * * @paramsendmsg A user supplied function that's applied to out * edges of vertices, received messages in the current * Iteration * *@parammergemsg A user supplied function that takes the incoming * messages of type A and merges them into a single messag E of type * A. ' This function must is commutative and associative and * Ideally the size of A should not increase. ' * * @returnThe resulting graph at the end of the computation **/def Apply[vd:classtag, Ed:classtag, A:classtag] (GRAPH:GRAPH[VD, ED), Initialmsg:a, maxiterations : Int=Int.maxvalue, Activedirection:edgedirection=edgedirection.either) (Vprog: (Vertexid, VD, A)=VD, SENDMSG:EDGETRIPLET[VD, ED]=iterator[(Vertexid, a)], mergemsg: (A, a)=A): GRAPH[VD, ED]={var g= Graph.mapvertices (vid, vdata) = =Vprog (vid, Vdata, initialmsg)). Cache ()//Compute the Messagesvar messages =g.mapreducetriplets (sendmsg, mergemsg) var activemessages=Messages.count ()//Loopvar prevg:graph[vd, ED] =NULLvar i= 0 while(Activemessages > 0 && i <maxiterations) { //Receive the messages. Vertices that didn ' t get any messages does not appear in Newverts.Val Newverts =G.vertices.innerjoin (Messages) (Vprog). Cache ()//Update the graph with the new vertices.PREVG =G G= G.outerjoinvertices (newverts) {(VID, old, newopt) =newopt.getorelse (Old)} G.cache () Val oldmessages=Messages//Send new messages. Vertices that didn ' t get any messages don ' t appear in Newverts, so don ' t//get to send messages. We must cache messages so it can is materialized on the next line,//allowing us to uncache the previous iteration.Messages =g.mapreducetriplets (sendmsg, Mergemsg, Some ((Newverts, Activedirection)). Cache ()//The call to count () materializes ' messages ', ' newverts ', and the vertices of ' g '. this//hides oldmessages (depended on by Newverts), Newverts (depended on by messages), and the//vertices of PREVG (depended on by Newverts, Oldmessages, and the vertices of g).Activemessages =Messages.count () loginfo ("Pregel finished iteration" +i)//unpersist the RDDs hidden by newly-materialized RDDsOldmessages.unpersist (blocking=false) newverts.unpersist (blocking=false) prevg.unpersistvertices (blocking=false) prevG.edges.unpersist (blocking=false) //Count the Iterationi + = 1} g}//End of Apply} //end of class Pregel
GRAPHX Pregel (BSP model-message passing mechanism) learning