The Foldbykey function is pairrdd<k, v> to V to do the merging, the method is this
As you can see, the first parameter is Zerovalue, which is used to perform a merge operation on the original V, and the latter argument is a jfunction operation.
For a pairrdd, such as Array ("a", 0), ("A", 2), ("B", 1), ("B", 2), ("C", 1))
When performing Foldbykey (2), and function is x+y operation, the operation process is this, first 2 to add the key "a" of the first element of the value, changed to ("a", 2), and then take the result of the initialization to execute "a" and subsequent elements, the result is (" A ", 4). The result for key "B" is ("B", 5)
Look at the code:
Import Org.apache.spark.api.java.JavaPairRDD;
Import Org.apache.spark.api.java.JavaSparkContext;
Import Org.apache.spark.api.java.function.Function2;
Import org.apache.spark.sql.SparkSession; Import Scala.
Tuple2;
Import java.util.ArrayList;
Import java.util.List;
Import Java.util.Map;
/** * @author Wuweifeng wrote on 2018/4/18. */public class Test {public static void main (string[] args) {sparksession sparksession = Sparksession.build
ER (). AppName ("Javawordcount"). Master ("local"). Getorcreate ();
The reduce operation of Spark to the ordinary list javasparkcontext Javasparkcontext = new Javasparkcontext (Sparksession.sparkcontext ());
list<tuple2<string, integer>> data = new arraylist<> ();
Data.add (New tuple2<> ("A", 10));
Data.add (New tuple2<> ("A", 20));
Data.add (New tuple2<> ("B", 2));
Data.add (New tuple2<> ("B", 3));
Data.add (New tuple2<> ("C", 5)); Javapairrdd<string, Integer>
Originrdd = javasparkcontext.parallelizepairs (data); The initial value is 2, then the 2 will be a function with the first element, and the result will be combined with the next element map map = Originrdd.foldbykey (2, New Function2<integer, Integer
, integer> () {@Override public integer call (integer v1, Integer v2) throws Exception {
Return v1 * v2;
}}). Collectasmap ();
{a=400, c=10, b=12} System.out.println (map); }
}
Note that the Zerovalue only evaluates with the first value of the same key, not all value.