This lesson demonstrates the most important of the two operators in the RDD, join and Cogroup through code combat
Join operator Code Combat:
Demonstrating join operators through code
Val conf = new sparkconf (). Setappname ("Rdddemo"). Setmaster ("local")
Val sc = new Sparkcontext (conf)
Val arr1 = Array (Tuple2 (1, "Spark"), Tuple2 (2, "Hadoop"), Tuple2 (3, "Tachyon"))
Val arr2 = Array (Tuple2 (1, 3), Tuple2 (2, 90), Tuple2
Val rdd1 = sc.parallelize (arr1)
Val rdd2 = sc.parallelize (ARR2)
Val rdd3 = Rdd1.join (RDD2)
Rdd3.collect (). foreach (println)
Operation Result:
(1, (spark,100))
(3, (tachyon,90))
(2, (hadoop,70))
Cogroup Operator Code Combat:
First written in Java:
sparkconf conf = new sparkconf (). Setmaster ("local"). Setappname ("Cogroup");
Javasparkcontext sc = new Javasparkcontext (conf);
List<tuple2<integer, string>> namelist = arrays.aslist (New Tuple2<integer, String> (1, "Spark"),
New Tuple2<integer, string> (2, "Tachyon"), New Tuple2<integer, string> (3, "Hadoop"));
List<tuple2<integer, integer>> scorelist = arrays.aslist (New Tuple2<integer, Integer> (1, 100),
New Tuple2<integer, integer> (2,), New Tuple2<integer, integer> (3, 80),
New Tuple2<integer, integer> (1, a), new Tuple2<integer, integer> (2, 110),
New Tuple2<integer, Integer> (2, 90));
Javapairrdd<integer, string> names = Sc.parallelizepairs (NameList);
Javapairrdd<integer, integer> scores = Sc.parallelizepairs (Scorelist);
Javapairrdd<integer, Tuple2<iterable<string>, iterable<integer>>> nameAndScores = Names.cogroup (scores);
Nameandscores.foreach (new Voidfunction<tuple2<integer, Tuple2<iterable<string>, Iterable< Integer>>>> () {
public void Call (Tuple2<integer, tuple2<iterable<string>, iterable<integer>>> T) throws Exception {
System.out.println ("ID:" + t._1);
System.out.println ("Name:" + t._2._1);
System.out.println ("Score:" + t._2._2);
}
});
Sc.close ();
Operation Result:
Id:1
Name:[spark]
SCORE:[100, 80]
Id:3
Name:[hadoop]
SCORE:[80]
Id:2
Name:[tachyon]
Score:[95, 110, 90]
Through Scala's way:
Val conf = new sparkconf (). Setappname ("Rdddemo"). Setmaster ("local")
Val sc = new Sparkcontext (conf)
Val arr1 = Array (Tuple2 (1, "Spark"), Tuple2 (2, "Hadoop"), Tuple2 (3, "Tachyon"))
Val arr2 = Array (Tuple2 (1), Tuple2 (2, 2), Tuple2 (3, Max), Tuple2 (1,), Tuple2 (1,), Tuple2 (110,))
Val rdd1 = sc.parallelize (arr1)
Val rdd2 = sc.parallelize (ARR2)
Val rdd3 = rdd1.Cogroup(RDD2)
Rdd3.collect (). foreach (println)
Sc.stop ()
Operation Result:
(1, (Compactbuffer (Spark), Compactbuffer (100, 95, 110)))
(3, (Compactbuffer (Tachyon), Compactbuffer (90)))
(2, (Compactbuffer (Hadoop), Compactbuffer (70, 65)))
Note:
Data from: Dt_ Big Data Dream Factory (spark release version customization)
For more private content, please follow the public number: Dt_spark
If you are interested in big data spark, you can listen to it free of charge by Liaoliang teacher every night at 20:00 Spark Permanent free public class, address yy room Number: 68917580
This article from "Dt_spark Big Data Dream Factory" blog, reproduced please contact the author!
17th Lesson: Rdd Cases (join, cogroup, etc.)