17th Lesson: Rdd Cases (join, cogroup, etc.)

Source: Internet
Author: User
Tags iterable

This lesson demonstrates the most important of the two operators in the RDD, join and Cogroup through code combat


Join operator Code Combat:

Demonstrating join operators through code
Val conf = new sparkconf (). Setappname ("Rdddemo"). Setmaster ("local")
Val sc = new Sparkcontext (conf)
Val arr1 = Array (Tuple2 (1, "Spark"), Tuple2 (2, "Hadoop"), Tuple2 (3, "Tachyon"))
Val arr2 = Array (Tuple2 (1, 3), Tuple2 (2, 90), Tuple2
Val rdd1 = sc.parallelize (arr1)
Val rdd2 = sc.parallelize (ARR2)

Val rdd3 = Rdd1.join (RDD2)
Rdd3.collect (). foreach (println)


Operation Result:

(1, (spark,100))

(3, (tachyon,90))

(2, (hadoop,70))


Cogroup Operator Code Combat:

First written in Java:

sparkconf conf = new sparkconf (). Setmaster ("local"). Setappname ("Cogroup");

Javasparkcontext sc = new Javasparkcontext (conf);


List<tuple2<integer, string>> namelist = arrays.aslist (New Tuple2<integer, String> (1, "Spark"),

New Tuple2<integer, string> (2, "Tachyon"), New Tuple2<integer, string> (3, "Hadoop"));


List<tuple2<integer, integer>> scorelist = arrays.aslist (New Tuple2<integer, Integer> (1, 100),

New Tuple2<integer, integer> (2,), New Tuple2<integer, integer> (3, 80),

New Tuple2<integer, integer> (1, a), new Tuple2<integer, integer> (2, 110),

New Tuple2<integer, Integer> (2, 90));


Javapairrdd<integer, string> names = Sc.parallelizepairs (NameList);

Javapairrdd<integer, integer> scores = Sc.parallelizepairs (Scorelist);


Javapairrdd<integer, Tuple2<iterable<string>, iterable<integer>>> nameAndScores = Names.cogroup (scores);


Nameandscores.foreach (new Voidfunction<tuple2<integer, Tuple2<iterable<string>, Iterable< Integer>>>> () {


public void Call (Tuple2<integer, tuple2<iterable<string>, iterable<integer>>> T) throws Exception {


System.out.println ("ID:" + t._1);

System.out.println ("Name:" + t._2._1);

System.out.println ("Score:" + t._2._2);

}

});


Sc.close ();


Operation Result:

Id:1

Name:[spark]

SCORE:[100, 80]

Id:3

Name:[hadoop]

SCORE:[80]

Id:2

Name:[tachyon]

Score:[95, 110, 90]


Through Scala's way:

Val conf = new sparkconf (). Setappname ("Rdddemo"). Setmaster ("local")
Val sc = new Sparkcontext (conf)
Val arr1 = Array (Tuple2 (1, "Spark"), Tuple2 (2, "Hadoop"), Tuple2 (3, "Tachyon"))
Val arr2 = Array (Tuple2 (1), Tuple2 (2, 2), Tuple2 (3, Max), Tuple2 (1,), Tuple2 (1,), Tuple2 (110,))
Val rdd1 = sc.parallelize (arr1)
Val rdd2 = sc.parallelize (ARR2)

Val rdd3 = rdd1.Cogroup(RDD2)
Rdd3.collect (). foreach (println)
Sc.stop ()


Operation Result:

(1, (Compactbuffer (Spark), Compactbuffer (100, 95, 110)))

(3, (Compactbuffer (Tachyon), Compactbuffer (90)))

(2, (Compactbuffer (Hadoop), Compactbuffer (70, 65)))


Note:

Data from: Dt_ Big Data Dream Factory (spark release version customization)

For more private content, please follow the public number: Dt_spark

If you are interested in big data spark, you can listen to it free of charge by Liaoliang teacher every night at 20:00 Spark Permanent free public class, address yy room Number: 68917580


This article from "Dt_spark Big Data Dream Factory" blog, reproduced please contact the author!

17th Lesson: Rdd Cases (join, cogroup, etc.)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.