spark2.x deep into the end series six of the RDD Java API detailed four

Source: Internet
Author: User

Before learning Spark any point of knowledge, make a correct understanding of spark, and you can refer to: Understanding Spark correctly

This article provides an explanation of the join-related APIs

Sparkconf conf = new sparkconf (). Setappname ("AppName"). Setmaster ("local"); Javasparkcontext sc = new javasparkcontext (conf); javapairrdd<integer, integer> javapairrdd =         sc.parallelizepairs (Arrays.aslist (new tuple2<> (1, 2),                 new Tuple2<> (3, 4),  new  Tuple2<> (3, 6), new tuple2<> (5, 6)); javapairrdd<integer, integer> otherjavapairrdd =         sc.parallelizepairs (Arrays.aslist (new tuple2<> (3, 9),                 new Tuple2<> (4, 5)); /Result:  [(4, ([],[5])),  (1, ([2],[])),  (3, ([4, 6],[9])),  (5, ([6],[]))]system.out.println ( Javapairrdd.Cogroup (Otherjavapairrdd). Collect ());//Results:  [(4, ([],[5])),  (1, ([2],[])),  (3, ([4, 6],[9])),   (5, ([6],[]))]// groupwith and cogroup effects are identical System.out.println (Javapairrdd.groupwith (OTHERJAVAPAIRRDD) . Collect ());//Results:  [(3, (4,9)),  (3, (6,9))]//is based on Cogroup, is to take the cogroup result in the same key in two RDD has the value of the Data System.out.println (Javapairrdd.join (Otherjavapairrdd). Collect ());//Results:  [(1, (2,optional.empty)),  (3, (4,optional[9)),  (3, (6,optional[9))),  (5, (6,optional.empty))]// Based on the Cogroup implementation, the result needs to appear the key with the left side of the Rdd System.out.println (Javapairrdd.leftouterjoin (Otherjavapairrdd). Collect ());// Results:  [(4, (optional.empty,5)),  (3, (optional[4],9)),  (3, (optional[6],9))]//based on cogroup implementation, The result is a key that needs to appear on the right side of the Rdd System.out.println (Javapairrdd.rightouterjoin (Otherjavapairrdd). Collect ());//Result:  [(4, ( OPTIONAL.EMPTY,OPTIONAL[5]),  (1, (Optional[2],optional.empty)),  (3, (optional[4],optional[9))),  (3, (optional[6],optional[9])),  (5, (optional[6],optional.empty))]//Based on the Cogroup implementation, the key that results need to appear is all the KeySystem.out.println (Javapairrdd.fullouterjoin (Otherjavapairrdd) in two rdd. Collect ()); 


From the above can be seen, the most basic operation is cogroup this operation, the following is the schematic diagram of Cougroup:

650) this.width=650; "Src=" https://s5.51cto.com/wyfs02/M02/A5/A7/wKioL1nBGv2wEZjgAB0QijUwttE091.png-wh_500x0-wm_ 3-wmp_4-s_3307346149.png "title=" Cogroup.png "alt=" Wkiol1nbgv2wezjgab0qijuwtte091.png-wh_50 "/>

If you want a more thorough understanding of the cogroup principle, you can refer to the following:Spark core RDD API Rationale

spark2.x deep into the end series six of the RDD Java API detailed four

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.