7 Spark Entry Key-value pair operation Subtractbykey, join, Rightouterjoin, Leftouterjoin

Source: Internet
Author: User
Tags arrays function definition join

Transfer from: https://blog.csdn.net/t1dmzks/article/details/70557249

Subtractbykey

function definition

def Subtractbykey[w] (other:rdd[(k, W)]) (implicit arg0:classtag[w]): rdd[(k, V)]

def Subtractbykey[w] (other:rdd[( K, W)], Numpartitions:int) (implicit arg0:classtag[w]): rdd[(k, V)]

def Subtractbykey[w] (other:rdd[(K, W)], P:parti Tioner) (implicit arg0:classtag[w]): rdd[(K, V)]

Similar to Subtrac, delete the element join with the same key in the RDD as the key in the other RDD

function definition

def Join[w] (other:rdd[(k, W)]): rdd[(k, (V, W))]

def Join[w] (other:rdd[(k, W)], Numpartitions:int): rdd[(k, (V, W)) ]

def Join[w] (other:rdd[(k, W)], Partitioner:partitioner): rdd[(k, (V, W))]
Rdd1.join (RDD2)

You can connect the same key in the RDD1,RDD2, similar to the join operation in SQL Leftouterjoin

def Leftouterjoin[w] (other:rdd[(k, W)]): rdd[(k, (V, Option[w])]

def Leftouterjoin[w] (other:rdd[(k, W)], Numpartitions:int): rdd[(k, (V, Option[w])]

def Leftouterjoin[w] (other:rdd[(k, W)], Partitioner:partitioner): Rdd[(K, (V, option[w]))

Look directly at the picture
Connect to two RDD, similar to a left outer join in SQL Rightouterjoin

The two RDD connection operation, similar to the right outer connection in SQL, exists, the value of the some, does not exist with none, specifically see the above diagram and the following code can be code example

Scala language

    scala> val rdd = Sc.makerdd (Array (), (3,4), (3,6))
    scala> val other = Sc.makerdd (Array ((3,9)))

    Scala >  Rdd.subtractbykey (Other). Collect ()
    res0:array[(int, int.)] = Array (())

    scala> Rdd.join (Other ). Collect ()
    res1:array[(int.) (int, int))] = Array ((3, (4,9)), (3, (6,9)))

    scala> Rdd.leftouterjoin (Other). Collect ()
    res2:array[(int, option[int])] = Array ((1, (2,none)), (3, (4,some (9))), (3, (6,some (9)

    ))) Scala> Rdd.rightouterjoin (Other). Collect ()
    res3:array[(int, (option[int], int))] = Array ((3, (Some (4), 9)), (3 , (Some (6), 9)))

Java language

 javardd<tuple2<integer,integer>> Rddpre = Sc.parallelize (Arrays.asList (new
    Tuple2, New Tuple2 (3,4), new Tuple2 (3,6));

    javardd<tuple2<integer,integer>> Otherpre = sc.parallelize (arrays.aslist (New Tuple2 (3,10)));
    Javardd converted to Javapairrdd javapairrdd<integer, integer> rdd = Javapairrdd.fromjavardd (Rddpre);
    Javapairrdd<integer, integer> other = Javapairrdd.fromjavardd (Otherpre);

    Subtractbykey Javapairrdd<integer, integer> Subrdd = Rdd.subtractbykey (other);

    Join Javapairrdd<integer, tuple2<float, integer>> Joinrdd = Rdd.join (other); Leftouterjoin Javapairrdd<integer, Tuple2<integer, optional<integer>>>

    Integertuple2javapairrdd = Rdd.leftouterjoin (other); Rightoutjoin Javapairrdd<integer, Tuple2<optional<integer>, integer>> rightOutJoin = Rdd.rightouterjoin (other); 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.