Transfer from: https://blog.csdn.net/t1dmzks/article/details/70557249
Subtractbykey
function definition
def Subtractbykey[w] (other:rdd[(k, W)]) (implicit arg0:classtag[w]): rdd[(k, V)]
def Subtractbykey[w] (other:rdd[( K, W)], Numpartitions:int) (implicit arg0:classtag[w]): rdd[(k, V)]
def Subtractbykey[w] (other:rdd[(K, W)], P:parti Tioner) (implicit arg0:classtag[w]): rdd[(K, V)]
Similar to Subtrac, delete the element join with the same key in the RDD as the key in the other RDD
function definition
def Join[w] (other:rdd[(k, W)]): rdd[(k, (V, W))]
def Join[w] (other:rdd[(k, W)], Numpartitions:int): rdd[(k, (V, W)) ]
def Join[w] (other:rdd[(k, W)], Partitioner:partitioner): rdd[(k, (V, W))]
Rdd1.join (RDD2)
You can connect the same key in the RDD1,RDD2, similar to the join operation in SQL Leftouterjoin
def Leftouterjoin[w] (other:rdd[(k, W)]): rdd[(k, (V, Option[w])]
def Leftouterjoin[w] (other:rdd[(k, W)], Numpartitions:int): rdd[(k, (V, Option[w])]
def Leftouterjoin[w] (other:rdd[(k, W)], Partitioner:partitioner): Rdd[(K, (V, option[w]))
Look directly at the picture
Connect to two RDD, similar to a left outer join in SQL Rightouterjoin
The two RDD connection operation, similar to the right outer connection in SQL, exists, the value of the some, does not exist with none, specifically see the above diagram and the following code can be code example
Scala language
scala> val rdd = Sc.makerdd (Array (), (3,4), (3,6))
scala> val other = Sc.makerdd (Array ((3,9)))
Scala > Rdd.subtractbykey (Other). Collect ()
res0:array[(int, int.)] = Array (())
scala> Rdd.join (Other ). Collect ()
res1:array[(int.) (int, int))] = Array ((3, (4,9)), (3, (6,9)))
scala> Rdd.leftouterjoin (Other). Collect ()
res2:array[(int, option[int])] = Array ((1, (2,none)), (3, (4,some (9))), (3, (6,some (9)
))) Scala> Rdd.rightouterjoin (Other). Collect ()
res3:array[(int, (option[int], int))] = Array ((3, (Some (4), 9)), (3 , (Some (6), 9)))
Java language
javardd<tuple2<integer,integer>> Rddpre = Sc.parallelize (Arrays.asList (new
Tuple2, New Tuple2 (3,4), new Tuple2 (3,6));
javardd<tuple2<integer,integer>> Otherpre = sc.parallelize (arrays.aslist (New Tuple2 (3,10)));
Javardd converted to Javapairrdd javapairrdd<integer, integer> rdd = Javapairrdd.fromjavardd (Rddpre);
Javapairrdd<integer, integer> other = Javapairrdd.fromjavardd (Otherpre);
Subtractbykey Javapairrdd<integer, integer> Subrdd = Rdd.subtractbykey (other);
Join Javapairrdd<integer, tuple2<float, integer>> Joinrdd = Rdd.join (other); Leftouterjoin Javapairrdd<integer, Tuple2<integer, optional<integer>>>
Integertuple2javapairrdd = Rdd.leftouterjoin (other); Rightoutjoin Javapairrdd<integer, Tuple2<optional<integer>, integer>> rightOutJoin = Rdd.rightouterjoin (other);