The join,rightouterjoin of the basic RDD operator for Spark programming, Leftouterjoin1) Join
def Join[w] (other:rdd[(k, W)]): rdd[(k, (V, W))]
def Join[w] (other:rdd[(k, W)], Numpartitions:int): rdd[(k, (V, W)) ]
def Join[w] (other:rdd[(k, W)], Partitioner:partitioner): rdd[(k, (V, W))]
Make an inner connection to the value of the RDD for the type of the key value, as key. The value type returned is also an RDD of the key-value pair type. Just a key, which corresponds to multiple value values for different rdd.
Val A = Sc.parallelize (List ("Dog", "salmon", "salmon", "rat", "Elephant"), 3)
val b = a.keyby (_.length)//depending on the length of the string it is changed Change to the corresponding Ganso value.
val C = sc.parallelize (List ("Dog", "cat", "GNU", "Salmon", "rabbit", "Turkey", "Wolf", "Bear", "Bee"), 3)
val d = C.keyby (_.length)
B.join (d). Collect//Make an inner connection.
res0:array[(Int, (String, string))] = Array ((6, (Salmon,salmon)), (6, (Salmon,rabbit)), (6, (Salmon,turkey)), (6, ( Salmon,salmon)), (6, (Salmon,rabbit)), (6, (Salmon,turkey)), (3, (Dog,dog)), (3, (Dog,cat)), (3, (DOG,GNU)), (3, Dog,bee) ), (3, (Rat,dog)), (3, (Rat,cat)), (3, (RAT,GNU)), (3, (Rat,bee)))
2) Leftouterjoin
def Leftouterjoin[w] (other:rdd[(k, W)]): rdd[(k, (V, Option[w])]
def Leftouterjoin[w] (other:rdd[(k, W)], Numpartitions:int): rdd[(k, (V, Option[w])]
def Leftouterjoin[w] (other:rdd[(k, W)], Partitioner:partitioner): Rdd[(K, (V, option[w]))
An outer join is made according to two Rdd, and a value not on the right returns a none. A some is returned if there is a value on the right.
Val A = Sc.parallelize (List ("Dog", "salmon", "salmon", "rat", "Elephant"), 3)
val b = a.keyby (_.length)
val C = SC . Parallelize (List ("Dog", "cat", "GNU", "Salmon", "rabbit", "Turkey", "Wolf", "Bear", "Bee"), 3)
val d = c.keyby (_. Length)
b.leftouterjoin (d). Collect
res1:array[(Int, (String, option[string])] = Array (6, (Salmon,some ( Salmon)), (6, (Salmon,some (Rabbit))), (6, (Salmon,some (Turkey))), (6, (Salmon,some (salmon))), (6, (salmon,some )), (6, (Salmon,some (Turkey))), (3, (Dog,some (dog))), (3, (Dog,some (CAT))), (3, (Dog,some (GNU))), (3, (Dog,some (Bee))), (3, (Rat,some (dog))), (3, (Rat,some (CAT))), (3, (Rat,some (GNU))), (3, (Rat,some (Bee))), (8, (Elephant,none)))// When this place has no value, remember it as none.
3) Rightouterjoin
def Rightouterjoin[w] (other:rdd[(k, W)]): rdd[(K, (Option[v], W))]
def Rightouterjoin[w] (other:rdd[(k, W)], Numpartitions:int): rdd[(K, (Option[v], W))]
def Rightouterjoin[w] (other:rdd[(k, W)], Partitioner:partitioner): Rdd[(K, (Option[v], W))]
Make a right outer link to the two rdd. The value type returned is the option type. If there is a value on the left, the some is none.
val a = Sc.parallelize (List ("Dog", "salmon", "salmon", "rat", "Elephant"), 3) Val B = A.keyby (_.L Ength) Val C = sc.parallelize (List ("Dog", "cat", "GNU", "Salmon", "rabbit", "Turkey", "Wolf", "Bear", "Bee"), 3) Val d = C.keyby (_.length) b.rightouterjoin (d). Collect res2:array[(Int, (option[string], String))] = Array (6, (Some (Salmon), Salmon)), (6, (Some (salmon), rabbit)), (6, (Some (salmon), Turkey)), (6, (Some (salmon), salmon)), (6, (Some), salmon ), (6, (Some (salmon), Turkey)), (3, (Some (dog), dog)), (3, (Some (dog), cat)), (3, (Some (dog), GNU)), (3, (Some (dog), Bee)), ( 3, (Some (rat), dog)), (3, (Some (rat), cat)), (3, (Some (rat), GNU)), (3, (Some (rat), Bee)), (4, (None,wolf)), (4, (none,bear)) )