Map (function)
Map is the execution of a specified function on each element of the RDD to produce a new rdd. Any element in the original RDD is in the new RDD and has only one element corresponding to it.
Example:
Val A = sc.parallelize (1 to 9, 3)
val b = a.map (x = x*2)//x = = X*2 is a function, X is an incoming parameter, each element of the RDD, X*2 is the return value
a.collect
//results Array[int] = Array (1, 2, 3, 4, 5, 6, 7, 8, 9)
B.collect
//result Array[int] = Array (2, 4, 6, 8, 10, 12, 14, 16, 18)
Of course, map can also turn key into Key-value.
Val A = Sc.parallelize (List ("Dog", "Tiger", "Lion", "cat", "Panther", "Eagle"), 2)
val b = a.map (x = (x, 1))
B.collect.foreach (println (_))/
*
(dog,1)
(tiger,1)
(lion,1
) (cat,1) (panther,1)
(eagle,1)
*/
Mappartitions (function)
The input function of map () is applied to each element in the RDD, and the input function of mappartitions () is applied to each partition
Package test
Import Scala. Iterator
Import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
object Testrdd {
def sumofeverypartition (Input:iterator[int]): Int = {
var total = 0
Input.foreach {elem
+ total + = elem< c9/>}
Total
}
def main (args:array[string]) {
val conf = new sparkconf (). Setappname ("Spark Rdd Test
val spark = new Sparkcontext (conf)
val input = Spark.parallelize (List (1, 2, 3, 4, 5, 6), 2)//rdd has 6 elements, divided into 2 par Tition
val result = input.mappartitions (
partition = Iterator (Sumofeverypartition (partition)))// Partition is an incoming parameter, is a list, and requires that the return is also a list, that is, iterator (Sumofeverypartition (partition))
result.collect (). foreach {
println (_)//6
Spark.stop ()
}
}
Mapvalues (function)
The key in the original RDD remains unchanged, together with the new value to form the elements in the new Rdd. Therefore, the function applies only to the RDD for which the element is KV.
Val A = Sc.parallelize (List ("Dog", "Tiger", "Lion", "cat", "Panther", "Eagle"), 2)
val b = a.map (x = (x.length, x ))
b.mapvalues ("x" + _ + "X"). Collect
"X" + _ + "x" Equals everyinput = "x" + Everyinput + "X"
Results
Array (
(3,XDOGX),
(5,xtigerx),
(4,xlionx),
(3,XCATX),
(7,xpantherx),
(5,xeaglex)
)
Mapwith and Flatmapwith
Not much to feel, refer to http://blog.csdn.net/jewes/article/details/39896301
FlatMap (function)
Similar to map, the difference is that elements in the original RDD can only generate one element after map processing, and elements in the original RDD can be flatmap processed to generate multiple elements
Val A = sc.parallelize (1 to 4, 2)
val b = a.flatmap (x = 1 to x)//Each element expands
b.collect/
*
results Array[int] = Array (1,
1, 2,
1, 2, 3,
1, 2, 3, 4)
*/
Flatmapvalues (function)
val a = Sc.parallelize (List (x), (3,4), (5,6))) Val B = A.flatmapvalues (x=>1 to X) b.collect . foreach (println (_))/* (3,1) (3,2) (3,3) (3,4) (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) */