Map () is the function used for each element in the RDD, and the return value constitutes a new rdd.
Flatmap () is to apply a function to each element in the RDD, and to make all the contents of the returned iterator a new rdd, so that an RDD consists of elements from each list, rather than a list of rdd.
A bit of a mouthful, look at the examples to understand.
Val Rdd = sc.parallelize (List ("Coffee Panda", "Happy Panda", "Happiest Panda Party"))
Input
Rdd.map (x=>x). Collect
Results
Res9:array[string] = Array (coffee panda, Happy Panda, happiest Panda Party)
Input
Rdd.flatmap (X=>x.split ("")). Collect
Results
Res8:array[string] = Array (coffee, Panda, happy, Panda, happiest, Panda, Party)
Flatmap shows that White is the first map and then flat, and then look at an example
Val rdd1 = sc.parallelize (List (1,2,3,3))
Scala> Rdd1.map (x=>x+1). Collect
Res10:array[int] = Array (2, 3, 4, 4)
Scala> Rdd1.flatmap (X=>x.to (3)). Collect
Res11:array[int] = Array (1, 2, 3, 2, 3, 3, 3)
--------------------------------------------------------------------------------------------------------------- ------------
Version Donuts: FlatMap = flatten + map;
Deep Pit Version: It is a combination of the state-function and the natural transformation of a covariant functor in the self-functor category!
var li=list (1,2,3,4) var res =li.flatmap (x=> x Match {case 3 = List (3.1,3.2) Case _ =>list (x*2) }) println (res)
li= List (1,2,3,4) var res2 =li.map (x=> x Match {case 3 =>list (3.1,3.2) Case _ =>x*2}) println (Res2)/ /output=> list (2,4, 3.1,3.2, 8) List (2,4, List (3.1,3.2), 8) program exited.
This process is like a map first, and then the list of maps comes to the end (flatten).