Description
In Spark, the map function and the Flatmap function are two more commonly used functions. which
Map: operates on each element in the collection.
FLATMAP: operates on each element in the collection and then flattens it.
Understanding flattening can give a simple example
Val arr=sc.parallelize (Array ("A", 1), ("B", 2), ("C", 3))
Arr.flatmap (x=> (x._1+x._2)). foreach (println)
The output result is
A
1
B
2
C
3
If you use map
Val arr=sc.parallelize (Array ("A", 1), ("B", 2), ("C", 3))
Arr.map (x=> (x._1+x._2)). foreach (println)
Output results
A1
B2
C3
So the Flatmap flat is probably the first time you use a map to map all the data once again. Actual usage Scenarios
This scenario is one of the challenges I have encountered in writing code, how many occurrences of adjacent character pairs are counted in the string. It means that if there is a; B C;d; B C string, then (A, B), (C,d), (D,b) the adjacent character pair appears once, (B,c) appears two times.
If you have data
A B C;d; B;d; C
b;d; A E;d; C
A; B
The number of occurrences of adjacent character pairs appears as follows
Data.map (_.split (";")). FlatMap (x=>{
for (i<-0 until x.length-1) yield (x (i) + "," +x (i+1), 1)
}). Reducebykey (_+_). foreach ( println
The output result is
(a,e,1)
(e,d,1)
(d,a,1)
(c,d,1)
(b,c,1)
(b,d,2)
(d,c,2)
(d,b,1)
(a,b,2)
This example is the full use of the flatmap of the flat function.