The spark operator can be broadly divided into three broad classes of operators:
1, the Value data type of the transformation operator, this transformation does not trigger the submission of the job, the data item processed is the value type of data.
2, Key-value data type of the transformation operator, this transformation does not trigger the submission of the job, the data item for processing is the Key-value type of data.
3, action operator, this kind of operator will trigger Sparkcontext submit job. First, Value type transformation operator
1) Map
Val A = Sc.parallelize (List ("Dog", "salmon", "salmon", "rat", "Elephant"), 3)
val b = A.map (_.length)
val C = A.zi P (b)
c.collect
res0:array[(String, Int)] = Array ((dog,3), (salmon,6), (salmon,6), (rat,3), (elephant,8))
2) FlatMap
Val A = sc.parallelize (1 to ten, 5)
A.flatmap (1 to _). Collect
Res47:array[int] = Array (1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, ten)
sc.parallelize (List (1, 2, 3), 2). FLATMAP (x = List (x, x, X)). Collect
Res85:array[int] = Array (1, 1, 1, 2, 2, 2, 3, 3, 3)
3) Mappartiions
val x = sc.parallelize (1 to 3)
X.flatmap (List.fill (Scala.util.Random.nextInt) (_)). Collect
Res1: Array[int] = Array (1, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 8, 8, 8, 8, 8 , 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10)
4) Glom (form an array)
Val A = sc.parallelize (1 to 3)
A.glom.collect
Res8:array[array[int]] = Array (Array (1, 2, 3, 4, 5, 6, 7, 8, 9 , ten, one, A, a, 34, 35, 36, 37, 3, and A. (+), (+), (+), (+), (+), (+),. 8,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,. 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94 , 98, 99, 100))
5) Union
Val A = sc.parallelize (1 to 3, 1)
val b = Sc.parallelize (5 to 7, 1)
(A + B). Collect
Res0:array[int] = Array ( 1, 2, 3, 5, 6, 7)
6) Cartesian (Descartes operation)
Val x = sc.parallelize (list (1,2,3,4,5))
val y = sc.parallelize (list (6,7,8,9,10))
X.cartesian (y). Collect
res0:array[(int, int)] = Array ((1,6), (1,7), (1,8), (1,9), (1,10), (2,6), (2,7), (2,8), (2,9), (2,10), (3,6), (3,7), 3,8 ), (3,9), (3,10), (4,6), (5,6), (4,7), (5,7), (4,8), (5,8), (4,9), (4,10), (5,9), (5,10))
7) groupBy (generate the corresponding key, the same is put together)
Val A = sc.parallelize (1 to 9, 3)
a.groupby (x = = {if (x% 2 = = 0) "Even" Else "odd"}). Collect
res42:array[(S Tring, Seq[int])] = Array ((Even,arraybuffer (2, 4, 6, 8)), (Odd,arraybuffer (1, 3, 5, 7, 9)))
8) Filter
Val A = sc.parallelize (1 to ten, 3)
val B = A.filter (_% 2 = = 0)
b.collect
Res3:array[int] = Array (2, 4, 6, 8, 10)
9) Distinct (de-weight)
Val C = sc.parallelize (List ("GNU", "Cat", "Rat", "Dog", "GNU", "Rat"), 2)
c.distinct.collect
res6:array[string] = Array (Dog, Gnu, Cat, Rat)
Subtract (remove items with duplicates)
Val A = sc.parallelize (1 to 9, 3)
val b = sc.parallelize (1 to 3, 3)
val c = a.subtract (b)
c.collect
Res3: Array[int] = Array (6, 9, 4, 7, 5, 8)
One) sample
Val A = sc.parallelize (1 to 10000, 3)
A.sample (False, 0.1, 0). Count
Res24:long = 960
Takesample)
Val x = sc.parallelize (1 to 3)
X.takesample (True, 1)
res3:array[int] = Array (339, 718, 810, 105, 71, 2 68, 333, 360, 341, 300, 68, 848, 431, 449, 773, 172, 802, 339, 431, 285, 937, 301, 167, 69, 330, 864, 40, 645, 65, 349, 61 3, 468, 982, 314, 160, 675, 232, 794, 577, 571, 805, 317, 136, 860, 522, 45, 628, 178, 321, 482, 657, 114, 332, 728, 901, 290, 175, 876, 227, 130, 863, 773, 559, 301, 694, 460, 839, 952, 664, 851, 260, 729, 823, 880, 792, 964, 614, 821, 683, 36 4, 80, 875, 813, 951, 663, 344, 546, 918, 436, 451, 397, 670, 756, 512, 391, 70, 213, 896, 123, 858)
Cache, persist
Val C = sc.parallelize (List ("GNU", "Cat", "Rat", "Dog", "GNU", "Rat"), 2)
c.getstoragelevel
Res0: Org.apache.spark.storage.StorageLevel = Storagelevel (False, False, False, False, 1)
C.cache
C.getstoragelevel
Res2:org.apache.spark.storage.StorageLevel = Storagelevel (False, True, False, true, 1)
second, Key-value type transformation operator
1) mapvalues
Val A = Sc.parallelize (List ("Dog", "Tiger", "Lion", "cat", "Panther", "Eagle"), 2)
val b = a.map (x = (x.length, x) )
b.mapvalues ("x" + _ + "X"). Collect
res5:array[(Int, String)] = Array (3,XDOGX), (5,xtigerx), (4,xlionx), (3,XC ATX), (7,xpantherx), (5,xeaglex))
2) Combinebykey
Val A = Sc.parallelize (List ("Dog", "cat", "GNU", "Salmon", "rabbit", "Turkey", "Wolf", "Bear", "Bee"), 3)
val B = Sc.parallelize (List (1,1,2,2<