Reprinted from: https://blog.csdn.net/t1dmzks/article/details/70667011
First
Returns the first element
Scala
scala> val rdd = sc.parallelize (List (1,2,3,3))
scala> Rdd.first ()
res1:int = 1
Java
Javardd<integer> Rdd = Sc.parallelize (Arrays.aslist (1, 2, 3, 3));
Integer first = Rdd.first ();
Take
Rdd.take (n) returns the nth element
Scala
scala> val rdd = sc.parallelize (List (1,2,3,3))
scala> Rdd.take (2)
res3:array[int] = Array (1, 2)
Java
Javardd<integer> Rdd = Sc.parallelize (Arrays.aslist (1, 2, 3, 3));
List<integer> take = Rdd.take (2);
Collect
Rdd.collect () returns all elements in the RDD
Scala
scala> val rdd = sc.parallelize (List (1,2,3,3))
scala> rdd.collect ()
res4:array[int] = Array (1, 2, 3, 3)
Java
Javardd<integer> Rdd = Sc.parallelize (Arrays.aslist (1, 2, 3, 3));
List<integer> collect = Rdd.collect ();
Count
Rdd.count () returns the number of elements in the RDD
Scala
scala> val rdd = sc.parallelize (List (1,2,3,3))
scala> rdd.count ()
Res5:long = 4
Java
Javardd<integer> Rdd = Sc.parallelize (Arrays.aslist (1, 2, 3, 3));
Long Count = Rdd.count ();
Countbyvalue
The number of occurrences of each element in the RDD returns {(Key1, number of times), (Key2, number of times),... (Keyn, number of times)}
Scala
scala> val rdd = sc.parallelize (List (1,2,3,3))
scala> rdd.countbyvalue ()
res6:scala.collection.map[ Int,long] = Map (1, 1, 2, 1, 3, 2)
Java
Javardd<integer> Rdd = Sc.parallelize (Arrays.aslist (1, 2, 3, 3));
Map<integer, long> integerlongmap = Rdd.countbyvalue ();
Reduce
Rdd.reduce (func)
Consolidates all data in the RDD in parallel, similar to the set of reduce in Scala
Scala
scala> val rdd = sc.parallelize (List (1,2,3,3))
scala> Rdd.reduce ((x, y) =>x+y)
Res7:int = 9
Java
Integer reduce = rdd.reduce (new Function2<integer, Integer, integer> () {
@Override public
Integer Call ( Integer integer, Integer integer2) throws Exception {
return integer + integer2;
}
});
Aggregate
Similar to reduce (), but usually
Returning functions of different types generally do not use this function
Scala
scala> val rdd = sc.parallelize (List (1,2,3,3))
TODO
Java
Fold
Rdd.fold (num) (func) generally does not use this function
As with reduce (), but provides the initial value num, each element is calculated by first folding the initial value, note that this will be fold per partition, and then again between the partitions fold
Provide an initial value
Scala
Explanation TODO
scala> val rdd = sc.parallelize (List (1,2,3,3), 2)
scala> rdd.fold (1) ((x, y) =>x+y)
Res8: Int = 12
Java
Javardd<integer> Rdd = Sc.parallelize (Arrays.aslist (1, 2, 3, 3), 2);
Integer fold = rdd.fold (1, New Function2<integer, Integer, integer> () {
@Override public
Integer Call ( Integer integer, Integer integer2) throws Exception {
return integer + integer2;
}
});
System.out.println (fold);
-------Output-----
12
Top
Rdd.top (N)
Returns the first n elements in descending or specified collation
Scala
scala> val rdd = sc.parallelize (List (1,2,3,3))
scala> rdd.top (2)
res9:array[int] = Array (3, 3)
Java
Javardd<integer> Rdd = Sc.parallelize (Arrays.aslist (1, 2, 3, 3), 2);
list<integer> top = rdd.top (2);
takeordered
Rdd.take (N)
Sort the RDD elements in ascending order, remove the first n elements and return them, or customize the comparer (not described here), similar to the opposite method of top
Scala
scala> val rdd = sc.parallelize (List (1,2,3,3))
scala> rdd.takeordered (2)
res10:array[int] = Array (1, 2)
Java
Javardd<integer> Rdd = Sc.parallelize (Arrays.aslist (1, 2, 3, 3), 2);
list<integer> integers = rdd.takeordered (2);
foreach
For each element in the RDD, use the
The fixed function
Scala
Val Rdd = sc.parallelize (List (1,2,3,3))
Rdd.foreach (Print (_))
-----output-----------
1233
Java
Rdd.foreach (New voidfunction<integer> () {
@Override public
void Call (Integer integer) throws Exception {
System.out.println (integer);
}
});