8 Basic action action for Spark first, take, collect, count, Countbyvalue, reduce, aggregate, fold,top

Source: Internet
Author: User

Reprinted from: https://blog.csdn.net/t1dmzks/article/details/70667011

First

Returns the first element
Scala

scala> val rdd = sc.parallelize (List (1,2,3,3))

scala> Rdd.first ()
res1:int = 1

Java

    Javardd<integer> Rdd = Sc.parallelize (Arrays.aslist (1, 2, 3, 3));
    Integer first = Rdd.first ();
Take

Rdd.take (n) returns the nth element
Scala

scala> val rdd = sc.parallelize (List (1,2,3,3))

scala> Rdd.take (2)
res3:array[int] = Array (1, 2)

Java

    Javardd<integer> Rdd = Sc.parallelize (Arrays.aslist (1, 2, 3, 3));
    List<integer> take = Rdd.take (2);
Collect

Rdd.collect () returns all elements in the RDD
Scala

scala> val rdd = sc.parallelize (List (1,2,3,3))

scala> rdd.collect ()
res4:array[int] = Array (1, 2, 3, 3)

Java

    Javardd<integer> Rdd = Sc.parallelize (Arrays.aslist (1, 2, 3, 3));
    List<integer> collect = Rdd.collect ();
Count

Rdd.count () returns the number of elements in the RDD
Scala

scala> val rdd = sc.parallelize (List (1,2,3,3))

scala> rdd.count ()
Res5:long = 4

Java

    Javardd<integer> Rdd = Sc.parallelize (Arrays.aslist (1, 2, 3, 3));
    Long Count = Rdd.count ();
Countbyvalue

The number of occurrences of each element in the RDD returns {(Key1, number of times), (Key2, number of times),... (Keyn, number of times)}
Scala

scala> val rdd = sc.parallelize (List (1,2,3,3))

scala> rdd.countbyvalue ()
res6:scala.collection.map[ Int,long] = Map (1, 1, 2, 1, 3, 2)

Java

    Javardd<integer> Rdd = Sc.parallelize (Arrays.aslist (1, 2, 3, 3));
    Map<integer, long> integerlongmap = Rdd.countbyvalue ();
Reduce

Rdd.reduce (func)
Consolidates all data in the RDD in parallel, similar to the set of reduce in Scala
Scala

scala> val rdd = sc.parallelize (List (1,2,3,3))

scala> Rdd.reduce ((x, y) =>x+y)
Res7:int = 9

Java

    Integer reduce = rdd.reduce (new Function2<integer, Integer, integer> () {
        @Override public
        Integer Call ( Integer integer, Integer integer2) throws Exception {
            return integer + integer2;
        }
    });
Aggregate

Similar to reduce (), but usually
Returning functions of different types generally do not use this function

Scala

scala> val rdd = sc.parallelize (List (1,2,3,3))
TODO

Java


Fold

Rdd.fold (num) (func) generally does not use this function
As with reduce (), but provides the initial value num, each element is calculated by first folding the initial value, note that this will be fold per partition, and then again between the partitions fold
Provide an initial value
Scala

Explanation TODO 
scala> val rdd = sc.parallelize (List (1,2,3,3), 2)

scala> rdd.fold (1) ((x, y) =>x+y)
Res8: Int = 12

Java

    Javardd<integer> Rdd = Sc.parallelize (Arrays.aslist (1, 2, 3, 3), 2);
    Integer fold = rdd.fold (1, New Function2<integer, Integer, integer> () {
        @Override public
        Integer Call ( Integer integer, Integer integer2) throws Exception {
            return integer + integer2;
        }
    });
    System.out.println (fold);
-------Output-----
12
Top

Rdd.top (N)
Returns the first n elements in descending or specified collation
Scala

scala> val rdd = sc.parallelize (List (1,2,3,3))

scala> rdd.top (2)
res9:array[int] = Array (3, 3)

Java

    Javardd<integer> Rdd = Sc.parallelize (Arrays.aslist (1, 2, 3, 3), 2);
    list<integer> top = rdd.top (2);
takeordered

Rdd.take (N)
Sort the RDD elements in ascending order, remove the first n elements and return them, or customize the comparer (not described here), similar to the opposite method of top
Scala

scala> val rdd = sc.parallelize (List (1,2,3,3))

scala> rdd.takeordered (2)
res10:array[int] = Array (1, 2)

Java

    Javardd<integer> Rdd = Sc.parallelize (Arrays.aslist (1, 2, 3, 3), 2);
    list<integer> integers = rdd.takeordered (2);
foreach

For each element in the RDD, use the
The fixed function
Scala

    Val Rdd = sc.parallelize (List (1,2,3,3))
    Rdd.foreach (Print (_))
-----output-----------
1233

Java

    Rdd.foreach (New voidfunction<integer> () {
       @Override public
       void Call (Integer integer) throws Exception {
           System.out.println (integer);
       }
    });

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.