Various action operator operations in Spark (Java edition)

Last Update:2016-05-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In my opinion, the action operator in spark programming acts like a trigger to trigger the previous transformation operator. The transformation operation has lazy loading, and you do not load it immediately after you have defined it, and all of the preceding transformation operators are executed only when an action operator executes. The usual action operators are listed in the following code: (Java Edition)
Package Cn.spark.study.core;

Import Java.util.Arrays;
Import java.util.List;
Import Java.util.Map;

Import org.apache.spark.SparkConf;
Import Org.apache.spark.api.java.JavaPairRDD;
Import Org.apache.spark.api.java.JavaRDD;
Import Org.apache.spark.api.java.JavaSparkContext;
Import org.apache.spark.api.java.function.Function;
Import Org.apache.spark.api.java.function.Function2;

Import Scala. Tuple2;

/**
* Action Operation Combat
* @author DD
*
*/
public class Actionoperation {
public static void Main (string[] args) {
Reducetest ();
Collecttest ();
Counttest ();
Taketest ();
Countbykeytest ();
}

/** * Reduce operator * Case: additive and */private static void Reducetest () {sparkconf conf = new sparkconf (). Setap    PName ("reduce"). Setmaster ("local");    Javasparkcontext sc = new Javasparkcontext (conf);    list<integer> numberlist = arrays.aslist (1,2,3,4,5,6,7,8,9,10);    javardd<integer> Numbersrdd = sc.parallelize (numberlist);         Use the reduce operation to accumulate numbers in the collection int sum = numbersrdd.reduce (new Function2<integer, Integer, integer> () {@Override        Public integer Call (integer arg0, integer arg1) throws Exception {return arg0+arg1;    }    });    SYSTEM.OUT.PRINTLN (sum); Sc.close ();} /** * Collect operator * The data on the cluster can be pulled to a local traversal (deprecated) */private static void Collecttest () {sparkconf conf = new sparkconf (). s    Etappname ("collect"). Setmaster ("local");    Javasparkcontext sc = new Javasparkcontext (conf);    list<integer> numberlist = arrays.aslist (1,2,3,4,5,6,7,8,9,10); javardd<integer> Numbersrdd = Sc.parallElize (numberlist); javardd<integer> doublenumbers = Numbersrdd.map (New Function<integer, integer> () {@Override Pub        Lic integer call (integer arg0) throws Exception {//TODO auto-generated method stub return arg0*2;    }    });    The action action of foreach is to traverse the elements in the RDD on a remote cluster, and the collect action is to pull the RDD//data on the distributed cluster locally, which is generally not recommended, because if the amount of data in the RDD is large, such as more than 10,000, then performance will Poor, because to go from remote to a large number of network transmission, to get the data locally, sometimes there may be an oom exception, memory overflow//So it is recommended to use the foreach operation to process the final Rdd list<integer> doublenumlist    = Doublenumbers.collect ();    for (Integer num:doublenumlist) {System.out.println (num); } sc.close ();} /** * Count operator * Can count the number of elements in the Rdd */private static void Counttest () {sparkconf conf = new sparkconf (). Setappname ("Count    "). Setmaster (" local ");    Javasparkcontext sc = new Javasparkcontext (conf);    list<integer> numberlist = arrays.aslist (1,2,3,4,5,6,7,8,9,10);    javardd<integer> Numbersrdd = sc.parallelize (numberlist); Use the count operation for RddCount the number of elements in the rdd long = Numbersrdd.count ();    System.out.println (count); Sc.close ();} /** * Take operator * pulls the first n data of the remote RDD to local */private static void Taketest () {sparkconf conf = new sparkconf (). Setappname ("Tak    E "). Setmaster (" local ");    Javasparkcontext sc = new Javasparkcontext (conf);    list<integer> numberlist = arrays.aslist (1,2,3,4,5,6,7,8,9,10);    javardd<integer> Numbersrdd = sc.parallelize (numberlist); The take operation is similar to the collect operation, and it also obtains the RDD data from the remote cluster, but the collect operation obtains the RDD//all data, taking only the first n data list<integer> top3number = Nu    Mbersrdd.take (3);    for (Integer num:top3number) {System.out.println (num); } sc.close ();} /** * saveastextfile operator * */private static void Saveastextfiletest () {sparkconf conf = new sparkconf (). Setappname ("    Saveastextfile ");    Javasparkcontext sc = new Javasparkcontext (conf);    list<integer> numberlist = arrays.aslist (1,2,3,4,5,6,7,8,9,10);    javardd<integer> Numbersrdd = sc.parallelize (numberlist); Javardd<integer> doublenumbers = Numbersrdd.map (New Function<integer, integer> () {@Override Publ        IC integer call (integer arg0) throws Exception {//TODO auto-generated method stub return arg0*2;    }    });    The saveastextfile operator can store the data in the RDD directly in HDFs//, but we can only specify the saved folder, which is the directory, in fact, it will be saved as the///double_number.txt/part-00000 file in the directory.    Doublenumbers.saveastextfile ("Hdfs://spark1:9000/double_number.txt"); Sc.close ();} /** * Countbykey operator */private static void Countbykeytest () {sparkconf conf = new sparkconf (). Setappname ("Take").    Setmaster ("local");    Javasparkcontext sc = new Javasparkcontext (conf); list<tuple2<string, string>> studentslist = arrays.aslist (New tuple2<string, String> ("Class1 "," Leo "), New tuple2<string, string> (" Class2 "," Jack "), New tuple2<string, string> (" Class 1 "," Marry "), new tuple2<string, string> (" Class2 "," Tom "), NEW tuple2<string, String> ("Class2", "David"));    javapairrdd<string, string> Studentsrdd = Sc.parallelizepairs (studentslist); The Countbykey operator can count the number of each key corresponding element//countbykey the returned type is directly map<string,object> map<string, object>    Studentscounts = Studentsrdd.countbykey (); For (map.entry<string, object> studentsCount:studentsCounts.entrySet ()) {System.out.println (studentscount.ge    TKey () + ":" +studentscount.getvalue ()); } sc.close ();}

}

Various action operator operations in Spark (Java edition)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Various action operator operations in Spark (Java edition)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Various action operator operations in Spark (Java edition)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support