1 PackageCn.spark.study.core;2 3 Importjava.util.Arrays;4 5 Importorg.apache.spark.SparkConf;6 ImportOrg.apache.spark.api.java.JavaPairRDD;7 ImportOrg.apache.spark.api.java.JavaRDD;8 ImportOrg.apache.spark.api.java.JavaSparkContext;9 Importorg.apache.spark.api.java.function.FlatMapFunction;Ten ImportOrg.apache.spark.api.java.function.Function2; One Importorg.apache.spark.api.java.function.PairFunction; A Importorg.apache.spark.api.java.function.VoidFunction; - - ImportScala. Tuple2; the - /** - * Using Java to develop locally tested WordCount programs - * @authorAdministrator + * - */ + Public classwordcountlocal { A at Public Static voidMain (string[] args) { - //Writing spark Applications - //local execution, which can be performed in the Main method in Eclipse, executes the - - //First step: Create a Sparkconf object and set the configuration information for the Spark app - //use Setmaster () to set the URL of the master node of the spark cluster to which the spark application will connect in //However, if set to local, it will be run locally -sparkconf conf =Newsparkconf () to. Setappname ("Wordcountlocal") +. Setmaster ("local"); - the //Step Two: Create a Javasparkcontext object * //in Spark, Sparkcontext is a portal to all of Spark's features, whether you're writing in Java, Scala, or even Python $ //must have a sparkcontext, its main role, including some of the core components required to initialize the spark application, includingPanax Notoginseng //Scheduler (Dagschedule, TaskScheduler), also go to the Spark Master node to register, and so on - //a word, sparkcontext, is one of the most important objects in the Spark application . the //However, in spark, writing different types of spark applications, using the Sparkcontext is different, if using Scala, + //is using the native Sparkcontext object . A //But if you use Java, it's the Javasparkcontext object. the //If you are developing a spark SQL program, then it is SqlContext, Hivecontext + //If you are developing a spark streaming program, then it is unique Sparkcontext - //etc. $Javasparkcontext sc =Newjavasparkcontext (conf); $ - //Step Three: To create an initial rdd for the input source (hdfs file, local file, and so on) - //the data in the input source is scattered and allocated to each partition of the RDD, resulting in an initial distributed dataset the //we're here because it's a local test, so that's for local files - //Sparkcontext, a method for creating an RDD based on the input source of a file type, called the Textfile () methodWuyi //in Java, the common Rdd created is called Javardd . the //here, in the RDD, there are elements of this concept, if it is HDFs or local files, the creation of the RDD, each element is equivalent to - //It's a line in the file. WuJavardd<string> lines = Sc.textfile ("C://users//administrator//desktop//spark.txt"); - About //Fourth Step: Transformation The initial rdd, that is, some computational operations $ //typically operations are performed by creating function and cooperating with the RDD map, Flatmap, and other operators. - //function, typically, if simpler, creates an anonymous inner class of the specified function - //However, if the function is more complex, a class is created separately as the class that implements the function interface - A //split each line into individual words first + //flatmapfunction, with two generic parameters representing the input and output types, respectively the //Here we are, the input must be a string, because it is a line of text, the output, is actually a string, because it is the text of each line - //Here is a brief introduction to the role of the FLATMAP operator, in fact, an element of the RDD, to split into one or more elements $javardd<string> words = Lines.flatmap (NewFlatmapfunction<string, string>() { the the Private Static Final LongSerialversionuid = 1L; the the @Override - PublicIterable<string> Call (String line)throwsException { in returnArrays.aslist (Line.split ("")); the } the About }); the the //Next, you need to map each word to this format (Word, 1) the //because this is the only way to add the number of occurrences of each word based on the word as key . + //Maptopair, in fact, is to map each element to an element of type Tuple2 (V1,V2) - //If you remember the tuple in Scala, then yes, the tuple2 here is the Scala type, which contains two values the //Maptopair This operator, which is required for use with pairfunction, the first generic parameter represents the input typeBayi //The second and third generic parameters, which represent the output of the first value of the Tuple2 and the type of the second value the //javapairrdd Two generic parameters representing the first value of the tuple element and the type of the second value, respectively thejavapairrdd<string, integer> pairs =Words.maptopair ( - - NewPairfunction<string, String, integer>() { the the Private Static Final LongSerialversionuid = 1L; the the @Override - PublicTuple2<string, integer> call (String Word)throwsException { the return NewTuple2<string, integer> (Word, 1); the } the 94 }); the the //Next, you need to use words as key to count the occurrences of each word. the //In this case, the Reducebykey operator is used, and the value of each key corresponds to the reduce operation98 //For example, there are several elements in Javapairrdd, respectively (Hello, 1) (Hello, 1) (Hello, 1) (World, 1) About //The reduce operation, which is equivalent to calculating the first value and the second value, and then calculating the result with a third value - //For example, Hello here, then the equivalent is, first of all, 1 + 1 = 2, and then 2 + 1 = 3101 //The last returned element in the Javapairrdd is also a tuple, but the first value is each key, and the second value is the value of key102 //The result after reduce is equivalent to the number of occurrences of each word103javapairrdd<string, integer> wordcounts =Pairs.reducebykey (104 the NewFunction2<integer, Integer, integer>() {106 107 Private Static Final LongSerialversionuid = 1L;108 109 @Override the PublicInteger call (integer v1, integer v2)throwsException {111 returnV1 +v2; the }113 the }); the the //so far, we have counted the number of words by manipulating several spark operators .117 //However, the Flatmap, Maptopair, and reducebykey operations we used before are called transformation Operations .118 //in a spark application, the transformation operation alone is not possible, it is not executed, there must be a kind of action119 //then, finally, you can trigger the execution of a program by using a, for example, foreach, called Action. -Wordcounts.foreach (NewVoidfunction<tuple2<string,integer>>() {121 122 Private Static Final LongSerialversionuid = 1L;123 124 @Override the Public voidCall (tuple2<string, integer> wordCount)throwsException {126System.out.println (Wordcount._1 + "appeared" + Wordcount._2 + "times."); 127 } - 129 }); the 131 sc.close (); the }133 134}
Spark Primer WordCount Detailed version