Spark Primer WordCount Detailed version

Source: Internet
Author: User

1  PackageCn.spark.study.core;2 3 Importjava.util.Arrays;4 5 Importorg.apache.spark.SparkConf;6 ImportOrg.apache.spark.api.java.JavaPairRDD;7 ImportOrg.apache.spark.api.java.JavaRDD;8 ImportOrg.apache.spark.api.java.JavaSparkContext;9 Importorg.apache.spark.api.java.function.FlatMapFunction;Ten ImportOrg.apache.spark.api.java.function.Function2; One Importorg.apache.spark.api.java.function.PairFunction; A Importorg.apache.spark.api.java.function.VoidFunction; -  - ImportScala. Tuple2; the  - /** - * Using Java to develop locally tested WordCount programs -  * @authorAdministrator +  * -  */ +  Public classwordcountlocal { A      at      Public Static voidMain (string[] args) { -         //Writing spark Applications -         //local execution, which can be performed in the Main method in Eclipse, executes the -          -         //First step: Create a Sparkconf object and set the configuration information for the Spark app -         //use Setmaster () to set the URL of the master node of the spark cluster to which the spark application will connect in         //However, if set to local, it will be run locally -sparkconf conf =Newsparkconf () to. Setappname ("Wordcountlocal") +. Setmaster ("local");  -          the         //Step Two: Create a Javasparkcontext object *         //in Spark, Sparkcontext is a portal to all of Spark's features, whether you're writing in Java, Scala, or even Python $             //must have a sparkcontext, its main role, including some of the core components required to initialize the spark application, includingPanax Notoginseng             //Scheduler (Dagschedule, TaskScheduler), also go to the Spark Master node to register, and so on -         //a word, sparkcontext, is one of the most important objects in the Spark application . the         //However, in spark, writing different types of spark applications, using the Sparkcontext is different, if using Scala, +             //is using the native Sparkcontext object . A             //But if you use Java, it's the Javasparkcontext object. the             //If you are developing a spark SQL program, then it is SqlContext, Hivecontext +             //If you are developing a spark streaming program, then it is unique Sparkcontext -             //etc. $Javasparkcontext sc =Newjavasparkcontext (conf); $      -         //Step Three: To create an initial rdd for the input source (hdfs file, local file, and so on) -         //the data in the input source is scattered and allocated to each partition of the RDD, resulting in an initial distributed dataset the         //we're here because it's a local test, so that's for local files -         //Sparkcontext, a method for creating an RDD based on the input source of a file type, called the Textfile () methodWuyi         //in Java, the common Rdd created is called Javardd . the         //here, in the RDD, there are elements of this concept, if it is HDFs or local files, the creation of the RDD, each element is equivalent to -         //It's a line in the file. WuJavardd<string> lines = Sc.textfile ("C://users//administrator//desktop//spark.txt"); -      About         //Fourth Step: Transformation The initial rdd, that is, some computational operations $         //typically operations are performed by creating function and cooperating with the RDD map, Flatmap, and other operators. -         //function, typically, if simpler, creates an anonymous inner class of the specified function -         //However, if the function is more complex, a class is created separately as the class that implements the function interface -          A         //split each line into individual words first +         //flatmapfunction, with two generic parameters representing the input and output types, respectively the         //Here we are, the input must be a string, because it is a line of text, the output, is actually a string, because it is the text of each line -         //Here is a brief introduction to the role of the FLATMAP operator, in fact, an element of the RDD, to split into one or more elements $javardd<string> words = Lines.flatmap (NewFlatmapfunction<string, string>() { the              the             Private Static Final LongSerialversionuid = 1L; the              the @Override -              PublicIterable<string> Call (String line)throwsException { in                 returnArrays.aslist (Line.split (""));  the             } the              About         }); the          the         //Next, you need to map each word to this format (Word, 1) the             //because this is the only way to add the number of occurrences of each word based on the word as key . +         //Maptopair, in fact, is to map each element to an element of type Tuple2 (V1,V2) -             //If you remember the tuple in Scala, then yes, the tuple2 here is the Scala type, which contains two values the         //Maptopair This operator, which is required for use with pairfunction, the first generic parameter represents the input typeBayi             //The second and third generic parameters, which represent the output of the first value of the Tuple2 and the type of the second value the         //javapairrdd Two generic parameters representing the first value of the tuple element and the type of the second value, respectively thejavapairrdd<string, integer> pairs =Words.maptopair ( -                  -                 NewPairfunction<string, String, integer>() { the  the                     Private Static Final LongSerialversionuid = 1L; the          the @Override -                      PublicTuple2<string, integer> call (String Word)throwsException { the                         return NewTuple2<string, integer> (Word, 1); the                     } the                     94                 }); the          the         //Next, you need to use words as key to count the occurrences of each word. the         //In this case, the Reducebykey operator is used, and the value of each key corresponds to the reduce operation98         //For example, there are several elements in Javapairrdd, respectively (Hello, 1) (Hello, 1) (Hello, 1) (World, 1) About         //The reduce operation, which is equivalent to calculating the first value and the second value, and then calculating the result with a third value -         //For example, Hello here, then the equivalent is, first of all, 1 + 1 = 2, and then 2 + 1 = 3101         //The last returned element in the Javapairrdd is also a tuple, but the first value is each key, and the second value is the value of key102         //The result after reduce is equivalent to the number of occurrences of each word103javapairrdd<string, integer> wordcounts =Pairs.reducebykey (104                  the                 NewFunction2<integer, Integer, integer>() {106                     107                     Private Static Final LongSerialversionuid = 1L;108         109 @Override the                      PublicInteger call (integer v1, integer v2)throwsException {111                         returnV1 +v2; the                     }113                      the                 }); the          the         //so far, we have counted the number of words by manipulating several spark operators .117         //However, the Flatmap, Maptopair, and reducebykey operations we used before are called transformation Operations .118         //in a spark application, the transformation operation alone is not possible, it is not executed, there must be a kind of action119         //then, finally, you can trigger the execution of a program by using a, for example, foreach, called Action. -Wordcounts.foreach (NewVoidfunction<tuple2<string,integer>>() {121             122             Private Static Final LongSerialversionuid = 1L;123             124 @Override the              Public voidCall (tuple2<string, integer> wordCount)throwsException {126System.out.println (Wordcount._1 + "appeared" + Wordcount._2 + "times."); 127             } -             129         }); the         131 sc.close (); the     }133     134}

Spark Primer WordCount Detailed version

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.