1. Introduction of Spark Package: spark-assembly-1.4.0-hadoop2.6.0, in the Lib directory of Spark
File-->project structure
2. Create a Scala project with idea and create a new WordCount object
The 3.WordCount code is as follows:
Import Org.apache.spark.SparkConfimport Org.apache.spark.SparkContextimport org.apache.spark.sparkcontext._ ObjectWordCount {def main (args:array[string]) {if(Args.length <1) {System.err.println ("Usage: <file>") System.exit (1)} Val conf=Newsparkconf () Val SC=Newsparkcontext (Conf) Val line= Sc.textfile (Args (0)) Line.flatmap (_.split (" "). Map ((_,1). Reducebykey (_+_). Collect ().foreach(println) Sc.stop ()}}
4. Package jar: Idea-->project structure-->artifacts--> click +
5. Fill in the exported path, mine is placed in the/home/jiahong/sparktest directory
6. Start the spark cluster and go to http://localhost:8080/to view the primary node address of Spark, mine is: spark://jiahong-optiplex-7010:7077
7. Last jar package to spark in terminal
[Email protected] 7010: ~/spark-1.4. 0-bin-hadoop2. 6$ bin/spark-submit--master Spark://jiahong-optiplex-7010:7077--name Spark_scala--class WordCount--executor-memory 1G--total-executor-cores 2 ~/sparktest/spark_scala.jar/home/jiahong/jia.txt
Enter Hadoop, then use the Spark-submit command to submit the jar package, if you do not understand the above command, you can use Spark-submit--help to view the Help
Spark://jiahong-optiplex-7010:7077 address of the primary node
Address of the Har package for export
--class WordCount The object name for the word count
--executor-memory 1G--total-executor-cores 2 Specifies how much memory is executed and, in what number of CPU cores executed
~/sparktest/spark_scala.jar The location of the exported jar package
/home/jiahong/jia.txt for the calculation of the input to wordcount the word frequency file location
Submit a Hadoop job to run on spark