Questions Guide:
1. What is SBT?
How is the 2.SBT project environment built?
3. How to use SBT to compile packaging Scala?
SBT Introduction
SBT is a code compilation tool that is MVN in the Scala world, can compile scala,java and so on, need to java1.6 above.
SBT Project Environment Establishment
SBT compilation requires a fixed directory format and requires networking, and SBT downloads the dependent jar package to the user home. Ivy2 below, the directory structure is as follows:
|--BUILD.SBT |--lib |--project |--src | | -- main| | | -- Scala| | -- Test| | -- Scala|--SBT |--target
The above set up the directory as follows:
Mkdir-p ~/spark_wordcount/lib -P~/spark_wordcount/ project-P ~/spark_wordcount/src/ main/Scala -P~/spark_wordcount/src/test/ Scala-P ~/spark_wordcount/target
Then copy the SBT script and SBT jar package from the SBT directory of the Spark installation directory
cp/path/to/spark/sbt/sbt* ~/spark_wordcount/
Because Spark's SBT script finds the./SBT directory by default, modify the following
jar=sbt/sbt-launch-${sbt_version}.jar to jar=sbt-launch-${sbt_version}.jar
Copy Spark's jar package to, SBT's lib directory
cp/path/to/spark/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0. Jar
Create BUILD.SBT configuration file, each line needs to have a blank line split
1 Name: = "WordCount"2 [ This was Bank line]3 version: = "1.0.0"4 [ This was Bank line]5 scalaversion: = "2.10.3"
Since the Spark's SBT script requires a version number for the SBT to be found in project's Build.properties file, we created the file with the following additions:
sbt.version=0.12.4
Spark WordCount Program Authoring and compiling
Create a Wordcount.scala source file, assuming the package is Spark.example
Mkdir-p ~/spark_wordcount/src/main/scala/spark/Example -P ~/spark_wordcount/src/main/scala/spark /example/wordcount.scala
Add specific program code and save
1 PackageSpark.example2 3 Importorg.apache.spark._4 Importsparkcontext._5 6 Object WordCount {7 def main (args:array[string]) {8 //command line parameter count check9 if(Args.length = = 0) {TenSystem.err.println ("Usage:spark.example.WordCount <input> <output>") OneSystem.exit (1) A } - //using the HDFs file system -Val hdfspathroot = "hdfshost:9000" the //instantiating the context of spark -Val Spark =NewSparkcontext (args (0), "WordCount", -System.getenv ("Spark_home"), Sparkcontext.jarofclass ( This. GetClass)) - //Read input file +Val inputfile = spark.textfile (Hdfspathroot + args (1)) - //Perform wordcount count + //read Inputfile execution method Flatmap, each line through a space participle A //It then outputs the word and count of a tuple, and initializes the Count at //is 1, then executes the Reducebykey method, counting the same words tired - //Plus -Val countresult = inputfile.flatmap (line = Line.split ("")) -. Map (Word = 1)) -. Reducebykey (_ + _) - //output WordCount results to the specified directory inCountresult.saveastextfile (Hdfspathroot + args (2)) - } to}
To the Spark_wordcount directory, perform the compilation:
CD ~/spark_wordcount/ . /SBT Compile
Make a jar package
Package
The build process, SBT needs to download dependent Toolkit , Jna,scala and so on. The packaged jar can be found in the target/scala-2.10/directory after the compilation is complete.
[Email protected] scala-2.10]# pwd /usr/local/hadoop/spark_wordcount/target/scala-2.10 [[email protected] Scala-2.10]# ls Cache Classes wordcount_2. 10-1.0.0.jar
WordCount execution
You can refer to the method of spark distributed on yarn running on Hadoop and write an execution script
1#!/usr/bin/ENV Bash2 3spark_jar=./assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0. Jar4./bin/spark-classorg.apache.spark.deploy.yarn.Client5--jar ~/spark_wordcount/target/scala-2.10/wordcount_2.10-1.0.0. Jar6--classSpark.example.WordCount7--args yarn-Standalone8--args/TestWordCount.txt9--args/ResultwordcountTen--num-workers 3 One--master-Memory 4g A--worker-Memory 2g ---worker-cores 2
Then, copy a file named TestWordCount.txt into HDFs.
HDFs dfs-copyfromlocal./testwordcount.txt/testwordcount.txt
Execute the script and you'll see the results in a minute.
Build Spark's WordCount program with SBT