Build Spark's WordCount program with SBT

Source: Internet
Author: User

Questions Guide:
1. What is SBT?
How is the 2.SBT project environment built?
3. How to use SBT to compile packaging Scala?

SBT Introduction
SBT is a code compilation tool that is MVN in the Scala world, can compile scala,java and so on, need to java1.6 above.

SBT Project Environment Establishment
SBT compilation requires a fixed directory format and requires networking, and SBT downloads the dependent jar package to the user home. Ivy2 below, the directory structure is as follows:

    |--BUILD.SBT    |--lib    |--project    |--src    |   | --    main|   |    | --    Scala|   | --    Test|         | --    Scala|--SBT    |--target

The above set up the directory as follows:

    Mkdir-p ~/spark_wordcount/lib    -P~/spark_wordcount/    project-P ~/spark_wordcount/src/ main/Scala    -P~/spark_wordcount/src/test/    Scala-P ~/spark_wordcount/target

Then copy the SBT script and SBT jar package from the SBT directory of the Spark installation directory

cp/path/to/spark/sbt/sbt* ~/spark_wordcount/

Because Spark's SBT script finds the./SBT directory by default, modify the following

    jar=sbt/sbt-launch-${sbt_version}.jar    to    jar=sbt-launch-${sbt_version}.jar

Copy Spark's jar package to, SBT's lib directory

    cp/path/to/spark/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0. Jar     

Create BUILD.SBT configuration file, each line needs to have a blank line split

1     Name: = "WordCount"2     [ This was Bank line]3     version: = "1.0.0"4     [ This was Bank line]5     scalaversion: = "2.10.3"

Since the Spark's SBT script requires a version number for the SBT to be found in project's Build.properties file, we created the file with the following additions:

sbt.version=0.12.4

Spark WordCount Program Authoring and compiling
Create a Wordcount.scala source file, assuming the package is Spark.example

    Mkdir-p ~/spark_wordcount/src/main/scala/spark/Example    -P ~/spark_wordcount/src/main/scala/spark /example/wordcount.scala

Add specific program code and save

1      PackageSpark.example2 3     Importorg.apache.spark._4     Importsparkcontext._5 6 Object WordCount {7 def main (args:array[string]) {8         //command line parameter count check9         if(Args.length = = 0) {TenSystem.err.println ("Usage:spark.example.WordCount <input> <output>") OneSystem.exit (1) A         } -         //using the HDFs file system -Val hdfspathroot = "hdfshost:9000" the         //instantiating the context of spark -Val Spark =NewSparkcontext (args (0), "WordCount", -System.getenv ("Spark_home"), Sparkcontext.jarofclass ( This. GetClass)) -         //Read input file +Val inputfile = spark.textfile (Hdfspathroot + args (1)) -         //Perform wordcount count +         //read Inputfile execution method Flatmap, each line through a space participle A         //It then outputs the word and count of a tuple, and initializes the Count at         //is 1, then executes the Reducebykey method, counting the same words tired -         //Plus -Val countresult = inputfile.flatmap (line = Line.split ("")) -. Map (Word = 1)) -. Reducebykey (_ + _) -         //output WordCount results to the specified directory inCountresult.saveastextfile (Hdfspathroot + args (2)) -       } to}

To the Spark_wordcount directory, perform the compilation:

    CD ~/spark_wordcount/    . /SBT Compile

Make a jar package

Package

The build process, SBT needs to download dependent Toolkit , Jna,scala and so on. The packaged jar can be found in the target/scala-2.10/directory after the compilation is complete.

    [Email protected] scala-2.10]# pwd     /usr/local/hadoop/spark_wordcount/target/scala-2.10    [[email protected] Scala-2.10]# ls    Cache  Classes  wordcount_2. 10-1.0.0.jar

WordCount execution
You can refer to the method of spark distributed on yarn running on Hadoop and write an execution script

1#!/usr/bin/ENV Bash2 3spark_jar=./assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0. Jar4./bin/spark-classorg.apache.spark.deploy.yarn.Client5--jar ~/spark_wordcount/target/scala-2.10/wordcount_2.10-1.0.0. Jar6--classSpark.example.WordCount7--args yarn-Standalone8--args/TestWordCount.txt9--args/ResultwordcountTen--num-workers 3  One--master-Memory 4g A--worker-Memory 2g ---worker-cores 2

Then, copy a file named TestWordCount.txt into HDFs.

HDFs dfs-copyfromlocal./testwordcount.txt/testwordcount.txt

Execute the script and you'll see the results in a minute.

Build Spark's WordCount program with SBT

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.