Build Spark's WordCount program with SBT

Last Update:2015-04-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Questions Guide:
1. What is SBT?
How is the 2.SBT project environment built?
3. How to use SBT to compile packaging Scala?

SBT Introduction
SBT is a code compilation tool that is MVN in the Scala world, can compile scala,java and so on, need to java1.6 above.

SBT Project Environment Establishment
SBT compilation requires a fixed directory format and requires networking, and SBT downloads the dependent jar package to the user home. Ivy2 below, the directory structure is as follows:

    |--BUILD.SBT    |--lib    |--project    |--src    |   | --    main|   |    | --    Scala|   | --    Test|         | --    Scala|--SBT    |--target

The above set up the directory as follows:

    Mkdir-p ~/spark_wordcount/lib    -P~/spark_wordcount/    project-P ~/spark_wordcount/src/ main/Scala    -P~/spark_wordcount/src/test/    Scala-P ~/spark_wordcount/target

Then copy the SBT script and SBT jar package from the SBT directory of the Spark installation directory

cp/path/to/spark/sbt/sbt* ~/spark_wordcount/

Because Spark's SBT script finds the./SBT directory by default, modify the following

    jar=sbt/sbt-launch-${sbt_version}.jar    to    jar=sbt-launch-${sbt_version}.jar

Copy Spark's jar package to, SBT's lib directory

    cp/path/to/spark/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0. Jar

Create BUILD.SBT configuration file, each line needs to have a blank line split

1     Name: = "WordCount"2     [ This was Bank line]3     version: = "1.0.0"4     [ This was Bank line]5     scalaversion: = "2.10.3"

Since the Spark's SBT script requires a version number for the SBT to be found in project's Build.properties file, we created the file with the following additions:

sbt.version=0.12.4

Spark WordCount Program Authoring and compiling
Create a Wordcount.scala source file, assuming the package is Spark.example

    Mkdir-p ~/spark_wordcount/src/main/scala/spark/Example    -P ~/spark_wordcount/src/main/scala/spark /example/wordcount.scala

Add specific program code and save

1      PackageSpark.example2 3     Importorg.apache.spark._4     Importsparkcontext._5 6 Object WordCount {7 def main (args:array[string]) {8         //command line parameter count check9         if(Args.length = = 0) {TenSystem.err.println ("Usage:spark.example.WordCount <input> <output>") OneSystem.exit (1) A         } -         //using the HDFs file system -Val hdfspathroot = "hdfshost:9000" the         //instantiating the context of spark -Val Spark =NewSparkcontext (args (0), "WordCount", -System.getenv ("Spark_home"), Sparkcontext.jarofclass ( This. GetClass)) -         //Read input file +Val inputfile = spark.textfile (Hdfspathroot + args (1)) -         //Perform wordcount count +         //read Inputfile execution method Flatmap, each line through a space participle A         //It then outputs the word and count of a tuple, and initializes the Count at         //is 1, then executes the Reducebykey method, counting the same words tired -         //Plus -Val countresult = inputfile.flatmap (line = Line.split ("")) -. Map (Word = 1)) -. Reducebykey (_ + _) -         //output WordCount results to the specified directory inCountresult.saveastextfile (Hdfspathroot + args (2)) -       } to}

To the Spark_wordcount directory, perform the compilation:

    CD ~/spark_wordcount/    . /SBT Compile

Make a jar package

Package

The build process, SBT needs to download dependent Toolkit , Jna,scala and so on. The packaged jar can be found in the target/scala-2.10/directory after the compilation is complete.

    [Email protected] scala-2.10]# pwd     /usr/local/hadoop/spark_wordcount/target/scala-2.10    [[email protected] Scala-2.10]# ls    Cache  Classes  wordcount_2. 10-1.0.0.jar

WordCount execution
You can refer to the method of spark distributed on yarn running on Hadoop and write an execution script

1#!/usr/bin/ENV Bash2 3spark_jar=./assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0. Jar4./bin/spark-classorg.apache.spark.deploy.yarn.Client5--jar ~/spark_wordcount/target/scala-2.10/wordcount_2.10-1.0.0. Jar6--classSpark.example.WordCount7--args yarn-Standalone8--args/TestWordCount.txt9--args/ResultwordcountTen--num-workers 3  One--master-Memory 4g A--worker-Memory 2g ---worker-cores 2

Then, copy a file named TestWordCount.txt into HDFs.

HDFs dfs-copyfromlocal./testwordcount.txt/testwordcount.txt

Execute the script and you'll see the results in a minute.

Build Spark's WordCount program with SBT

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Build Spark's WordCount program with SBT

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support