How to build a first spark project code

Last Update:2016-04-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

How to build the first spark Project code environment prepare the on-premises environment

Operating system
Window7/mac
Ide
IntelliJ idea Community Edition 14.1.6
JDK 1.8.0_65
Scala 2.11.7

Other environment

spark:1.4.1
Hadoop Yarn:hadoop 2.5.0-cdh5.3.2

IDE Project Create a new project

New Project
Create a Scala project using the MAVEN model
Fill in your own groupid, artifactid,version do not need to modify, MAVEN will generate the corresponding directory structure according to GroupID, GroupID value is generally A.B.C structure, Artifactid for the project name. Then click Next, fill out the project name and directory, click Finish to let Maven help you create a Scala project

After project creation is complete, the directory structure is as follows

4. Add JDK and Scala SDK to project
Click File->project Structure to configure the environment for your project in SDKs and global libraries.

At this point the entire project structure, the project environment has been set up

Writing the main function

The main function is written under projectname/src/main/scala/.../, and if the code is built according to the above steps, it will be found at the end of the catalog

MyRouteBuildMyRouteMain

These two files are module files, deleted MyRouteBuild , renamed MyRouteMain to DirectKafkaWordCount . Here, I use the spark streaming officially provided code for the instance code, the code is as follows

package Org. Apache. Spark. Examples. StreamingImport Kafka. Serializer. Stringdecoderimport org. Apache. Spark. Streaming. _import org. Apache. Spark. Streaming. Kafka. _import org. Apache. Spark. SparkconfObject Directkafkawordcount {def main (args:array[string]) {if (args. Length<2) {System. Err. println("...") System. Exit(1)}//streamingexamples. Setstreamingloglevels() Val Array (brokers, topics) = args val sparkconf = new sparkconf (). Setappname("Directkafkawordcount") Val SSC = new StreamingContext (sparkconf, Seconds (2)//Create direct Kafka stream with brokers andTopics val Topicsset = Topics. Split(","). TosetVal kafkaparams = map[string, String] ("Metadata.broker.list"-brokers) Val messages = Kafkautils. Createdirectstream[String, String, Stringdecoder, Stringdecoder] (SSC, Kafkaparams, Topicsset)//Get the lines, split them into words, count the words andPrint Val lines = messages. Map(_._2) Val words = lines. FlatMap(_. Split(" ")) Val wordcounts = words. Map(x= (x,1L)). Reducebykey(_ + _) wordcounts. Print()//Start the computation SSC. Start() SSC. Awaittermination()  }}

Replace the top of the code with package org.apache.spark.examples.streaming DirectKafkaWordCount the one in the package section. and overwrite the DirectKafkaWordCount file.
At this point the Spark processing code has been written to completion.

Modify pom.xmlTo prepare the project for packaging

pom.xmlhas written a dependency on the entire project in which we need to import some Spark Streaming related packages.

<dependency>  <groupId>Org.apache.spark</groupId>  <artifactid>spark-core_2.10</artifactid>  <version>1.4.1</version></Dependency><dependency>  <groupId>Org.apache.spark</groupId>  <artifactid>spark-streaming-kafka_2.10</artifactid>  <version>1.4.1</version></Dependency><dependency>  <groupId>Org.apache.spark</groupId>  <artifactid>spark-streaming_2.10</artifactid>  <version>1.4.1</version></Dependency><!--Scala --<dependency>  <groupId>Org.scala-lang</groupId>  <artifactid>Scala-library</artifactid>  <version>2.10.4</version></Dependency>

In addition, if you need to package dependent dependencies into the final JAR package, you need to pom.xml write the following configuration in the bulid tag:

<plugins>      <!--Plugin to create a single jar This includes all dependencies -      <plugin>        <artifactid>Maven-assembly-plugin</artifactid>        <version>2.4</version>        <configuration>          <descriptorrefs>            <descriptorref>Jar-with-dependencies</descriptorref>          </descriptorrefs>        </configuration>        <executions>          <execution>            <ID>make-assembly</ID>            <phase>Package</phase>            <goals>              <goal>Single</goal>            </goals>          </Execution>        </executions>      </plugin>      <plugin>        <groupId>Org.apache.maven.plugins</groupId>        <artifactid>Maven-compiler-plugin</artifactid>        <version>2.0.2</version>        <configuration>          <source>1.7</Source>          <target>1.7</target>        </configuration>      </plugin>      <plugin>        <groupId>Net.alchim31.maven</groupId>        <artifactid>Scala-maven-plugin</artifactid>        <executions>          <execution>            <ID>Scala-compile-first</ID>            <phase>Process-resources</phase>            <goals>              <goal>Add-source</goal>              <goal>Compile</goal>            </goals>          </Execution>          <execution>            <ID>Scala-test-compile</ID>            <phase>Process-test-resources</phase>            <goals>              <goal>Testcompile</goal>            </goals>          </Execution>        </executions>      </plugin>          </plugins>

pom.xmlOnce the file has been modified, you are ready to start MAVEN packaging and operation:

Click on the right pop-up window of Execute Maven Goal, command line enter clean package

Spark Job Submission

projectname/targetTwo packages can be found in the project directory jar , one containing only Scala code and the other containing all dependent packages.
jarlead the package to the spark server, run the spark job, and run the following

.. /bin/spark-submit–master Yarn-client–jars. /lib/kafka_2.10-0.8.2.1.jar–class Huochen.spark.example.DirectKafkaWordCount Sparkexample-1.0-snapshot-jar-with-dependencies.jar kafka-broker Topic

With spark-submit the task submitted to the yarn cluster, you can see the results of the operation.

How to build a first spark project code

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How to build a first spark project code

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

How to build a first spark project code

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support