How to build a first spark project code

Source: Internet
Author: User

How to build the first spark Project code environment prepare the on-premises environment
    1. Operating system
      Window7/mac
    2. Ide
      IntelliJ idea Community Edition 14.1.6
    3. JDK 1.8.0_65
    4. Scala 2.11.7
Other environment
    1. spark:1.4.1
    2. Hadoop Yarn:hadoop 2.5.0-cdh5.3.2
IDE Project Create a new project
    1. New Project
    2. Create a Scala project using the MAVEN model
    3. Fill in your own groupid, artifactid,version do not need to modify, MAVEN will generate the corresponding directory structure according to GroupID, GroupID value is generally A.B.C structure, Artifactid for the project name. Then click Next, fill out the project name and directory, click Finish to let Maven help you create a Scala project

      After project creation is complete, the directory structure is as follows

      4. Add JDK and Scala SDK to project
      Click File->project Structure to configure the environment for your project in SDKs and global libraries.

      At this point the entire project structure, the project environment has been set up
Writing the main function

The main function is written under projectname/src/main/scala/.../, and if the code is built according to the above steps, it will be found at the end of the catalog

MyRouteBuildMyRouteMain

These two files are module files, deleted MyRouteBuild , renamed MyRouteMain to DirectKafkaWordCount . Here, I use the spark streaming officially provided code for the instance code, the code is as follows

package Org. Apache. Spark. Examples. StreamingImport Kafka. Serializer. Stringdecoderimport org. Apache. Spark. Streaming. _import org. Apache. Spark. Streaming. Kafka. _import org. Apache. Spark. SparkconfObject Directkafkawordcount {def main (args:array[string]) {if (args. Length<2) {System. Err. println("...") System. Exit(1)}//streamingexamples. Setstreamingloglevels() Val Array (brokers, topics) = args val sparkconf = new sparkconf (). Setappname("Directkafkawordcount") Val SSC = new StreamingContext (sparkconf, Seconds (2)//Create direct Kafka stream with brokers andTopics val Topicsset = Topics. Split(","). TosetVal kafkaparams = map[string, String] ("Metadata.broker.list"-brokers) Val messages = Kafkautils. Createdirectstream[String, String, Stringdecoder, Stringdecoder] (SSC, Kafkaparams, Topicsset)//Get the lines, split them into words, count the words andPrint Val lines = messages. Map(_._2) Val words = lines. FlatMap(_. Split(" ")) Val wordcounts = words. Map(x= (x,1L)). Reducebykey(_ + _) wordcounts. Print()//Start the computation SSC. Start() SSC. Awaittermination()  }}

Replace the top of the code with package org.apache.spark.examples.streaming DirectKafkaWordCount the one in the package section. and overwrite the DirectKafkaWordCount file.
At this point the Spark processing code has been written to completion.

Modify pom.xmlTo prepare the project for packaging

pom.xmlhas written a dependency on the entire project in which we need to import some Spark Streaming related packages.

<dependency>  <groupId>Org.apache.spark</groupId>  <artifactid>spark-core_2.10</artifactid>  <version>1.4.1</version></Dependency><dependency>  <groupId>Org.apache.spark</groupId>  <artifactid>spark-streaming-kafka_2.10</artifactid>  <version>1.4.1</version></Dependency><dependency>  <groupId>Org.apache.spark</groupId>  <artifactid>spark-streaming_2.10</artifactid>  <version>1.4.1</version></Dependency><!--Scala --<dependency>  <groupId>Org.scala-lang</groupId>  <artifactid>Scala-library</artifactid>  <version>2.10.4</version></Dependency>

In addition, if you need to package dependent dependencies into the final JAR package, you need to pom.xml write the following configuration in the bulid tag:

<plugins>      <!--Plugin to create a single jar This includes all dependencies -      <plugin>        <artifactid>Maven-assembly-plugin</artifactid>        <version>2.4</version>        <configuration>          <descriptorrefs>            <descriptorref>Jar-with-dependencies</descriptorref>          </descriptorrefs>        </configuration>        <executions>          <execution>            <ID>make-assembly</ID>            <phase>Package</phase>            <goals>              <goal>Single</goal>            </goals>          </Execution>        </executions>      </plugin>      <plugin>        <groupId>Org.apache.maven.plugins</groupId>        <artifactid>Maven-compiler-plugin</artifactid>        <version>2.0.2</version>        <configuration>          <source>1.7</Source>          <target>1.7</target>        </configuration>      </plugin>      <plugin>        <groupId>Net.alchim31.maven</groupId>        <artifactid>Scala-maven-plugin</artifactid>        <executions>          <execution>            <ID>Scala-compile-first</ID>            <phase>Process-resources</phase>            <goals>              <goal>Add-source</goal>              <goal>Compile</goal>            </goals>          </Execution>          <execution>            <ID>Scala-test-compile</ID>            <phase>Process-test-resources</phase>            <goals>              <goal>Testcompile</goal>            </goals>          </Execution>        </executions>      </plugin>          </plugins>

pom.xmlOnce the file has been modified, you are ready to start MAVEN packaging and operation:

Click on the right pop-up window of Execute Maven Goal, command line enter clean package

Spark Job Submission

projectname/targetTwo packages can be found in the project directory jar , one containing only Scala code and the other containing all dependent packages.
jarlead the package to the spark server, run the spark job, and run the following

.. /bin/spark-submit–master Yarn-client–jars. /lib/kafka_2.10-0.8.2.1.jar–class Huochen.spark.example.DirectKafkaWordCount Sparkexample-1.0-snapshot-jar-with-dependencies.jar kafka-broker Topic

With spark-submit the task submitted to the yarn cluster, you can see the results of the operation.

How to build a first spark project code

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.