How to build the first spark Project code environment prepare the on-premises environment
- Operating system
Window7/mac
- Ide
IntelliJ idea Community Edition 14.1.6
- JDK 1.8.0_65
- Scala 2.11.7
Other environment
- spark:1.4.1
- Hadoop Yarn:hadoop 2.5.0-cdh5.3.2
IDE Project Create a new project
- New Project
- Create a Scala project using the MAVEN model
- Fill in your own groupid, artifactid,version do not need to modify, MAVEN will generate the corresponding directory structure according to GroupID, GroupID value is generally A.B.C structure, Artifactid for the project name. Then click Next, fill out the project name and directory, click Finish to let Maven help you create a Scala project
After project creation is complete, the directory structure is as follows
4. Add JDK and Scala SDK to project
Click File->project Structure to configure the environment for your project in SDKs and global libraries.
At this point the entire project structure, the project environment has been set up
Writing the main function
The main function is written under projectname/src/main/scala/.../, and if the code is built according to the above steps, it will be found at the end of the catalog
MyRouteBuildMyRouteMain
These two files are module files, deleted MyRouteBuild
, renamed MyRouteMain
to DirectKafkaWordCount
. Here, I use the spark streaming officially provided code for the instance code, the code is as follows
package Org. Apache. Spark. Examples. StreamingImport Kafka. Serializer. Stringdecoderimport org. Apache. Spark. Streaming. _import org. Apache. Spark. Streaming. Kafka. _import org. Apache. Spark. SparkconfObject Directkafkawordcount {def main (args:array[string]) {if (args. Length<2) {System. Err. println("...") System. Exit(1)}//streamingexamples. Setstreamingloglevels() Val Array (brokers, topics) = args val sparkconf = new sparkconf (). Setappname("Directkafkawordcount") Val SSC = new StreamingContext (sparkconf, Seconds (2)//Create direct Kafka stream with brokers andTopics val Topicsset = Topics. Split(","). TosetVal kafkaparams = map[string, String] ("Metadata.broker.list"-brokers) Val messages = Kafkautils. Createdirectstream[String, String, Stringdecoder, Stringdecoder] (SSC, Kafkaparams, Topicsset)//Get the lines, split them into words, count the words andPrint Val lines = messages. Map(_._2) Val words = lines. FlatMap(_. Split(" ")) Val wordcounts = words. Map(x= (x,1L)). Reducebykey(_ + _) wordcounts. Print()//Start the computation SSC. Start() SSC. Awaittermination() }}
Replace the top of the code with package org.apache.spark.examples.streaming
DirectKafkaWordCount
the one in the package
section. and overwrite the DirectKafkaWordCount
file.
At this point the Spark processing code has been written to completion.
Modify
pom.xml
To prepare the project for packaging
pom.xml
has written a dependency on the entire project in which we need to import some Spark Streaming
related packages.
<dependency> <groupId>Org.apache.spark</groupId> <artifactid>spark-core_2.10</artifactid> <version>1.4.1</version></Dependency><dependency> <groupId>Org.apache.spark</groupId> <artifactid>spark-streaming-kafka_2.10</artifactid> <version>1.4.1</version></Dependency><dependency> <groupId>Org.apache.spark</groupId> <artifactid>spark-streaming_2.10</artifactid> <version>1.4.1</version></Dependency><!--Scala --<dependency> <groupId>Org.scala-lang</groupId> <artifactid>Scala-library</artifactid> <version>2.10.4</version></Dependency>
In addition, if you need to package dependent dependencies into the final JAR
package, you need to pom.xml
write the following configuration in the bulid tag:
<plugins> <!--Plugin to create a single jar This includes all dependencies - <plugin> <artifactid>Maven-assembly-plugin</artifactid> <version>2.4</version> <configuration> <descriptorrefs> <descriptorref>Jar-with-dependencies</descriptorref> </descriptorrefs> </configuration> <executions> <execution> <ID>make-assembly</ID> <phase>Package</phase> <goals> <goal>Single</goal> </goals> </Execution> </executions> </plugin> <plugin> <groupId>Org.apache.maven.plugins</groupId> <artifactid>Maven-compiler-plugin</artifactid> <version>2.0.2</version> <configuration> <source>1.7</Source> <target>1.7</target> </configuration> </plugin> <plugin> <groupId>Net.alchim31.maven</groupId> <artifactid>Scala-maven-plugin</artifactid> <executions> <execution> <ID>Scala-compile-first</ID> <phase>Process-resources</phase> <goals> <goal>Add-source</goal> <goal>Compile</goal> </goals> </Execution> <execution> <ID>Scala-test-compile</ID> <phase>Process-test-resources</phase> <goals> <goal>Testcompile</goal> </goals> </Execution> </executions> </plugin> </plugins>
pom.xml
Once the file has been modified, you are ready to start MAVEN packaging and operation:
Click on the right pop-up window of Execute Maven Goal, command line
enter clean package
Spark Job Submission
projectname/target
Two packages can be found in the project directory jar
, one containing only Scala code and the other containing all dependent packages.
jar
lead the package to the spark server, run the spark job, and run the following
.. /bin/spark-submit–master Yarn-client–jars. /lib/kafka_2.10-0.8.2.1.jar–class Huochen.spark.example.DirectKafkaWordCount Sparkexample-1.0-snapshot-jar-with-dependencies.jar kafka-broker Topic
With spark-submit
the task submitted to the yarn cluster, you can see the results of the operation.
How to build a first spark project code