I heard Liaoliang's 10th lesson tonight. Java Development Spark Combat, after-school assignment: Using MAVEN to develop Spark's wordcount in Java and run it in the cluster
Configure Pom.xml First
<groupId>com.dt.spark</groupId>
<artifactId>SparkApps</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0</version>
</dependency>
Then write the program:
public class WordCount {public static void main (string[] args) {sparkconf conf = new sparkconf (). Setappname ("Spark Wordcou NT by Java "); Javasparkcontext sc = new Javasparkcontext (conf); Javardd<string> lines = sc.textfile (args (0)); javardd<string> words = Lines.flatmap (new flatmapfunction<string,string> () {public iterable<string > Call (String line) throws Exception{return Arrays.aslist (Line.split (""));}); javapairrdd<string,integer> pairs = Words.maptopair (new pairfunction<string,string,integer> () {public Tuple2<string,integer> Call (String Word) throws Exception {return new tuple2<string,integer> (word,1);} }); javapairrdd<string,integer> Wordscount = Pairs.reducebykey (new function2<integer,integer,integer> () { Public integer Call (integer V1,integer v2) {return v1+v2;}}); Wordscount.foreach (New voidfunction<tuple2<string,integer>> () {public void call (Tuple2<string, Integer> pairs) throws Exception{system.out.prIntln (pairs._1+ ":" +pairs._2);}); Sc.close ();}}
Packaged into a jar file on the server to execute:
/usr/lib/spark/bin/spark-submit--master yarn-client--class com.dt.spark.WordCount--executor-memory 2G-- Executor-cores 4 ~/spark/wc.jar./mydir/tmp.txt
You can see that the results are consistent with the Scala writing.
Follow-up courses can be referred to Sina Weibo Liaoliang _dt Big Data Dream Factory: Http://weibo.com/ilovepains
Liaoliang China Spark First person, public number Dt_spark
Blog: http://bolg.sina.com.cn/ilovepains
Reprint please explain the source!
Spark3000 Disciple 10th Lesson Java Development Spark Combat summary