IntroductionThe HelloWorld of the spark application written in Java, though somewhat shabby, is useless, but it has to be tried and recorded.
EnvironmentWindows7eclipse+mavenjdk1.7ubuntu 14.04
Step One: Create a maven project in Eclipse, the process is simple and not detailed. The Pom file is:<project xmlns=
"http://maven.apache.org/POM/4.0.0" xmlns: xsi=
"Http://www.w3.org/2001/XMLSchema-instance" xsi:schemalocation=
"http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/ Maven-4.0.0.xsd "> <groupId>edu.berkeley</groupId> <artifactid>sparkproj</artifactid> <modelversion>4.0.0</modelversion> <name>Spark Project</name> <packaging>jar</packaging> <version>1.0</version> <dependencies> <dependency> <!--Spark dependency -- <groupId>org.apache.spark</groupId> <artifactid>spark-core_2.10</artifactid> <version>1.3.0</version> </dependency> </dependencies> </Project>
Step Two: Write the core logic codeThe function is simple, count how many rows in Readme.md in the spark root in the cluster contain a, and how many rows contain B. To be honest, this feature is very simple to write in Scala, and Java is disgusting.
Package edu.berkeley.SparkProj;
/ * Simpleapp.java * /import org.apache.spark.api.java.*;import org.apache.spark.SparkConf;import org.apache.spark.api.java.function.Function;
Public class Simpleapp {Public static void Main (string[] args) {String logFile = "file:///home/fulong/spark/spark-1.3.0-bin-hadoop2.4/readme.md";//should be some file On your systemsparkconf conf = new sparkconf (). Setappname ("Simple Application");Javasparkcontext sc = new Javasparkcontext (conf);javardd<string> logdata = Sc.textfile (logFile). cache ();
Long Numas = Logdata.filter (New function<string, boolean> () {Public Boolean Call (String s) {return s.contains ("a");}}). Count ();
Long numbs = Logdata.filter (New function<string, boolean> () {Public Boolean Call (String s) {return s.contains ("B");}}). Count ();
System.out.println ("Lines with a:" + Numas + ", Lines with B:" + numbs); }}
step three: Under Windows cmd, go to the project root packageD:\WORKSPACE2015\SPARKPROJ>MVN PackageBuild jar Package: D:\WorkSpace2014\SparkProj\target\SparkProj-1.0.jar
Step Four: Copy the package through the WINSCP tool to the directory of a node in the cluster/home/fulong/workspace/spark/Sparkproj-1.0.jar
final step: Submit the program to the spark cluster via Spark-submit[Email protected]:~/spark/spark-1.3.0-bin-hadoop2.4$./bin/spark-submit--class edu.berkeley.SparkProj.SimpleApp--master yarn-client/home/fulong/workspace/spark/ Sparkproj-1.0.jar
The result of the run, containing 60 rows of a, contains 29 lines of B:
"Gandalf" Java Hello World on Spark