"Gandalf" Java Hello World on Spark

Source: Internet
Author: User

IntroductionThe HelloWorld of the spark application written in Java, though somewhat shabby, is useless, but it has to be tried and recorded. EnvironmentWindows7eclipse+mavenjdk1.7ubuntu 14.04 Step One: Create a maven project in Eclipse, the process is simple and not detailed. The Pom file is:<project xmlns= "http://maven.apache.org/POM/4.0.0" xmlns: xsi= "Http://www.w3.org/2001/XMLSchema-instance" xsi:schemalocation= "http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/ Maven-4.0.0.xsd "> <groupId>edu.berkeley</groupId> <artifactid>sparkproj</artifactid> <modelversion>4.0.0</modelversion> <name>Spark Project</name> <packaging>jar</packaging> <version>1.0</version> <dependencies> <dependency> <!--Spark dependency -- <groupId>org.apache.spark</groupId> <artifactid>spark-core_2.10</artifactid> <version>1.3.0</version> </dependency> </dependencies> </Project>
Step Two: Write the core logic codeThe function is simple, count how many rows in Readme.md in the spark root in the cluster contain a, and how many rows contain B. To be honest, this feature is very simple to write in Scala, and Java is disgusting.
Package edu.berkeley.SparkProj;
/ * Simpleapp.java * /import org.apache.spark.api.java.*;import org.apache.spark.SparkConf;import org.apache.spark.api.java.function.Function;
Public class Simpleapp {Public static void Main (string[] args) {String logFile = "file:///home/fulong/spark/spark-1.3.0-bin-hadoop2.4/readme.md";//should be some file On your systemsparkconf conf = new sparkconf (). Setappname ("Simple Application");Javasparkcontext sc = new Javasparkcontext (conf);javardd<string> logdata = Sc.textfile (logFile). cache ();
Long Numas = Logdata.filter (New function<string, boolean> () {Public Boolean Call (String s) {return s.contains ("a");}}). Count ();
Long numbs = Logdata.filter (New function<string, boolean> () {Public Boolean Call (String s) {return s.contains ("B");}}). Count ();
System.out.println ("Lines with a:" + Numas + ", Lines with B:" + numbs);  }}
step three: Under Windows cmd, go to the project root packageD:\WORKSPACE2015\SPARKPROJ>MVN PackageBuild jar Package: D:\WorkSpace2014\SparkProj\target\SparkProj-1.0.jar

Step Four: Copy the package through the WINSCP tool to the directory of a node in the cluster/home/fulong/workspace/spark/Sparkproj-1.0.jar

final step: Submit the program to the spark cluster via Spark-submit[Email protected]:~/spark/spark-1.3.0-bin-hadoop2.4$./bin/spark-submit--class edu.berkeley.SparkProj.SimpleApp--master yarn-client/home/fulong/workspace/spark/ Sparkproj-1.0.jar

The result of the run, containing 60 rows of a, contains 29 lines of B:

"Gandalf" Java Hello World on Spark

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.