標籤:
引言通過Java編寫Spark應用程式的HelloWorld,雖然有點寒磣,沒用Scala簡潔明了,但還是得嘗試和記錄下。
環境Windows7Eclipse+MavenJdk1.7Ubuntu 14.04
步驟一:在eclipse中建立maven工程,過程很簡單,不詳述。pom檔案為:<project xmlns=
"http://maven.apache.org/POM/4.0.0" xmlns:xsi=
"http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=
"http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <groupId>edu.berkeley</groupId> <artifactId>SparkProj</artifactId> <modelVersion>4.0.0</modelVersion> <name>Spark Project</name> <packaging>jar</packaging> <version>1.0</version> <dependencies> <dependency> <!-- Spark dependency --> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.3.0</version> </dependency> </dependencies></project>
步驟二:編寫核心邏輯代碼功能很簡單,統計叢集中Spark根目錄下README.md中有多少行包含a,多少行包含b。說實話該功能用Scala編寫十分簡單,Java實在是噁心。
package edu.berkeley.SparkProj;
/* SimpleApp.java */import org.apache.spark.api.java.*;import org.apache.spark.SparkConf;import org.apache.spark.api.java.function.Function;
public class SimpleApp { public static void main(String[] args) { String logFile = "file:///home/fulong/Spark/spark-1.3.0-bin-hadoop2.4/README.md"; // Should be some file on your system SparkConf conf = new SparkConf().setAppName("Simple Application"); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD<String> logData = sc.textFile(logFile).cache();
long numAs = logData.filter(new Function<String, Boolean>() { public Boolean call(String s) { return s.contains("a"); } }).count();
long numBs = logData.filter(new Function<String, Boolean>() { public Boolean call(String s) { return s.contains("b"); } }).count();
System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs); }}
步驟三:Windows下CMD中進入到工程根目錄下打包D:\WorkSpace2015\SparkProj>mvn package產生jar包:D:\WorkSpace2014\SparkProj\target\SparkProj-1.0.jar
步驟四:將該包通過WinSCP工具拷貝到叢集某節點的目錄下/home/fulong/Workspace/Spark/SparkProj-1.0.jar
最後一步:通過spark-submit提交程式到Spark叢集[email protected]:~/Spark/spark-1.3.0-bin-hadoop2.4$ ./bin/spark-submit --class edu.berkeley.SparkProj.SimpleApp --master yarn-client /home/fulong/Workspace/Spark/SparkProj-1.0.jar
運行結果,包含a的有60行,包含b的有29行:
【甘道夫】Java Hello World on Spark