【甘道夫】Java Hello World on Spark

來源:互聯網
上載者:User

標籤:

引言通過Java編寫Spark應用程式的HelloWorld,雖然有點寒磣,沒用Scala簡潔明了,但還是得嘗試和記錄下。 環境Windows7Eclipse+MavenJdk1.7Ubuntu 14.04 步驟一:在eclipse中建立maven工程,過程很簡單,不詳述。pom檔案為:<project xmlns= "http://maven.apache.org/POM/4.0.0" xmlns:xsi= "http://www.w3.org/2001/XMLSchema-instance"  xsi:schemaLocation= "http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">  <groupId>edu.berkeley</groupId>  <artifactId>SparkProj</artifactId>  <modelVersion>4.0.0</modelVersion>  <name>Spark Project</name>  <packaging>jar</packaging>  <version>1.0</version>  <dependencies>    <dependency> <!-- Spark dependency -->      <groupId>org.apache.spark</groupId>      <artifactId>spark-core_2.10</artifactId>      <version>1.3.0</version>    </dependency>  </dependencies></project>  
步驟二:編寫核心邏輯代碼功能很簡單,統計叢集中Spark根目錄下README.md中有多少行包含a,多少行包含b。說實話該功能用Scala編寫十分簡單,Java實在是噁心。
package edu.berkeley.SparkProj;
/* SimpleApp.java */import org.apache.spark.api.java.*;import org.apache.spark.SparkConf;import org.apache.spark.api.java.function.Function;
public class SimpleApp {  public static void main(String[] args) {    String logFile = "file:///home/fulong/Spark/spark-1.3.0-bin-hadoop2.4/README.md"; // Should be some file on your system    SparkConf conf = new SparkConf().setAppName("Simple Application");    JavaSparkContext sc = new JavaSparkContext(conf);    JavaRDD<String> logData = sc.textFile(logFile).cache();
    long numAs = logData.filter(new Function<String, Boolean>() {      public Boolean call(String s) { return s.contains("a"); }    }).count();
    long numBs = logData.filter(new Function<String, Boolean>() {      public Boolean call(String s) { return s.contains("b"); }    }).count();
    System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs);  }}
步驟三:Windows下CMD中進入到工程根目錄下打包D:\WorkSpace2015\SparkProj>mvn package產生jar包:D:\WorkSpace2014\SparkProj\target\SparkProj-1.0.jar

步驟四:將該包通過WinSCP工具拷貝到叢集某節點的目錄下/home/fulong/Workspace/Spark/SparkProj-1.0.jar

最後一步:通過spark-submit提交程式到Spark叢集[email protected]:~/Spark/spark-1.3.0-bin-hadoop2.4$ ./bin/spark-submit --class edu.berkeley.SparkProj.SimpleApp --master yarn-client /home/fulong/Workspace/Spark/SparkProj-1.0.jar

運行結果,包含a的有60行,包含b的有29行:

【甘道夫】Java Hello World on Spark

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.