官網:
http://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/launcher/package-summary.html
參照這個例子我寫出了launcher,可以用java 命令列執行spark編寫的業務程式
今天又搜尋了一下看到一篇文章,以下是網友的網上的原文:
Sometimes we need to start our spark application from the another scala/java application. So we can use SparkLauncher. we have an example in which we make spark application and run it with another scala application.
Let see our spark application code.
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
object SparkApp extends App{
val conf=new SparkConf().setMaster("local[*]").setAppName("spark-app")
val sc=new SparkContext(conf)
val rdd=sc.parallelize(Array(2,3,2,1))
rdd.saveAsTextFile("result")
sc.stop()
}
This is our simple spark application, make a jar of this application using sbt assembly, now we make a scala application through which we start this spark application as follows:
import org.apache.spark.launcher.SparkLauncher
object Launcher extends App {
val spark = new SparkLauncher()
.setSparkHome("/home/knoldus/spark-1.4.0-bin-hadoop2.6")
.setAppResource("/home/knoldus/spark_launcher-assembly-1.0.jar")
.setMainClass("SparkApp")
.setMaster("local[*]")
.launch();
spark.waitFor();
}
In the above code we use SparkLauncher object and set values for its like
setSparkHome(“/home/knoldus/spark-1.4.0-bin-hadoop2.6”) is use to set spark home which is use internally to call spark submit.
.setAppResource(“/home/knoldus/spark_launcher-assembly-1.0.jar”) is use to specify jar of our spark application.
.setMainClass(“SparkApp”) the entry point of the spark program i.e driver program.
.setMaster(“local[*]”) set the address of master where its start here now we run it on loacal machine.
.launch() is simply start our spark application.
Its a minimal requirement you can also set many other configurations like pass arguments, add jar , set configurations etc.
For source code you can check out following git repo:
Spark_laucher is our spark application
launcher_app is our scala application which start spark application
Change path according to you and make a jar of Spark_laucher, run launcher_app and see result RDD in this directory as a result of spark application because we simple save it as a text file.
https://github.com/phalodi/Spark-launcher
大概的意思是,你寫一個myspark.jar這樣的檔案,這個檔案按照正常的spark流程寫。
然後寫一個launcher,我的理解就是寫一個類似spark-class這樣的程式啟動或者調用你上面寫的那個myspark.jar檔案
關鍵地方:setAppResource,setMainClass,setMaster,分別設定你的myspark.jar和這個jar中要啟動並執行類名,還有啟動並執行模式。經過測試setMaster好像只支援“yarn-client”.(實驗中測試的結果,猜測因為My Code裡面有交換的東西,所以這裡得到只支援yanr-client.理論上支援yarn-cluster模式。可能是我的程式問題)
用java使用的方法。
java -jar spakr-launcher 就可以不在使用指令碼(spark-submit這樣的方式)運行你的myspark.jar檔案了(這樣的方式運行,看不到輸出到螢幕的結果,其實官網上有這樣的例子,還有輸出到螢幕的例子,好像用的outputstream,忘記了。。。。。)
為什麼使用這樣的方式:因為myspark業務程式需要結合web容器運行,如果能在java的環境運行,spark運行在web容器裡。(還沒有測試.......)