標籤:兩種 strong creat config ons ase api sed run
1 oozie如何查看任務日誌?
通過oozie job id可以查看流程詳細資料,命令如下:
oozie job -info 0012077-180830142722522-oozie-hado-W
流程詳細資料如下:
Job ID : 0012077-180830142722522-oozie-hado-W
------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : $workflow_name
App Path : hdfs://$hdfs_name/oozie/wf/$workflow_name.xml
Status : KILLED
Run : 0
User : hadoop
Group : -
Created : 2018-09-25 02:51 GMT
Started : 2018-09-25 02:51 GMT
Last Modified : 2018-09-25 02:53 GMT
Ended : 2018-09-25 02:53 GMT
CoordAction ID: -
Actions
------------------------------------------------------------------------------------------------------------------------------------
ID Status Ext ID Ext Status Err Code
------------------------------------------------------------------------------------------------------------------------------------
[email protected]:start: OK - OK -
------------------------------------------------------------------------------------------------------------------------------------
[email protected]$action_name ERROR application_1537326594090_5663FAILED/KILLEDJA018
------------------------------------------------------------------------------------------------------------------------------------
[email protected] OK - OK E0729
------------------------------------------------------------------------------------------------------------------------------------
失敗的任務定義如下
<action name="$action_name">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${job_tracker}</job-tracker>
<name-node>${name_node}</name-node>
<master>${jobmaster}</master>
<mode>${jobmode}</mode>
<name>${jobname}</name>
<class>${jarclass}</class>
<jar>${jarpath}</jar>
<spark-opts>${sparkopts}</spark-opts>
</spark>
在yarn上可以看到application_1537326594090_5663對應的application如下
application_1537326594090_5663 hadoop oozie:launcher:T=spark:W=$workflow_name:A=$action_name:ID=0012077-180830142722522-oozie-hado-W Oozie Launcher
查看application_1537326594090_5663日誌發現
2018-09-25 10:52:05,237 [main] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1537326594090_5664
yarn上application_1537326594090_5664對應的application如下
application_1537326594090_5664 hadoop $app_name SPARK
即application_1537326594090_5664才是Action對應的spark任務,為什麼中間會多一步,
簡要來說,Oozie執行Action時,即ActionExecutor(最主要的子類是JavaActionExecutor,hive、spark等action都是這個類的子類),JavaActionExecutor首先會提交一個LauncherMapper(map任務)到yarn,其中會執行LauncherMain(具體的action是其子類,比如JavaMain、SparkMain等),spark任務會執行SparkMain,在SparkMain中會調用org.apache.spark.deploy.SparkSubmit來提交任務
2 oozie提交spark任務如何添加依賴?
spark任務添加依賴的方式:
如果是local方式運行,可以通過--jars來添加依賴;
如果是yarn方式運行,可以通過spark.yarn.jars來添加依賴;
這兩種方式在oozie上都行不通,首先oozie上沒辦法也不應該通過local運行,其次通過spark.yarn.jars方式配置你會發現根本不會生效,來看為什麼
查看LauncherMapper的日誌(可見上述問題1)
Spark Version 2.1.1
Spark Action Main class : org.apache.spark.deploy.SparkSubmit
Oozie Spark action configuration
=================================================================
...
--conf
spark.yarn.jars=hdfs://$hdfs_name/spark/sparkjars/*.jar
--conf
spark.yarn.jars=hdfs://$hdfs_name/oozie/share/lib/lib_20180801121138/spark/spark-yarn_2.11-2.1.1.jar
可見oozie會自己添加一個新的spark.yarn.jars配置,如果提供兩個相同的key,spark會如何處理
org.apache.spark.deploy.SparkSubmit
val appArgs = new SparkSubmitArguments(args)
org.apache.spark.launcher.SparkSubmitOptionParser
if (!handle(name, value)) {
org.apache.spark.deploy.SparkSubmitArguments
override protected def handle(opt: String, value: String): Boolean = {
...
case CONF =>
value.split("=", 2).toSeq match {
case Seq(k, v) => sparkProperties(k) = v
case _ => SparkSubmit.printErrorAndExit(s"Spark config without ‘=‘: $value")
}
可見會直接覆蓋,使用最後一個配置,即oozie的配置,而不是應用自己提供的配置,這樣就需要應用自己將特殊依賴打包到應用jar中,具體使用maven的maven-assembly-plugin,配置其中的<dependencySets><dependencySet><includes><include>,詳細配置如下:
<assembly xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">
<!-- TODO: a jarjar format would be better -->
<id>jar-with-dependencies</id>
<formats>
<format>jar</format>
</formats>
<includeBaseDirectory>false</includeBaseDirectory>
<dependencySets>
<dependencySet>
<outputDirectory>/</outputDirectory>
<useProjectArtifact>true</useProjectArtifact>
<unpack>true</unpack>
<scope>runtime</scope>
<includes>
<include>redis.clients:jedis</include>
<include>org.apache.commons:commons-pool2</include>
</includes>
</dependencySet>
</dependencySets>
</assembly>
這裡只是將預設提供的jar-with-dependencies.xml內容拷貝出來添加includes配置;
大資料基礎之Oozie(2)常見問題