大資料基礎之Oozie(2)常見問題

來源:互聯網
上載者:User

標籤:兩種   strong   creat   config   ons   ase   api   sed   run   

1 oozie如何查看任務日誌?

通過oozie job id可以查看流程詳細資料,命令如下:

oozie job -info 0012077-180830142722522-oozie-hado-W

 

流程詳細資料如下:

Job ID : 0012077-180830142722522-oozie-hado-W

------------------------------------------------------------------------------------------------------------------------------------

Workflow Name :  $workflow_name

App Path      : hdfs://$hdfs_name/oozie/wf/$workflow_name.xml

Status        : KILLED

Run           : 0

User          : hadoop

Group         : -

Created       : 2018-09-25 02:51 GMT

Started       : 2018-09-25 02:51 GMT

Last Modified : 2018-09-25 02:53 GMT

Ended         : 2018-09-25 02:53 GMT

CoordAction ID: -

 

Actions

------------------------------------------------------------------------------------------------------------------------------------

ID                                                                            Status    Ext ID                 Ext Status Err Code 

------------------------------------------------------------------------------------------------------------------------------------

[email protected]:start:                                  OK        -                      OK         -        

------------------------------------------------------------------------------------------------------------------------------------

[email protected]$action_name                    ERROR     application_1537326594090_5663FAILED/KILLEDJA018    

------------------------------------------------------------------------------------------------------------------------------------

[email protected]                                     OK        -                      OK         E0729    

------------------------------------------------------------------------------------------------------------------------------------

 

失敗的任務定義如下

<action name="$action_name"> 

        <spark xmlns="uri:oozie:spark-action:0.1"> 

            <job-tracker>${job_tracker}</job-tracker> 

            <name-node>${name_node}</name-node> 

            <master>${jobmaster}</master> 

            <mode>${jobmode}</mode> 

            <name>${jobname}</name> 

            <class>${jarclass}</class> 

            <jar>${jarpath}</jar> 

            <spark-opts>${sparkopts}</spark-opts> 

        </spark>

 

在yarn上可以看到application_1537326594090_5663對應的application如下

application_1537326594090_5663       hadoop oozie:launcher:T=spark:W=$workflow_name:A=$action_name:ID=0012077-180830142722522-oozie-hado-W         Oozie Launcher

 

查看application_1537326594090_5663日誌發現

2018-09-25 10:52:05,237 [main] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl  - Submitted application application_1537326594090_5664

 

yarn上application_1537326594090_5664對應的application如下

application_1537326594090_5664       hadoop    $app_name SPARK

 

即application_1537326594090_5664才是Action對應的spark任務,為什麼中間會多一步,

簡要來說,Oozie執行Action時,即ActionExecutor(最主要的子類是JavaActionExecutor,hive、spark等action都是這個類的子類),JavaActionExecutor首先會提交一個LauncherMapper(map任務)到yarn,其中會執行LauncherMain(具體的action是其子類,比如JavaMain、SparkMain等),spark任務會執行SparkMain,在SparkMain中會調用org.apache.spark.deploy.SparkSubmit來提交任務

 

2 oozie提交spark任務如何添加依賴?

spark任務添加依賴的方式:

如果是local方式運行,可以通過--jars來添加依賴;

如果是yarn方式運行,可以通過spark.yarn.jars來添加依賴;

這兩種方式在oozie上都行不通,首先oozie上沒辦法也不應該通過local運行,其次通過spark.yarn.jars方式配置你會發現根本不會生效,來看為什麼

查看LauncherMapper的日誌(可見上述問題1)

 

Spark Version 2.1.1

Spark Action Main class        : org.apache.spark.deploy.SparkSubmit

 

Oozie Spark action configuration

=================================================================

...

                    --conf

                    spark.yarn.jars=hdfs://$hdfs_name/spark/sparkjars/*.jar

                    --conf

                    spark.yarn.jars=hdfs://$hdfs_name/oozie/share/lib/lib_20180801121138/spark/spark-yarn_2.11-2.1.1.jar

 

可見oozie會自己添加一個新的spark.yarn.jars配置,如果提供兩個相同的key,spark會如何處理

 

org.apache.spark.deploy.SparkSubmit

    val appArgs = new SparkSubmitArguments(args)

 

org.apache.spark.launcher.SparkSubmitOptionParser

        if (!handle(name, value)) {

 

org.apache.spark.deploy.SparkSubmitArguments

  override protected def handle(opt: String, value: String): Boolean = {

  ...

      case CONF =>

        value.split("=", 2).toSeq match {

          case Seq(k, v) => sparkProperties(k) = v

          case _ => SparkSubmit.printErrorAndExit(s"Spark config without ‘=‘: $value")

        }

 

可見會直接覆蓋,使用最後一個配置,即oozie的配置,而不是應用自己提供的配置,這樣就需要應用自己將特殊依賴打包到應用jar中,具體使用maven的maven-assembly-plugin,配置其中的<dependencySets><dependencySet><includes><include>,詳細配置如下:

 

<assembly xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"

          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

          xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">

    <!-- TODO: a jarjar format would be better -->

    <id>jar-with-dependencies</id>

    <formats>

        <format>jar</format>

    </formats>

    <includeBaseDirectory>false</includeBaseDirectory>

    <dependencySets>

        <dependencySet>

            <outputDirectory>/</outputDirectory>

            <useProjectArtifact>true</useProjectArtifact>

            <unpack>true</unpack>

            <scope>runtime</scope>

            <includes>

                <include>redis.clients:jedis</include>

                <include>org.apache.commons:commons-pool2</include>

            </includes>

        </dependencySet>

    </dependencySets>

</assembly>

 

這裡只是將預設提供的jar-with-dependencies.xml內容拷貝出來添加includes配置;

 

大資料基礎之Oozie(2)常見問題

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.