Big data basics-oozie (2) FAQs

Source: Internet
Author: User
Tags unpack
1. How does oozie view task logs?

The oozie job ID can be used to view detailed process information. The command is as follows:

Oozie job-Info0012077-180830142722522-oozie-hado-w

 

The process details are as follows:

Job ID:0012077-180830142722522-oozie-hado-w

Certificate ------------------------------------------------------------------------------------------------------------------------------------

Workflow name: $ workflow_name

APP path: HDFS: // $ hdfs_name/oozie/WF/$ workflow_name.xml

Status: killed

Run: 0

User: hadoop

GROUP :-

Created: GMT

Started: GMT

Last modified: GMT

Ended: GMT

Coordaction ID :-

 

Actions

Certificate ------------------------------------------------------------------------------------------------------------------------------------

ID status ext ID ext status err code

Certificate ------------------------------------------------------------------------------------------------------------------------------------

[Email protected]: Start: OK-

Certificate ------------------------------------------------------------------------------------------------------------------------------------

[Email protected] $ action_name error application_1537326594090_5663Failed/killedJa018

Certificate ------------------------------------------------------------------------------------------------------------------------------------

[Email protected] OK-OK e0729

Certificate ------------------------------------------------------------------------------------------------------------------------------------

 

The failed task is defined as follows:

<Action name = "$ action_name">

<Spark xmlns = "URI: oozie: spark-Action: 0.1">

<Job-tracker >$ {job_tracker} </job-tracker>

<Name-node >$ {name_node} </name-node>

<Master >$ {jobmaster} </Master>

<Mode >$ {jobmode} </mode>

<Name >$ {jobname} </Name>

<Class >$ {jarclass} </class>

<Jar >$ {jarpath} </jar>

<Spark-opts >$ {sparkopts} </spark-opts>

</Spark>

 

The application corresponding to application_1537326594090_5663 is shown in yarn as follows:

Application_1537326594090_5663 hadoop oozie: Launcher: t = spark: W = $ workflow_name: A = $ action_name: Id = 0012077-180830142722522-oozie-hado-w oozie Launcher

 

View application_1537326594090_5663 log discovery

10:52:05, 237 [main] info org. Apache. hadoop. yarn. Client. API. impl. yarnclientimpl-submitted application application_1537326594090_5664

 

The application corresponding to application_1537326594090_5664 on yarn is as follows:

Application_1537326594090_5664 hadoop $ app_name spark

 

That is, application_1537326594090_5664 is the spark task corresponding to the action. Why is there another step in the middle,

In brief, when oozie executes an action, that is, actionexecutor (the main subclass is javaactionexecutor, and actions such as hive and spark are subclasses of this class). javaactionexecutor will first submit a launchermapper (MAP task) in yarn, launchermain is executed (the specific action is its subclass, such as javamain and sparkmain), spark runs sparkmain, and org. apache. spark. deploy. sparksubmit to submit a task

 

2. How do I add dependencies when oozie submits a spark task?

How to add dependencies for spark tasks:

If it runs in local mode, you can use -- jars to add dependencies;

If it is run in yarn mode, you can use spark. yarn. jars to add dependencies;

These two methods do not work in oozie. First, you can neither run them in local mode on oozie, or use spark. yarn. jars to configure them.

View launchermapper logs (see Issue 1 above)

 

Spark version 2.1.1

Spark action main class: org. Apache. Spark. Deploy. sparksubmit

 

Oozie spark action configuration

========================================================== ======================================

...

-- Conf

Spark. yarn. JARS = HDFS: // $ hdfs_name/spark/sparkjars/*. Jar

-- Conf

Spark. yarn. JARS = HDFS: // $ hdfs_name/oozie/share/lib/lib_20180801161138/spark/spark-yarn_2.11-2.1.1.jar

 

It can be seen that oozie will add a new spark. yarn. JARS configuration. If two identical keys are provided, what will spark do?

 

Org. Apache. Spark. Deploy. sparksubmit

Val appargs = new sparksubmitarguments (ARGs)

 

Org. Apache. Spark. launcher. sparksubmitoptionparser

If (! Handle (name, value )){

 

Org. Apache. Spark. Deploy. sparksubmitarguments

Override protected def handle (OPT: String, value: string): Boolean = {

...

Case conf =>

Value. Split ("=", 2). toseq match {

Case seq (K, v) => sparkproperties (K) = V

Case _ => sparksubmit. printerrorandexit (S "Spark config without '=': $ value ")

}

 

It can be seen that it will be overwritten directly. The last configuration, that is, the oozie configuration, is used instead of the configuration provided by the application. In this way, the application needs to package the special dependencies into the application jar, use Maven-assembly-plugin to configure <dependencysets> <dependencyset> <dependdes> <include>. The detailed configuration is as follows:

 

<Assembly xmlns = "http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"

Xmlns: xsi = "http://www.w3.org/2001/XMLSchema-instance"

Xsi: schemalocation = "http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">

<! -- Todo: A JarJar format wocould be better -->

<ID> jar-with-dependencies </ID>

<Formats>

<Format> jar </format>

</Formats>

<Includebasedirectory> false </includebasedirectory>

<Dependencysets>

<Dependencyset>

<Outputdirectory>/</outputdirectory>

<Useprojectartifact> true </useprojectartifact>

<Unpack> true </unpack>

<Scope> runtime </scope>

<Shortdes>

<Include> redis. Clients: Jedis </include>

<Include> org. Apache. commons: commons-pool2 </include>

</Shortdes>

</Dependencyset>

</Dependencysets>

</Assembly>

 

Here you just copy the default jar-with-dependencies.xml content to add the des configuration;

 

Big data basics-oozie (2) FAQs

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.