Big data basics-oozie (2) FAQs

Last Update:2018-11-02 Source: Internet

Author: User

Tags unpack

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. How does oozie view task logs?

The oozie job ID can be used to view detailed process information. The command is as follows:

Oozie job-Info0012077-180830142722522-oozie-hado-w

The process details are as follows:

Job ID:0012077-180830142722522-oozie-hado-w

Certificate ------------------------------------------------------------------------------------------------------------------------------------

Workflow name: $ workflow_name

APP path: HDFS: // $ hdfs_name/oozie/WF/$ workflow_name.xml

Status: killed

Run: 0

User: hadoop

GROUP :-

Created: GMT

Started: GMT

Last modified: GMT

Ended: GMT

Coordaction ID :-

Actions

Certificate ------------------------------------------------------------------------------------------------------------------------------------

ID status ext ID ext status err code

Certificate ------------------------------------------------------------------------------------------------------------------------------------

[Email protected]: Start: OK-

Certificate ------------------------------------------------------------------------------------------------------------------------------------

[Email protected] $ action_name error application_1537326594090_5663Failed/killedJa018

Certificate ------------------------------------------------------------------------------------------------------------------------------------

[Email protected] OK-OK e0729

Certificate ------------------------------------------------------------------------------------------------------------------------------------

The failed task is defined as follows:

<Job-tracker >$ {job_tracker} </job-tracker>

<Name-node >$ {name_node} </name-node>

<Master >$ {jobmaster} </Master>

<Mode >$ {jobmode} </mode>

<Name >$ {jobname} </Name>

<Class >$ {jarclass} </class>

<Jar >$ {jarpath} </jar>

<Spark-opts >$ {sparkopts} </spark-opts>

</Spark>

The application corresponding to application_1537326594090_5663 is shown in yarn as follows:

Application_1537326594090_5663 hadoop oozie: Launcher: t = spark: W = $ workflow_name: A = $ action_name: Id = 0012077-180830142722522-oozie-hado-w oozie Launcher

View application_1537326594090_5663 log discovery

10:52:05, 237 [main] info org. Apache. hadoop. yarn. Client. API. impl. yarnclientimpl-submitted application application_1537326594090_5664

The application corresponding to application_1537326594090_5664 on yarn is as follows:

Application_1537326594090_5664 hadoop $ app_name spark

That is, application_1537326594090_5664 is the spark task corresponding to the action. Why is there another step in the middle,

In brief, when oozie executes an action, that is, actionexecutor (the main subclass is javaactionexecutor, and actions such as hive and spark are subclasses of this class). javaactionexecutor will first submit a launchermapper (MAP task) in yarn, launchermain is executed (the specific action is its subclass, such as javamain and sparkmain), spark runs sparkmain, and org. apache. spark. deploy. sparksubmit to submit a task

2. How do I add dependencies when oozie submits a spark task?

How to add dependencies for spark tasks:

If it runs in local mode, you can use -- jars to add dependencies;

If it is run in yarn mode, you can use spark. yarn. jars to add dependencies;

These two methods do not work in oozie. First, you can neither run them in local mode on oozie, or use spark. yarn. jars to configure them.

View launchermapper logs (see Issue 1 above)

Spark version 2.1.1

Spark action main class: org. Apache. Spark. Deploy. sparksubmit

Oozie spark action configuration

========================================================== ======================================

...

-- Conf

Spark. yarn. JARS = HDFS: // $ hdfs_name/spark/sparkjars/*. Jar

-- Conf

Spark. yarn. JARS = HDFS: // $ hdfs_name/oozie/share/lib/lib_20180801161138/spark/spark-yarn_2.11-2.1.1.jar

It can be seen that oozie will add a new spark. yarn. JARS configuration. If two identical keys are provided, what will spark do?

Org. Apache. Spark. Deploy. sparksubmit

Val appargs = new sparksubmitarguments (ARGs)

Org. Apache. Spark. launcher. sparksubmitoptionparser

If (! Handle (name, value )){

Org. Apache. Spark. Deploy. sparksubmitarguments

Override protected def handle (OPT: String, value: string): Boolean = {

...

Case conf =>

Value. Split ("=", 2). toseq match {

Case seq (K, v) => sparkproperties (K) = V

Case _ => sparksubmit. printerrorandexit (S "Spark config without '=': $ value ")

}

It can be seen that it will be overwritten directly. The last configuration, that is, the oozie configuration, is used instead of the configuration provided by the application. In this way, the application needs to package the special dependencies into the application jar, use Maven-assembly-plugin to configure <dependencysets> <dependencyset> <dependdes> <include>. The detailed configuration is as follows:

<Assembly xmlns = "http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"

Xmlns: xsi = "http://www.w3.org/2001/XMLSchema-instance"

Xsi: schemalocation = "http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">

<! -- Todo: A JarJar format wocould be better -->

<ID> jar-with-dependencies </ID>

</Formats>

<Includebasedirectory> false </includebasedirectory>

<Scope> runtime </scope>

<Shortdes>

<Include> redis. Clients: Jedis </include>

<Include> org. Apache. commons: commons-pool2 </include>

</Shortdes>

</Dependencyset>

</Dependencysets>

</Assembly>

Here you just copy the default jar-with-dependencies.xml content to add the des configuration;

Big data basics-oozie (2) FAQs

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More