1. How does oozie view task logs?
The oozie job ID can be used to view detailed process information. The command is as follows:
Oozie job-Info0012077-180830142722522-oozie-hado-w
The process details are as follows:
Job ID:0012077-180830142722522-oozie-hado-w
Certificate ------------------------------------------------------------------------------------------------------------------------------------
Workflow name: $ workflow_name
APP path: HDFS: // $ hdfs_name/oozie/WF/$ workflow_name.xml
Status: killed
Run: 0
User: hadoop
GROUP :-
Created: GMT
Started: GMT
Last modified: GMT
Ended: GMT
Coordaction ID :-
Actions
Certificate ------------------------------------------------------------------------------------------------------------------------------------
ID status ext ID ext status err code
Certificate ------------------------------------------------------------------------------------------------------------------------------------
[Email protected]: Start: OK-
Certificate ------------------------------------------------------------------------------------------------------------------------------------
[Email protected] $ action_name error application_1537326594090_5663Failed/killedJa018
Certificate ------------------------------------------------------------------------------------------------------------------------------------
[Email protected] OK-OK e0729
Certificate ------------------------------------------------------------------------------------------------------------------------------------
The failed task is defined as follows:
<Action name = "$ action_name">
<Spark xmlns = "URI: oozie: spark-Action: 0.1">
<Job-tracker >$ {job_tracker} </job-tracker>
<Name-node >$ {name_node} </name-node>
<Master >$ {jobmaster} </Master>
<Mode >$ {jobmode} </mode>
<Name >$ {jobname} </Name>
<Class >$ {jarclass} </class>
<Jar >$ {jarpath} </jar>
<Spark-opts >$ {sparkopts} </spark-opts>
</Spark>
The application corresponding to application_1537326594090_5663 is shown in yarn as follows:
Application_1537326594090_5663 hadoop oozie: Launcher: t = spark: W = $ workflow_name: A = $ action_name: Id = 0012077-180830142722522-oozie-hado-w oozie Launcher
View application_1537326594090_5663 log discovery
10:52:05, 237 [main] info org. Apache. hadoop. yarn. Client. API. impl. yarnclientimpl-submitted application application_1537326594090_5664
The application corresponding to application_1537326594090_5664 on yarn is as follows:
Application_1537326594090_5664 hadoop $ app_name spark
That is, application_1537326594090_5664 is the spark task corresponding to the action. Why is there another step in the middle,
In brief, when oozie executes an action, that is, actionexecutor (the main subclass is javaactionexecutor, and actions such as hive and spark are subclasses of this class). javaactionexecutor will first submit a launchermapper (MAP task) in yarn, launchermain is executed (the specific action is its subclass, such as javamain and sparkmain), spark runs sparkmain, and org. apache. spark. deploy. sparksubmit to submit a task
2. How do I add dependencies when oozie submits a spark task?
How to add dependencies for spark tasks:
If it runs in local mode, you can use -- jars to add dependencies;
If it is run in yarn mode, you can use spark. yarn. jars to add dependencies;
These two methods do not work in oozie. First, you can neither run them in local mode on oozie, or use spark. yarn. jars to configure them.
View launchermapper logs (see Issue 1 above)
Spark version 2.1.1
Spark action main class: org. Apache. Spark. Deploy. sparksubmit
Oozie spark action configuration
========================================================== ======================================
...
-- Conf
Spark. yarn. JARS = HDFS: // $ hdfs_name/spark/sparkjars/*. Jar
-- Conf
Spark. yarn. JARS = HDFS: // $ hdfs_name/oozie/share/lib/lib_20180801161138/spark/spark-yarn_2.11-2.1.1.jar
It can be seen that oozie will add a new spark. yarn. JARS configuration. If two identical keys are provided, what will spark do?
Org. Apache. Spark. Deploy. sparksubmit
Val appargs = new sparksubmitarguments (ARGs)
Org. Apache. Spark. launcher. sparksubmitoptionparser
If (! Handle (name, value )){
Org. Apache. Spark. Deploy. sparksubmitarguments
Override protected def handle (OPT: String, value: string): Boolean = {
...
Case conf =>
Value. Split ("=", 2). toseq match {
Case seq (K, v) => sparkproperties (K) = V
Case _ => sparksubmit. printerrorandexit (S "Spark config without '=': $ value ")
}
It can be seen that it will be overwritten directly. The last configuration, that is, the oozie configuration, is used instead of the configuration provided by the application. In this way, the application needs to package the special dependencies into the application jar, use Maven-assembly-plugin to configure <dependencysets> <dependencyset> <dependdes> <include>. The detailed configuration is as follows:
<Assembly xmlns = "http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"
Xmlns: xsi = "http://www.w3.org/2001/XMLSchema-instance"
Xsi: schemalocation = "http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">
<! -- Todo: A JarJar format wocould be better -->
<ID> jar-with-dependencies </ID>
<Formats>
<Format> jar </format>
</Formats>
<Includebasedirectory> false </includebasedirectory>
<Dependencysets>
<Dependencyset>
<Outputdirectory>/</outputdirectory>
<Useprojectartifact> true </useprojectartifact>
<Unpack> true </unpack>
<Scope> runtime </scope>
<Shortdes>
<Include> redis. Clients: Jedis </include>
<Include> org. Apache. commons: commons-pool2 </include>
</Shortdes>
</Dependencyset>
</Dependencysets>
</Assembly>
Here you just copy the default jar-with-dependencies.xml content to add the des configuration;
Big data basics-oozie (2) FAQs