Oozie Configuration of Hadoop

Source: Internet
Author: User

Description

Tasks performed in Hadoop sometimes require multiple map/reduce jobs to be connected together in order to achieve the goal. In the Hadoop ecosystem,Oozie allows us to combine multiple map/reduce jobs into a single logical unit of work, To accomplish larger tasks.

Principle

Oozie is a java Web application that runs in the Java servlet container- the Tomcat --in, and use the database to store the following:

Workflow definition

Currently running workflow instances, including the status and variables of the instance

The Oozie workflow is a set of actions placed in a control-dependent DAG(directed acyclic graph Direct acyclic graph) (for example,Hadoop map/reduce jobs,Pig jobs, and so on) that specify the order in which the actions are executed. We will use HPDL(an XML Process Definition language) to describe this diagram.

HPDLis a very concise language that uses only a handful of process control and action nodes. The control node defines the process of execution and includes the starting and ending points of the workflow (Start,Endand thefailnode) and the mechanism that controls the execution path of the workflow (decision,Forkand theJoinnode). Action nodes are mechanisms through which workflows trigger the execution of calculations or processing tasks. Oozieprovides support for the following types of actions:Hadoop map-reduce,Hadoopfile System,Pig,Javaand theOoziethe child workflow (SSHthe action has been taken fromOozie schema 0.2removed from the later version).

All compute and processing tasks triggered by the action node are notOozie-they are made up ofHadoopof theMap/reducethe framework executes. This approach allowsOoziecan support the existingHadoopa mechanism for load balancing, disaster recovery. These tasks are performed primarily asynchronously (only file system action exceptions, which are synchronous). This means that for most work flows to trigger the type of calculation or processing task, wait until the workflow action transitions to the next node of the workflow until the calculation or processing task is finished before it can continue. OozieThere are two different ways to detect whether a calculation or processing task is completed, that is, callbacks and polling. WhenOoziewhen a calculation or processing task is started, it provides a unique callback for the taskURL, and the task sends a notification when it is complete to a specificURL. The task cannot trigger a callbackURL(perhaps for any reason, such as a network flash), or when the type of the task fails to trigger a callback at completionURLthe time,Ooziethere is a mechanism for polling a calculation or processing task to ensure that the task can be completed.

Oozie Workflows can be parameterized (using variables like ${inputdir} in the workflow definition ). When submitting a workflow operation, we must supply the parameter value. If it is properly parameterized (say, using a different output directory), then many of the same workflow actions can be concurrent.

Some workflows are triggered as needed, but in most cases it is necessary to run them based on certain time periods and/or data availability and/or external events. Oozie Coordination System ( Coordinator System allows users to define workflow execution plans based on these parameters. the Oozie coordinator allows us to model workflow execution triggers in predicates, which can point to data, events, and/or external events. The workflow job starts when the predicate is satisfied.

Often we also need to connect workflow operations with timed runs, but with different intervals. The output of multiple subsequently running workflows becomes the input for the next workflow. Connecting these workflows together allows the system to refer to it as a conduit for data application. Oozie The Coordinator supports the creation of such a data application pipeline .

Installation

Installation Environment Hadoop2.6.0 maven3.3.9 pig0.15.0 jdk1.8 MySQL

1. Unzip to the app directory tar -zxf oozie-4.2.0-c app/

Compiling MVN clean Package assembly:single-p Hadoop-2-dskiptests

2. Unzip the compiled file tar -zxf oozie-4.2.0-distro.tar.gz ~/app/oozie/

3. Modify the HDFs configuration

Modify the core-site.xml under Hadoop

<property>

<name>hadoop.proxyuser. [user].hosts</name>

<value>*</value>

</property>

<property>

<name>hadoop.proxyuser. [user].groups</name>

<value>*</value>

</property>

[USER] need to change back to start Oozie Tomcat of users

Do not restart the Hadoop cluster and make the configuration effective

HDFs dfsadmin-refreshsuperusergroupsconfiguration

Yarn Rmadmin-refreshsuperusergroupsconfiguration

4, configuration Oozie

A. create a new libext directory under the oozie-4.2.0 directory and copy the Ext-2.2.zip to the directory; Hadoop -related jar packages to this directory include JDBC 's jar Bar

CP $HADOOP _home/share/hadoop/*/*.jar libext/

CP $HADOOP _home/share/hadoop/*/lib/*.jar libext/

Remove the Hadoop and Tomcat conflict jar Packages

MV Servlet-api-2.5.jar Servlet-api-2.5.jar.bak

MV Jsp-api-2.1.jar Jsp-api-2.1.jar.bak

MV Jasper-compiler-5.5.23.jar Jasper-compiler-5.5.23.jar.bak

MV Jasper-runtime-5.5.23.jar Jasper-runtime-5.5.23.jar.bak

B, configure database connection, file is conf/oozie-site.xml

<property>

<name>oozie.service.JPAService.create.db.schema</name>

<value>true</value>

</property>

<property>

<name>oozie.service.JPAService.jdbc.driver</name>

<value>com.mysql.jdbc.Driver</value>

</property>

<property>

<name>oozie.service.JPAService.jdbc.url</name>

<value>jdbc:mysql://node4:3306/oozie?createDatabaseIfNotExist=true</value>

</property>

<property>

<name>oozie.service.JPAService.jdbc.username</name>

<value>root</value>

</property>

<property>

<name>oozie.service.JPAService.jdbc.password</name>

<value>root</value>

</property>

<property>

<name>oozie.service.HadoopAccessorService.hadoop.configurations</name>

<value>*=/usr/hadoop/hadoop-2.6.0/etc/hadoop</value>

</property>

C, pre-boot initialization

A. fight A war bag

Bin/oozie-setup.sh Prepare-war

B. initializing a database

bin/ooziedb.sh Create-sqlfile Oozie.sql-run

C. Modify The oozie-4.2.0/oozie-server/conf/server.xml file and comment out the following record

<!--<listener classname= "Org.apache.catalina.mbeans.ServerLifecycleListener"/>-->

D. upload the jar Package

bin/oozie-setup.sh sharelib Create-fs hdfs://master:9000

5. Start

bin/oozied.sh start

Case

MR Task Flow

1.

A, decompression oozie-examples.tar.gz to oozie-4.2.0

B, Vim examples/apps/map-reduce/job.properties

namenode=hdfs://master:9000

jobtracker=master:8032

Queuename=default

Examplesroot=examples

Oozie.wf.application.path=${namenode}/user/${user.name}/${examplesroot}/apps/map-reduce

Outputdir=map-reduce

C, modify Vim Examples/apps/map-reduce/workflow.xml

<property>

<name>mapred.map.tasks</name>

<value>2</value>

</property>

D. Submit a Task

Oozie Job-oozie Http://master:11000/oozie-config Examples/apps/map-reduce/job.properties-run

E, view

http://master:11000/oozie/

Http://master:8088/cluster

2.

Oozie Configuration of Hadoop

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.