Oozie Configuration of Hadoop

Last Update:2016-05-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Description

Tasks performed in Hadoop sometimes require multiple map/reduce jobs to be connected together in order to achieve the goal. In the Hadoop ecosystem,Oozie allows us to combine multiple map/reduce jobs into a single logical unit of work, To accomplish larger tasks.

Principle

Oozie is a java Web application that runs in the Java servlet container- the Tomcat --in, and use the database to store the following:

Workflow definition

Currently running workflow instances, including the status and variables of the instance

The Oozie workflow is a set of actions placed in a control-dependent DAG(directed acyclic graph Direct acyclic graph) (for example,Hadoop map/reduce jobs,Pig jobs, and so on) that specify the order in which the actions are executed. We will use HPDL(an XML Process Definition language) to describe this diagram.

HPDLis a very concise language that uses only a handful of process control and action nodes. The control node defines the process of execution and includes the starting and ending points of the workflow (Start,Endand thefailnode) and the mechanism that controls the execution path of the workflow (decision,Forkand theJoinnode). Action nodes are mechanisms through which workflows trigger the execution of calculations or processing tasks. Oozieprovides support for the following types of actions:Hadoop map-reduce,Hadoopfile System,Pig,Javaand theOoziethe child workflow (SSHthe action has been taken fromOozie schema 0.2removed from the later version).

All compute and processing tasks triggered by the action node are notOozie-they are made up ofHadoopof theMap/reducethe framework executes. This approach allowsOoziecan support the existingHadoopa mechanism for load balancing, disaster recovery. These tasks are performed primarily asynchronously (only file system action exceptions, which are synchronous). This means that for most work flows to trigger the type of calculation or processing task, wait until the workflow action transitions to the next node of the workflow until the calculation or processing task is finished before it can continue. OozieThere are two different ways to detect whether a calculation or processing task is completed, that is, callbacks and polling. WhenOoziewhen a calculation or processing task is started, it provides a unique callback for the taskURL, and the task sends a notification when it is complete to a specificURL. The task cannot trigger a callbackURL(perhaps for any reason, such as a network flash), or when the type of the task fails to trigger a callback at completionURLthe time,Ooziethere is a mechanism for polling a calculation or processing task to ensure that the task can be completed.

Oozie Workflows can be parameterized (using variables like ${inputdir} in the workflow definition ). When submitting a workflow operation, we must supply the parameter value. If it is properly parameterized (say, using a different output directory), then many of the same workflow actions can be concurrent.

Some workflows are triggered as needed, but in most cases it is necessary to run them based on certain time periods and/or data availability and/or external events. Oozie Coordination System ( Coordinator System allows users to define workflow execution plans based on these parameters. the Oozie coordinator allows us to model workflow execution triggers in predicates, which can point to data, events, and/or external events. The workflow job starts when the predicate is satisfied.

Often we also need to connect workflow operations with timed runs, but with different intervals. The output of multiple subsequently running workflows becomes the input for the next workflow. Connecting these workflows together allows the system to refer to it as a conduit for data application. Oozie The Coordinator supports the creation of such a data application pipeline .

Installation

Installation Environment Hadoop2.6.0 maven3.3.9 pig0.15.0 jdk1.8 MySQL

1. Unzip to the app directory tar -zxf oozie-4.2.0-c app/

Compiling MVN clean Package assembly:single-p Hadoop-2-dskiptests

2. Unzip the compiled file tar -zxf oozie-4.2.0-distro.tar.gz ~/app/oozie/

3. Modify the HDFs configuration

Modify the core-site.xml under Hadoop

<name>hadoop.proxyuser. [user].hosts</name>

</property>

<name>hadoop.proxyuser. [user].groups</name>

</property>

[USER] need to change back to start Oozie Tomcat of users

Do not restart the Hadoop cluster and make the configuration effective

HDFs dfsadmin-refreshsuperusergroupsconfiguration

Yarn Rmadmin-refreshsuperusergroupsconfiguration

4, configuration Oozie

A. create a new libext directory under the oozie-4.2.0 directory and copy the Ext-2.2.zip to the directory; Hadoop -related jar packages to this directory include JDBC 's jar Bar

CP $HADOOP _home/share/hadoop/*/*.jar libext/

CP $HADOOP _home/share/hadoop/*/lib/*.jar libext/

Remove the Hadoop and Tomcat conflict jar Packages

MV Servlet-api-2.5.jar Servlet-api-2.5.jar.bak

MV Jsp-api-2.1.jar Jsp-api-2.1.jar.bak

MV Jasper-compiler-5.5.23.jar Jasper-compiler-5.5.23.jar.bak

MV Jasper-runtime-5.5.23.jar Jasper-runtime-5.5.23.jar.bak

B, configure database connection, file is conf/oozie-site.xml

<name>oozie.service.JPAService.create.db.schema</name>

</property>

<name>oozie.service.JPAService.jdbc.driver</name>

<value>com.mysql.jdbc.Driver</value>

</property>

<name>oozie.service.JPAService.jdbc.url</name>

<value>jdbc:mysql://node4:3306/oozie?createDatabaseIfNotExist=true</value>

</property>

<name>oozie.service.JPAService.jdbc.username</name>

</property>

<name>oozie.service.JPAService.jdbc.password</name>

</property>

<name>oozie.service.HadoopAccessorService.hadoop.configurations</name>

<value>*=/usr/hadoop/hadoop-2.6.0/etc/hadoop</value>

</property>

C, pre-boot initialization

A. fight A war bag

Bin/oozie-setup.sh Prepare-war

B. initializing a database

bin/ooziedb.sh Create-sqlfile Oozie.sql-run

C. Modify The oozie-4.2.0/oozie-server/conf/server.xml file and comment out the following record

D. upload the jar Package

bin/oozie-setup.sh sharelib Create-fs hdfs://master:9000

5. Start

bin/oozied.sh start

Case

MR Task Flow

A, decompression oozie-examples.tar.gz to oozie-4.2.0

B, Vim examples/apps/map-reduce/job.properties

namenode=hdfs://master:9000

jobtracker=master:8032

Queuename=default

Examplesroot=examples

Oozie.wf.application.path=${namenode}/user/${user.name}/${examplesroot}/apps/map-reduce

Outputdir=map-reduce

C, modify Vim Examples/apps/map-reduce/workflow.xml

<name>mapred.map.tasks</name>

</property>

D. Submit a Task

Oozie Job-oozie Http://master:11000/oozie-config Examples/apps/map-reduce/job.properties-run

E, view

http://master:11000/oozie/

Http://master:8088/cluster

Oozie Configuration of Hadoop

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Oozie Configuration of Hadoop

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Oozie Configuration of Hadoop

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support