High-availability Hadoop platform-Oozie Workflow

Source: Internet
Author: User

High-availability Hadoop platform-Oozie Workflow
1. Overview

When developing and using Hadoop-related applications, we can directly use Crontab to schedule related applications without complicated services and few tasks. Today, we will introduce the system for unified management of various scheduling tasks. The following is the content directory shared today:

  • Content
  • Oozie Server
  • Preview

Let's start today's content sharing.

2. Introduction

Today's content does not involve Oozie's detailed operations. Its workflow will be detailed in the next blog. Today we will share with you the role of Oozie and its integration steps.

2.1 role

Oozie is an open-source workflow scheduling system that manages multiple Hadoop job tasks with complicated logic and works collaboratively in the specified order. For example, our daily work scenario:

  1. Collect data to HDFS
  2. Write MR to clean data, generate new data and store it in the specified HDFS path
  3. Create a Hive table partition and load the data to the corresponding table partition.
  4. Use HQL for business indicator statistics and output the statistical results to the corresponding Hive big table.
  5. Export data from the Big Table after statistics to call external services.

Through the preceding daily workflow, we can write a workflow system, generate a workflow instance, and run the instance regularly every day. For such a Hadoop Application Scenario, Oozie can simplify task scheduling and execution.

2.2 Basic Environment

The basic environment for this article is:

Name Value
Operating System CentOS6.6
Workflow Oozie4.2
Hadoop 2.6

  

 

 

 

The above is the basic environment for this blog to depend on. JDK, Maven, and MySQL driver files are also required.

3. Oozie Server

Oozie Server can provide us with convenient Job management functions. It can manage the running status of jobs through its visual interface. Of course, it also supports building complex Hadoop Job processes, the dependency between jobs can be configured through the workflow, Which is uniformly executed by the Oozie Server.

3.1 prepare dependency packages
  • Maven

To download and install the Maven environment, run the following command:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.