High-availability Hadoop platform-Oozie Workflow
1. Overview
When developing and using Hadoop-related applications, we can directly use Crontab to schedule related applications without complicated services and few tasks. Today, we will introduce the system for unified management of various scheduling tasks. The following is the content directory shared today:
- Content
- Oozie Server
- Preview
Let's start today's content sharing.
2. Introduction
Today's content does not involve Oozie's detailed operations. Its workflow will be detailed in the next blog. Today we will share with you the role of Oozie and its integration steps.
2.1 role
Oozie is an open-source workflow scheduling system that manages multiple Hadoop job tasks with complicated logic and works collaboratively in the specified order. For example, our daily work scenario:
- Collect data to HDFS
- Write MR to clean data, generate new data and store it in the specified HDFS path
- Create a Hive table partition and load the data to the corresponding table partition.
- Use HQL for business indicator statistics and output the statistical results to the corresponding Hive big table.
- Export data from the Big Table after statistics to call external services.
Through the preceding daily workflow, we can write a workflow system, generate a workflow instance, and run the instance regularly every day. For such a Hadoop Application Scenario, Oozie can simplify task scheduling and execution.
2.2 Basic Environment
The basic environment for this article is:
Name |
Value |
Operating System |
CentOS6.6 |
Workflow |
Oozie4.2 |
Hadoop |
2.6 |
The above is the basic environment for this blog to depend on. JDK, Maven, and MySQL driver files are also required.
3. Oozie Server
Oozie Server can provide us with convenient Job management functions. It can manage the running status of jobs through its visual interface. Of course, it also supports building complex Hadoop Job processes, the dependency between jobs can be configured through the workflow, Which is uniformly executed by the Oozie Server.
3.1 prepare dependency packages
To download and install the Maven environment, run the following command: