What is Azkaban? (a)
Functional characteristics of Azkaban (II.)
Architecture of the Azkaban (iii)
Not much to say, directly on the dry goods!
Http://www.cnblogs.com/zlslch/category/938837.html
Currently, there are two of the most popular Hadoop workflow engine schedulers Azkaban and Oozie on the market.
Specifically, you can look further at my blog.
Azkaban Concept Learning Series http://www.cnblogs.com/zlslch/category/938837.html
and Oozie Concept Learning series http://www.cnblogs.com/zlslch/category/916607.html
The following table compares the key features of the above 2 Hadoop workflow schedulers, although the requirements scenarios that these workflow schedulers can address are basically consistent, but there are differences in design concepts, target users, application scenarios, and so on.
Characteristics |
Oozie |
Azkaban |
Workflow Description Language |
XML (XPDL based) |
Text file with Key/value pairs |
Dependency mechanism |
Explicit |
Explicit |
Do you want the Web container |
Yes |
Yes |
Progress tracking |
Web page |
Web page |
Hadoop Job scheduling support |
Yes |
Yes |
Operating mode |
Daemon |
Daemon |
Pig support |
Yes |
Yes |
Event notification |
No |
No |
Need to install |
Yes |
Yes |
Supported versions of Hadoop |
0.20+ |
Currently unknown |
Retry support |
Workflownode Evel |
Yes |
Run arbitrary commands |
Yes |
Yes |
Amazon EMR Support |
No |
Currently unknown |
Comparison of Azkaban and Oozie
The following detailed comparisons are given for the two most popular schedulers on the market. The high profile should be Apache Oozie, but the process of configuring workflow is to write a lot of XML configuration, and the code complexity is relatively high, not easy to two times development. Ooize compared to Azkaban is a heavyweight task scheduling system, full-featured, but also more complex configuration use. The lightweight scheduler Azkaban is a good candidate if you can not care about the absence of certain features.
Compare from function
Both can dispatch Linux commands, MapReduce, Spark, Pig, Java, Hive, Java programs, script workflow tasks
Both can perform workflow tasks on a timed basis
Compare from Workflow definition
1. Azkaban using the properties file to define the workflow
2. Oozie using XML file to define workflow
Compare from work Flow
1, Azkaban support direct parameters, such as ${input}
2, Oozie support parameters and El expressions, such as ${fs:dirsize (Myinputdir)}
Compare from timed execution
1, Azkaban scheduled execution of tasks is time-based
2, Oozie the timing of the task based on time and input data
Comparison from Resource management
1, Azkaban has more strict control of permissions, such as the user to the workflow to read/write/execute and other operations
2, Oozie temporarily no strict control of authority
Compare from Workflow execution
1, Azkaban has three modes of operation:
1.1, the solo server mode: The simplest mode, the database built-in H2 database, the Management Server and the execution server are running in a process, the task volume is not large project can adopt this mode.
1.2, the server mode: the database for MySQL, Management Server and execution server in different processes, this mode, the Management Server and execution server do not affect each other
1.3, multiple Executor mode: In this mode, the execution server and the Management Server are on different hosts, and the execution server can have multiple
I use the second mode this time, the Management Server, the execution server sub-process, but on the same host.
2. Oozie runs as a workflow server, supporting multiple users and multiple workflows
Compare Workflow Management
1, Azkaban support browser and AJAX mode operation workflow
2. Oozie Support command line, HTTP REST, Java API, browser operation workflow
Another version differs:
The two are roughly the same in terms of functionality, except that the Oozie underlying commits the Hadoop spark job through Org.apache.hadoop's encapsulated interface, and Azkaban can directly manipulate the shell statement. It might be better to oozie on security.
Workflow Definition : Oozie is defined by XML and Azkaban to properties.
deployment Process : The deployment of Oozie is too abusive. It's a little difficult. At the same time it is pulling the task log from yarn.
Azkaban If a task fails, as long as the process executes effectively, then the task succeeds, this is a bug, but Oozie can effectively detect the success and failure of the task.
Action Workflow : Azkaban uses web operations. Oozie supports Web,restapi,java API operations.
rights control : Oozie basic no permission control, Azkaban has a more complete permission control, into the user to read and write the workflow operation.
Oozie's action runs primarily in Hadoop and Azkaban's actions run on Azkaban's servers.
record the status of Workflow : Azkaban saves the workflow state in memory and oozie it in MySQL.
failure occurs : Azkaban loses all workflows, but Oozie can run on a workflow that continues to fail.
Comparison of Azkaban and Oozie of Hadoop workflow engine (iv)