About the execution conditions of input-events and Done-flag workflows for Oozie
When a workflow specified by coordinator has entered the Execution time window, Oozie first checks that all input-events have "occurred" (satisfied), and the check is mainly divided into two aspects:
- Does the specified file or folder already exist?
- If Done-flag is specified, check if the Done-flag file exists
The workflow will enter the runing state only if all input-events have occurred, otherwise the Oozie will continue to monitor the specified files or folders, but if they are created or done-flag files are generated, the workflow will immediately enter the running state.
About Done-flag
From an application perspective, if you are monitoring a file or folder, at the moment they are established, the data may not be written intact, and immediately executing the action may lose data or error, a good way to do this is to write a flag file after all the files have been written, Can cross this sign file to start action, this is the role of Done-flag!
The following is a description of the Done-flag in the official documentation:
- If the Done-flag is omitted the coordinator would wait for the presence of a _success file in the directory (Note:mapreduc E jobs create this on successful completion automatically).
- If the Done-flag is present but empty and then the existence of the directory itself indicates that the dataset was ready.
- If the Done-flag is present but Non-empty, Oozie would check for the presence of the named file within the directory, and W Ill be considered ready (done)
For Done-flag, it is necessary to explain:
For input event, when it refers to a dateset that specifies Done-flag, the Oozie "reads" The flag file before executing the action to determine whether a subsequent action can be initiated. It is important to note that the input event is not necessarily referenced by an action as a parameter (configuration property), although most of the time, if an input event is not referenced by any action, it actually acts as an "input check". That is, before you start any action, check that the file or folder you are making exists, if you specify a done flag file, and then check that the done flag file exists.
For the output event, Oozie does not "write" the flag file after the action is executed when it refers to a dateset that specifies Done-flag.
Done-flag is a very ingenious way to mark a file that is already the most common "pattern" in the big Data world. A typical example is the _success file generated by Mr At the end of the execution of a task.
Generating Done-flag is so common that the HDFs CLI directly provides commands for generating done-flag files, as well as an example:
#Create a flag file named _SUCCESS under a certain input folder."/puth/to/input/folder/_SUCCESS"
About Oozie's input-events and Done-flag