Apache NiFi Processor Combat

Source: Internet
Author: User
Tags hadoop ecosystem

1 Preface

What is Apache Nifi? The Nifi website explains the following: "An easy-to-use, robust and reliable data processing and distribution system". In layman's terms, Apache NiFi is an easy-to-use, powerful, and reliable data processing and distribution system designed to support data routing, transformation, and system mediation logic for highly configurable indicators.
To be clearer about what Nifi can articulate, the following is a brief introduction to the Nifi architecture, as shown in.

According to the official website of the various components of the description, do summary translation:
? WebServer: Its purpose is to provide HTTP-based command and control APIs.
? Flow Controller: This is the core of the operation, with processor as the processing unit, provides the extension thread for running, and manages the dispatch when the extension receives the resource.
? Extensions: Various types of Nifi extensions are described in other documents, and the key to Extensions is to extend operations and execution in the JVM.
? The role of the Flowfile Repository:flowfile Library is to Nifi track the state of a given stream file that is currently active in the stream, its implementation is pluggable, and the default method is a persistent write-ahead log on the specified partition.
? The purpose of the Content Repository:content Library is to place the actual content byte of the given stream file, and its implementation is pluggable. The default method is a relatively simple mechanism for storing blocks of data in a file system.
? The Provenance Repository:provenance Library is where all the source data is stored and is supported for pluggable. The default implementation is to use one or more physical disk volumes, where the event data is indexed and searchable.

2 NiFi processer Introduction

The previous section said so much, mainly through the Nifi architecture diagram introduced the basic concept of nifi, from the concept of flow controller is the core of Nifi, then what is the flow controller specifically? The Flow controller acts as a processor role for file Exchange, maintaining multiple processor connections and managing individual processer,processer is the actual processing unit.

So, let's look at the Nifi UI to see what the Nifi processor contains.

The processor contains various types of components, such as Amazon, attributes, Hadoop, etc., which can be easily identified by a prefix, such as GET, fetch at the beginning of the acquisition, such as GetFile, Getftp, Fetchhdfs, Execute representative execution, such as ExecuteSQL, Executeprocess, Executeflumesink and so on can be more easily known for its simple use.

3 NiFi Processer Combat

Said so much, introduced the Nifi architecture and processor, then the good actual combat it? Then, this article takes the author's actual demand as an example, carries on the actual combat of processor. The requirements are as follows: Select a data processing scheduling tool to implement custom dispatch execution for server scripts. The server's script involves scheduling the environment variables, Oracle databases, and Hadoop ecosystem components. Returns the script run state after the execution of the server script schedule and provides a failed re-run interface.
In order to achieve the requirements, has dispatched a variety of scheduling tools, such as Apache Oozie, Azkaban, Pentaho, etc., and finally compared the various advantages and disadvantages of the attempt to choose Apache Nifi as an attempt, by consulting Nifi Processor API, The processor that can better support remote operation is executeprocess. The following will be a practical explanation of requirements.

3.1 Processor Add and configure

1. Click "Add Processor", select Executeprocess and click the Add button to complete the add, such as.


2. Right-click executeprocess and select Configure Processor to configure the Properties tab, where each configuration option provides instructions such as.

As shown, it is necessary to explain the options.
? Command (execute commands): Sh.
? Command Arguments (execute commands parameter):-c;ssh [email protected] sh js/job/job_hourly.sh ' date
? Batch Duration (execution interval): not set. Our requirements are timed and not performed at a time interval.
? Redirect Error Stream (Relocation stream): not set.
? Argument Delimiter (Execute command parameter delimiter):; To split a parameter.

3.2 Processor Dispatch

Nifi supports three scheduling strategies, including time driven, cron driven (cron driver) and event driven (event-driven, non-optional), choosing CRON driven according to our actual needs. Personal understanding cron is the application of crontab, cron each parameter means: seconds, minutes, hours, days, months, weeks, years, need to cooperate with *,? and L co-execute (* represents the value of the field is valid;?) represents no value specified for the specified field, and L represents a long shape. such as: "0 0 13 * *?" The delegate wants to schedule execution at 1 o'clock in the afternoon every day. Therefore, according to our requirements of the parameters of the scheduling configuration. As shown in.

3.3 Operational Status Monitoring

Nifi through the rest API for developers to schedule, here we use the Processor API to monitor the running state (state parameter acquisition, processor start and stop).
1) Operation Status Monitoring parameter acquisition:
The command is as follows: Curl ' Http://IP/nifi-api/processors/processorsID ' obtains the following results, which can be parsed and obtained by the JSON parser.

2) Start and stop of processor:
Nifi processor start stop through its put method implementation, put the most effective role is to change its running state, Nifi process total three states, namely running, stopped and disabled.
Then we will start and stop the two command rest API to execute in the script.
The start command (Put method using the rest API):
Curl-i-x put-h ' Content-type:application/json '-d '
{
"revision": {
"ClientId": "586ec1d7-015d-1000-6459-28251212434e",
"Version": 17},
"Component": {
"id": "39e0dafc-015d-1000-918d-bee89ae2226e",
"State": "RUNNING"
}
} ' Http://IP/nifi-api/processors/processorsID
The Stop command (Put method using the rest API):
Curl-i-x put-h ' Content-type:application/json '-d '
{
"revision": {
"ClientId": "586ec1d7-015d-1000-6459-28251212434e",
"Version": 17},
"Component": {
"id": "39e0dafc-015d-1000-918d-bee89ae2226e",
"State": "STOPPED"
}
} ' Http://IP/nifi-api/processors/processorsID

4 Summary and PostScript

In this paper, the Apache Nifi is introduced first, after taking the actual needs of the author as an example, the Nifi core components processor the actual combat description. Since Nifi still belongs to the Apache launch of a top-notch project, although the function is very powerful, but the resources are still limited, this article is more a brick-throwing process, its truly powerful function is also in the data processing, welcome interested members to explore each other.

Original link: http://www.cobub.com/actual-combat-of-apache-nifi-processor/

Apache NiFi Processor Combat

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.