flume1.8 Development Guide Learning sentiment

Source: Internet
Author: User

Overview:

Apache Flume is a distributed, usable system that efficiently collects and moves large amounts of log data from many different sources for centralized data storage.

Schema and Data flow model:

Flume is actually an agent. The agent contains three main components: Source, Channel, Sink.

The Flume agent flows the data unit as an event. A flume agent is a JVM process that maintains a component that allows events to flow from an external source to an external destination.

Events are sent to source by external source (for example, Web server) and events are sent in a specific format. For example, Avrosource can be used to accept Avro events or other flume agents from the client. When source accepts an event, the source stores an event into one or more channels. The channel is a live store, saving the event until it is sink consumed. Sink removes the event from the channel and puts the event into an external repository, such as HDFs. Source and sink are run asynchronously inside the agent.

client--Development of custom components:

The client operates at the source of events and sends the acquired events to the Flume agent. The client typically operates in the application process in which they consume data. Flume generally supports Avro,log4j,syslog, and HTTP POST (with JSON body) as a way to convert data from an external source. The Web server in is the equivalent of a client.

You can create a custom mechanism to send data to flume in situations where the condition is not sufficient. There are two ways to do this: the first is to create a custom client with flume already existing sources, such as Avrosource or syslogtcpsource, to communicate with each other. Here the client needs to convert the data into flume sources can recognize the information. The other is to write a custom flume Source that communicates directly with the client application that you already exist using the IPC or RPC protocol, and then converts the client data to flume events for sending.

RPC Client Interface

The implementation of the Flume Rpcclient interface encapsulates the RPC mechanism supported by Flume. The user's application can simply invoke the Flume Client SDK's append (Event) or Appendbatch (list<event>) to send the data without worrying about the details of the underlying message exchange. There are two ways that users can provide the required event, one is through the direct implementation of the event interface, such as the SimpleEvent class, and the second by using the EventBuilder withbody () method.

RPC Clients-avro and Thrift

Avro is the default RPC protocol, nettyavrorpcclient and thriftrpcclient implement the Rpcclient interface. The client needs to create a target flume agent with host and port, which can then send data to the agent using Rpcclient.

Flume Client (Avro client) configuration parsing

① respectively to channels, Sources, sinks named C1, R1, K1;

② indicates the type of channels C1, memory storage;

③ Note Sources R1 need to connect the channels to C1, and then marked sources R1 type Avro, that is, the client is the avroclient type, the data sent to the source format is Avro Secondly, the host and port of the client are indicated;

④ Note that sinks K1 needs to be connected channels to C1, and then indicates that the sinks type is Loggger storage mode.

Transaction interface:

The transaction interface is the basis for flume reliability. All major components (such as sources,sinks and channels) must use flume transactions;

Transactions are implemented in the CHANNLE implementation process. Each source and sink is connected to the channel, which must contain the transaction object. Sources uses Channelprocessor to manage transactions,sinks through its configured channel management transactions. The operation of putting the event into the channel or removing the event from the channel is done in a live transaction.

Sink:

The purpose of the sink is to remove events from the channel and send them to the next flume agent or store them in an external repository. A sink is connected to a channels that is configured in the Flume configuration file. There is a Sinkrunner instance that connects each of the configured sink, and when the flume framework calls Sinkrunner.start (), a new thread is created to drive sink ( Use Sinkrunner.pollingrunner as the thread's runnable). This thread manages the life cycle of the sink. The sink needs to implement the start () and Stop () methods, which are lifecycleaware interfaces. The Sink.start () method should initialize the Sink and take it to a state where the event can advance to the next destination. The Sink.process () method should perform the core process of removing the event from the channel and making it move forward. The Sink.stop () method should perform the necessary cleanup (such as freeing resources)

Source:

The purpose of source is to receive data from the external client and store it in a configured channels. Source can get an instance through its own channelprocessor, which is used to process an event and commit at the channel local transaction. Similar to Sinkrunner.pollingrunner Runnable, there is Pollingrunner Runnable executed in a new thread, when Flume framework calls Pollablesourcerunner.start (), The thread is created. Each configured Pollablesource is associated with a pollingrunnable thread that it originally ran. This thread manages the life cycle of the pollablesource, such as starting and stopping. A Pollablesource implementation must implement the Strat () and Stop () methods, which are declared in the Lifecycleaware interface. Pollablesource runs the process () method that calls source. The process () method should examine the new data and store it in the channel in the form of flume events. Note There are two kinds of sources. Pollablesource has been mentioned. The other is Eventdrivensource. Eventdrivensource, unlike Pollablesource, must have its own calling mechanism to identify new data and channel new data. Eventdrivensources are not driven by their own threads.

Channel:

Not currently

flume1.8 Development Guide Learning sentiment

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.