Flume Introduction (i)

Source: Internet
Author: User

Introduction:

The contents of this paper include flume background, data flow model, common data flow operation, flume agent startup and flume Agent Simple example. The reference document is flume1.8.0 flumeuserguide of Flume official website.

I. BACKGROUND

  Flume is a distributed log collection system produced by Cloudera software company, which was donated to the Apache Software Foundation in 2009 and is currently a top-level project for Apache.

Flume is a highly available, high-reliability, distributed mass log collection, aggregation and transmission system, it is noteworthy that because the data transmitted is customizable, so flume can not only be used for the transmission of log records, but also for the transmission of network traffic data, social media data, mail information and other data. The current Flume has two versions, Flume 0.9X version is collectively known as flume-og,flume1.x version flume-ng. Since Flume-ng has undergone significant refactoring, it is very different from the flume-og and should be differentiated when used. second, the data flow model   Data flow Model 2.1 in Flume, where Flume is defined in the Terminology Interpretationas follows: 1.event: A Data flow unit, the storage format is byte; 2.agent: A JVM process, which is the smallest constituent unit of Flume, is responsible for the management of flume components, including source, channel, and sink. Each component can have multiple; 3.source: A data source that is responsible for receiving data sent from an external data source; 4.channel: A buffer that is stored in one or more channel when the source receives the data; 5.sink: Responsible for consuming data in the channel and sending it to external storage such as HDFs or to source of another flume agent.   

Figure 2.1 Flume Data flow model

A typical data flow process for a flume agent is:

1. External data sources (one or more) send data in the specified format to flume Agent,source to deserialize the data and store it in one or more channel;  2. When a data is required by the sink, the original data in the channel is removed from the specified channel and deleted.  3.sink serializes the data in the specified format and sends it to the specified location. Important: 1. A source can correspond to one or more channel, but a sink can only correspond to one channel; 2. When sink consumes the data in the corresponding channel, the original data in the channel is deleted; 3. Both source and sink must specify that their type,type are different and the serialization mechanism is different. iii. Common Data flow operations   The following common data flow operations are mentioned in flume1.8.0 's flumeuserguide: 1. Multiple agent cascade (multi-agent Flow) Flume agents can be cascaded together to form an agent chain.

Figure 3.1 Multi-Agent Flow

2. Consolidation (Consolidation) can be used to collect data produced by multiple data sources through Consolidation,flume.

Figure 3.2 Consolidation

3.Multiplexing the Flow

An event stream can be sent to one or more destinations via Multiplexing,flume.

Figure 3.3 Multiplexing The flow

Four, flume agent start

As mentioned above, a flume agent can contain multiple source, channel, and sink. Depending on the flumeuserguide of flume1.8.0, you can start a flume agent with the following command:

$FLUME _home/bin/flume-ng agent-n $agent _name-c conf-f conf/flume-conf.properties.template

where the-n parameter specifies the agent name, the-c parameter specifies the Conf directory, and the-f parameter specifies the configuration file.

The startup process for a flume agent is as follows:

1. Prepare the configuration file according to the requirements;

2. Use the above command to specify the configuration file and start the Flume agent.

V. Examples of flume agents

  Flume website provides a simple example of a flume agent whose name is A1 and the configuration file is as follows:

# example.conf:a Single-node Flume configuration# Name The components in this agenta1.sources = R1a1.sinks = K1a1.channel  s = c1# describe/configure the Sourcea1.sources.r1.type = Netcata1.sources.r1.bind = Localhosta1.sources.r1.port = 44444# Describe the Sinka1.sinks.k1.type = logger# use a channel which buffers events in Memorya1.channels.c1.type = memorya1.ch annels.c1.capacity = 1000a1.channels.c1.transactioncapacity = 100# Bind The source and sink to the channela1.sources.r1.ch Annels = C1a1.sinks.k1.channel = C1

The above configuration file is configured with a flume agent with a name of A1, a source named R1, a channel named C1, and a k1 named sink, with a data flow direction of R1-->C1-->K1. The detailed properties of each component are as follows:

Component/Attribute Name Type Bind Port Channel Capacity Transactioncapacity
Source R1 Netcat localhost 44444 C1 No No
Channel C1 Memory No No No 1000 100
Sink K1 Logger No No C1 No No

in the Flume installation directory, execute the following command to start the Flume agent:(Note: example.conf is located in the Flume installation directory)

Bin/flume-ng agent--conf conf--conf-file example.conf--name A1-dflume.root.logger=info,console

Command parameter explanation:

1.--CONF conf Specifies the configuration file directory as the Conf directory under the current directory; ( note : The directory must have flume-enx.sh files and log4j configuration files, otherwise the run fails)

2.--conf-file example.conf The example.conf file that specifies the configuration file as the current directory;

3.--name A1 The specified agent name is called A1;

4.-DFLUME.ROOT.LOGGER=INFO,CONSOLE specifies that the log output level is INFO and the output destination is the console.

  The startup information is as follows:

When sending data to localhost:44444 over a TCP connection, the Flume agent will receive the sent data and print it to the console:

1. Establish a TCP connection via Telnet and send the data:

2.agent receives data and prints it on the console:

  

Flume Introduction (i)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.