The principle and use of flume

Last Update:2015-07-30 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Flume is a high-performance, highly-possible distributed log collection system for the Cloudera company.

The core of Flume is to collect data from the data source and send it to the destination. To ensure a successful delivery, the data is cached before it is sent to the destination, and the data is deleted when the data actually arrives at the destination.

The basic unit of the data transmitted by Flume is the event, if it is a text file, usually a row of records, which is the basic unit of the transaction.

The core of the flume operation is the agent. It is a complete data collection tool that contains three core components, namely source, channel, sink. With these components, the event can flow from one place to another, as shown in.

Source can receive data sent from an external source. Different source, can accept different data format. For example, there is a directory pool (spooling directory) data source, you can monitor the new file changes in the specified folder, if there are files in the directory, the contents will be read immediately.

The channel is a storage place that receives the output of the source until a sink consumes the data in the channel. The data in the channel is not deleted until it enters the next channel or enters the terminal. When the sink write fails, it can be restarted automatically without data loss and is therefore reliable.

Sink will consume the data in the channel and then send it to an external source or to other sources. If the data can be written to HDFs or HBase.

Flume allows multiple agents to be joined together to form a multi-level hop before and after a connection.

Use

1. Download from official website apache-flume-1.4.0-bin.tar.gz and the apache-flume-1.4.0-src.tar.gz

2. Unzip each, then copy all the contents of the SRC project to the bin project

3. Delete the SRC project and rename the bin project to Flume

4. Configure to environment variables

5. Writing Agent Configuration

The core of using flume is how to configure the agent file. Agent configuration is a plain text file, using key-value pairs to store configuration information, you can set up multiple agent information. The contents of the configuration include source, channel, sink, and so on. Component source, channel, and sink all have names, types, and many personalized property configurations.

The configuration file should be written like this

# list The sources, sinks and channels for the agent

<agent>.sources = <Source>

<agent>.sinks = <Sink>

<agent>.channels = <Channel1> <Channel2>

# Set channel for source

<agent>.sources.<source>.channels = <Channel1> <Channel2> ...

# Set channel for sink

<agent>.sinks.<sink>.channel = <Channel1>

# Properties for sources

<Agent>.sources.<Source>.<someProperty> = <someValue>

# Properties for channels

<Agent>.channel.<Channel>.<someProperty> = <someValue>

# Properties for Sinks

<Agent>.sources.<Sink>.<someProperty> = <someValue>

# here is an example

#下面的agent1是代理名称, corresponding to the source, the name is SRC1, there is a sink, the name is SINK1; there is a channel, the name is CH1.

Agent1.sources = Src1

Agent1.sinks = Sink1

Agent1.channels = CH2

# Config directory source, monitor directory (must exist) change, require file name must be unique, otherwise flume error

Agent1.sources.src1.type = Spooldir

Agent1.sources.src1.channels = CH2

Agent1.sources.src1.spoolDir =/root/hmbbs

Agent1.sources.src1.fileHeader = False

Agent1.sources.src1.interceptors = I1

Agent1.sources.src1.interceptors.i1.type = Timestamp

# Configure Memory Channel

Agent1.channels.ch1.type = Memory

agent1.channels.ch1.capacity = 1000

agent1.channels.ch1.transactionCapacity = 1000

Agent1.channels.ch1.byteCapacityBufferPercentage = 20

Agent1.channels.ch1.byteCapacity = 800000

# config file Channel

Agent1.channels.ch2.type = File

Agent1.channels.ch2.checkpointDir =/root/flumechannel/checkpoint

Agent1.channels.ch2.dataDirs =/root/flumechannel/data

# Configuring HDFs Sink

Agent1.sinks.sink1.type = HDFs

Agent1.sinks.sink1.channel = CH2

Agent1.sinks.sink1.hdfs.path = hdfs://hadoop0:9000/flume/%y-%m-%d/

Agent1.sinks.sink1.hdfs.rollinterval=1

Agent1.sinks.sink1.hdfs.fileType = DataStream

Agent1.sinks.sink1.hdfs.writeFormat = Text

# Configure HBase Sink

#配置hbase SINK2

Agent1.sinks.sink2.type = HBase

Agent1.sinks.sink2.channel = Channel1

Agent1.sinks.sink2.table = Hmbbs

agent1.sinks.sink2.columnFamily = CF

Agent1.sinks.sink2.serializer = Flume. Hmbbshbaseeventserializer

Agent1.sinks.sink2.serializer.suffix = Timestamp

Agent1.sinks.sink2.serializer = Org.apache.flume.sink.hbase.SimpleHbaseEventSerializer

5. The script to start the agent is the Flume-ng agent, you need to specify the agent name, configuration directory, configuration file

-N Specify Agent Name

-c Specifies the configuration file directory

-F Specify configuration file

-dflume.root.logger=debug,console

So the full boot command should be written like this.

Bin/flume-ng agent–n agent1–c conf–f conf/example–dflume.root.logger=debug,console

After successful startup, you can put the file into directory/root/hmbbs, Flume will perceive the new file and upload it to the/flume directory in HDFs.

The principle and use of flume

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More