The principle and use of flume

Source: Internet
Author: User

Flume is a high-performance, highly-possible distributed log collection system for the Cloudera company.

The core of Flume is to collect data from the data source and send it to the destination. To ensure a successful delivery, the data is cached before it is sent to the destination, and the data is deleted when the data actually arrives at the destination.

The basic unit of the data transmitted by Flume is the event, if it is a text file, usually a row of records, which is the basic unit of the transaction.

The core of the flume operation is the agent. It is a complete data collection tool that contains three core components, namely source, channel, sink. With these components, the event can flow from one place to another, as shown in.

Source can receive data sent from an external source. Different source, can accept different data format. For example, there is a directory pool (spooling directory) data source, you can monitor the new file changes in the specified folder, if there are files in the directory, the contents will be read immediately.

The channel is a storage place that receives the output of the source until a sink consumes the data in the channel. The data in the channel is not deleted until it enters the next channel or enters the terminal. When the sink write fails, it can be restarted automatically without data loss and is therefore reliable.

Sink will consume the data in the channel and then send it to an external source or to other sources. If the data can be written to HDFs or HBase.

Flume allows multiple agents to be joined together to form a multi-level hop before and after a connection.

Use
    1. 1. Download from official website apache-flume-1.4.0-bin.tar.gz and the apache-flume-1.4.0-src.tar.gz

2. Unzip each, then copy all the contents of the SRC project to the bin project

3. Delete the SRC project and rename the bin project to Flume

4. Configure to environment variables

5. Writing Agent Configuration

The core of using flume is how to configure the agent file. Agent configuration is a plain text file, using key-value pairs to store configuration information, you can set up multiple agent information. The contents of the configuration include source, channel, sink, and so on. Component source, channel, and sink all have names, types, and many personalized property configurations.

The configuration file should be written like this

# list The sources, sinks and channels for the agent

<agent>.sources = <Source>

<agent>.sinks = <Sink>

<agent>.channels = <Channel1> <Channel2>

# Set channel for source

<agent>.sources.<source>.channels = <Channel1> <Channel2> ...

# Set channel for sink

<agent>.sinks.<sink>.channel = <Channel1>

# Properties for sources

<Agent>.sources.<Source>.<someProperty> = <someValue>

# Properties for channels

<Agent>.channel.<Channel>.<someProperty> = <someValue>

# Properties for Sinks

<Agent>.sources.<Sink>.<someProperty> = <someValue>

# here is an example

#下面的agent1是代理名称, corresponding to the source, the name is SRC1, there is a sink, the name is SINK1; there is a channel, the name is CH1.

Agent1.sources = Src1

Agent1.sinks = Sink1

Agent1.channels = CH2

# Config directory source, monitor directory (must exist) change, require file name must be unique, otherwise flume error

Agent1.sources.src1.type = Spooldir

Agent1.sources.src1.channels = CH2

Agent1.sources.src1.spoolDir =/root/hmbbs

Agent1.sources.src1.fileHeader = False

Agent1.sources.src1.interceptors = I1

Agent1.sources.src1.interceptors.i1.type = Timestamp

# Configure Memory Channel

Agent1.channels.ch1.type = Memory

agent1.channels.ch1.capacity = 1000

agent1.channels.ch1.transactionCapacity = 1000

Agent1.channels.ch1.byteCapacityBufferPercentage = 20

Agent1.channels.ch1.byteCapacity = 800000

# config file Channel

Agent1.channels.ch2.type = File

Agent1.channels.ch2.checkpointDir =/root/flumechannel/checkpoint

Agent1.channels.ch2.dataDirs =/root/flumechannel/data

# Configuring HDFs Sink

Agent1.sinks.sink1.type = HDFs

Agent1.sinks.sink1.channel = CH2

Agent1.sinks.sink1.hdfs.path = hdfs://hadoop0:9000/flume/%y-%m-%d/

Agent1.sinks.sink1.hdfs.rollinterval=1

Agent1.sinks.sink1.hdfs.fileType = DataStream

Agent1.sinks.sink1.hdfs.writeFormat = Text

# Configure HBase Sink

#配置hbase SINK2

Agent1.sinks.sink2.type = HBase

Agent1.sinks.sink2.channel = Channel1

Agent1.sinks.sink2.table = Hmbbs

agent1.sinks.sink2.columnFamily = CF

Agent1.sinks.sink2.serializer = Flume. Hmbbshbaseeventserializer

Agent1.sinks.sink2.serializer.suffix = Timestamp

Agent1.sinks.sink2.serializer = Org.apache.flume.sink.hbase.SimpleHbaseEventSerializer

5. The script to start the agent is the Flume-ng agent, you need to specify the agent name, configuration directory, configuration file

-N Specify Agent Name

-c Specifies the configuration file directory

-F Specify configuration file

-dflume.root.logger=debug,console

So the full boot command should be written like this.

Bin/flume-ng agent–n agent1–c conf–f conf/example–dflume.root.logger=debug,console

After successful startup, you can put the file into directory/root/hmbbs, Flume will perceive the new file and upload it to the/flume directory in HDFs.

The principle and use of flume

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.