Flume Introduction and Installation

Source: Internet
Author: User

What is a. Flume?

Flume is a distributed, reliable system. It can efficiently collect, consolidate, and move large amounts of data from different sources to data center storage.

Flume is a top-level project under Apache. Flume not only collects consolidated log data, because the data source can be customized, flume can be used to transfer large amounts of log data, which is included not only in network transmission data, social media generated data, mail information and so on.

The current version has 0.9.x and 1.x. The new version has more flexible configuration and performance improvements and is recommended for use with the 1.x version. This article describes the 1.8 version that is used.

Two. Flume's Data flow model

In a nutshell, the Flume Agent (a JVM process) sends an externally generated event to the next target, which is also called the next hop.

1. Related terms
    • Flume event: It is actually data. is the smallest transmission unit of data in Flume, with some optional property settings in addition to valid data.
    • Flume Source: The received event will be placed in one or more channel
    • Flume Channe: Save event know event is consumed by a Flume sink
    • Flume Sink: The event in the channel will be placed in an external source or sent to the Flume source of the next Flume agent.
      Note: Flume source and sink are interactive and asynchronous with the channel.
2.Flume Data Flow Process

For example, the external event source web Server sends data organized into a specific format to flume source. Flume Source has a variety of Avro Flume source, for example, Avro Flume source receives data from the Avro client or Flume agent (data obtained from Flume sink from the channel). Flume source puts data into the channel. Flume Sink can send an event to HDFs, but Sink here uses HDFs Sink.
Note: Source, Channel, and sink all have different implementations, which correspond to different functions.

Three. Install the Flume1. Download the installation package

~]# wget http://archive.apache.org/dist/flume/stable/apache-flume-1.8.0-bin.tar.gz
~]# Tar XF apache-flume-1.8.0-bin.tar.gz-c/opt
~]# cd/opt
~]# LN-SV apache-flume-1.8.0-bin.tar.gz Apache-flume

2. Installing the JDK

Dependent on JDK because of Flume Java program
JDK version is required for 1.8.0+
~]# Yum install-y java-1.8.0-openjdk

3. Configuration files

Modifying a configuration file conf/flume-conf.properties.template

(1) Single component configuration

~]# Vim Conf/flume-conf.properties.template

#对这个flume agent取一个名字,称之为agent01。这个名字可以任意取agent01.sources = r1     # 在flume agent中source的名字为r1。这个名字可以任意取agent01.sinks = k1         # 在flume sinks中sinks的名字为k1。这个名字可以任意取agent01.channels = c1   # 在flume agent中channels的名字为c1。这个名字可以任意取#设置flume source参数agent01.sources.r1.type = netcat           #agent01这个flume agent中源r1的类型为netcat,监听在指定的ip+port以接受数据agent01.sources.r1.bind = localhost       #指明绑定的端口agent01.sources.r1.port = 44444           #指明监听的端口#设置flume sinkagent01.sinks.k1.type = logger    #会记录info级别的信息,主要用于调试#配置flume channelagent01.channels.c1.type = memory   #使用memory类型的channel,他会将event保存在内存中agent01.channels.c1.capacity = 1000a1.channels.c1.transactionCapacity = 100#指定source和sink绑定的channelagent01.sources.r1.channels = c1   #r1会将event发给c1 channelagent01.sinks.k1.channel = c1         #k1会从c1中消费event
(2) Multi-flow agent configuration

A flume agent can configure multiple flow, and we can have multiple sources,channel,sink. The combination of these components can form multiple data flows.

usage Scenario : Multiple flows can be configured at the same time, and the two streams interfere with each other. Take the following example,
A stream is: avro-appsrv-source1-->mem-channel-1--> mem-channel-1
The other one is: exec-tail-source2--> File-channel-2-File-channel-2

#list the sources, sinks and channels in the agentagent_foo.sources = avro-AppSrv-source1 exec-tail-source2agent_foo.sinks = hdfs-Cluster1-sink1 avro-forward-sink2agent_foo.channels = mem-channel-1 file-channel-2#flow #1 configurationagent_foo.sources.avro-AppSrv-source1.channels = mem-channel-1agent_foo.sinks.hdfs-Cluster1-sink1.channel = mem-channel-1#flow #2 configurationagent_foo.sources.exec-tail-source2.channels = file-channel-2agent_foo.sinks. file-channel-2.channel = file-channel-2
(3) Fan out flow

Support for sending data from one source to multiple channel, Fan out has two modes

    • Replicating: Events are sent to multiple channel
    • Multiplexing: The event will be sent to the channel through selector filtering to meet the requirements, if not indicated by default is replicating
      Example configuration:
  #list The sources, sinks and channels in the agentagent_foo.sources = Avro-appsrv-source1agent_foo.sinks = HDFs -cluster1-sink1 avro-forward-sink2agent_foo.channels = mem-channel-1 file-channel-2#set channels for Sourceagent_ Foo.sources.avro-appsrv-source1.channels = Mem-channel-1 File-channel-2#set channel for Sinksagent_ Foo.sinks.hdfs-cluster1-sink1.channel = Mem-channel-1agent_foo.sinks.avro-forward-sink2.channel = file-channel-2# Channel Selector configurationagent_foo.sources.avro-appsrv-source1.selector.type = Multiplexingagent_ Foo.sources.avro-appsrv-source1.selector.header = stateagent_foo.sources.avro-appsrv-source1.selector.mapping.ca = Mem-channel-1agent_foo.sources.avro-appsrv-source1.selector.mapping.az = File-channel-2agent_ Foo.sources.avro-appsrv-source1.selector.mapping.ny = Mem-channel-1 File-channel-2agent_ Foo.sources.avro-appsrv-source1.selector.default = Mem-channel-1  

Selector.header Specifies the name of the detected header, which is configured as state. If the value is "CA" event will be sent to MEM-CHANNEL-1, if the value is "AZ" event will be sent to File-channel-2, if "NY" will be sent to Mem-channel-1 and File-channel-2. If none is matched, it is sent to the default channel MEM-CHANNEL-1
Note: Once the specified channel cannot consume the event, selector will retry in all channel.

4. Start the flume agent

Start the flume agent as a single component configuration
apache-fluem]# bin/flume-ng agent--conf./conf--conf-file./conf/flume-conf.properties.template--name agent01

agent:表明运行为flume agent--conf :指明配置文件目录--conf-file:指明配置我文件--name:指明运行agent的名称


You can see that the port is listening.

5. Test is normal

After the connection is successful, enter any string and if normal returns OK

Reference

For the use of different source,channel and sink, refer to the official documentation
Flume Official documents: http://flume.apache.org/FlumeUserGuide.html

Flume Introduction and Installation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.