Apache Flume Introduction and installation Deployment

Source: Internet
Author: User

Overview

Flume is a highly available, highly reliable, distributed, massive log collection, aggregation, and transmission software provided by Cloudera.

The core of Flume is to collect data from the data source , and then send the collected data to the specified destination (sink). In order to ensure that the delivery process must be successful, before sending to the destination (sink), the data will be cached (channel), when the data really arrives at the destination (sink), flume delete their own cached data.

Flume supports the customization of various data senders for the collection of various types of data, while Flume supports the customization of various data recipients for the final storage of data. General acquisition requirements, through the simple configuration of flume can be achieved. It also has a good custom extension capability for special scenarios. Therefore,Flume can be used for most of the daily data collection scenarios .

Operating mechanism

  The core role of the Flume system is that the agent,agent itself is a Java process that typically runs on the log collection node.

    • Each agent is equivalent to a data transfer agent with three components inside:

    Source: A collection of sources for interfacing with data sources to obtain data;

Sink: Sinking, the purpose of collecting data for transmitting data to the next level agent or transmitting data to the final storage system;

Channel:agent internal data transmission channel for transmitting data from source to sink;

    • In the process of transmitting the whole data, the event is flowing, it is the basic unit of Flume internal data transmission. Event encapsulates the data that is being transmitted. If it is a text file, usually a row of records, the event is also the basic unit of the transaction. Event from source, to channel, to sink, is itself a byte array, and can carry headers (header information) information. An event represents the smallest complete unit of data, from an external data source, to an external destination.
    • A complete event includes: Event headers, event body, event information, and its event information is the journal record that Flume collects.
Flume collection system structure diagram simple structure:

Single Agent collects data

Complex structure

Tandem between multi-level agents

Flume Installation Deployment
    • Upload the installation package to the node on which the data source resides
    • Extract
TAR-ZXVF apache-flume-1.6.0-bin.tar.gz
    • Configure the acquisition scheme according to the data acquisition requirements, described in the configuration file (the file name can be arbitrarily customized)

Create a new file in the Conf directory of Flume

VI netcat-logger.conf
#从网络端口接收数据, sink to logger# capture configuration file, Netcat-logger.conf# Name the components on Thisagenta1.sources=r1a1.sinks=K1a1.channels=c1# Describe/Configure the Sourcea1.sources.r1.type=Netcata1.sources.r1.bind=Localhosta1.sources.r1.port= 44444# Describe The Sinka1.sinks.k1.type=logger# use a channel which buffers events in Memorya1.channels.c1.type=memorya1.channels.c1.capacity= 1000a1.channels.c1.transactionCapacity= 100# Bind The source and sink to the Channela1.sources.r1.channels=C1a1.sinks.k1.channel= C1

    • Specify the acquisition scheme configuration file to start the Flume agent on the appropriate node
Bin/flume-ng agent--conf conf--conf-file conf/netcat-logger.conf--name A1-dflume.root.logger=info,console

#--conf specifying flume with profile location (abbreviated-C)

#--conf-file specifies which of the acquisition schemes (-F)

#--name a name for this flume agent.

    • Test
-Y Telnet

Incoming data:
$ telnet localhost 44444
Trying 127.0.0.1 ...
Connected to Localhost.localdomain (127.0.0.1).
Escape character is ' ^] '.
Hello world! <ENTER>
Ok

Apache Flume Introduction and installation Deployment

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.