Overview
Flume is a highly available, highly reliable, distributed, massive log collection, aggregation, and transmission software provided by Cloudera.
The core of Flume is to collect data from the data source , and then send the collected data to the specified destination (sink). In order to ensure that the delivery process must be successful, before sending to the destination (sink), the data will be cached (channel), when the data really arrives at the destination (sink), flume delete their own cached data.
Flume supports the customization of various data senders for the collection of various types of data, while Flume supports the customization of various data recipients for the final storage of data. General acquisition requirements, through the simple configuration of flume can be achieved. It also has a good custom extension capability for special scenarios. Therefore,Flume can be used for most of the daily data collection scenarios .
Operating mechanism
The core role of the Flume system is that the agent,agent itself is a Java process that typically runs on the log collection node.
- Each agent is equivalent to a data transfer agent with three components inside:
Source: A collection of sources for interfacing with data sources to obtain data;
Sink: Sinking, the purpose of collecting data for transmitting data to the next level agent or transmitting data to the final storage system;
Channel:agent internal data transmission channel for transmitting data from source to sink;
- In the process of transmitting the whole data, the event is flowing, it is the basic unit of Flume internal data transmission. Event encapsulates the data that is being transmitted. If it is a text file, usually a row of records, the event is also the basic unit of the transaction. Event from source, to channel, to sink, is itself a byte array, and can carry headers (header information) information. An event represents the smallest complete unit of data, from an external data source, to an external destination.
- A complete event includes: Event headers, event body, event information, and its event information is the journal record that Flume collects.
Flume collection system structure diagram simple structure:
Single Agent collects data
Complex structure
Tandem between multi-level agents
Flume Installation Deployment
- Upload the installation package to the node on which the data source resides
- Extract
TAR-ZXVF apache-flume-1.6.0-bin.tar.gz
- Configure the acquisition scheme according to the data acquisition requirements, described in the configuration file (the file name can be arbitrarily customized)
Create a new file in the Conf directory of Flume
VI netcat-logger.conf
#从网络端口接收数据, sink to logger# capture configuration file, Netcat-logger.conf# Name the components on Thisagenta1.sources=r1a1.sinks=K1a1.channels=c1# Describe/Configure the Sourcea1.sources.r1.type=Netcata1.sources.r1.bind=Localhosta1.sources.r1.port= 44444# Describe The Sinka1.sinks.k1.type=logger# use a channel which buffers events in Memorya1.channels.c1.type=memorya1.channels.c1.capacity= 1000a1.channels.c1.transactionCapacity= 100# Bind The source and sink to the Channela1.sources.r1.channels=C1a1.sinks.k1.channel= C1
- Specify the acquisition scheme configuration file to start the Flume agent on the appropriate node
Bin/flume-ng agent--conf conf--conf-file conf/netcat-logger.conf--name A1-dflume.root.logger=info,console
#--conf specifying flume with profile location (abbreviated-C)
#--conf-file specifies which of the acquisition schemes (-F)
#--name a name for this flume agent.
-Y Telnet
Incoming data:
$ telnet localhost 44444
Trying 127.0.0.1 ...
Connected to Localhost.localdomain (127.0.0.1).
Escape character is ' ^] '.
Hello world! <ENTER>
Ok
Apache Flume Introduction and installation Deployment