Flume Introduction and use (i)
Flume Introduction
Flume is a distributed, reliable, and practical service that efficiently collects, integrates, and moves massive amounts of data from different data sources.
Distributed: Multiple machines can simultaneously run the acquisition data, different agents before the transmission of data over the network
Reliable: Flume will cache the collected data in the channel and will be removed from the channel when the sink confirms that the data has been received.
Practical: The use of flume is very simple, easy to expand, only need to modify the configuration file options, you can customize the different source, Channel, Sink, thanks to its exquisite design
The agent is a JVM process for flume, which contains the source, Channel, sink three components, the data to be collected is customized to the source cache to Channel,channel is a queue, the source inserts data into it, Sink the data from the inside, when the sink confirms that the data is received by the next-hop agent or DB, the data is deleted from the channel.
The most ingenious part of Flume is its modular design. In the actual application of the data to be collected may come from the same data source, can be command screen output, application logs, network traffic and so on, users can customize different source to be collected separately. You can also tailor your channel and sink to suit your needs. The following table is a subset of the component types supported by Flume1.6.0
Source Type |
Description |
Avro Source |
Supports Avro protocol (actually Avro RPC) with built-in support |
Thrift Source |
Support Thrift protocol, built-in support |
Exec Source |
UNIX-based command to produce data on standard output |
JMS Source |
Read data from a JMS system (message, subject), ACTIVEMQ has been tested |
Spooling Directory Source |
Monitor data changes within a specified directory |
Twitter 1% firehose Source |
Continuous download of Twitter data via API, test nature |
Netcat Source |
Monitor a port to enter each text line data flowing through the port as an event input |
Sequence Generator Source |
Sequence generator data source, production sequence data |
Syslog Sources |
Read syslog data, generate event, support UDP and TCP two protocols |
HTTP Source |
A data source based on an HTTP POST or get method that supports JSON, blob representations |
Legacy Sources |
Compatible with old Flume og in source (0.9.x version) |
Channel Type |
Description |
Memory Channel |
Event data is stored in memory |
JDBC Channel |
The event data is stored in the persistent storage, and the current flume channel has built-in support for Derby |
File Channel |
Event data is stored in a disk file |
Spillable Memory Channel |
Event data is stored in memory and on disk and is persisted to disk files (currently experimental, not recommended for production environments) when the memory queue is full |
Pseudo Transaction Channel |
Test purpose |
Custom Channel |
Custom Channel implementations |
sink type |
Description |
Kafka Sink |
Write data to Kafka topic |
Hive Sink |
Write data to a hive database or partition |
HDFS Sink |
Data written to HDFs |
Logger Sink |
Data is written to the log file |
Avro Sink |
The data is converted to Avro Event and then sent to the configured RPC port |
Thrift Sink |
The data is converted to thrift Event and then sent to the configured RPC port |
IRC Sink |
Data is played back on IRC |
File Roll Sink |
Storing data to a local file system |
Null Sink |
Discard to all data |
HBase Sink |
Data written to HBase database |
Morphline SOLR Sink |
Data sent to SOLR Search server (cluster) |
ElasticSearch Sink |
Data sent to elastic Search server (cluster) |
Kite Dataset Sink |
Write data to Kite dataset, test-nature |
Custom Sink |
Custom Sink Implementation |
Flume Installation
This article first describes how to install the flume, and how to set up the configuration file, followed by how to program the collection of logs.
Flume1.6.0 version of the JVM1.6 and above, install the JDK (with the JVM), go to flume official website to download the binary format of the compressed package and unzip.
Second, enter the Conf directory: CP flume-conf.properties.template myflumeconf.properties
third, according to their own needs to modify the configuration inside, the official network has a lot of configuration definition.
The configured properties file is similar to the following:
Iv. Start Flume
Run under the Flume directory:
Bin/flume-ng agent--conf conf--conf-file conf/trafficxxx.properties--name Agent1-dflume.root.logger=info,console
Where--conf specifies the configuration file path,--conf-file Specifies the configuration file,--name the specified configuration file to start the agent name (a configuration file can have more than one agent definition),- Dflume.root.logger Specifies the level and place of logs that are output by the Flume runtime.
Flume Introduction and use (i)