Flume Introduction and use (i)

Last Update:2016-08-26 Source: Internet

Author: User

Tags solr syslog elastic search

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Flume Introduction and use (i)

Flume Introduction

Flume is a distributed, reliable, and practical service that efficiently collects, integrates, and moves massive amounts of data from different data sources.

Distributed: Multiple machines can simultaneously run the acquisition data, different agents before the transmission of data over the network

Reliable: Flume will cache the collected data in the channel and will be removed from the channel when the sink confirms that the data has been received.

Practical: The use of flume is very simple, easy to expand, only need to modify the configuration file options, you can customize the different source, Channel, Sink, thanks to its exquisite design

The agent is a JVM process for flume, which contains the source, Channel, sink three components, the data to be collected is customized to the source cache to Channel,channel is a queue, the source inserts data into it, Sink the data from the inside, when the sink confirms that the data is received by the next-hop agent or DB, the data is deleted from the channel.

The most ingenious part of Flume is its modular design. In the actual application of the data to be collected may come from the same data source, can be command screen output, application logs, network traffic and so on, users can customize different source to be collected separately. You can also tailor your channel and sink to suit your needs. The following table is a subset of the component types supported by Flume1.6.0

Source Type	Description
Avro Source	Supports Avro protocol (actually Avro RPC) with built-in support
Thrift Source	Support Thrift protocol, built-in support
Exec Source	UNIX-based command to produce data on standard output
JMS Source	Read data from a JMS system (message, subject), ACTIVEMQ has been tested
Spooling Directory Source	Monitor data changes within a specified directory
Twitter 1% firehose Source	Continuous download of Twitter data via API, test nature
Netcat Source	Monitor a port to enter each text line data flowing through the port as an event input
Sequence Generator Source	Sequence generator data source, production sequence data
Syslog Sources	Read syslog data, generate event, support UDP and TCP two protocols
HTTP Source	A data source based on an HTTP POST or get method that supports JSON, blob representations
Legacy Sources	Compatible with old Flume og in source (0.9.x version)

Channel Type	Description
Memory Channel	Event data is stored in memory
JDBC Channel	The event data is stored in the persistent storage, and the current flume channel has built-in support for Derby
File Channel	Event data is stored in a disk file
Spillable Memory Channel	Event data is stored in memory and on disk and is persisted to disk files (currently experimental, not recommended for production environments) when the memory queue is full
Pseudo Transaction Channel	Test purpose
Custom Channel	Custom Channel implementations

sink type	Description
Kafka Sink	Write data to Kafka topic
Hive Sink	Write data to a hive database or partition
HDFS Sink	Data written to HDFs
Logger Sink	Data is written to the log file
Avro Sink	The data is converted to Avro Event and then sent to the configured RPC port
Thrift Sink	The data is converted to thrift Event and then sent to the configured RPC port
IRC Sink	Data is played back on IRC
File Roll Sink	Storing data to a local file system
Null Sink	Discard to all data
HBase Sink	Data written to HBase database
Morphline SOLR Sink	Data sent to SOLR Search server (cluster)
ElasticSearch Sink	Data sent to elastic Search server (cluster)
Kite Dataset Sink	Write data to Kite dataset, test-nature
Custom Sink	Custom Sink Implementation

Flume Installation

This article first describes how to install the flume, and how to set up the configuration file, followed by how to program the collection of logs.

Flume1.6.0 version of the JVM1.6 and above, install the JDK (with the JVM), go to flume official website to download the binary format of the compressed package and unzip.

Second, enter the Conf directory: CP flume-conf.properties.template myflumeconf.properties

third, according to their own needs to modify the configuration inside, the official network has a lot of configuration definition.

The configured properties file is similar to the following:

Iv. Start Flume

　　　　Run under the Flume directory:

Bin/flume-ng agent--conf conf--conf-file conf/trafficxxx.properties--name Agent1-dflume.root.logger=info,console

Where--conf specifies the configuration file path,--conf-file Specifies the configuration file,--name the specified configuration file to start the agent name (a configuration file can have more than one agent definition),- Dflume.root.logger Specifies the level and place of logs that are output by the Flume runtime.

Flume Introduction and use (i)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More