Flume Introduction and use (i)

Source: Internet
Author: User
Tags solr syslog elastic search

Flume Introduction and use (i)

Flume Introduction

Flume is a distributed, reliable, and practical service that efficiently collects, integrates, and moves massive amounts of data from different data sources.

  

Distributed: Multiple machines can simultaneously run the acquisition data, different agents before the transmission of data over the network

Reliable: Flume will cache the collected data in the channel and will be removed from the channel when the sink confirms that the data has been received.

Practical: The use of flume is very simple, easy to expand, only need to modify the configuration file options, you can customize the different source, Channel, Sink, thanks to its exquisite design

The agent is a JVM process for flume, which contains the source, Channel, sink three components, the data to be collected is customized to the source cache to Channel,channel is a queue, the source inserts data into it, Sink the data from the inside, when the sink confirms that the data is received by the next-hop agent or DB, the data is deleted from the channel.

The most ingenious part of Flume is its modular design. In the actual application of the data to be collected may come from the same data source, can be command screen output, application logs, network traffic and so on, users can customize different source to be collected separately. You can also tailor your channel and sink to suit your needs. The following table is a subset of the component types supported by Flume1.6.0

Source Type Description
Avro Source Supports Avro protocol (actually Avro RPC) with built-in support
Thrift Source Support Thrift protocol, built-in support
Exec Source UNIX-based command to produce data on standard output
JMS Source Read data from a JMS system (message, subject), ACTIVEMQ has been tested
Spooling Directory Source Monitor data changes within a specified directory
Twitter 1% firehose Source Continuous download of Twitter data via API, test nature
Netcat Source Monitor a port to enter each text line data flowing through the port as an event input
Sequence Generator Source Sequence generator data source, production sequence data
Syslog Sources Read syslog data, generate event, support UDP and TCP two protocols
HTTP Source A data source based on an HTTP POST or get method that supports JSON, blob representations
Legacy Sources Compatible with old Flume og in source (0.9.x version)

Channel Type Description
Memory Channel Event data is stored in memory
JDBC Channel The event data is stored in the persistent storage, and the current flume channel has built-in support for Derby
File Channel Event data is stored in a disk file
Spillable Memory Channel Event data is stored in memory and on disk and is persisted to disk files (currently experimental, not recommended for production environments) when the memory queue is full
Pseudo Transaction Channel Test purpose
Custom Channel Custom Channel implementations

sink type Description
Kafka Sink Write data to Kafka topic
Hive Sink Write data to a hive database or partition
HDFS Sink Data written to HDFs
Logger Sink Data is written to the log file
Avro Sink The data is converted to Avro Event and then sent to the configured RPC port
Thrift Sink The data is converted to thrift Event and then sent to the configured RPC port
IRC Sink Data is played back on IRC
File Roll Sink Storing data to a local file system
Null Sink Discard to all data
HBase Sink Data written to HBase database
Morphline SOLR Sink Data sent to SOLR Search server (cluster)
ElasticSearch Sink Data sent to elastic Search server (cluster)
Kite Dataset Sink Write data to Kite dataset, test-nature
Custom Sink Custom Sink Implementation

Flume Installation

This article first describes how to install the flume, and how to set up the configuration file, followed by how to program the collection of logs.

Flume1.6.0 version of the JVM1.6 and above, install the JDK (with the JVM), go to flume official website to download the binary format of the compressed package and unzip.

Second, enter the Conf directory: CP flume-conf.properties.template myflumeconf.properties

third, according to their own needs to modify the configuration inside, the official network has a lot of configuration definition.

The configured properties file is similar to the following:

Iv. Start Flume

    Run under the Flume directory:

Bin/flume-ng agent--conf conf--conf-file conf/trafficxxx.properties--name Agent1-dflume.root.logger=info,console

Where--conf specifies the configuration file path,--conf-file Specifies the configuration file,--name the specified configuration file to start the agent name (a configuration file can have more than one agent definition),- Dflume.root.logger Specifies the level and place of logs that are output by the Flume runtime.

Flume Introduction and use (i)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.