first, what is Flume?
Flume, as a real-time log collection system developed by Cloudera, has been recognized and widely used by the industry. The initial release version of Flume is now collectively known as Flume OG (original Generation), which belongs to Cloudera. But with the expansion of the FLume function, FLume OG code Engineering bloated, the core component design is unreasonable, the core configuration is not standard and other shortcomings exposed, especially in FLume OG final release 0.94.0, log transmission instability is particularly serious, in order to solve these problems, 2011 October 22, Cloudera completed the Flume-728 and made a milestone change to Flume: Refactoring the core components, core configuration, and code architecture, the reconstructed version is collectively known as Flume NG (Next generation), and another reason for the change is Flume Included in Apache, Cloudera Flume renamed Apache Flume.
features of Flume:
Flume is a distributed, reliable, and highly available system for collecting, aggregating, and transmitting large volumes of logs. Support for customizing various data senders in the log system for data collection, while Flume provides the ability to simply process the data and write to various data recipients (such as text, HDFS, Hbase,kafka, etc.).
Flume Data flows are always run through events . An event is the basic unit of data for Flume, which carries log data (in the form of byte arrays) and carries header information that is generated by source outside the agent, which is formatted when the source captures the event, and then the source pushes the event into (Single or multiple) The channel. You can think of the channel as a buffer, which will save the event until sink finishes processing the event. Sink is responsible for persisting the log or pushing the event to another source.
reliability of the flume
When a node fails, the log can be transmitted to other nodes without loss. Flume provides three levels of reliability assurance, from strong to weak in order: End-to-end (Received data agent first writes the event to disk, when the data transfer is successful, then delete; If the data sent fails, you can resend it.) ), Store On failure (this is also the policy adopted by scribe, when the data receiver crash, writes the data to the local, after the recovery, continues to send), BestEffort (data sent to the receiver, will not be confirmed).
Flume Scalability
The Flume employs a three-tier architecture, Agent,collector and storage, each of which can be scaled horizontally. All agents and collector are managed by master, which makes the system easy to monitor and maintain, and master allows multiple (management and load balancing using zookeeper), which avoids a single point of failure.
Flume Manageability
all agents and Colletor are managed centrally by master, which makes the system easy to maintain. Multi-master case, Flume uses zookeeper and gossip to ensure the consistency of dynamic configuration data. Users can view individual data sources or data flow executions on master, and can be configured and dynamically loaded on individual data sources. Flume provides two forms of web and Shell Script command to manage data flow.
Flume Scalability
Users can add their own agent,collector or storage as needed. In addition, Flume comes with a number of components, including various agents (file, syslog, etc.), collector and storage (FILE,HDFS, etc.).
Recoverability of the flume:
or by the channel. It is recommended to use FileChannel, where events persist in the local file system (poor performance).
Some core concepts of flume:
The Agent uses the JVM to run Flume. Each machine runs an agent, but it can contain multiple sources and sinks in one agent.
The Client produces data that runs on a separate thread.
Source collects data from the client and passes it to the channel.
Sink collects data from the channel and runs on a separate thread.
The Channel connects sources and sinks, which is a bit like a queue.
Events can be log records, Avro objects, and so on.
Flume is the smallest independent operating unit of the agent. An agent is a JVM. A single agent consists of three components of source, sink, and channel, such as:
650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M02/88/8F/wKiom1f7bNag0ycCAABLidTkpi0724.png "title=" Kafa.png "alt=" Wkiom1f7bnag0yccaablidtkpi0724.png "/>
Flume's detailed three components introduction
The core of Flume is the agent. The agent is a Java process that runs on the log collection side, receives logs through the agent, and then temporarily saves them and sends them to the destination.
The agent consists of 3 core components: source, channel, sink.
Source Component: is dedicated to collecting logs that can handle various types of log data in various formats, including Avro, thrift, exec, JMS, spooling directory, netcat, sequence generator, syslog, HTTP, legacy, etc. custom. The source component collects the data and temporarily stores it in the channel.
Channel components: is used in the agent for temporary storage of data, can be stored in memory, JDBC, file, and other customizations.
The data in the channel is not deleted until the sink is sent successfully.
Sink component: A component used to send data to a destination including HDFs, logger, Avro, thrift, IPC, file, NULL, HBase, SOLR, Kfka, and so on.
It is important to note that Flume provides a large number of built-in source, channel, and sink types. Different types of source,channel and sink can be freely combined. The combination is based on user-set profiles and is very flexible. For example, a channel can persist an event in memory, or it can be persisted to a local hard disk. Sink can write logs to HDFs, HBase, or even another source, and so on. Flume support users to establish multi-level flow, that is to say, multiple agents can work together, and support fan-in, fan-out, contextual Routing, Backup Routes, which is the place of NB. As shown in the following:
650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/88/8F/wKiom1f7bY-Cb3PPAACw3q9ovVU703.png "title=" 3.png " alt= "Wkiom1f7by-cb3ppaacw3q9ovvu703.png"/>
Second, installation, configuration
1, website address: http://flume.apache.org/
2,flume installation Configuration
A, configure the Java environment variables first
Tar xvf/soft/jdk-7u79-linux-x64.tar.gz-c/soft Vim/etc/profile#java export Java_home=/soft/jdk1.7.0_79/export CLASSP Ath=.: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jarexport path= $PATH:/$JAVA _home/bin: $HADOOP _home/binsource/ Etc/profile
B, configure Flume
Tar xvf apache-flume-1.6.0-bin.tar.gz-c/usr/local/elk/mv apache-flume-1.6.0 usr/local/elk/apache-flume Cd/usr/lo CAL/ELK/APACHE-FLUME/CONFCP flume-env.sh.template flume-env.sh VI conf/flume-env.shjava_home=/soft/jdk1.8.0_101
C, verify that the installation is successful
/usr/local/elk/apache-flume/bin/flume-ng versionflume 1.6.0Source Code repository:https://git-wip-us.apache.org/ Repos/asf/flume.gitrevision:8633220df808c4cd0c13d1cf0320454a94f1ea97compiled by Hshreedharan on Wed 7 14:49:18 PDT 2014From Source with checksum a01fe726e4380ba0c9f7a7d222db961f
Indicates successful installation
Iii. cases of Flume
1) Case 1:avro
The cases referred to here are defined in the form of source.
Avro can send a given file to the Flume,avro source using the Avro RPC mechanism.
A) Creating an agent configuration file
cd /usr/local/elk/apache-flume/conf vim avro.confa1.sources = r1a1.sinks = k1a1.channels = c1# describe/configure the sourcea1.sources.r1.type = avroa1.sources.r1.channels = c1a1.sources.r1.bind = 0.0.0.0a1.sources.r1.port = 4141# describe the sinka1.sinks.k1.type = logger output The collected logs to the console # use a channel which buffers events in memorya1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transactioncapacity = 100# bind the source and sink  TO THE CHANNELA1.SOURCES.R1.CHANNELS = C1A1.SINKS.K1.CHANNEL = C1
2) Case 1:exec
The cases referred to here are defined in the form of source.
Ecex can monitor a file in real time, using Tail-f/opt/logs/usece.log.
A) Creating an agent configuration file
vim exec.conf a2.sources = r2a2.sinks = k2a2.channels = c2#describe/configure the sourcea2.sources.r2.type = execa2.sources.r2.channels = c2a2.sources.r2.command=tail -f /opt/logs/usercenter.log# Describe the sinka2.sinks.k2.type = file_rolla2.sinks.k2.channel = c2a2.sinks.k2.sink.directory = /opt/flume writes the collected logs to this directory # use a channel which buffers events in memorya2.channels.c2.type = Memorya2.channels.c2.capacity = 1000a2.channels.c2.transactioncapacity = 100# bind the source and sink to the channela2.sources.r2.channels = C2A2.SINKS.K2.CHANNEL = C2
For more case references:
Http://www.aboutyun.com/thread-8917-1-1.html
This article is from the "Crazy_sir" blog, make sure to keep this source http://douya.blog.51cto.com/6173221/1860390
Flume detailed introduction, installation, configuration