Http://blog.csdn.net/alphags/article/details/52862578?locationNum=10&fps=1
This article mainly refers to from the Apache Flume user documentation (http://flume.apache.org/FlumeUserGuide.html), because the Apache Flume 1.X Chinese resources are not many, So here's the process of documenting my deployment, hoping to give some hints to people with the same needs.
(A lot of English documents, here only write some of my own use) Overview
Apache Flume is an efficient distributed log collection system that can centralize a large amount of log data from different data sources. (PS: That's enough to know) System requirments
1, JDK 1.7+
2, sufficient memory
3, disk and free space
4, to have the corresponding directory of Read and write permission Data flow model
As you can see from the diagram, each agent contains a source, a channel, a sink
Source can be understood as a data source (log file, AVRO 、... There are a lot of reading documents to know, I only use the file
Sink can be understood as a data destination (there are also many, I test the environment is directly written to the file)
Channel can be understood as a data flow pipeline (there are a lot of types, the documentation given in the example is used in memory, but the memory is not stable, so my test environment also changed to a file)
A simple expression of three functions (not strictly stated): Source reads the log data and writes it to the channel, sink reads the data from the channel and writes it to its designated place. If sink write fails here, the data will accumulate in the channel until Sink returns to normal (this ensures that the log data is not lost)
Multiple Apache Flume agents can also be joined together, as shown in the following figure
Know the above content can start to build test environment of hardware environment
Three server IP addresses are 192.168.0.101~103, all using the Linux Ubuntu 12.04 server operating system architecture
installation process
wget http://mirrors.cnnic.cn/apache/flume/1.7.0/apache-flume-1.7.0-bin.tar.gz #下载压缩包
tar-xvzf apache-flume*.tar.gz #解压
mv apache-flume/data/local/flume #本人比较喜欢把把软件安装在/data/local Directory
1 2 3 configuration
I'm sending my local configuration directly here.
# flume.conf:a Flume Configuration # Agent a1 a1.sources = R1 a1.sinks = K1 a1.channels = C1 # source Configuration a1.sources.r1. Type = Exec A1.sources.r1.command = tail-f/data/logs/system.log # sink configuration A1.sinks.k1.type=avro a1.sinks.k1.hostname=19 2.168.0.101 a1.sinks.k1.port=4545 # Channel Configuration A1.channels.c1.type = File a1.channels.c1.checkpointdir=/data/logs/
Channels/a1/checkpoint a1.channels.c1.dataDirs =/data/logs/channels/a1/data # binding source, single to channel A1.sources.r1.channels = C1 A1.sinks.k1.channel = C1 #ageng A2 a2.sources=r2 a2.sinks=k2 a2.channels=c2 #a2 Source Configuration A2. Sources.r2.type=avro a2.sources.r2.bind=192.168.0.101 a2.sources.r2.port=4545 #a2 Sink Configuration writes the merged log data to the/data/local/
Collector Directory A2.sinks.k2.type = file_roll a2.sinks.k2.sink.directory =/data/local/collector a2.sinks.k2.sink.rollinterval=3600 # #下面是注释掉的代码为配置sink, the log is merged into a separate folder after the day #a2. Sinks.k2.type=hdfs # a2.sinks.k2.hdfs.path=hdfs://hadoop-master:9000/events/%y-%m-%d #a2. sinks.k2.hdfs.fileprefix=events-#a2. SiNks.k2.hdfs.rollinterval=0 #a2. sinks.k2.hdfs.rollsize=0 #a2. sinks.k2.hdfs.rollcount=0 A2.sinks.k2.hdfs.uselocaltimestamp=true #a2 Channel Configuration A2.channels.c2.type = File a2.channels.c2.checkpointdir=/data/
Logs/channels/a4/checkpoint a2.channels.c2.dataDirs =/data/logs/channels/a4/data # binding source, single to channel A2.SOURCES.R2.CHANNELS=C4 A2.SINKS.K2.CHANNEL=C4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26-27--28 29---30 31--32 33 34 35 36 37 38-39 40 41 42 45 46 47 48 49 50 51 52 53 54 55 56 57-58
Description: Run
Use the following command on the 192.168.0.101 to start Agent2 (A2)
Bin/flume-ng agent--conf conf--conf-file./conf/flume.conf--name A2-dflume.root.logger=info,console
1
Start another terminal in 192.168.0.101, run the following command to start Agent1 (A1)
Bin/flume-ng agent--conf conf--conf-file./conf/flume.conf--name A1-dflume.root.logger=info,console
1
Start Agent1 (A1) on 192.168.0.102~103, respectively
Bin/flume-ng agent--conf conf--conf-file./conf/flume.conf--name A1-dflume.root.logger=info,console
1 Python applet for generating test data
#!/usr/bin/python
import OS
import random from time
import Ctime,sleep to
I in range (1,1000000):
Smil=random.randint (50,100)
print smil/1000.0
com= "echo \" Hello message from 202\t "+str (i) +" \ ">>/data /logs/system.log ";
Print com
os.system (COM) sleep
(smil/1000.0)
1 2 3 4-5 6 7 8 9 10 11 test results
You can see that multiple merged log file PS was generated in the directory/data/logs/collector directory:
Suggest you read Flume's document, although the English reading is more tired but everybody makes technology who does not understand a little English, so the next point Kung Fu still can understand.
If you still do not understand the Flume workflow through this article, we suggest that you learn the a simple example given in the document. (For a simple explanation below)
# example.conf:a Single-node Flume Configuration
# Name The components on this agent
a1.sources = R1
a1.sinks = K1
a1.channels = C1
# describe/configure the source
a1.sources.r1.type = netcat A1.sources.r1.bind
= l Ocalhost
a1.sources.r1.port = 44444
# Describe the sink
a1.sinks.k1.type = Logger
# Use a channel which B Uffers events in memory
a1.channels.c1.type = Memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity =
# Bind The source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = C1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22-23
Original: This configuration defines a single agent named A1. A1 has a source that is listens for data on the port 44444, a channel that buffers event data in memory, and a sink that logs Eve NT data to the console.
This configuration defines an agent called A1. A1 has a source listening on port 44444 (receiving any received data from Port 44444), and a Memory pipeline (channel) will hold all incoming data in memory, and a sink will print the received data to the console.
$ bin/flume-ng Agent--conf conf--conf-file example.conf--name A1-dflume.root.logger=info,console
1
Executes the above command in a different command window (no telnet.) Apt-get install telnet or yum install telnet installation)
$ telnet localhost 44444
trying 127.0.0.1 ...
Connected to Localhost.localdomain (127.0.0.1).
Escape character is ' ^] '.
Hello world! <ENTER>
OK
1 2 3 4 5 6
We can see the flume console print out
12/06/19 15:32:19 INFO Source. Netcatsource:source starting
12/06/19 15:32:19 INFO Source. netcatsource:created serversocket:sun.nio.ch.serversocketchannelimpl[/127.0.0.1:44444]
12/06/19 15:32:34 INFO Sink. Loggersink:event: {headers:{} body:48 6C 6C 6F, 6F, 6C, 0D, Hello world!.}
1 2 3 Copyright notice: This article is the main original article, without the permission of the blogger may not be reproduced.