http://blog.csdn.net/hijk139/article/details/8308224
Business systems need to collect monitoring system logs and think of the flume of Hadoop. After testing, although the function is not strong enough, but basically can meet the functional requirements. Flume is a distributed, reliable and highly available service Log Collection tool, capable of completing log collection, storage, analysis, and other tasks such as hadoop,hive, and more detailed descriptions can be found on the Apache website. The following is an introduction to the simple installation configuration method
1, download the Flume-ng installation package on the Internet, deploy on the server that collects and receive log files separately, the server needs to install JDK 1.6 above,
Http://flume.apache.org/download.html
TAR-ZXVF apache-flume-1.3.0-bin.tar.gz
2, the specific configuration of the new Conf/flume-conf.properties server end of the log file receive end is as follows
Receives data from the Avro source side and writes to the HDFs file system
[HTML]View Plaincopy
- [email protected] conf]$ cat flume-conf.properties
- agent.sources = avrosrc
- Agent.channels = MemoryChanne3
- Agent.sinks = Hdfssink
- # for each one of the sources, the type is defined
- Agent.sources.avrosrc.type = Avro
- Agent.sources.avrosrc.bind = 172.16.251.1
- Agent.sources.avrosrc.port = 44444
- # The channel can be defined as follows.
- Agent.sources.avrosrc.channels = MemoryChanne3
- # each channel ' s type is defined.
- Agent.channels.memoryChanne3.type = Memory
- Agent.channels.memorychanne3.keep-alive = Ten
- agent.channels.memoryChanne3.capacity = 100000
- agent.channels.memoryChanne3.transactionCapacity =100000
- # each sink ' s type must be defined
- Agent.sinks.hdfsSink.type = HDFs
- Agent.sinks.hdfsSink.channel = MemoryChanne3
- Agent.sinks.hdfsSink.hdfs.path =/logdata/%{hostname}_linux/%y%m%d_date
- Agent.sinks.hdfsSink.hdfs.filePrefix =%{datacenter}_
- Agent.sinks.hdfsSink.hdfs.rollInterval = 0
- Agent.sinks.hdfsSink.hdfs.rollSize = 4000000
- Agent.sinks.hdfsSink.hdfs.rollCount = 0
- Agent.sinks.hdfsSink.hdfs.writeFormat = Text
- Agent.sinks.hdfsSink.hdfs.fileType = DataStream
- Agent.sinks.hdfsSink.hdfs.batchSize = Ten
If Flume and Hadoop are not the same users, you need to be aware of related permissions issues
3, log collection end of the Conf/flume-conf.properties server file configuration, here collect two log files to the collection end
[HTML]View Plaincopy
- agent.sources = tailsource-1 tailsource-2
- Agent.channels = memoryChannel-1 memoryChannel-2
- Agent.sinks = Remotesink remotesink-2
- Agent.sources.tailsource-1.type = exec
- Agent.sources.tailsource-1.command = tail-f/tmp/linux2.log
- Agent.sources.tailsource-1.channels = MemoryChannel-1
- Agent.sources.tailsource-2.type = exec
- Agent.sources.tailsource-2.command = tail-f/tmp/linux2_2.log
- Agent.sources.tailsource-2.channels = MemoryChannel-2
- Agent.sources.tailsource-1.interceptors = host_int timestamp_int inter1
- Agent.sources.tailsource-1.interceptors.host_int.type = Host
- Agent.sources.tailsource-1.interceptors.host_int.hostheader = hostname
- Agent.sources.tailsource-1.interceptors.timestamp_int.type = Org.apache.flume.interceptor.timestampinterceptor$builder
- #agent.sources.tailsource-1.interceptors = inter1
- Agent.sources.tailsource-1.interceptors.inter1.type = static
- Agent.sources.tailsource-1.interceptors.inter1.key = Datacenter
- Agent.sources.tailsource-1.interceptors.inter1.value = Beijing
- Agent.sources.tailsource-2.interceptors = host_int timestamp_int inter1
- Agent.sources.tailsource-2.interceptors.host_int.type = Host
- Agent.sources.tailsource-2.interceptors.host_int.hostheader = hostname
- Agent.sources.tailsource-2.interceptors.timestamp_int.type = Org.apache.flume.interceptor.timestampinterceptor$builder
- #agent.sources.tailsource-1.interceptors = inter1
- Agent.sources.tailsource-2.interceptors.inter1.type = static
- Agent.sources.tailsource-2.interceptors.inter1.key = Datacenter
- Agent.sources.tailsource-2.interceptors.inter1.value = linux2_2
- Agent.channels.memorychannel-1.type = Memory
- Agent.channels.memorychannel-1.keep-alive = Ten
- agent.channels.memorychannel-1.capacity = 100000
- agent.channels.memorychannel-1.transactioncapacity =100000
- Agent.channels.memorychannel-2.type = Memory
- Agent.channels.memorychannel-2.keep-alive = Ten
- agent.channels.memorychannel-2.capacity = 100000
- agent.channels.memorychannel-2.transactioncapacity =100000
- Agent.sinks.remotesink.type = Avro
- Agent.sinks.remotesink.hostname = 172.16.251.1
- Agent.sinks.remotesink.port = 44444
- Agent.sinks.remotesink.channel = MemoryChannel-1
- Agent.sinks.remotesink-2.type = Avro
- Agent.sinks.remotesink-2.hostname = 172.16.251.1
- Agent.sinks.remotesink-2.port = 44444
- Agent.sinks.remotesink-2.channel = MemoryChannel-2
4, running in the background
Nohup bin/flume-ng agent-n agent-c conf-f conf/flume-conf.properties >1.log &
View Log VI Flume.log
Port connection Condition Netstat-an|grep 44444
[Email protected] flume-1.4]$ Netstat-an|grep 44444
TCP 0 0:: ffff:172.16.251.1:44444:::* LISTEN
5, test method
You can use a script like the following to periodically write to a log file for testing
For i in {1..1000000}; Do echo "LINUX2 press ************* Flume log rotation $i" >>/tmp/linux3.log; Sleep 0.0001; Done
Resources:
Http://flume.apache.org/FlumeUserGuide.html
Flume ng 1.3 mounting (RPM)