Flume Simple Introduction
When you see this article, you should have a general understanding of the flume but to take care of the students just getting started, so still will say Flume, just start using flume do not need to understand too much inside things, only need to understand the following map can use the flume log data into the Kafka, The HDFs in the figure below is just a representative sink, and I sink in the actual use is Kafka
Flume Installation Flume Environment Preparation CentOS 6.5 JDK 1.7+ flume Download installation flume 1.7 Download link Installation flume
1.TAR-ZXVF apache-flume-1.7.0-bin.tar.gz
2.MV Apache-flume-1.7.0-bin Flume
3.CP conf/flume-conf.properties.template conf/flume-conf.properties # flume-conf.properties configuration Source,channel, Sink and other information
4.CP conf/flume-env.sh.template conf/flume-env.sh # flume-env.sh configuration agent Startup items and Java environment Variables flume configuration Configure Flume-conf.properties
AGENT.SOURCES=R1
agent.sinks=k1
agent.channels=c1
agent.sources.r1.type=exec
Agent.sources.r1.command=tail-f/data/logs/access.log
agent.sources.r1.restart=true
agent.sources.r1.batchsize=1000
agent.sources.r1.batchtimeout=3000
agent.sources.r1.channels=c1
Agent.channels.c1.type=memory
agent.channels.c1.capacity=102400
agent.channels.c1.transactionCapacity =1000
agent.channels.c1.bytecapacity=134217728
agent.channels.c1.bytecapacitybufferpercentage=80
agent.sinks.k1.channel=c1
Agent.sinks.k1.type=org.apache.flume.sink.kafka.kafkasink
Agent.sinks.k1.kafka.topic=xxxxx-kafka
agent.sinks.k1.kafka.bootstrap.servers=x.x.x.x:9092,x.x.x.x:9092
Agent.sinks.k1.serializer.class=kafka.serializer.stringencoder
agent.sinks.k1.flumebatchsize=1000
Agent.sinks.k1.useflumeeventformat=true
The command rule is the R1->source k1->sink c1->channels agent name configured with the parameter value of-N at the time you started flume-env.sh
Export java_home=/data/java/jdk1.8.0_102/
I only configure java_home and some of the JMX options that agent starts are not added, which can be added according to your needs
* Start Flume-agent
Start Flume-agent
./bin/flume-ng agent-c conf-f conf/flume-conf.properties-n agent-dflume.root.logger=info,console
-C Profile Directory-F Specify Flume profile-N flume client name Dflume log the info level logs at startup to summarize at the console
1.flume can define your own source,sink, you can do according to their own needs to modify or git the address, from GitHub pull code such as you just modify a module code only need to remove the previous jar, the compiled jar can be thrown up, Other games can be seen in official documents
2. In use memory channels when the agent is killed, the data will be lost and not restored.
3.flume is very flexible in the day aggregation, can be composed of various play such as I get data from a TCP port to the other Flume agent medium
4. Suggest to read an official document