Stand-alone operation
First, the Environment preparation
Flume 1.6.0
Hadoop 2.6.0
Spark 1.6.0
Java version 1.8.0_73
Kafka 2.11-0.9.0.1
Zookeeper 3.4.6
Second, the configuration
Spark and Hadoop configuration see ()
Kafka and zookeeper using the default configuration
1. Kafka Configuration
Start
Bin/kafka-server-start. SH config/server.properties
Create a topic of test
Bin/kafka-topics. SH 2181 1 1 --topic test
2, flume configuration file, create a new dh.conf file, configured as follows
Where the content sent is apache-tomcat-8.0.32 's access log
#defineC1Agent1.channels.c1.type=memoryagent1.channels.c1.capacity=2000000agent1.channels.c1.transactionCapacity= -#defineC1 End#defineC2Agent1.channels.c2.type=memoryagent1.channels.c2.capacity=2000000agent1.channels.c2.transactionCapacity= -#defineC2 End#defineSource Monitor A fileAgent1.sources.avro-s.type =Execagent1.sources.avro-s.command =Tail-f-n+1/usr/local/hong/apache-tomcat-8.0. +/logs/localhost_access_log. .- Geneva- +. Txtagent1.sources.avro-s.channels =C1 C2agent1.sources.avro-s.threads =5# Send to Hadoopagent1.sinks.log-hdfs.channel =C1agent1.sinks.log-hdfs.type =Hdfsagent1.sinks.log-hdfs.hdfs.path = HDFs://Vm:9000/flumeAgent1.sinks.log-hdfs.hdfs.writeformat =Textagent1.sinks.log-hdfs.hdfs.filetype =DataStreamagent1.sinks.log-hdfs.hdfs.rollinterval =0Agent1.sinks.log-hdfs.hdfs.rollsize =1000000Agent1.sinks.log-hdfs.hdfs.rollcount =0Agent1.sinks.log-hdfs.hdfs.batchsize = +Agent1.sinks.log-hdfs.hdfs.txneventmax = +Agent1.sinks.log-hdfs.hdfs.calltimeout =60000Agent1.sinks.log-hdfs.hdfs.appendtimeout =60000#send to Kafakaagent1.sinks.log-sink2.type =Org.apache.flume.sink.kafka.KafkaSinkagent1.sinks.log-sink2.topic =Testagent1.sinks.log-sink2.brokerlist = VM:9092Agent1.sinks.log-sink2.requiredacks =1Agent1.sinks.log-sink2.batchsize = -Agent1.sinks.log-sink2.channel =c2# Finally, now that we've defined all the components, tell# Agent1whichones we want to activate.agent1.channels=C1 c2agent1.sources= avro-sagent1.sinks= Log-hdfs Log-sink2
Third, test flume send
1. Start HDFs
./start-dfs. SH
2. Start Zookeeper
./zkserver. SH start
3, the Kafka see above
4. Start Flume
Flume-ng agent-c conf-f dh.conf-n agent1-dflume.root.logger=info,console
Iv. test Results
Run the Kafka consumer View
Bin/kafka-console-consumer. sh --zookeeper localhost:2181 --topic test--from-beginning
You can see that Kafka and Flume are configured successfully
Access HDFs View if/flume can download files to view verify that HDFs send is successful
Spark Learning Lambda schema log analysis pipeline