Flume is a highly available, highly reliable, distributed mass log capture, aggregation, and transmission system provided by Cloudera, Flume supports the customization of various data senders in the log system for data collection, while Flume provides simple processing of data The ability to write to various data-receiving parties (customizable).
Using flume from Kafka data to HDFs
The configuration file is as follows:
Flumetohdfs_agent.sources = Source_from_kafka Flumetohdfs_agent.channels = Mem_channel Flumetohdfs_agent.sinks = Hdfs_ Sink #auto. Commit.enable = true # # Kerberos config # # # #flumetohdfs_agent. Sinks.hdfs_sink.hdfs.kerberosPrincipal = flume/ Datanode2.hdfs.alpha.com@OMGHADOOP.COM #flumetohdfs_agent. Sinks.hdfs_sink.hdfs.kerberosKeytab =/root/ Apache-flume-1.6.0-bin/conf/flume.keytab # For each one of the sources, the type is defined FLUMETOHDFS_AGENT.SOURCES.S Ource_from_kafka.type = Org.apache.flume.source.kafka.KafkaSource Flumetohdfs_agent.sources.source_from_ Kafka.zookeeperconnect = 10.129.142.46:2181,10.166.141.46:2181,10.166.141.47:2181/testkafka Flumetohdfs_ Agent.sources.source_from_kafka.topic = itil_topic_4097 #flumetohdfs_agent. sources.source_from_kafka.batchSize = 10000 Flumetohdfs_agent.sources.source_from_kafka.groupId = flume4097 flumetohdfs_agent.sources.source_from_
Kafka.channels = Mem_channel # The channel can be defined as follows. Flumetohdfs_agent.sinks.hdfs_sink.tyPE = HDFs #flumetohdfs_agent. Sinks.hdfs_sink.filePrefix =%{host} Flumetohdfs_agent.sinks.hdfs_sink.hdfs.path = hdfs:/ /10.49.133.77:9000/data/4097/ds=%y%m%d # # every hour (after gz) flumetohdfs_agent.sinks.hdfs_sink.hdfs.rollSize =
0 Flumetohdfs_agent.sinks.hdfs_sink.hdfs.rollCount = 0 flumetohdfs_agent.sinks.hdfs_sink.hdfs.rollInterval = 3600 Flumetohdfs_agent.sinks.hdfs_sink.hdfs.threadsPoolSize = #flumetohdfs_agent. Sinks.hdfs_sink.hdfs.codeC = gzip # Flumetohdfs_agent.sinks.hdfs_sink.hdfs.fileType = Compressedstream Flumetohdfs_agent.sinks.hdfs_ Sink.hdfs.filetype=datastream Flumetohdfs_agent.sinks.hdfs_sink.hdfs.writeformat=text #Specify the channel the sink
should use Flumetohdfs_agent.sinks.hdfs_sink.channel = Mem_channel # The channel ' s type is defined. Flumetohdfs_agent.channels.mem_channel.type = memory # Other config values specific to each type of channel (sink or sour CE) # can be defined as well # in this case, it specifies the capacity of the memory ChanneL flumetohdfs_agent.channels.mem_channel.capacity = 100000 flumetohdfs_agent.channels.mem_ Channel.transactioncapacity = 10000
Start Agent:
./flume-ng Agent--conf. /conf/-N flumetohdfs_agent-f. /conf/flume-conf-4097.properties
The name of the agent (-n flumetohdfs_agent) must be consistent with the name in the configuration file, the default output HDFs file format is Sequencefile, cannot directly open the browsing, you can set the output format as text:
Flumetohdfs_agent.sinks.hdfs_sink.hdfs.filetype=datastream
Flumetohdfs_agent.sinks.hdfs_sink.hdfs.writeformat=text
You can also set the compression output:
Flumetohdfs_agent.sinks.hdfs_sink.hdfs.codeC = gzip
Flumetohdfs_agent.sinks.hdfs_sink.hdfs.fileType = Compressedstream
See Flume User Guide for more information: http://flume.apache.org/FlumeUserGuide.html
From Kafka to hive:http://geek.csdn.net/news/detail/97941