Flume from Kafka Guide data to HDFs

Source: Internet
Author: User

Flume is a highly available, highly reliable, distributed mass log capture, aggregation, and transmission system provided by Cloudera, Flume supports the customization of various data senders in the log system for data collection, while Flume provides simple processing of data The ability to write to various data-receiving parties (customizable).



Using flume from Kafka data to HDFs

The configuration file is as follows:

Flumetohdfs_agent.sources = Source_from_kafka Flumetohdfs_agent.channels = Mem_channel Flumetohdfs_agent.sinks = Hdfs_ Sink #auto. Commit.enable = true # # Kerberos config # # # #flumetohdfs_agent. Sinks.hdfs_sink.hdfs.kerberosPrincipal = flume/ Datanode2.hdfs.alpha.com@OMGHADOOP.COM #flumetohdfs_agent. Sinks.hdfs_sink.hdfs.kerberosKeytab =/root/ Apache-flume-1.6.0-bin/conf/flume.keytab # For each one of the sources, the type is defined FLUMETOHDFS_AGENT.SOURCES.S Ource_from_kafka.type = Org.apache.flume.source.kafka.KafkaSource Flumetohdfs_agent.sources.source_from_ Kafka.zookeeperconnect = 10.129.142.46:2181,10.166.141.46:2181,10.166.141.47:2181/testkafka Flumetohdfs_ Agent.sources.source_from_kafka.topic = itil_topic_4097 #flumetohdfs_agent. sources.source_from_kafka.batchSize = 10000 Flumetohdfs_agent.sources.source_from_kafka.groupId = flume4097 flumetohdfs_agent.sources.source_from_
Kafka.channels = Mem_channel # The channel can be defined as follows. Flumetohdfs_agent.sinks.hdfs_sink.tyPE = HDFs #flumetohdfs_agent. Sinks.hdfs_sink.filePrefix =%{host} Flumetohdfs_agent.sinks.hdfs_sink.hdfs.path = hdfs:/ /10.49.133.77:9000/data/4097/ds=%y%m%d # # every hour (after gz) flumetohdfs_agent.sinks.hdfs_sink.hdfs.rollSize =
0 Flumetohdfs_agent.sinks.hdfs_sink.hdfs.rollCount = 0 flumetohdfs_agent.sinks.hdfs_sink.hdfs.rollInterval = 3600 Flumetohdfs_agent.sinks.hdfs_sink.hdfs.threadsPoolSize = #flumetohdfs_agent. Sinks.hdfs_sink.hdfs.codeC = gzip # Flumetohdfs_agent.sinks.hdfs_sink.hdfs.fileType = Compressedstream Flumetohdfs_agent.sinks.hdfs_ Sink.hdfs.filetype=datastream Flumetohdfs_agent.sinks.hdfs_sink.hdfs.writeformat=text #Specify the channel the sink
should use Flumetohdfs_agent.sinks.hdfs_sink.channel = Mem_channel # The channel ' s type is defined. Flumetohdfs_agent.channels.mem_channel.type = memory # Other config values specific to each type of channel (sink or sour CE) # can be defined as well # in this case, it specifies the capacity of the memory ChanneL flumetohdfs_agent.channels.mem_channel.capacity = 100000 flumetohdfs_agent.channels.mem_ Channel.transactioncapacity = 10000

Start Agent:

./flume-ng Agent--conf. /conf/-N flumetohdfs_agent-f. /conf/flume-conf-4097.properties

The name of the agent (-n flumetohdfs_agent) must be consistent with the name in the configuration file, the default output HDFs file format is Sequencefile, cannot directly open the browsing, you can set the output format as text:

Flumetohdfs_agent.sinks.hdfs_sink.hdfs.filetype=datastream
Flumetohdfs_agent.sinks.hdfs_sink.hdfs.writeformat=text

You can also set the compression output:

Flumetohdfs_agent.sinks.hdfs_sink.hdfs.codeC = gzip
Flumetohdfs_agent.sinks.hdfs_sink.hdfs.fileType = Compressedstream


See Flume User Guide for more information: http://flume.apache.org/FlumeUserGuide.html

From Kafka to hive:http://geek.csdn.net/news/detail/97941


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.