Acquisition Layer Flume can be used mainly , Kafka two kinds of technology. Flume:Flume is a pipeline flow method that provides a number of default implementations that allow users to deploy through parameters and extend the API. Kafka:Kafka is a durable, distributed message queue.
The Kafka is a very versatile system. You can have many producers and many consumers sharing multiple theme Topics. By contrast ,
Acquisition Layer can be used mainly Flume, Kafka two kinds of technology. Flume:Flume is a pipeline flow method that provides a number of default implementations that allow users to deploy through parameters and extend the API. Kafka:Kafka is a durable, distributed message queue.
The Kafka is a very versatile system. You can have many producers and many consumers sharing multiple theme Topics. By contrast ,
Https://www.ibm.com/developerworks/cn/opensource/os-cn-kafka/index.htmlKafka and Flume Many of the functions are really repetitive. Here are some suggestions for evaluating the two systems:
Kafka is a general-purpose system. You can have many producers and consumers to share multiple themes. Conversely, Flume is designed to work for a specific purpose and is sent specifically to HDFS and HBase.
entire data transfer process, the event is flowing. The transaction guarantee is at the event level. Flume can support multi-level flume agent, support fan-in (fan-in), fan-out (fan-out).Second, the Environment preparation1) Hadoop cluster (landlord version 2.7.3, a total of 6 nodes, can refer to http://www.cnblogs.com/qq503665965/p/6790580.html)2)
.sources.r1.command = Tail-f/home/hadoop/flume/flume/conf/source.txt (big difference in size tail-f solves a big problem for us)Sinks Component Type is loggerA1.sinks.k1.type = LoggerChannels component type is memoryA1.channels.c1.type = Memorya1.channels.c1.capacity = 1000a1.channels.c1.transactionCapacity = 100Connect the sources, the sinks and the pipes.A1.sou
will be explained below.2.1 Add Zabbix Monitor ServiceOn the one hand, flume itself provides HTTP, ganglia monitoring services, while we are currently mainly using Zabbix to monitor. As a result, we have added the Zabbix monitoring module to Flume and seamlessly converged with SA's monitoring services.On the other hand, purify the flume metrics. Send only the me
, Memoryrecoverchannel, FileChannel. Memorychannel can achieve high-speed throughput, but cannot guarantee the integrity of the data. Memoryrecoverchannel has been built to replace the official documentation with FileChannel. FileChannel guarantees the integrity and consistency of the data. When configuring FileChannel specifically, it is recommended that the directory and program log files that you set up FileChannel be saved to a different disk for increased efficiency.Sink when setting up sto
. Here you need to replace protobuf under the lib folder of flume with 2.5.0 version in the Hadoop-2.2.0, you also need to replace guava under the lib folder of flume with guava in the hadoop-2.2.0, delete the original corresponding jar file. Start to take effect.
SimpleHbaseEventSerializer in
flume cannot satisfy, so we have added a lot of features to the flume based on open source, modified some bugs, and made some tuning. Some of the key aspects will be explained below. 2.1 Add Zabbix Monitor Service on the one hand, flume itself provides HTTP, ganglia monitoring services, while we are currently mainly using Zabbix to monitor. As a result, we have
.channels.c2.capacity = 1000a2.channels.c2.transactioncapacity = 100# definition Sinka2.sinks.k2.type=hdfsa2.sinks.k2.hdfs.path = hdfs://life-hadoop.life.com:8020/user/yanglin/flume/ hive-tail# each time you refresh the number of event to HDFs, default: 100a2.sinks.k2.hdfs.batchsize=10# Modify the file type, Default: sequencefilea2.sinks.k2.hdfs.filetype=datastream# Modify the Write format of the file, default: writablea2.sinks.k2.hdfs.writeformat=tex
/flume-ng Agent--conf/home/flume/conf--conf-file/home/flume/conf/netcat.conf--name Agent2- dflume.monitoring.type=http-dflume.monitoring.port=34545A general explanation:--name Agent2 Specifies the name of the agent that is currently running--conf/home/flume/conf This parameter is best to specify the absolute path, indi
Recently learned the use of the next flume, in line with the company will be independent of the development of the log system, the official website user manual: Http://flume.apache.org/FlumeUserGuide.htmlFlume schema A. ComponentFirst move, the structure of the Internet.As you can see from the diagram, the Flume event is defined as a data stream, a data stream consisting of an agent, which is actually a JVM
.flume can support multi-level flume agent, support fan-in (fan-in), fan-out (fan-out).Fan-in refers to: source can receive multiple inputsFan-out refers to: sink can output multiple destinationsFlume Installation:1. Unzip each of these two files in the node:2. Copy the SRC content to the bin:
Cp-ri apache-flume-1.4.0-src/* apache-
This article introduces flume data insert hdfs and common directory (), this article continues to introduce flume-ng to insert data into the hbase-0.96.0.
First, modify the flume-node.conf file in the conf directory under the flume folder in node (for the original configuration, refer to the above) and make the followi
a single-node flume deployment1 Hadoop PreparationCreate the Flume directory in HDFs and assign permissions for the flume directory to flume usersHDFs Dfs-mkdir FlumeHDFs Dfs-chown-r Flume:flume/flume2 flume-env.shEnter ${flume_ho
a new file, 0 does not create a new file based on the time a1.sinks.k1.hdfs.rollinterval=60# HDFs, 0 not based on file size a1.sinks.k1.hdfs.rollsize=10240# No data is written when the temporary file is currently opened in the time specified by the parameter (in seconds). The temporary file is closed and renamed to the target file a1.sinks.k1.hdfs.idletimeout=3a1.sinks.k1.hdfs.filetype= datastreama1.sinks.k1.hdfs.uselocaltimestamp=true## generates a directory every five minutes: # Whether to en
," + "increasing capacity, or increasing thread count") ; }
Take before also pre-judgment, if the takelist is full, indicating take operation is too slow, there is an event accumulation phenomenon, you should adjust the transaction capacitywhat happens when a transaction commits, and what does the transaction commit?? Commit is a transaction commitTwo cases:1, put the event submissionwhile (!putlist.isempty ()) { if (!queue.offer (Putlist.removefirst ())) {
From Bin/flume this shell script can see Flume starting from the Org.apache.flume.node.Application class, which is where the main function of Flume is.
The main method first resolves the shell command, throwing an exception if the specified configuration file does not exist.
According to the command contains "no-reload-conf" parameters, decide which way to load t
This article describes Flume (spooling Directory source) + HDFS, and some of the source details in Flume are described in the article http://www.cnblogs.com/cnmenglang/p/6544081.html1. Material Preparation: apache-flume-1.7.0-bin.tar.gz2. Configuration steps:A. Upload to User (LZ user MFZ) directory under ResourcesB. UnzipTAR-XZVF apache-
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.