Flume is a highly available, highly reliable, distributed mass log capture, aggregation, and transmission system provided by Cloudera, Flume supports the customization of various data senders in the log system for data collection, while Flume provides simple processing of
I. Overview1, now has three machines, respectively: HADOOP1,HADOOP2,HADOOP3, to HADOOP1 for the log summary2, HADOOP1 Summary of the simultaneous output to multiple targets3, flume a data source corresponding to multiple channel, multiple sink, is configured in the consolidation-accepter.conf fileIi. deploy flume to collect logs and summary logs1, running on the
Flume real-time crawl log data and upload to Kafka
1.Linux OK zookeeper is configured, start zookeeper first
sbin/zkserver.sh start
(sbin/zkserver.sh Status View startup state) JPS can check to see Le process as Quorumpeermain
2. Start Kafka,zookeeper need to start before Kafka
bin/kafka-server-start.sh Config/server.properties
3. Start consumer receive log
bin/kafka-console-consumer.sh--zookeeper localhos
Three myths about big data as the industry's interest in big data grows, one of my favorite topics I've done in 2013 was the big data public speaking more than any previous year in my career. I've made a lot of speeches at industr
Continue today to discuss the configuration of several agents.The first agent is to capture the output of a particular command execution from the terminal and output the file to a specific directory. First look at the configured code:Agent2.sources = Execsource //Specifies the Sourceagent2.sinks = Filesink to get output from the command =// output to file Sinkagent2.channels = FileChannel// output to file Channelagent2.sources.execsource.type = exec //Type Agent2.sources.execs
Flume acquisition Process:#说明: The case is Flume listening directory/home/hadoop/flume_kafka acquisition to Kafka;Start the clusterStart Kafka,Start the agent,Flume-ng agent-c. -f/home/hadoop/flume-1.7.0/conf/myconf/flume-kafka.conf-n A1-dflume.root.logger=info,consoleOpen C
Target: store in the Hive database by accepting HTTP request information for Port 1084,Osgiweb2.db the name of the database created for hivePeriodic_report5 for the Created data table,The flume configuration is as follows:a1.sources=R1 a1.channels=C1 a1.sinks= k1 =0.0. 0.01084a1.sources.r1.handler=Jkong. Test.httpsourcedpihandler #a1. Sources.r1.interceptors=i1 I2#a1. Sources.r1.interceptors.i2.type
Speaking of headings, this is only a small part of the real-time architecture.
Download the latest version flume:apache-flume-1.6.0-bin.tar.gz
Unzip, modify Conf/flume-conf.properties name can write casually.
What I currently achieve is to read the data from the directory to write to the Kafka, the principle of the east of the Internet a lot of, only to connect t
Flume Ng has 4 main components:Event represents the flow of data passed between flume agentsSource indicates that the event data stream is received from an external source and then passed to the channelChannel represents the temporary storage of the event data stream passed
===========> create hbase tables and column families first Case 1: One row of source data corresponding to HBase (hbase-1.12 no problem)================================================================================#说明: The case is flume listening directory/home/hadoop/flume_hbase capture to HBase; You must first create the table and column families in HBaseData Catalog:Vi/home/hadoop/flume_hbase/word.txt1
During an experiment, when using flume 1.7 to capture local data to the HDFs file system, an error occurred due to the unreasonable configuration file. The error is as follows:[Warn-org.apache.hadoop.hdfs.dfsoutputstream$datastreamer.closeresponder (dfsoutputstream.java:611)] Caught exceptionJava.lang.InterruptedExceptionAt java.lang.Object.wait (Native Method)At Java.lang.Thread.join (thread.java:1281)At J
hours to 8 seconds, while MkI's genetic analysis time has been shortened from a few days to 20 minutes.Here, let's look at the difference between MapReduce and the traditional distributed parallel computing environment MPI. MapReduce differs greatly from MPI in its design purpose, usage, and support for file systems, enabling it to be more adaptable to processing needs in big data environments.What new met
#the name of sourceAgent.sources =Kafkasource#the name of channels, which is suggested to be named according to typeAgent.channels =Memorychannel#Sink's name, suggested to be named according to the targetAgent.sinks =Hdfssink#Specifies the channel name used by SourceAgent.sources.kafkaSource.channels =Memorychannel#Specify the name of the channel that sink needs to use, Note that this is the channelAgent.sinks.hdfsSink.channel =Memorychannel#--------kafkasource related configuration-------------
Http://www.cognoschina.net/club/thread-66425-1-1.html for reference only
"Automatic Big Data Mining" is the true significance of big data.
Nowadays, big data cannot work very well. Almost everyone is talking about
"Foreword" After our unremitting efforts, at the end of 2014 we finally released the Big Data Security analytics platform (Platform, BDSAP). So, what is big Data security analytics? Why do you need big Data security analytics? Whe
650) This. width = 650; "src =" http://s4.51cto.com/wyfs02/M01/88/F3/wKiom1gB-xOCREAoAAGSlTgPbXM571.jpg-wh_500x0-wm_3-wmp_4-s_1934323789.jpg "Title =" Big-data-1.jpg "alt =" wKiom1gB-xOCREAoAAGSlTgPbXM571.jpg-wh_50 "/>
Since 2015, big data has been removed from Gartner's new technological Hype Curve. The word "
Big Data projects are driven by business. A complete and excellent big data solution is of strategic significance to the development of enterprises.
Due to the diversity of data sources, data types and scales from different
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.