Reprint marked Source: http://www.cnblogs.com/adealjason/p/6240122.htmlRecently want to play a nasty calculation, first saw the implementation of the principle of flume and source codeSource can go to Apache official website to downloadThe following flume principle and code implementation:Flume is a real-time data collection tool, one of the ecosystem of Hadoop,
Flume Official document translation--flume 1.7.0 User Guide (unreleased version) (i)Flume Official document translation--flume 1.7.0 User Guide (Unreleased version) (ii)Flume Properties
Property Name
Default
Description
Flume.call
In flume1.5.2, if you want to get flume related metrics through HTTP monitoring, add the following after the startup script:-dflume.monitoring.type=http-dflume.monitoring.port=34545MonitoringThe-D attribute can be obtained directly through system.getproerties (), so the above two properties are read by Method Loadmonitoring (), and the method is flume in the portal application private void Loadmonitoring ()
collector is the receiving agent sent data, the data sent to the specified target machine.
Note: The flume framework's reliance on Hadoop and zookeeper is only on the jar package and does not require that the Hadoop and zookeeper services be started when the flume is started. III.
combine the above source code to look at:If you configure 10 seconds to scroll once, write for 2 seconds, just this time the file contents of the block in the copy, then although not to 10 seconds, will still give you scrolling files, file size, the number of events configured similarly.In order to solve the above problem, we just let the program not be aware of the writing file block is being copied on the line, how to do it??Just let the isunderreplicated () method always return False.This me
the/home/hadoop directory and you have completed 50%:) simple2) Modify the flume-env.sh configuration file, mainly the Java_home variable settings[Email protected]:/home/hadoop/flume-1.5.0-bin# cp conf/flume-env.sh.template conf/flume
.sinks.k1.hdfs.filePrefix = events-#产生日志的前缀A4.sinks.k1.hdfs.fileType = DataStream #纯文本方式接收Do not generate files according to the number of barsA4.sinks.k1.hdfs.rollCount = 0 #多少条flush成1个文件Generate a file when the file on HDFs reaches 128MA4.sinks.k1.hdfs.rollSize = 134217728 #文件达到多大时flush成一个文件The file on HDFs reaches 60 seconds to generate a fileA4.sinks.k1.hdfs.rollInterval = #flush成一个文件的时间间隔Assemble source, channel, sinkA4.sources.r1.channels = C1A4.sinks.k1.channel = C13. Start FlumeSwitch to
. Different types of source,channel and sink can be freely combined. The combination is based on user-set profiles and is very flexible. For example, a channel can persist an event in memory, or it can be persisted to a local hard disk. Sink can write logs to HDFs, HBase, or even another source, and so on. Flume support users to establish multi-level flow, that is to say, multiple agents can work together, and support fan-in, fan-out, contextual Routi
smallest independent operating unit of the agent. An agent is a JVM. The single agent consists of three components: source, sink, and channel.Two-start flume cluster1 ' first, start the Hadoop cluster (see the previous blog for details).2 ' second, (all the rest of the steps need to be done on master ) to install and configure the Flume task, which reads as foll
Flume + Solr + log4j build web Log collection system, flumesolr
Preface
Many web applications use ELK as the log collection system. Flume is used here because they are familiar with the Hadoop framework and Flume has many advantages.
For details about Apache Hadoop Ecosystem
Questions Guide:1.flume-ng and Scribe, where is the advantage of Flume-ng?2. What issues should be considered in architecture design considerations?3.Agent How can I fix it?Does 4.Collector panic have an impact?What are the measures for 5.flume-ng reliability (reliability)?The U.S. mission's log collection system is responsible for the collection of all business
makes the system easy to monitor and maintain, and the master allows multiple (using ZooKeeper for management and load balancing ), this avoids spof.(3) manageabilityAll agents and colletors are centrally managed by the master, which makes the system easy to maintain. In the case of multiple masters, Flume uses ZooKeeper and gossip to ensure dynamic configuration data consistency. You can view the execution of each data source or data stream on the m
Apache Flume is a distributed, reliable, and efficient system that collects, aggregates, and moves data from disparate sources to a centralized data storage center. Apache Flume is not just used in log collection. Because data sources can be customized,flume can use the transfer of a large number of custom event data, including but not limited to website traffic
Original link: Kafka combat-flume to KAFKA1. OverviewIn front of you to introduce the entire Kafka project development process, today to share Kafka how to get the data source, that is, Kafka production data. Here are the directories to share today:
Data sources
Flume to Kafka
Data source Loading
Preview
Let's start today's shared content.2. Data sourcesThe data produced by Kafka i
-in : (1) Let update support $setOnInsert, (2) to resolve the update $set, $inc is empty, raised exception bug ; (3) When a bulk insert is resolved, a bug in which subsequent logs of the same batch insert are discarded because one of the logs has duplicate exception.7,Flume and fluentd very similar , but from the Hadoop ecosystem flume more popular, so I choose
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.