IconAs shown in the Red box section, I do stability testing, when the flume run a few days later, I found that the counter value gradually become larger, to a certain value, and then become smaller, there is a cycle of the process, and therefore the desire to produce research, the following to see:if (Txneventcount = = 0) { sinkcounter.incrementbatchemptycount (); } else if (Txneventcount = = batchsize) { Sinkcounter.incrementbatchc
internal selection of a valid sink for processingThe exception section, we found that triggered the informsinkfailed () method, let's take a look at the methodpublic void Informfailure (T failedobject) {//if There are no Backoff this method is a no-op. if (!shouldbackoff) {return; } failurestate state = Statemap.get (Failedobject); Long now = System.currenttimemillis (); Long delta = now-state.lastfail; /* * When do we increase the Backoff period? * We Basically calculate the ti
I. Installation deployment of Flume: Flume installation is very simple, only need to decompress, of course, if there is already a Hadoop environment The installation package Is: http://www-us.apache.org/dist/flume/1.7.0/apache-flume-1.7.0-bin.tar.gz 1. Upload the installatio
http://blog.csdn.net/hijk139/article/details/8308224Business systems need to collect monitoring system logs and think of the flume of Hadoop. After testing, although the function is not strong enough, but basically can meet the functional requirements. Flume is a distributed, reliable and highly available service Log Collection tool, capable of completing log col
First, Netcat source + memory Channel + logger SINK1. Modify Configuration1) Modify the flume-env.sh file under $flume_home/conf, modify the contents as followsExport JAVA_HOME=/OPT/MODULES/JDK1. 7. 0_672) under the $flume_home/conf directory, create the agent subdirectory, creating a new netcat-memory-logger.conf with the following configuration:# netcat-memory-logger# Name The components in this agenta1.sources=r1a1.sinks=K1a1.channels=c1# Describe/
good performance where multiple disks is not available for checkpoint and data Directori Es.It is natural that the channel data is synchronized to disk and performance degrades, but the checkpoint mechanism is added to prevent data loss.For the deformed memory channel, which is the memory channel and the file channel used together, we do not explain here, because this mixed use, the official also give hints-not recommended in the production environment to use.The reason for this is that data lo
a certain range, it will flushprivate void Flusheventbatch (listFlush is the event in the EventList that is now being saved and emptied1. Put the event into the configured channelFor (event event:events) { listHere is the detailed procedure for putting the event into the channel, but here you notice that there are two selector getchannel methods, because there are two types of channel selector modes: Multiplexing and Replication if (restart) { logger.info ("Restarting in {}ms, ex
1. Background information
Many of the company's platforms generate a large number of logs per day (typically streaming data, such as search engine PV, queries, etc.), and processing these logs requires a specific logging system, in general, these systems need to have the following characteristics:
(1) Construct the bridge of application system and analysis system, and decouple the correlation between them;
(2) Support near real-time online analysis system and similar to the offline analysis sys
1. Background information
Many of the company's platforms generate a large number of logs per day (typically streaming data, such as search engine PV, queries, etc.), and processing these logs requires a specific logging system, in general, these systems need to have the following characteristics:
(1) Construct the bridge of application system and analysis system, and decouple the correlation between them;
(2) Support near real-time online analysis system and similar to the offline analysis syst
.sinks.k2.port=9988
a1.sinks.k2.channel=c2
Note: If you want to implement the fan-out of the event data stream, you need to configure multiple channel and sink, if only one channel, multiple sink, then the event data through the channel will be more than sink mutually exclusive consumption.
2) b1...bn configuration:
B1.SOURCES=R1
b1.sinks=k1
b1.channels=c1
#描述/Configure source
B1.sources.r1.type=avro
b1.sources.r1.bind=0.0.0.0
b1.sources.r1.port=9988
b1.sources.r1.channels=c1
b1.channels.
1) hostname error:
2011-11-14 11:44:55,497 ERROR com.cloudera.util.NetUtils: Unable to get canonical host name! test: test java.net.UnknownHostException: test: test at java.net.InetAddress.getLocalHost(InetAddress.java:1354) at com.cloudera.util.NetUtils.
Error cause: IP address cannot be obtained from hostname
Solution: add the host name to IP address ing in the/etc/hosts file.
2) Java does not follow
line 234: exec: java: not found
Error cause: the Java command does not exist.
In a complete large data processing system, in addition to the core of the Hdfs+mapreduce+hive composition Analysis system, data acquisition, result data export, task scheduling and other indispensable auxiliary systems are needed, and these auxiliary tools are There is a convenient open source framework in the Hadoop ecosystem. Log capture framework FlumeFlume is a distributed, reliable, and highly available system for collecting, aggregating, and
multiple agents, and that's how we make the difference.#我们知道agent包含了三个重要的组件, there's source,channel,sink.#那么我们也给这个三个组件分别取名字A2.sources = R1A2.channels = C1A2.sinks = K1#定义具体的source内容#这里是执行命令以及下面对应的具体命令#这个命令执行后的数据返回给这个sourceA2.sources.r1.type = SpooldirA2.sources.r1.spoolDir =/home/hadoop/hadoop-2.9.0/userlogs#定义具体的channel信息#我们source定义好了, we need to define our channel.A2.channels.c1.type = MemoryA2.channels.
that users do not need to focus on the way and address of the data store.
Provides interoperability for data processing tools like pig, MapReduce, and hive.
Chukwa:
Chukwa is a large cluster monitoring system based on Hadoop, which is contributed by Yahoo. Back to top Cloudera Series products:
Founding organization: Cloudera Company
1.Cloudera Manager:
There are four functions (1) Management (2) Monitoring (3) Diagnostics (4) integration
2.Cloudera
Install times wrong: Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (site) on project Hadoop-hdfs:an Ant B Uildexception has occured:input file/usr/local/hadoop-2.6.0-stable/hadoop-2.6.0-src/hadoop-hdfs-project/ Hadoop-hdfs/target/findbugsxml.xml
The main introduction to the Hadoop family of products, commonly used projects include Hadoop, Hive, Pig, HBase, Sqoop, Mahout, Zookeeper, Avro, Ambari, Chukwa, new additions include, YARN, Hcatalog, O Ozie, Cassandra, Hama, Whirr, Flume, Bigtop, Crunch, hue, etc.Since 2011, China has entered the era of big data surging, and the family software, represented by
Create an empty file under /tmp/logs kafka.log; if there is no logs directory under the/tmp directory , you will need to start by creating a logs directory. 5.3. Create build log dataShellScriptCreate a kafkaoutput.sh script under the Hadoop User directory and give Execute permissions to output content to/tmp/logs/kafka.log. the specific contents of the kafkaoutput.sh script are as follows:For ((i=0;iDo echo "kafka_test-" + $i >>/tmp/logs/kafka.log;D
What is a. Flume?Flume is a distributed, reliable system. It can efficiently collect, consolidate, and move large amounts of data from different sources to data center storage.Flume is a top-level project under Apache. Flume not only collects consolidated log data, because the data source can be customized, flume can b
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.