Apache Next version (1.6) will bring a new component Kafkachannel, as the name implies is to use Kafka as the channel, of course, in the CDH5.3 version already exists this channel.As you know, there are three main channel commonly used:1, Memory channel: With the channel, the advantage is the fastest, easy to configure; The disadvantage is that the reliability is the worst, because once the flume process hangs the memory of the data is not out;2, File
Big Data We all know about Hadoop, but not all of Hadoop. How do we build a large database project. For offline processing, Hadoop is still more appropriate, but for real-time and relatively strong, data volume is relatively large, we can use storm, then storm and what technology collocation, in order to do a suitable for their own projects.1. What are the characteristics of a good project architecture?2. How does the project structure ensure the accuracy of the data?3. What is Kafka?How does 4.
Http://blog.csdn.net/alphags/article/details/52862578?locationNum=10fps=1
This article mainly refers to from the Apache Flume user documentation (http://flume.apache.org/FlumeUserGuide.html), because the Apache Flume 1.X Chinese resources are not many, So here's the process of documenting my deployment, hoping to give some hints to people with the same needs.(A lot of English documents, here only write so
Question Guide: What is the problem with 1.Flume? 2. What are the additional features of Flume based on open source? How the 3.Flume system is tuned.
In the flume-based log collection system (a) architecture and design, we detail the architecture design of the flume
Here are the solutions to seehttps://issues.apache.org/jira/browse/SPARK-1729Please be personal understanding, there are questions please leave a message.In fact, itself Flume is not support like Kafka Publish/Subscribe function, that is, can not let spark to flume pull data, so foreigners think of a trickery way.In flume in fact sinks is to the channel initiativ
Install flume
1, to the official website download flume, download address: http://flume.apache.org/download.html
2, [root@bicloud77 home]# tar zxvf apache-flume-1.5.2-bin.tar.gz
3, [root@bicloud77 home]# CD Apache-flume-1.5.2-bin
4,[root@bicloud76 apache-flume-1.5.2-bin]# b
There are two ways, one is sparkstreaming in the driver from listening, flume to push the data, the other is sparkstreaming according to the time policy rotation to flume pull data.At first I thought there was only the first method, but the Nima problem is that driver up the knot is flaky, so every time I restart streaming found that every time to change the flume
Flume is a highly available, highly reliable, distributed mass log capture, aggregation, and transmission system provided by Cloudera, Flume supports the customization of various data senders in the log system for data collection, while Flume provides simple processing of data The ability to write to various data-receiving parties (customizable). The current
1) hostname error:
2011-11-14 11:44:55,497 ERROR com.cloudera.util.NetUtils: Unable to get canonical host name! test: test java.net.UnknownHostException: test: test at java.net.InetAddress.getLocalHost(InetAddress.java:1354) at com.cloudera.util.NetUtils.
Error cause: IP address cannot be obtained from hostname
Solution: add the host name to IP address ing in the/etc/hosts file.
2) Java does not follow
line 234: exec: java: not found
Error cause: the Java command does not exist.
In the distributed system, each machine has the local log that the program runs, sometimes in order to analyze the demand, have to these scattered log summary requirements, I believe many people will choose RSYNC,SCP, but they are not strong in real-time, but also bring the problem of name conflict. The scalability is not satisfactory, not elegant at all.In reality, we are confronted with the need to summarize the Nginx logs of multiple servers on the line in real time.
In a complete large data processing system, in addition to the core of the Hdfs+mapreduce+hive composition Analysis system, data acquisition, result data export, task scheduling and other indispensable auxiliary systems are needed, and these auxiliary tools are There is a convenient open source framework in the Hadoop ecosystem. Log capture framework FlumeFlume is a distributed, reliable, and highly available system for collecting, aggregating, and transmitting large volumes of logs.
Download apache-flume-1.7.0-bin.tar.gz, withTar -zxvfUnzip, add the settings in the/etc/profile file:Export Flume_home=/opt/apache-flume-1.7.0-binexport path= $PATH: $FLUME _home/binModify the two files under $flume_home/conf/and increase the java_home in flume-env.sh:java_home=/opt/jdk1.8.0_121Most importantly, modify
{ //no event, that is Backoffresult =Status.backoff; } //Commit a transactionTransaction.commit (); } Catch(Exception ex) {//rolling back a transactionTransaction.rollback (); Throw NewEventdeliveryexception ("Failed to log event:" +event, ex); } finally { //Close TransactionTransaction.close (); } returnresult; } } 3. Pack and place in/soft/flume/Lib under4, using the custom s
1. Background introduction Many of the company's platforms generate a large number of logs per day (typically streaming data, for example, the search engine PV, query, etc.), the processing of these logs requires a specific log system, in general, these systems need to have the following characteristics: (1) The construction of application systems and analysis systems of the bridge, and the correlation between them decoupling (2) support for near real-time online analysis system and off-line ana
1. Background information
Many of the company's platforms generate a large number of logs (typically streaming data, such as the PV of search engines, queries, etc.), which require a specific log system, which in general requires the following characteristics:
(1) Construct the bridge of application system and analysis system, and decouple the correlation between them;
(2) support the near real-time on-line analysis system and the off-line analysis system similar to Hadoop;
(3) with high scalabi
This article introduces flume data insert hdfs and common directory (), this article continues to introduce flume-ng to insert data into the hbase-0.96.0.
First, modify the flume-node.conf file in the conf directory under the flume folder in node (for the original configuration, refer to the above) and make the followi
Unify the time before building, turn off the firewall, use the jar package version is 1.6.0There are two ways to configure a serviceThe first type: The following steps:1. Pass the jar package to the Node1 and extract it to the root directory2. Change the directory name by using the following command: MV apache-flume-1.6.0-bin/home/install/flume-1.63. After entering the
first, Flume basic part:Flume--Log collection framework background: Log scattered across the machine, and want to use the big data platform for statistical analysis from other servers to collect log movement to the cluster, and can monitor, need to be timeliness, fault tolerance, load balancing Flume generally by configuring configuration File for an overview of the collection of data: flume.apache.org dist
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.