Flume is a highly available, highly reliable, distributed mass log capture, aggregation, and transmission system provided by Cloudera, Flume supports the customization of various data senders in the log system for data collection, while Flume provides simple processing of data The ability to write to various data-receiving parties (customizable).
Using
Today's meeting to discuss why log processing uses both Flume and Kafka, is it possible to use only Kafka without Flume? The idea was to use only the Flume interface, whether it is the input interface (socket and file) and the output interface (
Kafka Source
Kafka source is an Apache Kafka consumer who reads messages from a Kafka topic. If you have multiple Kafka source runs, you can configure them using the same consumer group, so each will read a unique set of partitio
the high-level interface, which hides the details of the broker, allowing consumer to push data from the broker without having to care about the network topology.
More importantly, for most log systems, the data information that consumer has acquired is saved by the broker, while in Kafka, the data information is maintained by consumer itself.
Cloudera's Flume
Pre-Preparation
Elk Official Website: https://www.elastic.co/, package download and perfect documentation.
Zookeeper Official website: https://zookeeper.apache.org/
Kafka official website: http://kafka.apache.org/documentation.html, package download and perfect documentation.
Flume Official website: https://flume.apache.org/
Heka Official website: https://hekad.readthedocs.io/en/v0.10.0/
The system is a ce
A scheme of log acquisition architecture based on Flume+log4j+kafkaThis article will show you how to use Flume, log4j, Kafka for the specification of log capture.Flume Basic ConceptsFlume is a perfect, powerful log collection tool, about its configuration, on the internet there are many examples and information available, here only to do a simple explanation is n
, Memoryrecoverchannel, FileChannel. Memorychannel can achieve high-speed throughput, but cannot guarantee the integrity of the data. Memoryrecoverchannel has been built to replace the official documentation with FileChannel. FileChannel guarantees the integrity and consistency of the data. When configuring FileChannel specifically, it is recommended that the directory and program log files that you set up FileChannel be saved to a different disk for increased efficiency.Sink when setting up sto
-round.
3 Implementing the Architecture
A schema implementation architecture is shown in the following figure:
Analysis of 3.1 producer layer
The service assumptions within the PAAs platform are deployed within the Docker container, so in order to meet the non-functional requirements, another process is responsible for collecting logs and therefore does not invade the service framework and processes. Using flume ng for log collection, this open s
the service exception.3. Send data输入4. View the data file to view the/tmp/log/flume directory file:Integration with KafkaFlume can be flexibly integrated with Kafka, Flume focuses on data collection, and Kafka focuses on data distribution. The flume can be configured with a
Last time Flume+kafka+hbase+elk:http://www.cnblogs.com/super-d2/p/5486739.html was implemented.This time we can add storm:storm-0.9.5 simple configuration is as follows:Installation dependencieswget http://download.oracle.com/otn-pub/java/jdk/8u45-b14/jdk-8u45-linux-x64.tar.gztar ZXVF jdk-8u45-linux-x64.tar.gzcd jdk-8u45-linux-/etc/profileAdd the following: Export Java_home =/home/dir/jdk1. 8 . 0_45export C
Structure:Nginx-flume->kafka->flume->kafka (because involved in the cross-room problem, between the two Kafka added a flume, egg pain. )Phenomenon:In the second layer, write Kafka topic
Background: Kafka The completion of the message bus, so that the data of each system can be aggregated in the Kafka node, the next task is to maximize the value of data, let the data "Hui" talk.Environment Preparation:Kafka server.CDH 5.8.3 Server, install Flume,solr,hue,hdfs,zookeeper service.Flume provides a scalable, real-time data transmission channel, Morphl
The data source used in the previous article is to take data from a socket, a bit belonging to the "Heterodoxy", serious is from the Kafka and other message queue to take the data!The main supported source, learned by the official website are as follows: The form of data acquisition includes push push and pull pullsfirst, spark streaming integration Flume The way of 1.pushMore recommended is the pull meth
apache-flume1.6 Sink Default Support Kafka
[FLUME-2242]-FLUME Sink and Source for Apache Kafka
The official example is very intimate, you can directly run =,=, detailed configuration after a slow look.
Flume real-time crawl log data and upload to Kafka
1.Linux OK zookeeper is configured, start zookeeper first
sbin/zkserver.sh start
(sbin/zkserver.sh Status View startup state) JPS can check to see Le process as Quorumpeermain
2. Start Kafka,zookeeper need to start before Kafka
bin/
in the message queue and support a large number of consumer subscriptions.Back to top of pagedesign ideas of using Apache Kafka system Architecture
Example: Online games
Suppose we are developing an online web game platform that needs to support a large number of online users in real time, and that players can work together in a virtual world to accomplish each task in a collaborative way. As
I haven't written a blog for a long time. We have recently studied storm, flume, and Kafka. Today, I will write down the scenarios and conclusions for testing flume failover and load balance;
The test environment contains five configuration files, that is, five agents.
A main configuration file, that is, the configuration file (
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.