Flume and Kakfa example (KAKFA as Flume sink output to Kafka topic)To prepare the work:$sudo mkdir-p/flume/web_spooldir$sudo chmod a+w-r/flumeTo edit a flume configuration file:$ cat/home/tester/flafka/spooldir_kafka.conf# Name The components in this agentAgent1.sources = WeblogsrcAgent1.sinks = Kafka-sinkAgent1.channe
The collection of user behavior data is undoubtedly a prerequisite for building a referral system, and the Flume project under the Apache Foundation is tailored for distributed log collection, this is the 1th of the Flume research note, which mainly introduces Flume's basic architecture, The next note will illustrate the deployment and use steps of flume with an
Flume Official document translation--flume 1.7.0 User Guide (unreleased version) (i)Flume Official document translation--flume 1.7.0 User Guide (Unreleased version) (ii)Flume Properties
Property Name
Default
Description
Flume.call
In flume1.5.2, if you want to get flume related metrics through HTTP monitoring, add the following after the startup script:-dflume.monitoring.type=http-dflume.monitoring.port=34545MonitoringThe-D attribute can be obtained directly through system.getproerties (), so the above two properties are read by Method Loadmonitoring (), and the method is flume in the portal application private void Loadmonitoring ()
architecture of flume, data flow, basic components. The flume consists of three components to support the entire internal system architecture, and three components are Source, Channel, Sink, respectively. The channel is the data memory, which holds all the data inside the flume; Source is similar to the producer, it accepts external data and saves the data to ch
1, pipeline overview and related API application1.1 Key concepts related to pipingThe pipeline is one of the original UNIX IPC forms supported by Linux and has the following features:
The pipe is half-duplex, the data can only flow in one direction, when two sides need to communicate, need to establish two pipelines;
Can only be used between parent-child processes or sibling processes (affinity processes);
Separate form a separate fil
Flume + Solr + log4j build web Log collection system, flumesolr
Preface
Many web applications use ELK as the log collection system. Flume is used here because they are familiar with the Hadoop framework and Flume has many advantages.
For details about Apache Hadoop Ecosystem, click here.
The official Cloudera tutorial is based on this example. get-started-with-h
One, what is flume?As a real-time log collection system developed by Cloudera, Flume is recognized and widely used by the industry. The initial release version of Flume is currently known collectively as Flume OG (original Generation), which belongs to Cloudera. However, with the expansion of the
, logger, Avro, thrift, IPC, file, NULL, HBase, SOLR, Custom, and more.
Understanding source, channel, and Sink:
Source is the source of water , which is the entrance of aent data;
Channel is a pipeline , is the data (obtained by resource) flow of channels, the main role is to transmit and store data;
The sink is a sink used to receive incoming data from the channel and output the data to a specified place.
You can think of the agent as a water
, logger, Avro, thrift, IPC, file, NULL, HBase, SOLR, Custom, and more.
Understanding source, channel, and Sink:
Source is the source of water , which is the entrance of aent data;
Channel is a pipeline , is the data (obtained by resource) flow of channels, the main role is to transmit and store data;
The sink is a sink used to receive incoming data from the channel and output the data to a specified place.
You can think of the agent as a water
[Flume] uses Flume to pass the Web log to HDFs example:Create the directory where log is stored on HDFs:$ HDFs dfs-mkdir-p/test001/weblogsflumeSpecify the log input directory:$ sudo mkdir-p/flume/weblogsmiddleSettings allow log to be accessed by any user:$ sudo chmod a+w-r/flume$To set the configuration file contents:$
Apache Flume is a distributed, reliable, and efficient system that collects, aggregates, and moves data from disparate sources to a centralized data storage center. Apache Flume is not just used in log collection. Because data sources can be customized,flume can use the transfer of a large number of custom event data, including but not limited to website traffic
Is Flume a good fit for your problem?If you need to ingest textual log data into Hadoop/hdfs then Flume are the right fit for your problem, full stop. For other use cases, here is some guidelines:Flume is designed to transport and ingestregularly-generatedeventdataoverrelativelystable,potentiallycomplextopologies. Thenotionof "Eventdata" isverybroadlydefined.to flume
There are many examples of failover on the Internet, but there are multiple approaches, and individuals feel that the principle of single responsibility1, a machine running a flume agent2, a agent downstream sink point to a flume agent, do not have a flume agent configuration multiple Ports "impact performance"3, sub-machine configuration, you can avoid a driver,
First, IntroductionRecently in the study of Big data analysis related work, for which the use of the collection part used to Flume, deliberately spent a little time to understand the flume work principle and working mechanism. A personal understanding of a new system first, after a rough understanding of its rationale, and then from the source code to understand some of its key implementation part, and fina
flume– primary knowledge of Flume, source and sinkDirectoryBasic conceptsCommon source sourcesCommon sinkBasic conceptsWhat's the name flume?Distributed, reliable, large number of log collection, aggregation, and mobility tools.? eventsevent, which is the byte data of a row of data, is the basic unit of Flume sending f
Flume ng Overview:Flume Ng is a distributed, highly available, reliable system that collects, moves, and stores disparate amounts of data into a single data storage system. Lightweight, simple to configure, suitable for a variety of log collections, and supports failover and load balancing. Where the agent contains Source,channel and Sink, three have formed an agent. The duties of the three are as follows:
Source: Used to consume (collect) th
I haven't written a blog for a long time. We have recently studied storm, flume, and Kafka. Today, I will write down the scenarios and conclusions for testing flume failover and load balance;
The test environment contains five configuration files, that is, five agents.
A main configuration file, that is, the configuration file (flume-sink.properties) for configur
Implementation Architecture
A scenario implementation architecture is shown in the following illustration:
Analysis of 3.1 producer layer
Service assumptions within the PAAs platform are deployed within the Docker container, so to meet non-functional requirements, another process is responsible for collecting logs, thus not intruding into service frameworks and processes. Using flume ng for log collection, this open source component is very powerful
I. Introduction of FlumeFlume is a distributed, reliable, and highly available mass-log aggregation system that enables the customization of various data senders in the system for data collection, while Flume provides the ability to simply process the data and write to various data-receiving parties (customizable).Design goal:(1) ReliabilityWhen a node fails, the log can be transmitted to other nodes without loss.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.