Recently, flume is used for data collection. The spooldir source has the following problems:
If a line of the file contains garbled characters and does not comply with the specified encoding specification, flume throws an exception and stops there.
Once the files in the folder specified by spooldir are modified, flume throws an exception and stops there.
In f
First, Netcat source + memory Channel + logger SINK1. Modify Configuration1) Modify the flume-env.sh file under $flume_home/conf, modify the contents as followsExport JAVA_HOME=/OPT/MODULES/JDK1. 7. 0_672) under the $flume_home/conf directory, create the agent subdirectory, creating a new netcat-memory-logger.conf with the following configuration:# netcat-memory-logger# Name The components in this agenta1.sources=r1a1.sinks=K1a1.channels=c1# Describe/
1. First you need to know flume HTTP monitoring if bootingPlease refer to the monitoring parameters of the blog flumeThat is, in Http://localhost:3000/metrics, you can access the following content2. Install the Flume monitor plugin in Open-falcon, refer to the official documentation http://book.open-falcon.org/zh_0_2/usage/flume.htmlOfficial documentation is very unclear, please refer to the next steps in t
win7+ubuntu16.04+flume1.8.01. Download apache-flume-1.8.0-bin.tar.gzHttp://flume.apache.org/download.html2. Unzip into the/usr/local/flume3. Set the configuration file/etc/profile file to increase the path of flume①vi/etc/profileExport flume_home=/usr/local/flumeexport PATH= $PATH: $FLUME _home/bin② make the configuration file effective immediatelySource/etc/prof
Tagged: NET ogg local port Javah Port data event multiple1 Download and unzip the installation package: http://flume.apache.org/download.htmlDecompression: Tar zxvf apache-flume-1.8.0-bin.tar.gz2 Configuring Environment variablesVI ~/.BASHRCTo configure environment variables:Export Flume_home=/hmaster/flume/apache-flume-1.8.0-binExport flume_conf_dir= $
Original link: Http://www.tuicool.com/articles/Z73UZf6
The data collected on the HADOOP2 and HADOOP3 are sent to the HADOOP1 cluster and HADOOP1 to a number of different purposes.
I. Overview
1, now there are three machines, respectively: HADOOP1,HADOOP2,HADOOP3, HADOOP1 for the log summary
2, HADOOP1 Summary of the simultaneous output to multiple targets
3, flume a data source corresponding to multiple channel, multiple sink, is configured in th
IP implementation.Paste the configuration of the testThe configuration is the same, use the time to open or close sinkgroup comments.This is the configuration of the collection node.#flume配置文件Agent1.sources=execsourceagent1.sinks= Avrosink1 Avrosink2Agent1.channels=filechannel#sink groups affect performance very much#agent1. Sinkgroups=avrogroup#agent1. sinkgroups.avroGroup.sinks = Avrosink1 Avrosink2#sink调度模式 load_balance Failover#agent1. sinkgroups
The previous introduction of how to use thrift source production data, today describes how to use Kafka sink consumption data.In fact, in the Flume configuration file has been set up with Kafka sink consumption dataAgent1.sinks.kafkaSink.type =Org.apache.flume.sink.kafka.KafkaSinkagent1.sinks.kafkaSink.topic=TRAFFIC_LOGagent1.sinks.kafkaSink.brokerList=10.208.129.3:9092,10.208.129.4:9092,10.208.129.5:9092agent1.sinks.kafkaSink.metadata.broker.list=10.
Flume ArchitectureMainly by 3 components, respectively, Source,channel and sink,3 components of the event in the Flume data flow or pipeline, the function can be seen by the introduction of Flume: When a Flume source receives an event, It stores it into one or more channels. The channel is a passive store that keeps th
Flume 1.7 Installing and running under Windows
Install Java and configure environment variables.
Install Flume,flume's official website http://flume.apache.org/, after downloading the direct decompression can.
Second, the operationCreate a configuration file: Create a example.conf under the extracted file apache-flume-1.6.0-bin/conf, as follows.
Recently, after listening to Liaoliang's 2016 Big Data spark "mushroom cloud" action, Flume,kafka and spark streaming need to be integrated.Feel a moment difficult to get started, or start from the simple: my idea is that, flume produce data, and then output to spark streaming,flume source data is netcat (address: localhost, port 22222), The output is Avro (addre
First, FlumeFlume is a distributed, reliable, usable, and very efficient service for collecting, aggregating, and moving information about large volumes of log data.1. How to Structure1) All applications use one flume server;2) All applications share flume cluster;3) Each application uses one flume, and then uses a flume
The first is a basic introduction to flume.
Component Name
function Introduction
Agent agents
Run flume using the JVM. Each machine runs an agent, but it can contain multiple sources and sinks in one agent.
Client clients
Production data, running on a separate thread.
SOURCE sources
Collect data from the client and pass it to t
Flume is a highly available, highly reliable, distributed mass log capture, aggregation, and transmission system provided by Cloudera, Flume supports the customization of various data senders in the log system for data collection, while Flume provides simple processing of data The ability to write to various data-receiving parties (customizable).
Using
I. Overview1, now has three machines, respectively: HADOOP1,HADOOP2,HADOOP3, to HADOOP1 for the log summary2, HADOOP1 Summary of the simultaneous output to multiple targets3, flume a data source corresponding to multiple channel, multiple sink, is configured in the consolidation-accepter.conf fileIi. deploy flume to collect logs and summary logs1, running on the HADOOP1Flume-ng agent--conf./-F Consolidation
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.