Use Apache Flume to read JMS Message Queuing messages. and write the message to the Hdfs,flume agent configuration such as the following:Flume-agent.conf#name the components in this agentagenthdfs.sources = Jms_sourceAgenthdfs.sinks = Hdfs_sinkAgenthdfs.channels = Mem_channel# Describe/configure The sourceAgentHdfs.sources.jms_source.type = JMS# Bind to all interfacesAgentHdfs.sources.jms_source.initialCont
Flume real-time crawl log data and upload to Kafka
1.Linux OK zookeeper is configured, start zookeeper first
sbin/zkserver.sh start
(sbin/zkserver.sh Status View startup state) JPS can check to see Le process as Quorumpeermain
2. Start Kafka,zookeeper need to start before Kafka
bin/kafka-server-start.sh Config/server.properties
3. Start consumer receive log
bin/kafka-console-consumer.sh--zookeeper localhost:2181--topic topic-004
(Before you can creat
win7+ubuntu16.04+flume1.8.01. Download apache-flume-1.8.0-bin.tar.gzHttp://flume.apache.org/download.html2. Unzip into the/usr/local/flume3. Set the configuration file/etc/profile file to increase the path of flume①vi/etc/profileExport flume_home=/usr/local/flumeexport PATH= $PATH: $FLUME _home/bin② make the configuration file effective immediatelySource/etc/prof
Tagged: NET ogg local port Javah Port data event multiple1 Download and unzip the installation package: http://flume.apache.org/download.htmlDecompression: Tar zxvf apache-flume-1.8.0-bin.tar.gz2 Configuring Environment variablesVI ~/.BASHRCTo configure environment variables:Export Flume_home=/hmaster/flume/apache-flume-1.8.0-binExport flume_conf_dir= $
Original link: Http://www.tuicool.com/articles/Z73UZf6
The data collected on the HADOOP2 and HADOOP3 are sent to the HADOOP1 cluster and HADOOP1 to a number of different purposes.
I. Overview
1, now there are three machines, respectively: HADOOP1,HADOOP2,HADOOP3, HADOOP1 for the log summary
2, HADOOP1 Summary of the simultaneous output to multiple targets
3, flume a data source corresponding to multiple channel, multiple sink, is configured in th
Continue today to discuss the configuration of several agents.The first agent is to capture the output of a particular command execution from the terminal and output the file to a specific directory. First look at the configured code:Agent2.sources = Execsource //Specifies the Sourceagent2.sinks = Filesink to get output from the command =// output to file Sinkagent2.channels = FileChannel// output to file Channelagent2.sources.execsource.type = exec //Type Agent2.sources.execs
A scheme of log acquisition architecture based on Flume+log4j+kafkaThis article will show you how to use Flume, log4j, Kafka for the specification of log capture.Flume Basic ConceptsFlume is a perfect, powerful log collection tool, about its configuration, on the internet there are many examples and information available, here only to do a simple explanation is no longer detailed.The
Official Document parameter explanation: Http://flume.apache.org/FlumeUserGuide.html#hdfs-sinkNeed to note: file format, filetype=datastream default is Sequencefile, is the Hadoop file format, to DataStream can be read directly (Sqeuencefile How to use still do not know. )Configuration file:Hdfs.conf
A1.sources = R1A1.sinks = K1A1.channels = C1# Describe/configure The sourceA1.sources.r1.type = SpooldirA1.sources.r1.channels = C1A1.sources.r1.spoolDir =/usr/local/hadoop/apache-
Flume and Sqoop are Hadoop data integration and collection systems, both of which are positioned differently, following an introduction based on individual experience and understanding and everyone:FlumebyClouderadeveloped, there are two major products:Flume-ogand theFlume-ng,Flume-ogThe architecture is too complex, there will be data loss in the inquiring, so give up. Now we are using theFlume-ng, mainly l
Flume acquisition Process:#说明: The case is Flume listening directory/home/hadoop/flume_kafka acquisition to Kafka;Start the clusterStart Kafka,Start the agent,Flume-ng agent-c. -f/home/hadoop/flume-1.7.0/conf/myconf/flume-kafka.conf-n A1-dflume.root.logger=info,consoleOpen C
A flume task is an agent that consists of three parts, as shown in the figure:
Mainly focus on source and sink.
Source is divided into active source and passive source.
Sink such as HDFs client, Kafka client, etc.
TAR-ZXVF apache-flume-1.6.0-bin.tar.gz
Configure Environment variables
Vim ~/.bash_profile
...
Source ~/.bash_profile
Vim test01
# example.conf:a Single-node
, create a new Java project, edit the Pom file, and the contents of the Pom file are "remove parent here":The parent is removed and the rat plugin is removed, which avoids common errors that occur at compile time https://issues.apache.org/jira/browse/FLUME-1372The custom sink implementation needs to inherit the Abstractsink and implement the interface configurable, and override some of the methods as follows:Package Com.cmcc.chiwei.kafka;import Java.u
, we first use Gson to deserialize it into a Java object, and then take the log field we care about to get the original log text, the next process is the same as the original.in.tell();String preReadLine = readSingleLine();ifnullreturnnull; //if the log is wrapped by docker log format, //should extract origin log firstly if (wrappedByDocker) { DockerLog dockerLog = GSON.fromJson(preReadLine, DockerLog.class); preReadLine = dockerLog.getLog(); }This allows the agent to c
First of all, the installation of the tools are not in this explanation, many online, can be viewed by themselves.Here we use examples to illustrate the configuration of each tool and the effect of the final presentation.If we have a batch of tracklog logs that need to be displayed in real time elk:First, collect logs, we use Flume toolThe log server-side placement agent is sent to collect collect, configured as follows:Agent (can be multiple)
Liaoliang Teacher's course: The 2016 big Data spark "mushroom cloud" action spark streaming consumption flume collected Kafka data DIRECTF way job.First, the basic backgroundSpark-streaming get Kafka data in two ways receiver and direct way, this article describes the way of direct. The specific process is this:1, direct mode is directly connected to the Kafka node to obtain data.2. Direct-based approach: Periodically query Kafka to obtain the latest
In the use of flume found due to network, HDFs and other reasons, so that after the flume collected to the HDFs log some anomalies, performance as:1. Files that have not been closed: Files ending with tmp (default). Added to the HDFs file should be a GZ compressed file, the file with the end of TMP can not be used;2, there is a size of 0 files, such as GZ compressed file size of 0, we take this file alone d
Write in the front: Career change Big Data field, did not report class, self-study to try, can persist after the good do this line, can not ...! Ready to start with this set of it18 screen Ben Ben ... Self-study is painful, blog and everyone to share the results of learning-also supervise themselves, urging themselves to continue to learn.(Teaching video screen is it18 do activities to send, the screen is not very full, class notes and source materials, such as classroom-related information has
Preparatory work:1.apache Download Flume2. Unzip the Flume3. Modify flume-env.sh, configure Java_home
Netcat Capture Demo:1. Create the netcat-logger.conf in conf # defines the name of each component in the agent a1.sources = r1a1.sinks = K1a1.channels = c1# Describe and configure the source component: R1a1.sources.r1.type = Netcata1.sources.r1.bind = Localhosta1.sources.r1.port = 44444 # Describes and configures the sink component: K1a1.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.