Hello everyone.The company has a need. Requires Flumne to store the message from MQ to DFS, and writes the flume custom source. , as I was just touching flume. So please forgive me if there is anything wrong with you.See the source code for Flume-ng. are generally based on different scenes extends Abstractsource implements Eventdrivensource, configurableThe Mqsou
, create a new Java project, edit the Pom file, and the contents of the Pom file are "remove parent here":The parent is removed and the rat plugin is removed, which avoids common errors that occur at compile time https://issues.apache.org/jira/browse/FLUME-1372The custom sink implementation needs to inherit the Abstractsink and implement the interface configurable, and override some of the methods as follows:Package Com.cmcc.chiwei.kafka;import Java.u
, we first use Gson to deserialize it into a Java object, and then take the log field we care about to get the original log text, the next process is the same as the original.in.tell();String preReadLine = readSingleLine();ifnullreturnnull; //if the log is wrapped by docker log format, //should extract origin log firstly if (wrappedByDocker) { DockerLog dockerLog = GSON.fromJson(preReadLine, DockerLog.class); preReadLine = dockerLog.getLog(); }This allows the agent to c
First of all, the installation of the tools are not in this explanation, many online, can be viewed by themselves.Here we use examples to illustrate the configuration of each tool and the effect of the final presentation.If we have a batch of tracklog logs that need to be displayed in real time elk:First, collect logs, we use Flume toolThe log server-side placement agent is sent to collect collect, configured as follows:Agent (can be multiple)
Liaoliang Teacher's course: The 2016 big Data spark "mushroom cloud" action spark streaming consumption flume collected Kafka data DIRECTF way job.First, the basic backgroundSpark-streaming get Kafka data in two ways receiver and direct way, this article describes the way of direct. The specific process is this:1, direct mode is directly connected to the Kafka node to obtain data.2. Direct-based approach: Periodically query Kafka to obtain the latest
In the use of flume found due to network, HDFs and other reasons, so that after the flume collected to the HDFs log some anomalies, performance as:1. Files that have not been closed: Files ending with tmp (default). Added to the HDFs file should be a GZ compressed file, the file with the end of TMP can not be used;2, there is a size of 0 files, such as GZ compressed file size of 0, we take this file alone d
Write in the front: Career change Big Data field, did not report class, self-study to try, can persist after the good do this line, can not ...! Ready to start with this set of it18 screen Ben Ben ... Self-study is painful, blog and everyone to share the results of learning-also supervise themselves, urging themselves to continue to learn.(Teaching video screen is it18 do activities to send, the screen is not very full, class notes and source materials, such as classroom-related information has
Log into the Elasticsearch cluster via flume see here: Flume log import ElasticsearchKibana IntroductionKibana HomeKibana is a powerful elasticsearch data display Client,logstash has built-in Kibana. You can also deploy Kibana alone, the latest version of Kibana3 is pure html+jsclient. can be very convenient to deploy to Apache, Nginx and other httpserver.Address of Kibana3: https://github.com/elasticsearch
Preparatory work:1.apache Download Flume2. Unzip the Flume3. Modify flume-env.sh, configure Java_home
Netcat Capture Demo:1. Create the netcat-logger.conf in conf # defines the name of each component in the agent a1.sources = r1a1.sinks = K1a1.channels = c1# Describe and configure the source component: R1a1.sources.r1.type = Netcata1.sources.r1.bind = Localhosta1.sources.r1.port = 44444 # Describes and configures the sink component: K1a1.
Business background:The output of the log files generated by the Java project to flumeThe first step:Output the log to flume, write the log4j in the Java program, and specify the output to which Flume serverLog4j.rootlogger=info,flumelog4j.appender.flume= Org.apache.flume.clients.log4jappender.log4jappenderlog4j.appender.flume.hostname= 192.168.13.132log4j.appender.flume.port=41414Step Two:Import Java.util.
Spooling Directory Source:The following 2 sets of parameters are explained:Fileheader and Fileheaderkey:Fileheader is a Boolean value that can be configured to TRUE or false to indicate whether the file name is added to the header of the event in the encapsulated event after the Flume has read the data.Fileheaderkey indicates that if there is a header in the event (when Fileheader is configured to True), the header stores the file name in the Basename
pool. Each sink has a priority, the higher the priority, the greater the value, such as 100 priority above 80 priority. If a sink fails to send an event, the sink with the highest priority will attempt to send the failed event.
a1.sinkgroups = G1
a1.sinkgroups.g1.sinks = K1 K2
a1.sinkgroups.g1.processor.type = Failover
A1.SINKGROUPS.G1.PROCESSOR.PRIORITY.K1 = 5
a1.sinkgroups.g1.processor.priority.k2 = ten
A1.sinkgroups.g1.processor.maxpenalty = 10000
The above configuration group has K1, K2 tw
1.flume is a distributed log collection system that transmits collected data to its destination. 2.flume has a core concept, called an agent. The agent is a Java process that runs on the Log collection node. The 3.agent consists of 3 core components: source, channel, sink. The 3.1 source component is dedicated to collecting logs and can handle various types of log data in various formats, including Avro, th
Flume configuration get information transferred to the Kafka cluster conf directory under new configuration file [[emailprotected]flume]#vimconf/file-monitor.conf# Statement agenta1.sources=r1a1.sinks=k1a1.channels=c1# Defining a data source a1.sources.r1.type=execa1.sources.r1.command=tail-f/data/ Xx.loga1.sources.r1.channels=c1#filter Filter a1.sources.r1.interceptors= I1a1.sources.r1.interceptors.i1.typ
Real-time streaming processing complete flow based on flume+kafka+spark-streaming
1, environment preparation, four test server
Spark Cluster Three, SPARK1,SPARK2,SPARK3
Kafka cluster Three, SPARK1,SPARK2,SPARK3
Zookeeper cluster three, SPARK1,SPARK2,SPARK3
Log Receive server, SPARK1
Log collection server, Redis (this machine is used to do redis development, now used to do log collection test, the hostname does not change)
Log collection process:
Log
Recently has been using the Flume thrift source, just start to encounter a lot of problems (dependent on other programs more), after the compilation program (with g++, need-l:g++-g-dhave_netr/local/include/thrift-l/ Usr/local/lib flumethriftclient.cpp gen-cpp/flume_constants.cpp gen-cpp/flume_types.cpp gen-cpp/ Thriftsourceprotocol.cpp-o FLUMECLIENT-LTHRIFTNB-LEVENT-LTHRIFT-LRT), these two points have been summarized.
After the online test found a app
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.