Requirements background:When using flume for log collection, the error message will print multiple lines of the stack, and multiple lines of information need to be merged into one line and packaged into an event for transmission.Solution Ideas: Addressing these requirements can be achieved through custom interceptors and custom Deserializer. There are more information about custom interceptors on the web, but considering the location and usage scenari
Target: store in the Hive database by accepting HTTP request information for Port 1084,Osgiweb2.db the name of the database created for hivePeriodic_report5 for the Created data table,The flume configuration is as follows:a1.sources=R1 a1.channels=C1 a1.sinks= k1 =0.0. 0.01084a1.sources.r1.handler=Jkong. Test.httpsourcedpihandler #a1. Sources.r1.interceptors=i1 I2#a1. Sources.r1.interceptors.i2.type=timestampA1.channels.c1.type=Memory A1.chann
Speaking of headings, this is only a small part of the real-time architecture.
Download the latest version flume:apache-flume-1.6.0-bin.tar.gz
Unzip, modify Conf/flume-conf.properties name can write casually.
What I currently achieve is to read the data from the directory to write to the Kafka, the principle of the east of the Internet a lot of, only to connect the code:
a1.sources = R1
a1.sinks = K1
a1.cha
uploading Avro files to HDFs using flume
Scenario Description: Upload the Avro file under a folder to HDFs. Source uses HDFs, which is used by Spooldir,sink. Configure flume.conf
# memory channel called CH1 on Agent1 agent1.channels.ch1.type = memory # source Agent1.sources.spooldir-source1.channels = Ch1 Agent1.sources.spooldir-source1.type = Spooldir Agent1.sources.spooldir-source1.spooldir=/home/yang/data/avro
/Agent1.sources.spooldir-source1.base
1.flume is a distributed log collection system that transmits collected data to its destination. 2.flume has a core concept, called an agent. The agent is a Java process that runs on the Log collection node. The 3.agent consists of 3 core components: source, channel, sink. The 3.1 source component is dedicated to collecting logs and can handle various types of log data in various formats, including Avro, th
Flume configuration get information transferred to the Kafka cluster conf directory under new configuration file [[emailprotected]flume]#vimconf/file-monitor.conf# Statement agenta1.sources=r1a1.sinks=k1a1.channels=c1# Defining a data source a1.sources.r1.type=execa1.sources.r1.command=tail-f/data/ Xx.loga1.sources.r1.channels=c1#filter Filter a1.sources.r1.interceptors= I1a1.sources.r1.interceptors.i1.typ
, create a new Java project, edit the Pom file, and the contents of the Pom file are "remove parent here":The parent is removed and the rat plugin is removed, which avoids common errors that occur at compile time https://issues.apache.org/jira/browse/FLUME-1372The custom sink implementation needs to inherit the Abstractsink and implement the interface configurable, and override some of the methods as follows:Package Com.cmcc.chiwei.kafka;import Java.u
, we first use Gson to deserialize it into a Java object, and then take the log field we care about to get the original log text, the next process is the same as the original.in.tell();String preReadLine = readSingleLine();ifnullreturnnull; //if the log is wrapped by docker log format, //should extract origin log firstly if (wrappedByDocker) { DockerLog dockerLog = GSON.fromJson(preReadLine, DockerLog.class); preReadLine = dockerLog.getLog(); }This allows the agent to c
First of all, the installation of the tools are not in this explanation, many online, can be viewed by themselves.Here we use examples to illustrate the configuration of each tool and the effect of the final presentation.If we have a batch of tracklog logs that need to be displayed in real time elk:First, collect logs, we use Flume toolThe log server-side placement agent is sent to collect collect, configured as follows:Agent (can be multiple)
Liaoliang Teacher's course: The 2016 big Data spark "mushroom cloud" action spark streaming consumption flume collected Kafka data DIRECTF way job.First, the basic backgroundSpark-streaming get Kafka data in two ways receiver and direct way, this article describes the way of direct. The specific process is this:1, direct mode is directly connected to the Kafka node to obtain data.2. Direct-based approach: Periodically query Kafka to obtain the latest
In the use of flume found due to network, HDFs and other reasons, so that after the flume collected to the HDFs log some anomalies, performance as:1. Files that have not been closed: Files ending with tmp (default). Added to the HDFs file should be a GZ compressed file, the file with the end of TMP can not be used;2, there is a size of 0 files, such as GZ compressed file size of 0, we take this file alone d
For details about how to import logs to elasticsearch clusters Through flume, see flume log import to elasticsearch clusters.Kibana Introduction
Kibana Homepage
Kibana is a powerful elasticsearch data display client. logstash has built-in kibana. You can also deploy kibana separately. The latest version of kibana3 is a pure HTML + JS client, it can be conveniently deployed on HTTP servers such as Apache an
Write in the front: Career change Big Data field, did not report class, self-study to try, can persist after the good do this line, can not ...! Ready to start with this set of it18 screen Ben Ben ... Self-study is painful, blog and everyone to share the results of learning-also supervise themselves, urging themselves to continue to learn.(Teaching video screen is it18 do activities to send, the screen is not very full, class notes and source materials, such as classroom-related information has
Log into the Elasticsearch cluster via flume see here: Flume log import ElasticsearchKibana IntroductionKibana HomeKibana is a powerful elasticsearch data display Client,logstash has built-in Kibana. You can also deploy Kibana alone, the latest version of Kibana3 is pure html+jsclient. can be very convenient to deploy to Apache, Nginx and other httpserver.Address of Kibana3: https://github.com/elasticsearch
Preparatory work:1.apache Download Flume2. Unzip the Flume3. Modify flume-env.sh, configure Java_home
Netcat Capture Demo:1. Create the netcat-logger.conf in conf # defines the name of each component in the agent a1.sources = r1a1.sinks = K1a1.channels = c1# Describe and configure the source component: R1a1.sources.r1.type = Netcata1.sources.r1.bind = Localhosta1.sources.r1.port = 44444 # Describes and configures the sink component: K1a1.
Business background:The output of the log files generated by the Java project to flumeThe first step:Output the log to flume, write the log4j in the Java program, and specify the output to which Flume serverLog4j.rootlogger=info,flumelog4j.appender.flume= Org.apache.flume.clients.log4jappender.log4jappenderlog4j.appender.flume.hostname= 192.168.13.132log4j.appender.flume.port=41414Step Two:Import Java.util.
Spooling Directory Source:The following 2 sets of parameters are explained:Fileheader and Fileheaderkey:Fileheader is a Boolean value that can be configured to TRUE or false to indicate whether the file name is added to the header of the event in the encapsulated event after the Flume has read the data.Fileheaderkey indicates that if there is a header in the event (when Fileheader is configured to True), the header stores the file name in the Basename
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.