Log collection exception, production report error log:(org.apache.flume.source.spooldirectorysource$spooldirectoryrunnable.run:280)-FATAL:spool Directory Source Spool_source: {spooldir:/apps/logs/libra}: uncaught exception in Spooldirectorysource thread. Restart orReconfigure Flume to continue processing.Java.lang.IllegalStateException:File has been modified since being read:/apps/logs/libra/financial-webapp/spool/ libra.2018-03-09_09-10-16.tmpThe hin
Collect from different sources, aggregate logs, and transfer them to the storage system.
Source is used to read data, can be a variety of clients, or from another agent, deposited into the channel,sink to consume, the entire process is asynchronous.
The event is only deleted when it is successfully deposited into the channel of the next agent (multiple agents) or the final destination (a single agent), ensuring reliability.
Channel has two kinds of files and memory.
Multiple instances to
Flume Official website: http://flume.apache.org/FlumeUserGuide.html
First, make Flume a simple metaphor to help understand:
There is a pool, it is a water, the other end of the water, the inlet can be configured with a variety of pipes, outlet can also be configured with a variety of pipes, can have multiple water inlet, multiple outlets,
The term water is called the event, the inlet term is called Source,
Configuring flume cluster Reference Https://www.cnblogs.com/jifengblog/p/9277793.htmlload-balance load Balancing Introduction
Load balancing is an algorithm that is used to solve a machine (a process) that cannot resolve all requests.
The load Balancing Sink Processor can implement the load balance function, such as AGENT1 is a routing node that balances the Channel staging Event to the corresponding plurality of Sink components, and each
Pre-Preparation
Elk Official Website: https://www.elastic.co/, package download and perfect documentation.
Zookeeper Official website: https://zookeeper.apache.org/
Kafka official website: http://kafka.apache.org/documentation.html, package download and perfect documentation.
Flume Official website: https://flume.apache.org/
Heka Official website: https://hekad.readthedocs.io/en/v0.10.0/
The system is a centos6.6,64 bit machine.
Version of the softwa
1. Background information
Many of the company's platforms generate a large number of logs per day (typically streaming data, such as search engine PV, queries, etc.), and processing these logs requires a specific logging system, in general, these systems need to have the following characteristics:
(1) Construct the bridge of application system and analysis system, and decouple the correlation between them;
(2) Support near real-time online analysis system and similar to the offline analysis sys
1. Background information
Many of the company's platforms generate a large number of logs per day (typically streaming data, such as search engine PV, queries, etc.), and processing these logs requires a specific logging system, in general, these systems need to have the following characteristics:
(1) Construct the bridge of application system and analysis system, and decouple the correlation between them;
(2) Support near real-time online analysis system and similar to the offline analysis syst
In order to achieve near real-time search, there must be a mechanism to process the data in real time and then generate to the SOLR index, flume-ng just provide a mechanism, it can collect data in real time, and then through Morphlinesolrsink to the data ETL, It is finally written to the SOLR index so that it can query the new incoming data in near real time in the SOLR search engine.
Build steps:
1 We only do a demo here, so we've created a new file
Flume:flume is a distributed, reliable service for efficient collection, clustering, and moving large volumes of data. Flume uses a simple and extensible architecture based on streaming data. Flume is robust and fault-tolerant due to its adjustable dependency mechanism and many recovery mechanisms. Flume uses a simple, extensible data model that can be used for o
The previous section builds a simple operating environment for Flume and provides a netcat-based demonstration. This section continues to further explain the entire process of flume.First, the basic structure diagram of Flume:The following diagram basically illustrates the role of flume and the basic components in Flume: source, channel, sink. Source: Completes t
Based on the Thriftsource,memorychannel,hdfssink three components, this article analyzes the transactions of flume data transfer, and if you are using other components, the flume transaction will be handled differently. Under normal circumstances, with Memorychannel is good, our company is this, FileChannel slow, although provide log level of data recovery, but in general, constantly electric Memorychannel
master HBase Enterprise-level development and management• Ability to master pig Enterprise-level development and management• Ability to master hive Enterprise-level development and management• Ability to use Sqoop to freely convert data from traditional relational databases and HDFs• Ability to collect and manage distributed logs using Flume• Ability to master the entire process of analysis, development, and deployment of Hadoop complete projects
Use Apache Flume to read JMS Message Queuing messages. and write the message to the Hdfs,flume agent configuration such as the following:Flume-agent.conf#name the components in this agentagenthdfs.sources = Jms_sourceAgenthdfs.sinks = Hdfs_sinkAgenthdfs.channels = Mem_channel# Describe/configure The sourceAgentHdfs.sources.jms_source.type = JMS# Bind to all interfacesAgentHdfs.sources.jms_source.initialCont
Continue today to discuss the configuration of several agents.The first agent is to capture the output of a particular command execution from the terminal and output the file to a specific directory. First look at the configured code:Agent2.sources = Execsource //Specifies the Sourceagent2.sinks = Filesink to get output from the command =// output to file Sinkagent2.channels = FileChannel// output to file Channelagent2.sources.execsource.type = exec //Type Agent2.sources.execs
A scheme of log acquisition architecture based on Flume+log4j+kafkaThis article will show you how to use Flume, log4j, Kafka for the specification of log capture.Flume Basic ConceptsFlume is a perfect, powerful log collection tool, about its configuration, on the internet there are many examples and information available, here only to do a simple explanation is no longer detailed.The
Flume and Sqoop are Hadoop data integration and collection systems, both of which are positioned differently, following an introduction based on individual experience and understanding and everyone:FlumebyClouderadeveloped, there are two major products:Flume-ogand theFlume-ng,Flume-ogThe architecture is too complex, there will be data loss in the inquiring, so give up. Now we are using theFlume-ng, mainly l
Flume acquisition Process:#说明: The case is Flume listening directory/home/hadoop/flume_kafka acquisition to Kafka;Start the clusterStart Kafka,Start the agent,Flume-ng agent-c. -f/home/hadoop/flume-1.7.0/conf/myconf/flume-kafka.conf-n A1-dflume.root.logger=info,consoleOpen C
A flume task is an agent that consists of three parts, as shown in the figure:
Mainly focus on source and sink.
Source is divided into active source and passive source.
Sink such as HDFs client, Kafka client, etc.
TAR-ZXVF apache-flume-1.6.0-bin.tar.gz
Configure Environment variables
Vim ~/.bash_profile
...
Source ~/.bash_profile
Vim test01
# example.conf:a Single-node
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.