I. Overview1, now has three machines, respectively: HADOOP1,HADOOP2,HADOOP3, to HADOOP1 for the log summary2, HADOOP1 Summary of the simultaneous output to multiple targets3, flume a data source corresponding to multiple channel, multiple sink, is configured in the consolidation-accepter.conf fileIi. deploy flume to collect logs and summary logs1, running on the HADOOP1Flume-ng agent--conf./-F Consolidation
Log collection exception, production report error log:(org.apache.flume.source.spooldirectorysource$spooldirectoryrunnable.run:280)-FATAL:spool Directory Source Spool_source: {spooldir:/apps/logs/libra}: uncaught exception in Spooldirectorysource thread. Restart orReconfigure Flume to continue processing.Java.lang.IllegalStateException:File has been modified since being read:/apps/logs/libra/financial-webapp/spool/ libra.2018-03-09_09-10-16.tmpThe hin
Collect from different sources, aggregate logs, and transfer them to the storage system.
Source is used to read data, can be a variety of clients, or from another agent, deposited into the channel,sink to consume, the entire process is asynchronous.
The event is only deleted when it is successfully deposited into the channel of the next agent (multiple agents) or the final destination (a single agent), ensuring reliability.
Channel has two kinds of files and memory.
Multiple instances to
Flume Official website: http://flume.apache.org/FlumeUserGuide.html
First, make Flume a simple metaphor to help understand:
There is a pool, it is a water, the other end of the water, the inlet can be configured with a variety of pipes, outlet can also be configured with a variety of pipes, can have multiple water inlet, multiple outlets,
The term water is called the event, the inlet term is called Source,
Flume:flume is a distributed, reliable service for efficient collection, clustering, and moving large volumes of data. Flume uses a simple and extensible architecture based on streaming data. Flume is robust and fault-tolerant due to its adjustable dependency mechanism and many recovery mechanisms. Flume uses a simple, extensible data model that can be used for o
The previous section builds a simple operating environment for Flume and provides a netcat-based demonstration. This section continues to further explain the entire process of flume.First, the basic structure diagram of Flume:The following diagram basically illustrates the role of flume and the basic components in Flume: source, channel, sink. Source: Completes t
Based on the Thriftsource,memorychannel,hdfssink three components, this article analyzes the transactions of flume data transfer, and if you are using other components, the flume transaction will be handled differently. Under normal circumstances, with Memorychannel is good, our company is this, FileChannel slow, although provide log level of data recovery, but in general, constantly electric Memorychannel
master HBase Enterprise-level development and management• Ability to master pig Enterprise-level development and management• Ability to master hive Enterprise-level development and management• Ability to use Sqoop to freely convert data from traditional relational databases and HDFs• Ability to collect and manage distributed logs using Flume• Ability to master the entire process of analysis, development, and deployment of Hadoop complete projects
Use Apache Flume to read JMS Message Queuing messages. and write the message to the Hdfs,flume agent configuration such as the following:Flume-agent.conf#name the components in this agentagenthdfs.sources = Jms_sourceAgenthdfs.sinks = Hdfs_sinkAgenthdfs.channels = Mem_channel# Describe/configure The sourceAgentHdfs.sources.jms_source.type = JMS# Bind to all interfacesAgentHdfs.sources.jms_source.initialCont
Flume real-time crawl log data and upload to Kafka
1.Linux OK zookeeper is configured, start zookeeper first
sbin/zkserver.sh start
(sbin/zkserver.sh Status View startup state) JPS can check to see Le process as Quorumpeermain
2. Start Kafka,zookeeper need to start before Kafka
bin/kafka-server-start.sh Config/server.properties
3. Start consumer receive log
bin/kafka-console-consumer.sh--zookeeper localhost:2181--topic topic-004
(Before you can creat
Real-time streaming processing complete flow based on flume+kafka+spark-streaming
1, environment preparation, four test server
Spark Cluster Three, SPARK1,SPARK2,SPARK3
Kafka cluster Three, SPARK1,SPARK2,SPARK3
Zookeeper cluster three, SPARK1,SPARK2,SPARK3
Log Receive server, SPARK1
Log collection server, Redis (this machine is used to do redis development, now used to do log collection test, the hostname does not change)
Log collection process:
Log
Recently has been using the Flume thrift source, just start to encounter a lot of problems (dependent on other programs more), after the compilation program (with g++, need-l:g++-g-dhave_netr/local/include/thrift-l/ Usr/local/lib flumethriftclient.cpp gen-cpp/flume_constants.cpp gen-cpp/flume_types.cpp gen-cpp/ Thriftsourceprotocol.cpp-o FLUMECLIENT-LTHRIFTNB-LEVENT-LTHRIFT-LRT), these two points have been summarized.
After the online test found a app
1. All hosts need to install JDK and configure JDK environment variable 2, all the host installed SSH, and each other to achieve no secret access 3, modify the host hosts: File/etc/hosts, to ensure that the machine through the machine name can exchange visits 4. Install Python 2.6 and above (Storm ) 5, ZeroMQJava code
wget http://download.zeromq.org/zeromq-2.1.7.tar.gz
TAR-XZF zeromq-2.1. 7. tar.gz
CD zeromq-2.1. 7
./configure
Continue today to discuss the configuration of several agents.The first agent is to capture the output of a particular command execution from the terminal and output the file to a specific directory. First look at the configured code:Agent2.sources = Execsource //Specifies the Sourceagent2.sinks = Filesink to get output from the command =// output to file Sinkagent2.channels = FileChannel// output to file Channelagent2.sources.execsource.type = exec //Type Agent2.sources.execs
A scheme of log acquisition architecture based on Flume+log4j+kafkaThis article will show you how to use Flume, log4j, Kafka for the specification of log capture.Flume Basic ConceptsFlume is a perfect, powerful log collection tool, about its configuration, on the internet there are many examples and information available, here only to do a simple explanation is no longer detailed.The
Official Document parameter explanation: Http://flume.apache.org/FlumeUserGuide.html#hdfs-sinkNeed to note: file format, filetype=datastream default is Sequencefile, is the Hadoop file format, to DataStream can be read directly (Sqeuencefile How to use still do not know. )Configuration file:Hdfs.conf
A1.sources = R1A1.sinks = K1A1.channels = C1# Describe/configure The sourceA1.sources.r1.type = SpooldirA1.sources.r1.channels = C1A1.sources.r1.spoolDir =/usr/local/hadoop/apache-
Flume and Sqoop are Hadoop data integration and collection systems, both of which are positioned differently, following an introduction based on individual experience and understanding and everyone:FlumebyClouderadeveloped, there are two major products:Flume-ogand theFlume-ng,Flume-ogThe architecture is too complex, there will be data loss in the inquiring, so give up. Now we are using theFlume-ng, mainly l
Flume acquisition Process:#说明: The case is Flume listening directory/home/hadoop/flume_kafka acquisition to Kafka;Start the clusterStart Kafka,Start the agent,Flume-ng agent-c. -f/home/hadoop/flume-1.7.0/conf/myconf/flume-kafka.conf-n A1-dflume.root.logger=info,consoleOpen C
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.