Flume Real-time collection of logs

Source: Internet
Author: User

In the distributed system, each machine has the local log that the program runs, sometimes in order to analyze the demand, have to these scattered log summary requirements, I believe many people will choose RSYNC,SCP, but they are not strong in real-time, but also bring the problem of name conflict. The scalability is not satisfactory, not elegant at all.

In reality, we are confronted with the need to summarize the Nginx logs of multiple servers on the line in real time. Flume meritorious.

Flume Introduction

F Lume is a distributed, reliable and efficient log collection system that allows users to customize the data transfer model, and therefore scalability is also strong. There are also strong fault-tolerant and recovery mechanisms. Here are a few important concepts

    • Event:event is the basic unit of FLUME data transmission. Flume the data from the source to the final destination in the form of an event.
    • Agent:agent contains Sources, Channels, Sinks, and other components that utilize these components to transfer events from one node to another or to the ultimate purpose.
    • Source:source is responsible for receiving events and placing events in batches into one or more channels.
    • Channel:channel is located between Source and Sink, and is used to cache incoming events, and events are removed from the channel when Sink successfully sends events to the next-hop channel or end purpose.
    • The Sink:sink is responsible for transferring events to the next hop or final purpose, and then removing events from the channel after successful completion.

    • Source has Syslog source, Kafka source,http source, Exec source Avro source, and so on.
    • Sink have Kafka Sink, Avro Sink, File roll Sink, HDFS Sink and so on.
    • Channel has Memory channel,file channel, etc.

It provides a skeleton, as well as a variety of Source, Sink, Channel, lets you design the right data model. In fact, it can be done with multiple Flume, just like a subway compartment.

Defining a Data flow model

Back to our beginning of the scene, we will be more than one server's Nginx log to summarize the analysis,

Divided into two flume to achieve

    • Flume1 data stream is Exec Source, Memory Channel, Avro Sink, deployed on the business machine
    • Flume2 Data flow is Avro Source, Memory Channel, Fileroll Sink

Required Preparation

You need to install

    • Download Flume
    • Install JAVASDK, and after downloading the extracted conf/flume-env.sh, configure

# 我用的是oracle-java-8export JAVA_HOME=/usr/lib/jvm/java-8-oracle/jre/

    • Think about your data flow model, write the configuration, as described above in Flume1, tail2avro.conf:

Agent.sources = S1Agent.channels = C1Agent.sinks = K1Agent.sources.s1.Type=execAgent.sources.s1.Command=tail-F <YourFile path>agent.sources.s1.channels=c1agent.channels.c1.  Type=memoryagent.channels.c1.capacity=10000agent.channels.c1.transactioncapacity=10000  Agent.sinks.k1. type = AvroAgent.sinks.k1.hostname = <Your Target address>agent.sinks.k1.  Port = <Your Target port>agent.sinks.k1.channel=c1      

The avro2file.conf in Flume2

Agent.sources = S1Agent.channels = C1Agent.sinks = K1Agent.sources.s1.Type=AvroAgent.sources.s1.bind = <YourAddress>Agent.sources.s1.port = <YourPort>Agent.sources.s1.channels = C1Agent.sinks.k1.Type= file_rollagent.sinks.k1.sink.directory =/data/log/ngxlog# scrolling interval  Agent.sinks.k1.sink.rollInterval = 86400agent.sinks.k1.channel = C1agent.channels.c1.  Type = memory# Capacity of Event in queue agent.channels.c1.capacity = 10000  Agent.channels.c1.transactionCapacity = 10000agent.channels.c1.keep-alive =    

    • Start run

# 启动flume1bin/flume-ng agent -n agent -c conf -f conf/tail2avro.conf -Dflume.root.logger=WARN# 启动flume2in/flume-ng agent -n agent -c conf -f conf/avro2file.conf -Dflume.root.logger=INFO

Flume Real-time collection of logs

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.