Flume Real-time collection of logs

Last Update:2017-03-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the distributed system, each machine has the local log that the program runs, sometimes in order to analyze the demand, have to these scattered log summary requirements, I believe many people will choose RSYNC,SCP, but they are not strong in real-time, but also bring the problem of name conflict. The scalability is not satisfactory, not elegant at all.

In reality, we are confronted with the need to summarize the Nginx logs of multiple servers on the line in real time. Flume meritorious.

Flume Introduction

F Lume is a distributed, reliable and efficient log collection system that allows users to customize the data transfer model, and therefore scalability is also strong. There are also strong fault-tolerant and recovery mechanisms. Here are a few important concepts

Event:event is the basic unit of FLUME data transmission. Flume the data from the source to the final destination in the form of an event.
Agent:agent contains Sources, Channels, Sinks, and other components that utilize these components to transfer events from one node to another or to the ultimate purpose.
Source:source is responsible for receiving events and placing events in batches into one or more channels.
Channel:channel is located between Source and Sink, and is used to cache incoming events, and events are removed from the channel when Sink successfully sends events to the next-hop channel or end purpose.
The Sink:sink is responsible for transferring events to the next hop or final purpose, and then removing events from the channel after successful completion.

Source has Syslog source, Kafka source,http source, Exec source Avro source, and so on.
Sink have Kafka Sink, Avro Sink, File roll Sink, HDFS Sink and so on.
Channel has Memory channel,file channel, etc.

It provides a skeleton, as well as a variety of Source, Sink, Channel, lets you design the right data model. In fact, it can be done with multiple Flume, just like a subway compartment.

Defining a Data flow model

Back to our beginning of the scene, we will be more than one server's Nginx log to summarize the analysis,

Divided into two flume to achieve

Flume1 data stream is Exec Source, Memory Channel, Avro Sink, deployed on the business machine
Flume2 Data flow is Avro Source, Memory Channel, Fileroll Sink

Required Preparation

You need to install

Download Flume
Install JAVASDK, and after downloading the extracted conf/flume-env.sh, configure

# 我用的是oracle-java-8export JAVA_HOME=/usr/lib/jvm/java-8-oracle/jre/

Think about your data flow model, write the configuration, as described above in Flume1, tail2avro.conf:

Agent.sources = S1Agent.channels = C1Agent.sinks = K1Agent.sources.s1.Type=execAgent.sources.s1.Command=tail-F <YourFile path>agent.sources.s1.channels=c1agent.channels.c1.  Type=memoryagent.channels.c1.capacity=10000agent.channels.c1.transactioncapacity=10000  Agent.sinks.k1. type = AvroAgent.sinks.k1.hostname = <Your Target address>agent.sinks.k1.  Port = <Your Target port>agent.sinks.k1.channel=c1

The avro2file.conf in Flume2

Agent.sources = S1Agent.channels = C1Agent.sinks = K1Agent.sources.s1.Type=AvroAgent.sources.s1.bind = <YourAddress>Agent.sources.s1.port = <YourPort>Agent.sources.s1.channels = C1Agent.sinks.k1.Type= file_rollagent.sinks.k1.sink.directory =/data/log/ngxlog# scrolling interval  Agent.sinks.k1.sink.rollInterval = 86400agent.sinks.k1.channel = C1agent.channels.c1.  Type = memory# Capacity of Event in queue agent.channels.c1.capacity = 10000  Agent.channels.c1.transactionCapacity = 10000agent.channels.c1.keep-alive =

Start run

# 启动flume1bin/flume-ng agent -n agent -c conf -f conf/tail2avro.conf -Dflume.root.logger=WARN# 启动flume2in/flume-ng agent -n agent -c conf -f conf/avro2file.conf -Dflume.root.logger=INFO

Flume Real-time collection of logs

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Flume Real-time collection of logs

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support