Distributed computing Distributed log Import Tool-flume

Source: Internet
Author: User

Background

Flume is a distributed log management system sponsored by Apache, and the main function is to log,collect the logs generated by each worker in the cluster to a specific location.

Why write this article, because now the search out of the literature is mostly the old version of the Flume, in Flume1. x version, that is, flume-ng version with a lot of changes before, many of the market's documents are outdated, we must pay attention to this point, I will provide a few more new, reference value of the article later.

There are several aspects to Flume's advantages:
* Java implementation, good cross-platform performance
* There is a certain fault-tolerant mechanism, and the prevention of data protection mechanism
* Provides a lot of agents
* Easy to develop, with developer option

Function


Stand-alone version is the above form, there are three components, respectively, is source,channel,sink. In use, as long as the installation of Flume, and then configure the corresponding conf file, it is OK.
Source: Mainly the origin of the configuration log file (multiple agents are available, multiple data sources are supported)
Channel: Similar to a queue, staging the received log data
Sink: Output The log file (there are many ways to project it onto the screen, or you can read it to a database or a specified file)

# Name The components in this agentA1. Sources=R1A1. Sinks= K1A1. Channels= C1# describe/configure The sourceA1. Sources. R1. Type= Avro#avro是flume的一种type, read the local log fileA1. Sources. R1. Bind= localhost#这个和下面的port对应于avro the-client portA1. Sources. R1. Port=44444# Describe The sinkA1. Sinks. K1. Type=com. Waqu. Sink. Odpssink #对应代码里的包名A1. Sinks. K1. Sink. BatchSize= -             #需要大于10A1. Sinks. K1. Sink. Table= *******#自己建的hub表以及key-id InformationA1. Sinks. K1. Sink. Project=******* A1. Sinks. K1. Sink. ODPs. Access_id =********** A1. Sinks. K1. Sink. ODPs. Access_key =********** A1. Sinks. K1. Sink. ODPs. End_point =***********A1. Sinks. K1. Sink. Sink. Tunnel. End_point =*******# Use a channel which buffers events in memoryA1. Channels. C1. Type= Memorya1. Channels. C1. Checkpointdir= +A1. Channels. C1. Datadirs= -# Bind The source and sink to the channelA1. Sources. R1. Channels= C1A1. Sinks. K1. Channel= C1

The following is for these three points, detailed introduction of the following

Flume Workflow

The agent supports a variety of input source, several more commonly used type.
*http, can listen to the HTTP port, take log
*netcat, you can listen for Telnet-like port data
*spooling, listening for new files in a file directory
*avro Source, send the specified file, this does not support real-time monitoring, that is to say we monitor A.log file, when A.log changed, we can not get the change of the log
*exec Source, which can monitor a file in real time

The point is that exec Source, which is cool, allows shell commands to be executed on the agent so that we can use the tail command to monitor what's new in a file.

-flog.txt
Develop

* Start with the Official SDK package to develop a packaged jar file
* Put the jar in the Flume lib file directory
* Configure conf file
* Start Agent: flume-ng agent --conf conf --conf-file ./conf/my.conf -name a1 -Dflume.root.logger=INFO,console
* Start Data Source:flume-ng avro-client -H localhost -p 44444 -F /home/garvin/log.txt -Dflume.root.logger=INFO,console

Recommend a few useful things:
An example of a code implementation: Https://github.com/waqulianjie/odps_sink
Developer Document:http://flume.apache.org/flumeuserguide.html
A more complete introduction: http://www.aboutyun.com/thread-8917-1-1.html

This article comes from the blog "Bo Li Garvin"
Reprint please indicate source: Http://blog.csdn.net/buptgshengod]

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Distributed computing Distributed log Import Tool-flume

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.