Flume uses the exec source to collect each end data to summarize to another server

Source: Internet
Author: User

Reprint: http://blog.csdn.net/liuxiao723846/article/details/78133375

First, the scene of a description:

The Online API interface service prints logs on the local disk via log4j, installs Flume on the interface server, collects logs through the exec source, and then sends the flume to the rollup server via Avro Sink, Flume through Avro on the rollup server Source receives the log and then writes to the local disk by File_roll sink.

Assumption: API interface Server two 10.153.140.250 and 10.153.140.251, summary log server one 10.153.137.211

1. Flume configuration on API interface server:

1) Download, unzip, and install Flume on the API interface server:

[HTML]View PlainCopy
    1. cd/usr/local/
    2. wget http://mirror.bit.edu.cn/apache/flume/1.7.0/apache-flume-1.7.0-bin.tar.gz
    3. Tar-xvzf apache-flume-1.7.9-bin.tar.gz
    4. Vim/etc/profile
    5. Export ps1= "[\[email protected] '/sbin/ifconfig eth0|grep ' inet ' |awk-f ' [:]+ ' {print $4} ' \w]"' $ '
    6. Export Flume_home=/usr/local/apache-flume-1.6.0-bin
    7. Export path= $PATH: $FLUME _home/bin


2) Modify the flume-env.sh configuration file:

Cd/usr/local/flume/conf

Vim flume-env.sh

The inside specifies Java_home, simultaneously adds log4j.properties file in the Conf directory;

3) Flume configuration file:

agent1.sources = ngrinder agent1.channels = MC1 agent1.sinks = avro-sink agent1.sources.ngrinder.type = exec Agent1.sources.ngrinder.command = tail-f/data/logs/ttbrain/ttbrain-recommend-api.log agent1.sources.ngrinder.channels = MC1 agent1.channels.mc1.type = memory agent1.channels.mc1.capacity = +agent1.channels.mc1.keep-alive = Agent1.sinks.avro-sink.type = Avro Agent1.sinks.avro-sink.channel = MC1 agent1.sinks.avro-sink.hostname = 10.153.137.211 agent1.sinks.avro-sink.port = 4545

Note: This sink uses Avro, and the interface server's flume will send the log data to the summary log server via RPC;

4) Start:

Nohup flume-ng agent-c/usr/local/apache-flume-1.7.0-bin/conf-f/usr/local/apache-flume-1.7.0-bin/conf/ Test-tomcat-log.conf-n agent1 >/dev/null 2>&1 &

2. Flume configuration on the summary log server:

1) installation, decompression, configuration flume:

2) Flume configuration file:

collector1.sources = avroin collector1.channels = MC1 collector1.sinks = localout Collector1.sources.AvroIn.type = Avro collector1.sources.AvroIn.bind = 10.153.137.211 collector1.sources.AvroIn.port = 4545 collector1.sources.AvroIn.channels = MC1 collector1.channels.mc1.type = memory collector1.channels.mc1.capacity = collector1.channels.mc1.transactionCapacity = Collector1.sinks.LocalOut.type = file_roll collector1.sinks.LocalOut.sink.directory =/data/tomcat_log_bakCollector1.sinks.LocalOut.sink.rollInterval = 0 Collector1.sinks.LocalOut.channel = MC1

Description

A, the source used here is Avro, and API interface flume docking;

B, use File_roll's sink here, save log data to local disk;

Note: Bind can only write a machine IP or machine name, not writing localhost and so on.

3) Start:

Nohup flume-ng agent-c/usr/local/apache-flume-1.7.0-bin/conf-f/usr/local/apache-flume-1.7.0-bin/conf/tomcat_ Collection.conf-n collector1 -dflume.root.logger=info,console >/dev/null 2>&1 & /c4>


This is, we will find that the/data/tomcat_log_bak directory will generate logs collected from both interface servers.

Second, scenario two description:

The Online API interface service prints logs to the local disk via log4j, installs Flume on the interface server, collects logs through the exec source, and then sends the logs to Flume on the rollup server via Avro Sink, Flume on the rollup server by Avro The source receives the log and then backs it up to HDFs via HDFs sink.

Suppose there are two 10.153.140.250 and 10.153.140.251 for the API interface server, one for the server that summarizes the logs 10.153.137.211

1. Flume configuration on API interface server:

Ibid.

2, the summary server flume configuration:

1) Installation, decompression flume:

2) Flume configuration file:

Agent1.channels = Ch1

Agent1.sources = S1

agent1.sinks = log-sink1

Agent1.sources.s1.type = Avro

Agent1.sources.s1.bind = 10.153.135.113

Agent1.sources.s1.port = 41414

Agent1.sources.s1.threads = 5

Agent1.sources.s1.channels = Ch1

Agent1.channels.ch1.type = Memory

agent1.channels.ch1.capacity = 100000

agent1.channels.ch1.transactionCapacity = 100000

agent1.channels.ch1.keep-alive =

Agent1.sinks.log-sink1.type = HDFs

Agent1.sinks.log-sink1.hdfs.path = Hdfs://hadoop-jy-namenode/data/qytt/flume

Agent1.sinks.log-sink1.hdfs.writeformat = Text

Agent1.sinks.log-sink1.hdfs.filetype = DataStream

Agent1.sinks.log-sink1.hdfs.rollinterval = 0

Agent1.sinks.log-sink1.hdfs.rollsize = 60554432

Agent1.sinks.log-sink1.hdfs.rollcount = 0

Agent1.sinks.log-sink1.hdfs.batchsize = +

Agent1.sinks.log-sink1.hdfs.txneventmax = +

Agent1.sinks.log-sink1.hdfs.calltimeout = 60000

Agent1.sinks.log-sink1.hdfs.appendtimeout = 60000

Agent1.sinks.log-sink1.channel = Ch1


Description

A, the source used here is Avro, and API interface flume docking;

B, the sink here uses HDFs and can write data to HDFs, where the Namenode address of the Hadoop cluster needs to be specified. (hdfs://hadoop-jy-namenode/)

3) Start:

Nohup flume-ng agent-c/usr/local/apache-flume-1.7.0-bin/conf-f/usr/local/apache-flume-1.7.0-bin/conf/hdfs.conf-n Agent1>/dev/null 2>&1 &


At this point, we will generate the logs collected from the two interface servers in the/data/qytt/flume directory in HDFs.



Assuming there are two 10.153.140.250 and 10.153.140.251 API Interface Servers, we can deploy flume on the interface server,

Summary log of the server one 10.153.137.211

Flume uses the exec source to collect each end data to summarize to another server

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.