Flume single channel multi-sink test

Source: Internet
Author: User
Tags failover

Description

The results are personally tested and provided with simple data analysis, which is rudimentary and may result in inaccurate results.

First of all, the results, multi-sink can be directly according to the general configuration, so that each sink will start a sinkrunner, equivalent to a sink per thread, non-interference, load balancing is achieved through the channel, the efficiency will be increased to n times, if on this basis to join

Sinkgroup, the Sinkgroup will start a sinkrunner, which is single-threaded, Sinkgroup reads data from the channel and distributes it to the sink mounted below, with no improvement in efficiency and single sink, However, two sink can be implemented with load balancing or hot standby mode.

The above analysis is reference to the source code also reference the article on the Internet, not guarantee absolutely correct, forgive me.

Individuals think that the actual use or direct configuration of multi-sink, can improve efficiency, to achieve load balance, as for hot-standby, can be through other load balancing software or hardware to provide virtual IP implementation.

Paste the configuration of the test

The configuration is the same, use the time to open or close sinkgroup comments.

This is the configuration of the collection node.

#flume配置文件

Agent1.sources=execsource

agent1.sinks= Avrosink1 Avrosink2

Agent1.channels=filechannel


#sink groups affect performance very much

#agent1. Sinkgroups=avrogroup

#agent1. sinkgroups.avroGroup.sinks = Avrosink1 Avrosink2

#sink调度模式 load_balance Failover

#agent1. sinkgroups.avrogroup.processor.type=load_balance

#负载均衡模式 polling for Random Round_robin

#agent1. Sinkgroups.avrogroup.processor.selector=round_robin

#失效降级

#agent1. sinkgroups.avrogroup.processor.backoff=true

#降级时间30秒

#agent1. sinkgroups.avrogroup.processor.maxtimeout=30000


#配置execSource

#channel

Agent1.sources.execsource.channels=filechannel

#souorce type

Agent1.sources.execsource.type=exec

#监控正在写入的日志文件

Agent1.sources.execsource.command=tail-f/home/flume/log/test.log

#如果命令死掉是否重新启动

Agent1.sources.execsource.restart=true

#重新启动命令的间隔时间

agent1.sources.execsource.restartthrottle=2000

#记录命令的错误日志

Agent1.sources.execsource.logstderr=true

#批量提交的大小

agent1.sources.execsource.batchsize=1000

#批量提交的超时 Unit milliseconds

agent1.sources.execsource.batchtimeout=1000


#配置filechannel

#channel类型 file Memory

Agent1.channels.filechannel.type=memory

#agent1. Channels.filechannel.checkpointdir=/home/flume/channel/log/ckpdir

#agent1. Channels.filechannel.datadirs=/home/flume/channel/log/data

#单个文件大小 100M

#agent1. channels.filechannel.maxfilesize=204800000

#channel的event个数

agent1.channels.filechannel.capacity=20000000

#事务event个数

agent1.channels.filechannel.transactioncapacity=10000

#内存channel占用内存大小 default is 0.8 of the JVM's memory

agent1.channels.filechannel.bytecapacity=1024000000


#配置avrosink1

#sink的channel

Agent1.sinks.avrosink1.channel=filechannel

#sink类型 Avro Thrift

Agent1.sinks.avrosink1.type=avro

#ip地址

agent1.sinks.avrosink1.hostname=10.8.6.161

#端口

agent1.sinks.avrosink1.port=1463

#批量提交的个数

agent1.sinks.avrosink1.batch-size=1000

#连接超时 milliseconds

agent1.sinks.avrosink1.connect-timeout=3000

#请求超时 milliseconds

agent1.sinks.avrosink1.request-timeout=20000

Polling for back-end load Balancing #重新连接source的时间 unit of seconds

agent1.sinks.avrosink1.reset-connection-interval=300

#最大连接数 Default 5

Agent1.sinks.avrosink1.maxconnections=5


#配置avrosink2

#sink的channel

Agent1.sinks.avrosink2.channel=filechannel

#sink类型 Avro Thrift

Agent1.sinks.avrosink2.type=avro

#ip地址

agent1.sinks.avrosink2.hostname=10.8.6.160

#端口

agent1.sinks.avrosink2.port=1463

#批量提交的个数

agent1.sinks.avrosink2.batch-size=1000

#连接超时 milliseconds

agent1.sinks.avrosink2.connect-timeout=3000

#请求超时 milliseconds

agent1.sinks.avrosink2.request-timeout=20000

Polling for back-end load Balancing #重新连接source的时间 unit of seconds

agent1.sinks.avrosink2.reset-connection-interval=300

#最大连接数 Default 5

Agent1.sinks.avrosink2.maxconnections=5


This is the configuration of the aggregation node.

#flume配置文件

Agent1.sources=avrosource

Agent1.sinks=hdfssink1 HDFSSINK2

Agent1.channels=filechannel


#sink groups can be configured with spaces separated by multiple very affected performance shutdown

#agent1. Sinkgroups=hdfsgroup

#agent1. sinkgroups.hdfsGroup.sinks = Hdfssink1 Hdfssink2

#sink调度模式 load_balance Failover

#agent1. sinkgroups.hdfsgroup.processor.type=load_balance

#负载均衡模式 polling for Random Round_robin

#agent1. Sinkgroups.hdfsgroup.processor.selector=round_robin

#失效降级

#agent1. sinkgroups.hdfsgroup.processor.backoff=true

#降级时间30秒

#agent1. sinkgroups.hdfsgroup.processor.maxtimeout=30000


#配置avrosource

#channel

Agent1.sources.avrosource.channels=filechannel

#source Type Thrift Avro

Agent1.sources.avrosource.type=avro

#监控正在写入的日志文件

agent1.sources.avrosource.bind=0.0.0.0

#端口

agent1.sources.avrosource.port=1463

#线程数

Agent1.sources.avrosource.threads=24

#增加拦截器 can be configured with spaces to separate multiple

Agent1.sources.avrosource.interceptors=i1

#拦截器类型 must be configured by builder to create interceptor

Agent1.sources.avrosource.interceptors.i1.type=com.cfto.flume.interceptor.timestampinterceptor$builder


#配置filechannel

#channel类型 file Memory

Agent1.channels.filechannel.type=memory

Agent1.channels.filechannel.checkpointdir=/tmp/flume1/channel/log/ckpdir

Agent1.channels.filechannel.datadirs=/tmp/flume1/channel/log/data

#单个文件大小 100M

#agent1. channels.filechannel.maxfilesize=204800000

#channel的event个数

agent1.channels.filechannel.capacity=200000000

#事务event个数

agent1.channels.filechannel.transactioncapacity=10000

#内存channel占用内存大小 default is 0.8 of the JVM's memory

agent1.channels.filechannel.bytecapacity=1024000000


#配置hdfssink1

#连接的channel

Agent1.sinks.hdfssink1.channel=filechannel

#sink的类型

Agent1.sinks.hdfssink1.type=hdfs

#写入hdfs的路径%{} is to take the attribute from the header% is its own parsing property%y/%m/%d

#最后不要有/

Agent1.sinks.hdfssink1.hdfs.path = Hdfs://nameservice1/flumelog/%{datedir}

#文件名前缀

Agent1.sinks.hdfssink1.hdfs.fileprefix=hostxx_1

#是否是用本地时间戳 header does not have a timestamp attribute and needs to get time is set to True

Agent1.sinks.hdfssink1.hdfs.useLocalTimeStamp = True

#文件类型 sequencefile (default) DataStream (not compressed) compressedstream (compressed)

Agent1.sinks.hdfssink1.hdfs.filetype=compressedstream

#压缩编码

Agent1.sinks.hdfssink1.hdfs.codec=lzop

#文件写入格式 Text Writable

Agent1.sinks.hdfssink1.hdfs.writeformat=text

#按时间滚动文件 unit seconds Default 30 seconds 0 do not scroll

Agent1.sinks.hdfssink1.hdfs.rollinterval=0

#按文件大小滚动文件 Unit byte 1G

agent1.sinks.hdfssink1.hdfs.rollsize=1024000000

#按event是个数滚动文件 Default 10 0 does not scroll

Agent1.sinks.hdfssink1.hdfs.rollcount=0

# #批量提交大小

agent1.sinks.hdfssink1.hdfs.batchsize=1000

Thread pool size for #HDFS IO operations

agent1.sinks.hdfssink1.hdfs.threadspoolsize=10

#hdfs文件访问超时时间 default 100000 unit milliseconds

agent1.sinks.hdfssink1.hdfs.calltimeout=30000

#文件关闭前空闲时间 default 0 does not turn off unit seconds

agent1.sinks.hdfssink1.hdfs.idletimeout=300

#写入hdfs文件的用户

Agent1.sinks.hdfssink1.hdfs.proxyuser=hadoop

#hdfs文件操作失败后的重试时间 Unit seconds Default 180

Agent1.sinks.hdfssink1.hdfs.retryInterval = 3


#配置hdfssink2

#连接的channel

Agent1.sinks.hdfssink2.channel=filechannel

#sink的类型

Agent1.sinks.hdfssink2.type=hdfs

#写入hdfs的路径%{} is to take the attribute from the header% is its own parsing property%y/%m/%d

#最后不要有/

Agent1.sinks.hdfssink2.hdfs.path = Hdfs://nameservice1/flumelog/%{datedir}

#文件名前缀

Agent1.sinks.hdfssink2.hdfs.fileprefix=hostxx_2

#是否是用本地时间戳 header does not have a timestamp attribute and needs to get time is set to True

Agent1.sinks.hdfssink2.hdfs.useLocalTimeStamp = True

#文件类型 sequencefile (default) DataStream (not compressed) compressedstream (compressed)

Agent1.sinks.hdfssink2.hdfs.filetype=compressedstream

#压缩编码

Agent1.sinks.hdfssink2.hdfs.codec=lzop

#文件写入格式 Text Writable

Agent1.sinks.hdfssink2.hdfs.writeformat=text

#按时间滚动文件 unit seconds Default 30 seconds 0 do not scroll

Agent1.sinks.hdfssink2.hdfs.rollinterval=0

#按文件大小滚动文件 Unit byte 1G

agent1.sinks.hdfssink2.hdfs.rollsize=1024000000

#按event是个数滚动文件 Default 10 0 does not scroll

Agent1.sinks.hdfssink2.hdfs.rollcount=0

# #批量提交大小

agent1.sinks.hdfssink2.hdfs.batchsize=1000

Thread pool size for #HDFS IO operations

agent1.sinks.hdfssink2.hdfs.threadspoolsize=10

#hdfs文件访问超时时间 default 100000 unit milliseconds

agent1.sinks.hdfssink2.hdfs.calltimeout=30000

#文件关闭前空闲时间 default 0 does not turn off unit seconds

agent1.sinks.hdfssink2.hdfs.idletimeout=300

#写入hdfs文件的用户

Agent1.sinks.hdfssink2.hdfs.proxyuser=hadoop

#hdfs文件操作失败后的重试时间 Unit seconds Default 180

Agent1.sinks.hdfssink2.hdfs.retryInterval = 3


This article is from the IT Worker blog, so be sure to keep this source http://luhaiyou.blog.51cto.com/3179056/1703291

Flume single channel multi-sink test

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.