Description
The results are personally tested and provided with simple data analysis, which is rudimentary and may result in inaccurate results.
First of all, the results, multi-sink can be directly according to the general configuration, so that each sink will start a sinkrunner, equivalent to a sink per thread, non-interference, load balancing is achieved through the channel, the efficiency will be increased to n times, if on this basis to join
Sinkgroup, the Sinkgroup will start a sinkrunner, which is single-threaded, Sinkgroup reads data from the channel and distributes it to the sink mounted below, with no improvement in efficiency and single sink, However, two sink can be implemented with load balancing or hot standby mode.
The above analysis is reference to the source code also reference the article on the Internet, not guarantee absolutely correct, forgive me.
Individuals think that the actual use or direct configuration of multi-sink, can improve efficiency, to achieve load balance, as for hot-standby, can be through other load balancing software or hardware to provide virtual IP implementation.
Paste the configuration of the test
The configuration is the same, use the time to open or close sinkgroup comments.
This is the configuration of the collection node.
#flume配置文件
Agent1.sources=execsource
agent1.sinks= Avrosink1 Avrosink2
Agent1.channels=filechannel
#sink groups affect performance very much
#agent1. Sinkgroups=avrogroup
#agent1. sinkgroups.avroGroup.sinks = Avrosink1 Avrosink2
#sink调度模式 load_balance Failover
#agent1. sinkgroups.avrogroup.processor.type=load_balance
#负载均衡模式 polling for Random Round_robin
#agent1. Sinkgroups.avrogroup.processor.selector=round_robin
#失效降级
#agent1. sinkgroups.avrogroup.processor.backoff=true
#降级时间30秒
#agent1. sinkgroups.avrogroup.processor.maxtimeout=30000
#配置execSource
#channel
Agent1.sources.execsource.channels=filechannel
#souorce type
Agent1.sources.execsource.type=exec
#监控正在写入的日志文件
Agent1.sources.execsource.command=tail-f/home/flume/log/test.log
#如果命令死掉是否重新启动
Agent1.sources.execsource.restart=true
#重新启动命令的间隔时间
agent1.sources.execsource.restartthrottle=2000
#记录命令的错误日志
Agent1.sources.execsource.logstderr=true
#批量提交的大小
agent1.sources.execsource.batchsize=1000
#批量提交的超时 Unit milliseconds
agent1.sources.execsource.batchtimeout=1000
#配置filechannel
#channel类型 file Memory
Agent1.channels.filechannel.type=memory
#agent1. Channels.filechannel.checkpointdir=/home/flume/channel/log/ckpdir
#agent1. Channels.filechannel.datadirs=/home/flume/channel/log/data
#单个文件大小 100M
#agent1. channels.filechannel.maxfilesize=204800000
#channel的event个数
agent1.channels.filechannel.capacity=20000000
#事务event个数
agent1.channels.filechannel.transactioncapacity=10000
#内存channel占用内存大小 default is 0.8 of the JVM's memory
agent1.channels.filechannel.bytecapacity=1024000000
#配置avrosink1
#sink的channel
Agent1.sinks.avrosink1.channel=filechannel
#sink类型 Avro Thrift
Agent1.sinks.avrosink1.type=avro
#ip地址
agent1.sinks.avrosink1.hostname=10.8.6.161
#端口
agent1.sinks.avrosink1.port=1463
#批量提交的个数
agent1.sinks.avrosink1.batch-size=1000
#连接超时 milliseconds
agent1.sinks.avrosink1.connect-timeout=3000
#请求超时 milliseconds
agent1.sinks.avrosink1.request-timeout=20000
Polling for back-end load Balancing #重新连接source的时间 unit of seconds
agent1.sinks.avrosink1.reset-connection-interval=300
#最大连接数 Default 5
Agent1.sinks.avrosink1.maxconnections=5
#配置avrosink2
#sink的channel
Agent1.sinks.avrosink2.channel=filechannel
#sink类型 Avro Thrift
Agent1.sinks.avrosink2.type=avro
#ip地址
agent1.sinks.avrosink2.hostname=10.8.6.160
#端口
agent1.sinks.avrosink2.port=1463
#批量提交的个数
agent1.sinks.avrosink2.batch-size=1000
#连接超时 milliseconds
agent1.sinks.avrosink2.connect-timeout=3000
#请求超时 milliseconds
agent1.sinks.avrosink2.request-timeout=20000
Polling for back-end load Balancing #重新连接source的时间 unit of seconds
agent1.sinks.avrosink2.reset-connection-interval=300
#最大连接数 Default 5
Agent1.sinks.avrosink2.maxconnections=5
This is the configuration of the aggregation node.
#flume配置文件
Agent1.sources=avrosource
Agent1.sinks=hdfssink1 HDFSSINK2
Agent1.channels=filechannel
#sink groups can be configured with spaces separated by multiple very affected performance shutdown
#agent1. Sinkgroups=hdfsgroup
#agent1. sinkgroups.hdfsGroup.sinks = Hdfssink1 Hdfssink2
#sink调度模式 load_balance Failover
#agent1. sinkgroups.hdfsgroup.processor.type=load_balance
#负载均衡模式 polling for Random Round_robin
#agent1. Sinkgroups.hdfsgroup.processor.selector=round_robin
#失效降级
#agent1. sinkgroups.hdfsgroup.processor.backoff=true
#降级时间30秒
#agent1. sinkgroups.hdfsgroup.processor.maxtimeout=30000
#配置avrosource
#channel
Agent1.sources.avrosource.channels=filechannel
#source Type Thrift Avro
Agent1.sources.avrosource.type=avro
#监控正在写入的日志文件
agent1.sources.avrosource.bind=0.0.0.0
#端口
agent1.sources.avrosource.port=1463
#线程数
Agent1.sources.avrosource.threads=24
#增加拦截器 can be configured with spaces to separate multiple
Agent1.sources.avrosource.interceptors=i1
#拦截器类型 must be configured by builder to create interceptor
Agent1.sources.avrosource.interceptors.i1.type=com.cfto.flume.interceptor.timestampinterceptor$builder
#配置filechannel
#channel类型 file Memory
Agent1.channels.filechannel.type=memory
Agent1.channels.filechannel.checkpointdir=/tmp/flume1/channel/log/ckpdir
Agent1.channels.filechannel.datadirs=/tmp/flume1/channel/log/data
#单个文件大小 100M
#agent1. channels.filechannel.maxfilesize=204800000
#channel的event个数
agent1.channels.filechannel.capacity=200000000
#事务event个数
agent1.channels.filechannel.transactioncapacity=10000
#内存channel占用内存大小 default is 0.8 of the JVM's memory
agent1.channels.filechannel.bytecapacity=1024000000
#配置hdfssink1
#连接的channel
Agent1.sinks.hdfssink1.channel=filechannel
#sink的类型
Agent1.sinks.hdfssink1.type=hdfs
#写入hdfs的路径%{} is to take the attribute from the header% is its own parsing property%y/%m/%d
#最后不要有/
Agent1.sinks.hdfssink1.hdfs.path = Hdfs://nameservice1/flumelog/%{datedir}
#文件名前缀
Agent1.sinks.hdfssink1.hdfs.fileprefix=hostxx_1
#是否是用本地时间戳 header does not have a timestamp attribute and needs to get time is set to True
Agent1.sinks.hdfssink1.hdfs.useLocalTimeStamp = True
#文件类型 sequencefile (default) DataStream (not compressed) compressedstream (compressed)
Agent1.sinks.hdfssink1.hdfs.filetype=compressedstream
#压缩编码
Agent1.sinks.hdfssink1.hdfs.codec=lzop
#文件写入格式 Text Writable
Agent1.sinks.hdfssink1.hdfs.writeformat=text
#按时间滚动文件 unit seconds Default 30 seconds 0 do not scroll
Agent1.sinks.hdfssink1.hdfs.rollinterval=0
#按文件大小滚动文件 Unit byte 1G
agent1.sinks.hdfssink1.hdfs.rollsize=1024000000
#按event是个数滚动文件 Default 10 0 does not scroll
Agent1.sinks.hdfssink1.hdfs.rollcount=0
# #批量提交大小
agent1.sinks.hdfssink1.hdfs.batchsize=1000
Thread pool size for #HDFS IO operations
agent1.sinks.hdfssink1.hdfs.threadspoolsize=10
#hdfs文件访问超时时间 default 100000 unit milliseconds
agent1.sinks.hdfssink1.hdfs.calltimeout=30000
#文件关闭前空闲时间 default 0 does not turn off unit seconds
agent1.sinks.hdfssink1.hdfs.idletimeout=300
#写入hdfs文件的用户
Agent1.sinks.hdfssink1.hdfs.proxyuser=hadoop
#hdfs文件操作失败后的重试时间 Unit seconds Default 180
Agent1.sinks.hdfssink1.hdfs.retryInterval = 3
#配置hdfssink2
#连接的channel
Agent1.sinks.hdfssink2.channel=filechannel
#sink的类型
Agent1.sinks.hdfssink2.type=hdfs
#写入hdfs的路径%{} is to take the attribute from the header% is its own parsing property%y/%m/%d
#最后不要有/
Agent1.sinks.hdfssink2.hdfs.path = Hdfs://nameservice1/flumelog/%{datedir}
#文件名前缀
Agent1.sinks.hdfssink2.hdfs.fileprefix=hostxx_2
#是否是用本地时间戳 header does not have a timestamp attribute and needs to get time is set to True
Agent1.sinks.hdfssink2.hdfs.useLocalTimeStamp = True
#文件类型 sequencefile (default) DataStream (not compressed) compressedstream (compressed)
Agent1.sinks.hdfssink2.hdfs.filetype=compressedstream
#压缩编码
Agent1.sinks.hdfssink2.hdfs.codec=lzop
#文件写入格式 Text Writable
Agent1.sinks.hdfssink2.hdfs.writeformat=text
#按时间滚动文件 unit seconds Default 30 seconds 0 do not scroll
Agent1.sinks.hdfssink2.hdfs.rollinterval=0
#按文件大小滚动文件 Unit byte 1G
agent1.sinks.hdfssink2.hdfs.rollsize=1024000000
#按event是个数滚动文件 Default 10 0 does not scroll
Agent1.sinks.hdfssink2.hdfs.rollcount=0
# #批量提交大小
agent1.sinks.hdfssink2.hdfs.batchsize=1000
Thread pool size for #HDFS IO operations
agent1.sinks.hdfssink2.hdfs.threadspoolsize=10
#hdfs文件访问超时时间 default 100000 unit milliseconds
agent1.sinks.hdfssink2.hdfs.calltimeout=30000
#文件关闭前空闲时间 default 0 does not turn off unit seconds
agent1.sinks.hdfssink2.hdfs.idletimeout=300
#写入hdfs文件的用户
Agent1.sinks.hdfssink2.hdfs.proxyuser=hadoop
#hdfs文件操作失败后的重试时间 Unit seconds Default 180
Agent1.sinks.hdfssink2.hdfs.retryInterval = 3
This article is from the IT Worker blog, so be sure to keep this source http://luhaiyou.blog.51cto.com/3179056/1703291
Flume single channel multi-sink test