1. Load Balancing Scenario
1) Initial: Upstream agent through Round_robin selector, the event is sent to downstream Collecotor1, Collector2
2) Fault: Shut down the Collector1 process to simulate the failure, Agent1 because the configuration of Backoff, will Collecotor1 temporarily removed from the Send list, the event all sent to Collector2
3) Restore: Restart the Collector1 process, Collector1 after the maximum timeout, re-enter the sending list, then the event is again distributed to COLLECTOR1/2
2, node configuration 2.1 upstream Agent flume configuration
# to-flume-loadbalance-client# agent name:a1# source:exec# channel:memory# sink:k1 K2, each set to Avro type to link to next-Level collector# ondefine Source,channel,sink namea1.sources=R1a1.channels=c1a1.sinks=K1 k2# Genevadefine Sourcea1.sources.r1.type=Execa1.sources.r1.command=Tail-f/root/flume_test/server.log# define Sink,each Connect to Next-level Collector via hostname and Porta1.sinks.k1.type = Avroa1.sinks.k1.hostname = SLAVE1 # Upstream Avro Sink bound to downstream host, RPCa1.sinks.k1.port = 4444a1.sinks.k2.type = Avroa1.sinks.k2.hostname = slave2 # upstream Avro S Ink binds to downstream host, PRCa1.sinks.k2.port = 4444# define Sinkgroups, sink'll be a seleced for event distribution based on SELECOTR a1.sinkgroups = g1a1.sinkgroups. G1.sinks =K1 K2a1.sinkgroups.g1.processor.type = Load_balancea1.sinkgroups.g1.processor.selector = round_robin# node fails, the node is removed from the sinkgroup for a period of time A1.sinkgroups.g1.processor.backoff = True# When the node is removed from sinkgroups, the millisecond# node is temporarily removed, and selector does not attempt to send data to the node, which can increase the event distribution speed somewhat, but the event may be distributed unevenly A1.sinkgroups.g1.processor.selector.maxTimeOut = 10000# toDefine channel A1.channels.c1.type=memory# Number of eventsinchMemory Queue A1.channels.c1.capacity= +# Number of events for 1commit (Commit events to memory queue) A1.channels.c1.transactioncapacity= -# .bind Source,sink to Channela1.sources.r1.channels=C1A1.sinks.k1.channel = C1a1.sinks.k2.channel = C1
2.2 Flume configuration of downstream Collector1
# onSpecify Agent,source,sink,channela1.sources=r1a1.sinks=K1a1.channels=C1# Avro Source,connect to local port 4444a1.sources.r1.type = Avro # downstream Avro Source is bound to a native port, and the port is aligned with the configuration values in the upstream agent A1.sources.r1.bind = Slave1a1.sources.r1.port = 4444# GenevaLogger Sinka1.sinks.k1.type=Logger #GenevaChannel,memorya1.channels.c1.type=memorya1.channels.c1.capacity= +a1.channels.c1.transactionCapacity= -# tobind Source,sink to Channela1.sources.r1.channels=C1a1.sinks.k1.channel= C1
2.3 Flume configuration of downstream Collecotor2
# onSpecify Agent,source,sink,channela1.sources=r1a1.sinks=K1a1.channels=C1# Avro Source,connect to local port 4444 A1.sources.r1.type = Avro # downstream Avro Source is bound to a native port, and the configuration value in the upstream agent remains one to A1.sources.r1.bind = Slave2a1.sources.r1.port = 4444# GenevaLogger Sinka1.sinks.k1.type=Logger #GenevaChannel,memorya1.channels.c1.type=memorya1.channels.c1.capacity= +a1.channels.c1.transactionCapacity= -# tobind Source,sink to Channela1.sources.r1.channels=C1a1.sinks.k1.channel= C1
3. Start the flume agent on each node
Start Collector1
Agent --conf conf --conf-file./conf/flume-failover-server.properties --name A1 -dflume.root.logger=info,console
Start Collector2
Agent --conf conf --conf-file./conf/flume-failover-server.properties --name A1 -dflume.root.logger=info,console
Start an upstream agent
Agent --conf conf --conf-file./conf/flume-loadbalance-client.properties --name A1 -dflume.root.logger=info,console
Note: You need to start the downstream collector node before starting the agent, otherwise the agent starts, but the downstream collector does not start, the agent will find no available downstream nodes, resulting in an error
4. Fault Simulation
1) before the fault, to the agent's machine log file, through the form of a pipeline to append data to see if the event polling sent to Collector1, Collecotor2
The following data is appended to the agent
Collector1 Receive and Print to console event (2,4,7)
Collector2 Receive and Print to console event (1,4,5,6,8)
Summary: flume round_robin Distribution, if it is a small test set, distribution results are not strictly round_robin. Some nodes are distributed more often and some nodes are distributed less often
2) Simulate the failure and kill the Collector1 process
3) Append the data to the agent again to see if the event is all distributed to Collector2
Collector2 Receive all event at this time and print to console
Note 1 Details
When the Collector1 fails, the agent sends an event to indicate that 1 sink are not available and tries the next sink to send the event
4) Restore Collector1 to view the distribution results at this time of event
Append Data on Agent
Collector1 Distribution of data obtained
Collector2 Distribution of data obtained
5. The official configuration reference under load Balancing scenario
04_flume Multi-node Load_balance practice