03_flume Multi-node failover practice

Last Update:2017-12-02 Source: Internet

Author: User

Tags failover

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Practice Scenario

Simulates the failover of an upstream flume agent when sending an event (failover)

1) Initial: Upstream agent passes event to active downstream node Collector1

2) Collector1 fault: Kill the process to simulate the way the event is sent to Collector2, complete the failover

3) Collector1 Recovery: Rerun the process, after the maximum penalty time, the event will be sent back to Collector1

2. configuration file Agent configuration file

#flume-failover- Client# agent name:a1# source:exec with given command, monitor output of the command, each Lin E'll be generated as a event # channel:memory# sink:k1 K2, each set to Avro type to link to next -Level  Collector# ondefine Source,channel,sink namea1.sources=R1a1.channels=c1a1.sinks=K1 k2# Genevadefine Sourcea1.sources.r1.type=Execa1.sources.r1.command=Tail-f/root/flume_test/server.log#define Sink,each Connect to Next-level collector via hostnameand  Porta1.sinks.k1.type = Avroa1.sinks.k1.hostname = slave1 # sink bind to remote host, RPC (upstream agent Avro Sink bound to downstream hosts)A1.sinks.k1.po RT = 4444a1.sinks.k2.type = Avroa1.sinks.k2.hostname = slave2 # sink band to remote host, PRC (upstream agent Avro Sink bound to downstream Host)A1.sinks.k2.port = 4444# define SINKGROUPS,ONLY 1Sink'll be selected as active based in priority and  online statusa1.sinkgroups=g1a1.sinkgroups.g1.sinks=K1 K2a1.sinkgroups.g1.processor.type=Failover# K1 is selected as active to send event if K1 is online, otherwise K2 is selecteda1.sinkgroups.g1.processor.priority.k1= # Based on priority selection, priority is selected active, the same priority is selected according to the order in which K1,k2 appears. A1.sinkgroups.g1.processor.priority.k2=1# failover time, milliseconds# ifK1 is down and up again, K1 'll be selected as active after1secondsa1.sinkgroups.g1.processor.priority.maxpenality= # # failback time #  todefine Channela1.channels.c1.type=memory# Number of eventsinchMemory Queue A1.channels.c1.capacity= +# Number of events for 1commit (Commit events to memory queue) A1.channels.c1.transactioncapacity= -#  .bind Source,sink to Channela1.sources.r1.channels=C1a1.sinks.k1.channel= C1

A1.sinks.k2.channel = C1

Collector1 configuration file

# onSpecify Agent,source,sink,channela1.sources=r1a1.sinks=K1a1.channels=C1# Avro Source,connect to local port 4444a1.sources.r1.type = Avro # downstream Avro Source is bound to this machine, the port number is consistent with the upstream agent specified value A1.S Ources.r1.bind = Slave1a1.sources.r1.port = 4444# GenevaLogger Sinka1.sinks.k1.type=Logger #GenevaChannel,memorya1.channels.c1.type=memorya1.channels.c1.capacity= +a1.channels.c1.transactionCapacity= -#  tobind Source,sink to Channela1.sources.r1.channels=C1a1.sinks.k1.channel= C1

Collector2 configuration file

# onSpecify Agent,source,sink,channela1.sources=r1a1.sinks=K1a1.channels=C1# Avro Source,connect to local port 4444a1.sources.r1.type = Avro # downstream Avro Source is bound to this machine, the port number is consistent with the upstream agent specified value A1.sou Rces.r1.bind = Slave2a1.sources.r1.port = 4444# GenevaLogger Sinka1.sinks.k1.type=Logger #GenevaChannel,memorya1.channels.c1.type=memorya1.channels.c1.capacity= +a1.channels.c1.transactionCapacity= -#  tobind Source,sink to Channela1.sources.r1.channels=C1a1.sinks.k1.channel= C1

3. Start collector1,2 and Agent

Start Collector1

--conf conf --conf-file./conf/flume-failover-server.properties --name A1 -dflume.root.logger=info,console

Interpretation: Start the flume agent according to the Flume-failvoer-server.properties configuration file in the Conf directory under the current directory; Agent name is called A1;

Flume log information to the terminal at info level and above

Start Collector2

./--conf conf--conf-file./conf/flume-failover-server.properties--name A1 - Dflume.root.logger=info,console

Start Agent

./--conf conf--conf-file./conf/flume-failover-client.properties--name A1 - Dflume.root.logger=info,console

Attention:

1) to start the downstream collector, and then to start the agent; Otherwise, the agent will start the downstream effective site selection, at this time collector if not started, there will be an error

2) After 3 agents have started normally, the agent will establish connections to all downstream sites: experience three stages of connected, bound, open

4, fault simulation and recovery

1) Before the failure occurs: first add data to the log file, pipe way to see if the event is printed at the Collector1 terminal

The SLAVE1 node where the Collector1 is located receives and prints an event to the terminal

2) Failure simulation: Kill Collector1 Process

3) Try sending the data again

The Slave2 node where the Collector2 is located receives and prints an event to the terminal

At the same time, the agent will always try to reestablish and Collector1 the connection

4) Restart the Collector1 process to simulate failback

Agent --conf conf --conf-file./conf/flume-failover-server.properties --name A1 -dflume.root.logger=info,console

5) Append data to log again to see if the event was sent to Collector1 again and printed to the terminal

At this point Collecot1 receives and prints the event (the failback time is set to 1 seconds in the agent's configuration)

6) Consider all downstream nodes all down, and then the downstream node recovery situation, the final data to whom?

Because the flume has an event-based transaction mechanism, when downstream nodes are all down, Flume will keep the event in the channel

When the downstream node resumes, the agent makes the active node selection again, and then sends the evnet again

When the downstream node receives the event, the agent removes the event from the channel

If the COLLECOTR2 is restored first, the event will be sent to Collector2; And no data is sent to Collector2 after Collecot1, so the event is now removed from the agent's channel

03_flume Multi-node failover practice

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More