Flume ng High Availability Cluster Setup:
Overall diagram of the architecture:
Schema allocation:
Role |
Host |
Port |
Agent1 |
Hadoop3 |
52020 |
Collector1 |
Hadoop1 |
52020 |
Collector2 |
Hadoop2 |
52020 |
Agent1 configuration (flume-client.conf):
#agent1 nameagent1.channels = c1agent1.sources = R1agent1.sinks = K1 K2#set gruopagent1.sinkgroups = G1#set channelagent1. Channels.c1.type = memoryagent1.channels.c1.capacity = 1000agent1.channels.c1.transactioncapacity = 100agent1.sources.r1.channels = C1agent1.sources.r1.type = Execagent1.sources.r1.command = Tail-f/home/sky/flume/log _exec_tailagent1.sources.r1.interceptors = I1 I2agent1.sources.r1.interceptors.i1.type = Staticagent1.sources.r1.interceptors.i1.key = Typeagent1.sources.r1.interceptors.i1.value = LOGINagent1.sources.r1.interceptors.i2.type = timestamp# Set Sink1agent1.sinks.k1.channel = C1agent1.sinks.k1.type = Avroagent1.sinks.k1.hostname = Hadoop1agent1.sinks.k1.port = 52020# Set Sink2agent1.sinks.k2.channel = C1agent1.sinks.k2.type = Avroagent1.sinks.k2.hostname = Hadoop2agent1.sinks.k2.port = 52020#set Sink Groupagent1.sinkgroups.g1.sinks = K1 K2#set Failoveragent1.sinkgroups.g1.processor.type = FAILOVERAGENT1.SINKGROUPS.G1.PROCESSOR.PRIORITY.K1 = 10agent1.sinkgrOUPS.G1.PROCESSOR.PRIORITY.K2 = 1agent1.sinkgroups.g1.processor.maxpenalty = 10000
Collector1 configuration (flume-server.conf):
#set Agent namea1.sources = r1a1.channels = C1a1.sinks = K1#set Channela1.channels.c1.type = memorya1.channels.c1.capacit y = 1000a1.channels.c1.transactioncapacity = 100# other Node,nna to Nnsa1.sources.r1.type = Avroa1.sources.r1.bind = Hadoo P1a1.sources.r1.port = 52020a1.sources.r1.interceptors = I1a1.sources.r1.interceptors.i1.type = Statica1.sources.r1.interceptors.i1.key = Collectora1.sources.r1.interceptors.i1.value = Hadoop1a1.sources.r1.channels = C1#set sink to HDFSA1.SINKS.K1.TYPE=LOGGERA1.SINKS.K1.CHANNEL=C1
Collector2 configuration (flume-server.conf):
#set Agent namea1.sources = r1a1.channels = C1a1.sinks = K1#set Channela1.channels.c1.type = memorya1.channels.c1.capacit y = 1000a1.channels.c1.transactioncapacity = 100# other Node,nna to Nnsa1.sources.r1.type = Avroa1.sources.r1.bind = Hadoo P2a1.sources.r1.port = 52020a1.sources.r1.interceptors = I1a1.sources.r1.interceptors.i1.type = Statica1.sources.r1.interceptors.i1.key = Collectora1.sources.r1.interceptors.i1.value = Hadoop2a1.sources.r1.channels = C1#set sink to HDFSA1.SINKS.K1.TYPE=LOGGERA1.SINKS.K1.CHANNEL=C1
Start the server first and start the client:
Flume-ng agent--conf conf--conf-file/usr/local/flume/conf/flume-server.conf--name A1-dflume.root.logger=info, Console
Flume-ng agent--conf conf--conf-file/usr/local/flume/conf/flume-client.conf--name Agent1-dflume.root.logger=info, Console
Test validation:
HADOOP1 received the news of HADOOP3, haha. Hadoop2 didn't get the message because HADOOP1 's priority was high.
HADOOP3:
HADOOP1:
HADOOP2:
Test failover again:
Stop HADOOP1 Flume and send the data again in HADOOP3. Visible HADOOP3 Flume error was re-connected, and HADOOP2 received the data. If you start Hadoop1 flume again, everything will revert back to HADOOP1 receive.
HADOOP3:
HADOOP2:
So the test is finished. The flume high-availability cluster is built!
Flume study notes Flume ng high Availability cluster construction