International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > Web Develop

Setting up Flume High Availability

Last Update:2018-01-08 Source: Internet

Author: User

Tags failover

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Flume is a distributed, highly available, reliable system that collects, moves, and stores disparate amounts of data into a single data storage System. lightweight, simple to configure, suitable for a variety of log collections, and supports failover and load balancing. And it has a very rich set of components. Flume is a three-tier architecture: agent layer, collector layer and store layer, each level can be expanded horizontally. Where the agent contains Source,channel and Sink, three have formed an agent. The duties of the three are as Follows:

Source: used to consume (collect) the data source into the channel component
Channel: Interim storage, Save All Source component information

Sink: read from channel, Delete information in channel after successful reading
Flume single-node Mode schema diagram (official Map) as shown in:

The figure shows that the resulting log is collected from an external system (Web Server) and then sent to the temporary storage channel component via the source component of the Flume agent, and finally to the sink component, where the sink component stores the data directly into the HDFs file System.
The flume version used in this article is the latest update 1.8, which describes the single node configuration and cluster mode configuration Respectively. The Hadoop cluster has been configured prior to Configuration. one, single node mode

1.1 Download and install



[[email protected] ~]$ http://mirrors.hust.edu.cn/apache/flume/stable/apache-flume-1.8.0-bin.tar.gz
[[email protected] ~]$ tar -xzf apache-flume-1.8.0-bin.tar.gz;mv apache-flume-1.8.0-bin /u01/flume

1.2 Setting environment variables



[[email protected] ~]$ vi .bash_profile
export FLUME_HOME=/u01/flume
export PATH=$PATH:$FLUME_HOME/bin

1.3 Creating a Flume configuration file

[[email protected] ~]$ vi /u01/flume/conf/flume-hdfs.conf
#Agent Name
A1.sources = so1
A1.sinks = si1
A1.channels = ch1
#Setting Source so1
A1.sources.so1.type = spooldir
a1.sources.so1.spoolDir = /u01/flume/loghdfs
A1.sources.so1.channels = ch1
a1.sources.so1.fileHeader = false
A1.sources.so1.interceptors = i1
A1.sources.so1.interceptors.i1.type = timestamp
a1.sources.so1.ignorePattern = ^(.)*\\.tmp$
#Setting Sink With HDFS
A1.sinks.si1.channel = ch1
A1.sinks.si1.type = hdfs
A1.sinks.si1.hdfs.path = hdfs://NNcluster/flume/input
A1.sinks.si1.hdfs.fileType = DataStream
A1.sinks.si1.hdfs.writeFormat = Text
A1.sinks.si1.hdfs.rollInternal = 1
A1.sinks.si1.hdfs.filePrefix = %Y-%m-%d
A1.sinks.si1.hdfs.fileSuffix= .txt
#Binding Source and Sink to Channel
A1.channels.ch1.type = file
a1.channels.ch1.checkpointDir = /u01/flume/loghdfs/point
a1.channels.ch1.dataDirs = /u01/flume/loghdfs
[[email protected] ~]$ cp /u01/flume/conf/flume-env.sh.template /u01/flume/conf/flume-env.sh
[[email protected] ~]$ vi /u01/flume/conf/flume-env.sh
Export JAVA_HOME=/usr/java/jdk1.8.0_152
--Create related directories
[[email protected] ~]$ mkdir -p /u01/flume/loghdfs/point
--Link hadoop configuration file?/u01/flume/conf
The existing Hadoop environment is configured with the NameNode high availability, and must be linked to the relevant configuration, otherwise Flume does not know where to store the data.
[[email protected] ~]$ ln -s /u01/hadoop/etc/hadoop/core-site.xml /u01/flume/conf/core-site.xml
[[email protected] ~]$ ln -s /u01/hadoop/etc/hadoop/hdfs-site.xml /u01/flume/conf/hdfs-site.xml

In this way, the Single-node mode is configured to Complete.
1.4 Starting the Flume service


[[email protected] ~]$ flume-ng agent --conf conf --conf-file /u01/flume/conf/flume-hdfs.conf --name a1 -Dflume.root.logger=INFO,console > /u01/flume/logs/flume-hdfs.log 2>&1 &

Note: the A1 in the command represents the name of the agent in the configuration file, and the flume configuration file must use an absolute path.
1.5 Effect Test
under/u01/flume/loghdfs, create a file and write the data, as shown in the following example:

second, Flume cluster mode

The architecture diagram for the flume cluster mode (official Map) is as shown in:

The flume storage can be supported in a variety of ways, listing only HDFs and Kafka (e.g., storing the latest Sunday logs and providing real-time log streams to the Storm system). Here is an example of Oracle's alert log. The environment is shown in the following table:

The alert logs for RAC two nodes in the table are stored in HDFs via Collector1 and Collector2. In addition the flume itself provides a failover mechanism that can be automatically switched and Restored.
2.1 RAC Node Installation flume



[[email protected] ~]$ http://mirrors.hust.edu.cn/apache/flume/stable/apache-flume-1.8.0-bin.tar.gz
[[email protected] ~]$ tar -xzf apache-flume-1.8.0-bin.tar.gz;mv apache-flume-1.8.0-bin /u01/app/oracle/flume

Other nodes of the RAC are similarly installed
2.2 Configure the agent for the RAC node
2.2.1 Configuring EBSDB1 Agent


[[email protected] ~]$ vi /u01/flume/conf/flume-client.properties 
#agent name
agent1.channels = c1
agent1.sources = r1
agent1.sinks = k1 k2
#set gruop
agent1.sinkgroups = g1
#Setting Channel
agent1.channels.c1.type = memory
agent1.channels.c1.capacity = 100000
agent1.channels.c1.transactionCapacity = 100
#Just For Fllowing Error Messgaes
#Space for commit to queue couldn‘t be acquired. Sinks are likely not keeping up with sources, or the buffer size is too tight
agent1.channels.c1.byteCapacityBufferPercentage=20
agent1.channels.c1.byteCapacity=800000
agent1.channels.c1.keep-alive = 60
#Setting Sources
agent1.sources.r1.channels = c1
agent1.sources.r1.type = exec
agent1.sources.r1.command = tail -F /u01/app/oracle/diag/rdbms/prod/prod1/trace/alert_prod1.log
agent1.sources.r1.interceptors = i1 i2
agent1.sources.r1.interceptors.i1.type = static
agent1.sources.r1.interceptors.i1.key = Type
agent1.sources.r1.interceptors.i1.value = LOGIN
agent1.sources.r1.interceptors.i2.type = timestamp
# Setting Sink1
agent1.sinks.k1.channel = c1
agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = hdp01
agent1.sinks.k1.port = 52020
# Setting Sink2
agent1.sinks.k2.channel = c1
agent1.sinks.k2.type = avro
agent1.sinks.k2.hostname = hdp02
agent1.sinks.k2.port = 52020
#Seting Sink Group
agent1.sinkgroups.g1.sinks = k1 k2
#Setting Failover
agent1.sinkgroups.g1.processor.type = failover
agent1.sinkgroups.g1.processor.priority.k1 = 10
agent1.sinkgroups.g1.processor.priority.k2 = 1
agent1.sinkgroups.g1.processor.maxpenalty = 10000

2.2.2 Configuring the agent for EBSDB2


[[email protected] ~]$ vi /u01/flume/conf/flume-client.properties 
#Setting Agent Name
agent1.channels = c1
agent1.sources = r1
agent1.sinks = k1 k2
#Setting Gruop
agent1.sinkgroups = g1
#Setting Channel
agent1.channels.c1.type = memory
agent1.channels.c1.capacity = 100000
agent1.channels.c1.transactionCapacity = 100
#Just For Fllowing Error Messgaes
#Space for commit to queue couldn‘t be acquired. Sinks are likely not keeping up with sources, or the buffer size is too tight#
agent1.channels.c1.byteCapacityBufferPercentage=20
agent1.channels.c1.byteCapacity=800000
agent1.channels.c1.keep-alive = 60
#Seting Sources
agent1.sources.r1.channels = c1
agent1.sources.r1.type = exec
agent1.sources.r1.command = tail -F /u01/app/oracle/diag/rdbms/prod/prod2/trace/alert_prod2.log
agent1.sources.r1.interceptors = i1 i2
agent1.sources.r1.interceptors.i1.type = static
agent1.sources.r1.interceptors.i1.key = Type
agent1.sources.r1.interceptors.i1.value = LOGIN
agent1.sources.r1.interceptors.i2.type = timestamp
#Settinf Sink1
agent1.sinks.k1.channel = c1
agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = hdp01
agent1.sinks.k1.port = 52020
# Setting Sink2
agent1.sinks.k2.channel = c1
agent1.sinks.k2.type = avro
agent1.sinks.k2.hostname = hdp02
agent1.sinks.k2.port = 52020
#Setting Sink Group
agent1.sinkgroups.g1.sinks = k1 k2
#Set Failover
agent1.sinkgroups.g1.processor.type = failover
agent1.sinkgroups.g1.processor.priority.k1 = 10
agent1.sinkgroups.g1.processor.priority.k2 = 1
agent1.sinkgroups.g1.processor.maxpenalty = 10000

2.3 Configuring the Flume collector
2.3.1 Hdp01 's collector configuration



[[email protected] conf]$ vi flume-server.properties
#Setting Agent Name
a1.sources = r1
a1.channels = c1
a1.sinks = k1
#Setting Channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
#Setting Sources
a1.sources.r1.type = avro
a1.sources.r1.bind = hdp01
a1.sources.r1.port = 52020
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = static
a1.sources.r1.interceptors.i1.key = Collector
a1.sources.r1.interceptors.i1.value = hdp01
a1.sources.r1.channels = c1
#Setting Sink To HDFS
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs://NNcluster/flume/Oracle/logs
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=TEXT
a1.sinks.k1.hdfs.rollInterval=1
a1.sinks.k1.channel=c1
a1.sinks.k1.hdfs.filePrefix=%Y-%m-%d
a1.sinks.k1.hdfs.fileSuffix=.txt

2.3.2 Hdp02 's collector configuration



[[email protected] conf]$ vi flume-server.properties 
#Setting Agent Name
a1.sources = r1
a1.channels = c1
a1.sinks = k1
#Setting Channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Seting Sources
a1.sources.r1.type = avro
a1.sources.r1.bind = hdp02
a1.sources.r1.port = 52020
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = static
a1.sources.r1.interceptors.i1.key = Collector
a1.sources.r1.interceptors.i1.value = hdp02
a1.sources.r1.channels = c1
#Setting Sink To HDFS
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs://NNcluster/flume/Oracle/logs
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=TEXT
a1.sinks.k1.hdfs.rollInterval=1
a1.sinks.k1.channel=c1
a1.sinks.k1.hdfs.filePrefix=%Y-%m-%d
a1.sinks.k1.hdfs.fileSuffix=.txt

2.4 Flume Cluster service startup
2.4.1 Start the Flume collector



[[email protected] conf]$ flume-ng agent --conf conf --conf-file /u01/flume/conf/flume-server.properties --name a1 -Dflume.root.logger=INFO,console > /u01/flume/logs/flume-server.log 2>&1 &
[[email protected] conf]$ flume-ng agent --conf conf --conf-file /u01/flume/conf/flume-server.properties --name a1 -Dflume.root.logger=INFO,console > /u01/flume/logs/flume-server.log 2>&1 &

After you start, you can view the flume log file, as Follows:

2.4.2 Start the flume agent



[[email protected] bin]$ ./flume-ng agent --conf conf --conf-file /u01/app/oracle/flume/conf/flume-client.properties --name agent1 -Dflume.root.logger=INFO,console > /u01/app/oracle/flume/logs/flume-client.log 2>&1 &  
[[email protected] bin]$ ./flume-ng agent --conf conf --conf-file /u01/app/oracle/flume/conf/flume-client.properties --name agent1 -Dflume.root.logger=INFO,console > /u01/app/oracle/flume/logs/flume-client.log 2>&1

After the agent starts, observe the Collecter log, you will find that the agent has successfully connected to the collector, such as:

2.5 Flume High-availability Test
Since the weight of the Collector1 configuration is greater than collector2, the Collector1 is preferentially captured and uploaded to the storage System. Here if kill off collector1, by Collector2 responsible for log collection upload work, see whether upload success.

Then restore the Collector1 node of the flume service, again in the Agent1 upload files, found Collector1 restore priority level of acquisition Work.
Reference Documents:
1. Flume 1.8.0 User Guide

Setting up Flume High Availability

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

high availability wordpress hosting mongodb high availability couchbase high availability veeam high availability as400 high availability mariadb high availability sap hana high availability

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Setting up Flume High Availability

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support