Sink: read from channel, Delete information in channel after successful reading
Flume single-node Mode schema diagram (official Map) as shown in:
The figure shows that the resulting log is collected from an external system (Web Server) and then sent to the temporary storage channel component via the source component of the Flume agent, and finally to the sink component, where the sink component stores the data directly into the HDFs file System.
The flume version used in this article is the latest update 1.8, which describes the single node configuration and cluster mode configuration Respectively. The Hadoop cluster has been configured prior to Configuration. one, single node mode
1.1 Download and install
[[email protected] ~]$ http://mirrors.hust.edu.cn/apache/flume/stable/apache-flume-1.8.0-bin.tar.gz
[[email protected] ~]$ tar -xzf apache-flume-1.8.0-bin.tar.gz;mv apache-flume-1.8.0-bin /u01/flume
1.2 Setting environment variables
[[email protected] ~]$ vi .bash_profile
export FLUME_HOME=/u01/flume
export PATH=$PATH:$FLUME_HOME/bin
1.3 Creating a Flume configuration file
[[email protected] ~]$ vi /u01/flume/conf/flume-hdfs.conf
#Agent Name
A1.sources = so1
A1.sinks = si1
A1.channels = ch1
#Setting Source so1
A1.sources.so1.type = spooldir
a1.sources.so1.spoolDir = /u01/flume/loghdfs
A1.sources.so1.channels = ch1
a1.sources.so1.fileHeader = false
A1.sources.so1.interceptors = i1
A1.sources.so1.interceptors.i1.type = timestamp
a1.sources.so1.ignorePattern = ^(.)*\\.tmp$
#Setting Sink With HDFS
A1.sinks.si1.channel = ch1
A1.sinks.si1.type = hdfs
A1.sinks.si1.hdfs.path = hdfs://NNcluster/flume/input
A1.sinks.si1.hdfs.fileType = DataStream
A1.sinks.si1.hdfs.writeFormat = Text
A1.sinks.si1.hdfs.rollInternal = 1
A1.sinks.si1.hdfs.filePrefix = %Y-%m-%d
A1.sinks.si1.hdfs.fileSuffix= .txt
#Binding Source and Sink to Channel
A1.channels.ch1.type = file
a1.channels.ch1.checkpointDir = /u01/flume/loghdfs/point
a1.channels.ch1.dataDirs = /u01/flume/loghdfs
[[email protected] ~]$ cp /u01/flume/conf/flume-env.sh.template /u01/flume/conf/flume-env.sh
[[email protected] ~]$ vi /u01/flume/conf/flume-env.sh
Export JAVA_HOME=/usr/java/jdk1.8.0_152
--Create related directories
[[email protected] ~]$ mkdir -p /u01/flume/loghdfs/point
--Link hadoop configuration file?/u01/flume/conf
The existing Hadoop environment is configured with the NameNode high availability, and must be linked to the relevant configuration, otherwise Flume does not know where to store the data.
[[email protected] ~]$ ln -s /u01/hadoop/etc/hadoop/core-site.xml /u01/flume/conf/core-site.xml
[[email protected] ~]$ ln -s /u01/hadoop/etc/hadoop/hdfs-site.xml /u01/flume/conf/hdfs-site.xml
In this way, the Single-node mode is configured to Complete.
1.4 Starting the Flume service
[[email protected] ~]$ flume-ng agent --conf conf --conf-file /u01/flume/conf/flume-hdfs.conf --name a1 -Dflume.root.logger=INFO,console > /u01/flume/logs/flume-hdfs.log 2>&1 &
Note: the A1 in the command represents the name of the agent in the configuration file, and the flume configuration file must use an absolute path.
1.5 Effect Test
under/u01/flume/loghdfs, create a file and write the data, as shown in the following example:
second, Flume cluster mode
The architecture diagram for the flume cluster mode (official Map) is as shown in:
The flume storage can be supported in a variety of ways, listing only HDFs and Kafka (e.g., storing the latest Sunday logs and providing real-time log streams to the Storm system). Here is an example of Oracle's alert log. The environment is shown in the following table:
The alert logs for RAC two nodes in the table are stored in HDFs via Collector1 and Collector2. In addition the flume itself provides a failover mechanism that can be automatically switched and Restored.
2.1 RAC Node Installation flume
[[email protected] ~]$ http://mirrors.hust.edu.cn/apache/flume/stable/apache-flume-1.8.0-bin.tar.gz
[[email protected] ~]$ tar -xzf apache-flume-1.8.0-bin.tar.gz;mv apache-flume-1.8.0-bin /u01/app/oracle/flume
Other nodes of the RAC are similarly installed
2.2 Configure the agent for the RAC node
2.2.1 Configuring EBSDB1 Agent
[[email protected] ~]$ vi /u01/flume/conf/flume-client.properties
#agent name
agent1.channels = c1
agent1.sources = r1
agent1.sinks = k1 k2
#set gruop
agent1.sinkgroups = g1
#Setting Channel
agent1.channels.c1.type = memory
agent1.channels.c1.capacity = 100000
agent1.channels.c1.transactionCapacity = 100
#Just For Fllowing Error Messgaes
#Space for commit to queue couldn‘t be acquired. Sinks are likely not keeping up with sources, or the buffer size is too tight
agent1.channels.c1.byteCapacityBufferPercentage=20
agent1.channels.c1.byteCapacity=800000
agent1.channels.c1.keep-alive = 60
#Setting Sources
agent1.sources.r1.channels = c1
agent1.sources.r1.type = exec
agent1.sources.r1.command = tail -F /u01/app/oracle/diag/rdbms/prod/prod1/trace/alert_prod1.log
agent1.sources.r1.interceptors = i1 i2
agent1.sources.r1.interceptors.i1.type = static
agent1.sources.r1.interceptors.i1.key = Type
agent1.sources.r1.interceptors.i1.value = LOGIN
agent1.sources.r1.interceptors.i2.type = timestamp
# Setting Sink1
agent1.sinks.k1.channel = c1
agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = hdp01
agent1.sinks.k1.port = 52020
# Setting Sink2
agent1.sinks.k2.channel = c1
agent1.sinks.k2.type = avro
agent1.sinks.k2.hostname = hdp02
agent1.sinks.k2.port = 52020
#Seting Sink Group
agent1.sinkgroups.g1.sinks = k1 k2
#Setting Failover
agent1.sinkgroups.g1.processor.type = failover
agent1.sinkgroups.g1.processor.priority.k1 = 10
agent1.sinkgroups.g1.processor.priority.k2 = 1
agent1.sinkgroups.g1.processor.maxpenalty = 10000
2.2.2 Configuring the agent for EBSDB2
[[email protected] ~]$ vi /u01/flume/conf/flume-client.properties
#Setting Agent Name
agent1.channels = c1
agent1.sources = r1
agent1.sinks = k1 k2
#Setting Gruop
agent1.sinkgroups = g1
#Setting Channel
agent1.channels.c1.type = memory
agent1.channels.c1.capacity = 100000
agent1.channels.c1.transactionCapacity = 100
#Just For Fllowing Error Messgaes
#Space for commit to queue couldn‘t be acquired. Sinks are likely not keeping up with sources, or the buffer size is too tight#
agent1.channels.c1.byteCapacityBufferPercentage=20
agent1.channels.c1.byteCapacity=800000
agent1.channels.c1.keep-alive = 60
#Seting Sources
agent1.sources.r1.channels = c1
agent1.sources.r1.type = exec
agent1.sources.r1.command = tail -F /u01/app/oracle/diag/rdbms/prod/prod2/trace/alert_prod2.log
agent1.sources.r1.interceptors = i1 i2
agent1.sources.r1.interceptors.i1.type = static
agent1.sources.r1.interceptors.i1.key = Type
agent1.sources.r1.interceptors.i1.value = LOGIN
agent1.sources.r1.interceptors.i2.type = timestamp
#Settinf Sink1
agent1.sinks.k1.channel = c1
agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = hdp01
agent1.sinks.k1.port = 52020
# Setting Sink2
agent1.sinks.k2.channel = c1
agent1.sinks.k2.type = avro
agent1.sinks.k2.hostname = hdp02
agent1.sinks.k2.port = 52020
#Setting Sink Group
agent1.sinkgroups.g1.sinks = k1 k2
#Set Failover
agent1.sinkgroups.g1.processor.type = failover
agent1.sinkgroups.g1.processor.priority.k1 = 10
agent1.sinkgroups.g1.processor.priority.k2 = 1
agent1.sinkgroups.g1.processor.maxpenalty = 10000
2.3 Configuring the Flume collector
2.3.1 Hdp01 's collector configuration
[[email protected] conf]$ vi flume-server.properties
#Setting Agent Name
a1.sources = r1
a1.channels = c1
a1.sinks = k1
#Setting Channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
#Setting Sources
a1.sources.r1.type = avro
a1.sources.r1.bind = hdp01
a1.sources.r1.port = 52020
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = static
a1.sources.r1.interceptors.i1.key = Collector
a1.sources.r1.interceptors.i1.value = hdp01
a1.sources.r1.channels = c1
#Setting Sink To HDFS
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs://NNcluster/flume/Oracle/logs
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=TEXT
a1.sinks.k1.hdfs.rollInterval=1
a1.sinks.k1.channel=c1
a1.sinks.k1.hdfs.filePrefix=%Y-%m-%d
a1.sinks.k1.hdfs.fileSuffix=.txt
2.3.2 Hdp02 's collector configuration
[[email protected] conf]$ vi flume-server.properties
#Setting Agent Name
a1.sources = r1
a1.channels = c1
a1.sinks = k1
#Setting Channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Seting Sources
a1.sources.r1.type = avro
a1.sources.r1.bind = hdp02
a1.sources.r1.port = 52020
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = static
a1.sources.r1.interceptors.i1.key = Collector
a1.sources.r1.interceptors.i1.value = hdp02
a1.sources.r1.channels = c1
#Setting Sink To HDFS
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs://NNcluster/flume/Oracle/logs
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=TEXT
a1.sinks.k1.hdfs.rollInterval=1
a1.sinks.k1.channel=c1
a1.sinks.k1.hdfs.filePrefix=%Y-%m-%d
a1.sinks.k1.hdfs.fileSuffix=.txt
2.4 Flume Cluster service startup
2.4.1 Start the Flume collector
[[email protected] conf]$ flume-ng agent --conf conf --conf-file /u01/flume/conf/flume-server.properties --name a1 -Dflume.root.logger=INFO,console > /u01/flume/logs/flume-server.log 2>&1 &
[[email protected] conf]$ flume-ng agent --conf conf --conf-file /u01/flume/conf/flume-server.properties --name a1 -Dflume.root.logger=INFO,console > /u01/flume/logs/flume-server.log 2>&1 &
After you start, you can view the flume log file, as Follows:
2.4.2 Start the flume agent
[[email protected] bin]$ ./flume-ng agent --conf conf --conf-file /u01/app/oracle/flume/conf/flume-client.properties --name agent1 -Dflume.root.logger=INFO,console > /u01/app/oracle/flume/logs/flume-client.log 2>&1 &
[[email protected] bin]$ ./flume-ng agent --conf conf --conf-file /u01/app/oracle/flume/conf/flume-client.properties --name agent1 -Dflume.root.logger=INFO,console > /u01/app/oracle/flume/logs/flume-client.log 2>&1
After the agent starts, observe the Collecter log, you will find that the agent has successfully connected to the collector, such as:
2.5 Flume High-availability Test
Since the weight of the Collector1 configuration is greater than collector2, the Collector1 is preferentially captured and uploaded to the storage System. Here if kill off collector1, by Collector2 responsible for log collection upload work, see whether upload success.
Then restore the Collector1 node of the flume service, again in the Agent1 upload files, found Collector1 restore priority level of acquisition Work.
Reference Documents:
1. Flume 1.8.0 User Guide