Setting up Flume High Availability

Source: Internet
Author: User
Tags failover


Flume is a distributed, highly available, reliable system that collects, moves, and stores disparate amounts of data into a single data storage System. lightweight, simple to configure, suitable for a variety of log collections, and supports failover and load balancing. And it has a very rich set of components. Flume is a three-tier architecture: agent layer, collector layer and store layer, each level can be expanded horizontally. Where the agent contains Source,channel and Sink, three have formed an agent. The duties of the three are as Follows:




  • Source: used to consume (collect) the data source into the channel component
  • Channel: Interim storage, Save All Source component information
  • Sink: read from channel, Delete information in channel after successful reading
    Flume single-node Mode schema diagram (official Map) as shown in:

    The figure shows that the resulting log is collected from an external system (Web Server) and then sent to the temporary storage channel component via the source component of the Flume agent, and finally to the sink component, where the sink component stores the data directly into the HDFs file System.
    The flume version used in this article is the latest update 1.8, which describes the single node configuration and cluster mode configuration Respectively. The Hadoop cluster has been configured prior to Configuration. one, single node mode


    1.1 Download and install


    
    
    
    
    [[email protected] ~]$ http://mirrors.hust.edu.cn/apache/flume/stable/apache-flume-1.8.0-bin.tar.gz
    [[email protected] ~]$ tar -xzf apache-flume-1.8.0-bin.tar.gz;mv apache-flume-1.8.0-bin /u01/flume


    1.2 Setting environment variables


    
    
    
    
    [[email protected] ~]$ vi .bash_profile
    export FLUME_HOME=/u01/flume
    export PATH=$PATH:$FLUME_HOME/bin


    1.3 Creating a Flume configuration file


    [[email protected] ~]$ vi /u01/flume/conf/flume-hdfs.conf
    #Agent Name
    A1.sources = so1
    A1.sinks = si1
    A1.channels = ch1
    #Setting Source so1
    A1.sources.so1.type = spooldir
    a1.sources.so1.spoolDir = /u01/flume/loghdfs
    A1.sources.so1.channels = ch1
    a1.sources.so1.fileHeader = false
    A1.sources.so1.interceptors = i1
    A1.sources.so1.interceptors.i1.type = timestamp
    a1.sources.so1.ignorePattern = ^(.)*\\.tmp$
    #Setting Sink With HDFS
    A1.sinks.si1.channel = ch1
    A1.sinks.si1.type = hdfs
    A1.sinks.si1.hdfs.path = hdfs://NNcluster/flume/input
    A1.sinks.si1.hdfs.fileType = DataStream
    A1.sinks.si1.hdfs.writeFormat = Text
    A1.sinks.si1.hdfs.rollInternal = 1
    A1.sinks.si1.hdfs.filePrefix = %Y-%m-%d
    A1.sinks.si1.hdfs.fileSuffix= .txt
    #Binding Source and Sink to Channel
    A1.channels.ch1.type = file
    a1.channels.ch1.checkpointDir = /u01/flume/loghdfs/point
    a1.channels.ch1.dataDirs = /u01/flume/loghdfs
    [[email protected] ~]$ cp /u01/flume/conf/flume-env.sh.template /u01/flume/conf/flume-env.sh
    [[email protected] ~]$ vi /u01/flume/conf/flume-env.sh
    Export JAVA_HOME=/usr/java/jdk1.8.0_152
    --Create related directories
    [[email protected] ~]$ mkdir -p /u01/flume/loghdfs/point
    --Link hadoop configuration file?/u01/flume/conf
    The existing Hadoop environment is configured with the NameNode high availability, and must be linked to the relevant configuration, otherwise Flume does not know where to store the data.
    [[email protected] ~]$ ln -s /u01/hadoop/etc/hadoop/core-site.xml /u01/flume/conf/core-site.xml
    [[email protected] ~]$ ln -s /u01/hadoop/etc/hadoop/hdfs-site.xml /u01/flume/conf/hdfs-site.xml
    


    In this way, the Single-node mode is configured to Complete.
    1.4 Starting the Flume service


    
    
    [[email protected] ~]$ flume-ng agent --conf conf --conf-file /u01/flume/conf/flume-hdfs.conf --name a1 -Dflume.root.logger=INFO,console > /u01/flume/logs/flume-hdfs.log 2>&1 &


    Note: the A1 in the command represents the name of the agent in the configuration file, and the flume configuration file must use an absolute path.
    1.5 Effect Test
    under/u01/flume/loghdfs, create a file and write the data, as shown in the following example:


    second, Flume cluster mode


    The architecture diagram for the flume cluster mode (official Map) is as shown in:

    The flume storage can be supported in a variety of ways, listing only HDFs and Kafka (e.g., storing the latest Sunday logs and providing real-time log streams to the Storm system). Here is an example of Oracle's alert log. The environment is shown in the following table:

    The alert logs for RAC two nodes in the table are stored in HDFs via Collector1 and Collector2. In addition the flume itself provides a failover mechanism that can be automatically switched and Restored.
    2.1 RAC Node Installation flume


    
    
    
    
    [[email protected] ~]$ http://mirrors.hust.edu.cn/apache/flume/stable/apache-flume-1.8.0-bin.tar.gz
    [[email protected] ~]$ tar -xzf apache-flume-1.8.0-bin.tar.gz;mv apache-flume-1.8.0-bin /u01/app/oracle/flume


    Other nodes of the RAC are similarly installed
    2.2 Configure the agent for the RAC node
    2.2.1 Configuring EBSDB1 Agent


    
    
    [[email protected] ~]$ vi /u01/flume/conf/flume-client.properties 
    #agent name
    agent1.channels = c1
    agent1.sources = r1
    agent1.sinks = k1 k2
    #set gruop
    agent1.sinkgroups = g1
    #Setting Channel
    agent1.channels.c1.type = memory
    agent1.channels.c1.capacity = 100000
    agent1.channels.c1.transactionCapacity = 100
    #Just For Fllowing Error Messgaes
    #Space for commit to queue couldn‘t be acquired. Sinks are likely not keeping up with sources, or the buffer size is too tight
    agent1.channels.c1.byteCapacityBufferPercentage=20
    agent1.channels.c1.byteCapacity=800000
    agent1.channels.c1.keep-alive = 60
    #Setting Sources
    agent1.sources.r1.channels = c1
    agent1.sources.r1.type = exec
    agent1.sources.r1.command = tail -F /u01/app/oracle/diag/rdbms/prod/prod1/trace/alert_prod1.log
    agent1.sources.r1.interceptors = i1 i2
    agent1.sources.r1.interceptors.i1.type = static
    agent1.sources.r1.interceptors.i1.key = Type
    agent1.sources.r1.interceptors.i1.value = LOGIN
    agent1.sources.r1.interceptors.i2.type = timestamp
    # Setting Sink1
    agent1.sinks.k1.channel = c1
    agent1.sinks.k1.type = avro
    agent1.sinks.k1.hostname = hdp01
    agent1.sinks.k1.port = 52020
    # Setting Sink2
    agent1.sinks.k2.channel = c1
    agent1.sinks.k2.type = avro
    agent1.sinks.k2.hostname = hdp02
    agent1.sinks.k2.port = 52020
    #Seting Sink Group
    agent1.sinkgroups.g1.sinks = k1 k2
    #Setting Failover
    agent1.sinkgroups.g1.processor.type = failover
    agent1.sinkgroups.g1.processor.priority.k1 = 10
    agent1.sinkgroups.g1.processor.priority.k2 = 1
    agent1.sinkgroups.g1.processor.maxpenalty = 10000


    2.2.2 Configuring the agent for EBSDB2


    
    
    [[email protected] ~]$ vi /u01/flume/conf/flume-client.properties 
    #Setting Agent Name
    agent1.channels = c1
    agent1.sources = r1
    agent1.sinks = k1 k2
    #Setting Gruop
    agent1.sinkgroups = g1
    #Setting Channel
    agent1.channels.c1.type = memory
    agent1.channels.c1.capacity = 100000
    agent1.channels.c1.transactionCapacity = 100
    #Just For Fllowing Error Messgaes
    #Space for commit to queue couldn‘t be acquired. Sinks are likely not keeping up with sources, or the buffer size is too tight#
    agent1.channels.c1.byteCapacityBufferPercentage=20
    agent1.channels.c1.byteCapacity=800000
    agent1.channels.c1.keep-alive = 60
    #Seting Sources
    agent1.sources.r1.channels = c1
    agent1.sources.r1.type = exec
    agent1.sources.r1.command = tail -F /u01/app/oracle/diag/rdbms/prod/prod2/trace/alert_prod2.log
    agent1.sources.r1.interceptors = i1 i2
    agent1.sources.r1.interceptors.i1.type = static
    agent1.sources.r1.interceptors.i1.key = Type
    agent1.sources.r1.interceptors.i1.value = LOGIN
    agent1.sources.r1.interceptors.i2.type = timestamp
    #Settinf Sink1
    agent1.sinks.k1.channel = c1
    agent1.sinks.k1.type = avro
    agent1.sinks.k1.hostname = hdp01
    agent1.sinks.k1.port = 52020
    # Setting Sink2
    agent1.sinks.k2.channel = c1
    agent1.sinks.k2.type = avro
    agent1.sinks.k2.hostname = hdp02
    agent1.sinks.k2.port = 52020
    #Setting Sink Group
    agent1.sinkgroups.g1.sinks = k1 k2
    #Set Failover
    agent1.sinkgroups.g1.processor.type = failover
    agent1.sinkgroups.g1.processor.priority.k1 = 10
    agent1.sinkgroups.g1.processor.priority.k2 = 1
    agent1.sinkgroups.g1.processor.maxpenalty = 10000


    2.3 Configuring the Flume collector
    2.3.1 Hdp01 's collector configuration


    
    
    
    
    [[email protected] conf]$ vi flume-server.properties
    #Setting Agent Name
    a1.sources = r1
    a1.channels = c1
    a1.sinks = k1
    #Setting Channel
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 100
    #Setting Sources
    a1.sources.r1.type = avro
    a1.sources.r1.bind = hdp01
    a1.sources.r1.port = 52020
    a1.sources.r1.interceptors = i1
    a1.sources.r1.interceptors.i1.type = static
    a1.sources.r1.interceptors.i1.key = Collector
    a1.sources.r1.interceptors.i1.value = hdp01
    a1.sources.r1.channels = c1
    #Setting Sink To HDFS
    a1.sinks.k1.type=hdfs
    a1.sinks.k1.hdfs.path=hdfs://NNcluster/flume/Oracle/logs
    a1.sinks.k1.hdfs.fileType=DataStream
    a1.sinks.k1.hdfs.writeFormat=TEXT
    a1.sinks.k1.hdfs.rollInterval=1
    a1.sinks.k1.channel=c1
    a1.sinks.k1.hdfs.filePrefix=%Y-%m-%d
    a1.sinks.k1.hdfs.fileSuffix=.txt


    2.3.2 Hdp02 's collector configuration


    
    
    
    
    [[email protected] conf]$ vi flume-server.properties 
    #Setting Agent Name
    a1.sources = r1
    a1.channels = c1
    a1.sinks = k1
    #Setting Channel
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 100
    # Seting Sources
    a1.sources.r1.type = avro
    a1.sources.r1.bind = hdp02
    a1.sources.r1.port = 52020
    a1.sources.r1.interceptors = i1
    a1.sources.r1.interceptors.i1.type = static
    a1.sources.r1.interceptors.i1.key = Collector
    a1.sources.r1.interceptors.i1.value = hdp02
    a1.sources.r1.channels = c1
    #Setting Sink To HDFS
    a1.sinks.k1.type=hdfs
    a1.sinks.k1.hdfs.path=hdfs://NNcluster/flume/Oracle/logs
    a1.sinks.k1.hdfs.fileType=DataStream
    a1.sinks.k1.hdfs.writeFormat=TEXT
    a1.sinks.k1.hdfs.rollInterval=1
    a1.sinks.k1.channel=c1
    a1.sinks.k1.hdfs.filePrefix=%Y-%m-%d
    a1.sinks.k1.hdfs.fileSuffix=.txt


    2.4 Flume Cluster service startup
    2.4.1 Start the Flume collector


    
    
    
    
    [[email protected] conf]$ flume-ng agent --conf conf --conf-file /u01/flume/conf/flume-server.properties --name a1 -Dflume.root.logger=INFO,console > /u01/flume/logs/flume-server.log 2>&1 &
    [[email protected] conf]$ flume-ng agent --conf conf --conf-file /u01/flume/conf/flume-server.properties --name a1 -Dflume.root.logger=INFO,console > /u01/flume/logs/flume-server.log 2>&1 &


    After you start, you can view the flume log file, as Follows:

    2.4.2 Start the flume agent


    
    
    
    
    [[email protected] bin]$ ./flume-ng agent --conf conf --conf-file /u01/app/oracle/flume/conf/flume-client.properties --name agent1 -Dflume.root.logger=INFO,console > /u01/app/oracle/flume/logs/flume-client.log 2>&1 &  
    [[email protected] bin]$ ./flume-ng agent --conf conf --conf-file /u01/app/oracle/flume/conf/flume-client.properties --name agent1 -Dflume.root.logger=INFO,console > /u01/app/oracle/flume/logs/flume-client.log 2>&1


    After the agent starts, observe the Collecter log, you will find that the agent has successfully connected to the collector, such as:

    2.5 Flume High-availability Test
    Since the weight of the Collector1 configuration is greater than collector2, the Collector1 is preferentially captured and uploaded to the storage System. Here if kill off collector1, by Collector2 responsible for log collection upload work, see whether upload success.


    Then restore the Collector1 node of the flume service, again in the Agent1 upload files, found Collector1 restore priority level of acquisition Work.
    Reference Documents:
    1. Flume 1.8.0 User Guide



Setting up Flume High Availability


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.