High-availability Hadoop platform-flume ng practical illustration

Source: Internet
Author: User
Tags failover

1. Overview

Today, I would add a blog about flume, which was omitted when explaining the highly available Hadoop platform, and this blog will tell you the following:

    • Flume ng Brief Introduction
    • Single point flume ng construction, operation
    • Highly Available flume ng construction
    • Failover test
    • Preview

Start today's blog introduction below.

2.Flume ng Overview

Flume Ng is a distributed, highly available, reliable system that collects, moves, and stores disparate amounts of data into a single data storage system. Lightweight, simple to configure, suitable for a variety of log collections, and supports failover and load balancing. And it has a very rich set of components. Flume Ng uses a three-tier architecture: The agent layer, the collector layer, and the store layer, each level can be expanded horizontally. Where the agent contains Source,channel and sink, three have formed an agent. The duties of the three are as follows:

    • Source: Used to consume (collect) the data source into the channel component
    • Channel: Interim storage, save all source component information
    • Sink: read from channel, delete information in channel after successful reading

Is the architecture diagram for Flume ng, as shown below:

The diagram describes the collection of generated logs from an external system (Web Server), and then sends the data to the temporary storage channel component via the source component of the flume agent, and finally to the sink component, which stores the data directly into the HDFs file system, sink the component.

3. Single point flume ng construction, operation

After we were familiar with Flume Ng's architecture, we first set up a single point flume to gather information into the HDFs cluster, which was built directly on the previous high-availability Hadoop cluster due to limited resources flume.

The scenario is as follows: Build a flume NG on the NNA node and collect the local logs into the HDFs cluster.

3.1 Basic software

Before we build flume Ng, we need to prepare the necessary software as follows:

    • Flume ""

JDK has been configured before the Hadoop cluster was installed, this is not discussed here, if you need to configure the students, you can refer to the configuration of high-availability Hadoop platform.

3.2 Installation and configuration
    • Installation

First, we unzip the flume installation package and the command looks like this:

tar -zxvf apache-flume-1.5. 2-bin. tar. gz
    • Configuration

The environment variable configuration content is as follows:

Export flume_home=/home/hadoop/flume-1.5. 2 export PATH= $PATH: $FLUME _home/bin

Flume-conf.properties

#agent1 nameagent1.sources=source1agent1.sinks=Sink1agent1.channels=channel1#spooling directory#set Source1agent1.sources.source1.type=Spooldiragent1.sources.source1.spoolDir=/home/hadoop/dir/Logdfsagent1.sources.source1.channels=Channel1agent1.sources.source1.fileHeader=falseagent1.sources.source1.interceptors=I1agent1.sources.source1.interceptors.i1.type=Timestamp#set Sink1agent1.sinks.sink1.type=Hdfsagent1.sinks.sink1.hdfs.path=/home/hdfs/flume/Logdfsagent1.sinks.sink1.hdfs.fileType=DataStreamagent1.sinks.sink1.hdfs.writeFormat=TEXTagent1.sinks.sink1.hdfs.rollInterval=1Agent1.sinks.sink1.channel=Channel1agent1.sinks.sink1.hdfs.filePrefix=%y-%m-%D#set Channel1agent1.channels.channel1.type=fileAgent1.channels.channel1.checkpointDir=/home/hadoop/dir/logdfstmp/Pointagent1.channels.channel1.dataDirs=/home/hadoop/dir/logdfstmp

flume-env.sh

Java_home=/usr/java/jdk1. 7

  Note: If the directory in the configuration does not exist, it needs to be created in advance.

3.3 Start

The start command looks like this:

Flume-ng agent-n agent1-c conf-f flume-conf.properties-dflume.root.logger=debug,console

Note: The agent1 in the command represents the name of the agent in the configuration file, such as agent1 in the configuration file. Flume-conf.properties represents the configuration file configuration, you need to fill in the exact configuration file path.

3.4 Effect Preview

After the successful upload, the local destination is marked for completion. As shown in the following:

4. Highly Available flume ng construction

After completing the single-point flume ng build, below we build a highly available flume ng cluster, the frame composition is as follows:

Figure, we can see that flume storage can support a variety of, here only the HDFs and Kafka (such as: storage of the latest Sunday logs, and provide real-time log stream to the storm system).

4.1 Node Assignment

The flume agent and collector distribution is shown in the following table:

Name HOST Role
Agent1 10.211.55.14 Web Server
Agent2 10.211.55.15 Web Server
Agent3 10.211.55.16 Web Server
Collector1 10.211.55.18 AgentMstr1
Collector2 10.211.55.19 AgentMstr2

As shown in the figure, the Agent1,agent2,agent3 data flows into Collector1 and Collector2,flume ng itself provides a failover mechanism that can be automatically switched and restored. In, there are 3 generated log servers distributed in different rooms, to collect all the logs into a cluster storage. Below we develop the configuration Flume ng cluster

4.2 Configuration

In the following single point flume, the basic configuration is complete, we only need to add two new configuration files, they are flume-client.properties and flume-server.properties, the configuration content is as follows:

    • Flume-client.properties
#agent1 nameagent1.channels=c1agent1.sources=r1agent1.sinks=K1 K2#set gruopagent1.sinkgroups=G1 #set Channelagent1.channels.c1.type=memoryagent1.channels.c1.capacity= +agent1.channels.c1.transactionCapacity= -Agent1.sources.r1.channels=C1agent1.sources.r1.type=Execagent1.sources.r1.command=Tail-f/home/hadoop/dir/logdfs/test.logagent1.sources.r1.interceptors=I1 i2agent1.sources.r1.interceptors.i1.type=Staticagent1.sources.r1.interceptors.i1.key=Typeagent1.sources.r1.interceptors.i1.value=LOGINagent1.sources.r1.interceptors.i2.type=timestamp# Set Sink1agent1.sinks.k1.channel=C1agent1.sinks.k1.type=Avroagent1.sinks.k1.hostname=Nnaagent1.sinks.k1.port=52020# set Sink2agent1.sinks.k2.channel=C1agent1.sinks.k2.type=Avroagent1.sinks.k2.hostname=Nnsagent1.sinks.k2.port=52020#set Sink Groupagent1.sinkgroups.g1.sinks=K1 K2#set failoveragent1.sinkgroups.g1.processor.type=Failoveragent1.sinkgroups.g1.processor.priority.k1=TenAgent1.sinkgroups.g1.processor.priority.k2=1agent1.sinkgroups.g1.processor.maxpenalty=10000

Note: Specify the IP and port of the collector.

    • Flume-server.properties
#set Agent namea1.sources=R1a1.channels=c1a1.sinks=K1#set Channela1.channels.c1.type=memorya1.channels.c1.capacity= +a1.channels.c1.transactionCapacity= -# Other Node,nna to Nnsa1.sources.r1.type=Avroa1.sources.r1.bind=Nnaa1.sources.r1.port=52020a1.sources.r1.interceptors=I1a1.sources.r1.interceptors.i1.type=Statica1.sources.r1.interceptors.i1.key=Collectora1.sources.r1.interceptors.i1.value=NNAa1.sources.r1.channels=c1#set sink to Hdfsa1.sinks.k1.type=Hdfsa1.sinks.k1.hdfs.path=/home/hdfs/flume/Logdfsa1.sinks.k1.hdfs.fileType=DataStreama1.sinks.k1.hdfs.writeFormat=TEXTa1.sinks.k1.hdfs.rollInterval=1A1.sinks.k1.channel=C1a1.sinks.k1.hdfs.filePrefix=%y-%m-%d

Note: Modify the IP on another collector node, such as the NNS node will bind the object with Nna modified to NNS.

4.3 start

The start command on the agent node is as follows:

Flume-ng agent-n agent1-c conf-f flume-client.properties-dflume.root.logger=debug,console

 Note: The agent1 in the command represents the name of the agent in the configuration file, such as agent1 in the configuration file. Flume-client.properties represents the configuration file configuration, you need to fill in the exact configuration file path.

The start command on the Collector node is as follows:

Flume-ng agent-n a1-c conf-f flume-server.properties-dflume.root.logger=debug,console

  Note: The A1 in the command represents the name of the agent in the configuration file, such as A1 in the configuration file. Flume-server.properties represents the configuration file configuration, you need to fill in the exact configuration file path.

5.Failover Test

Let's test the high Availability (failover) of the Flume ng cluster below. The scene is as follows: We upload files in the Agent1 node, because we configure the Collector1 weight than Collector2, so Collector1 priority to capture and upload to the storage system. Then we kill Collector1, at this time have Collector2 responsible for the log collection upload work, after we manually restore Collector1 node flume service, again in Agent1 last file, Discover the Collector1 recovery priority level of collection work. The following are the specific examples:

    • Collector1 Priority upload

    • Log content previews uploaded in the HDFs cluster

    • Collector1 downtime, Collector2 get priority upload permissions

    • Restart Collector1 Service, Collector1 re-get priority upload permissions

6. Preview

Below you will see a preview of the HDFs file system as shown in:

    • File previews in the HDFs file system

    • Preview of uploaded file contents

7. Summary

There are a few things to keep in mind when configuring highly available flume ng. In the agent you need to bind the corresponding Collector1 and Collector2 IP and port, in addition, when configuring the collector node, you need to modify the current flume node's configuration file, The bind IP (or hostname) is the IP (or hostname) of the current node, and finally, at boot time, specify the name of the agent in the configuration file and the path to the configuration file, otherwise an error occurs.

8. Concluding remarks

This blog is to share with you here, if you study in the process of learning what is the problem, you can add groups to discuss or send e-mail to me, I will do my best to answer for you, with June encouragement!

High-availability Hadoop platform-flume ng practical illustration

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.