1. Overview
Today, I would add a blog about flume, which was omitted when explaining the highly available Hadoop platform, and this blog will tell you the following:
- Flume ng Brief Introduction
- Single point flume ng construction, operation
- Highly Available flume ng construction
- Failover test
- Preview
Start today's blog introduction below.
2.Flume ng Overview
Flume Ng is a distributed, highly available, reliable system that collects, moves, and stores disparate amounts of data into a single data storage system. Lightweight, simple to configure, suitable for a variety of log collections, and supports failover and load balancing. And it has a very rich set of components. Flume Ng uses a three-tier architecture: The agent layer, the collector layer, and the store layer, each level can be expanded horizontally. Where the agent contains Source,channel and sink, three have formed an agent. The duties of the three are as follows:
- Source: Used to consume (collect) the data source into the channel component
- Channel: Interim storage, save all source component information
- Sink: read from channel, delete information in channel after successful reading
Is the architecture diagram for Flume ng, as shown below:
The diagram describes the collection of generated logs from an external system (Web Server), and then sends the data to the temporary storage channel component via the source component of the flume agent, and finally to the sink component, which stores the data directly into the HDFs file system, sink the component.
3. Single point flume ng construction, operation
After we were familiar with Flume Ng's architecture, we first set up a single point flume to gather information into the HDFs cluster, which was built directly on the previous high-availability Hadoop cluster due to limited resources flume.
The scenario is as follows: Build a flume NG on the NNA node and collect the local logs into the HDFs cluster.
3.1 Basic software
Before we build flume Ng, we need to prepare the necessary software as follows:
JDK has been configured before the Hadoop cluster was installed, this is not discussed here, if you need to configure the students, you can refer to the configuration of high-availability Hadoop platform.
3.2 Installation and configuration
First, we unzip the flume installation package and the command looks like this:
tar -zxvf apache-flume-1.5. 2-bin. tar. gz
The environment variable configuration content is as follows:
Export flume_home=/home/hadoop/flume-1.5. 2 export PATH= $PATH: $FLUME _home/bin
Flume-conf.properties
#agent1 nameagent1.sources=source1agent1.sinks=Sink1agent1.channels=channel1#spooling directory#set Source1agent1.sources.source1.type=Spooldiragent1.sources.source1.spoolDir=/home/hadoop/dir/Logdfsagent1.sources.source1.channels=Channel1agent1.sources.source1.fileHeader=falseagent1.sources.source1.interceptors=I1agent1.sources.source1.interceptors.i1.type=Timestamp#set Sink1agent1.sinks.sink1.type=Hdfsagent1.sinks.sink1.hdfs.path=/home/hdfs/flume/Logdfsagent1.sinks.sink1.hdfs.fileType=DataStreamagent1.sinks.sink1.hdfs.writeFormat=TEXTagent1.sinks.sink1.hdfs.rollInterval=1Agent1.sinks.sink1.channel=Channel1agent1.sinks.sink1.hdfs.filePrefix=%y-%m-%D#set Channel1agent1.channels.channel1.type=fileAgent1.channels.channel1.checkpointDir=/home/hadoop/dir/logdfstmp/Pointagent1.channels.channel1.dataDirs=/home/hadoop/dir/logdfstmp
flume-env.sh
Java_home=/usr/java/jdk1. 7
Note: If the directory in the configuration does not exist, it needs to be created in advance.
3.3 Start
The start command looks like this:
Flume-ng agent-n agent1-c conf-f flume-conf.properties-dflume.root.logger=debug,console
Note: The agent1 in the command represents the name of the agent in the configuration file, such as agent1 in the configuration file. Flume-conf.properties represents the configuration file configuration, you need to fill in the exact configuration file path.
3.4 Effect Preview
After the successful upload, the local destination is marked for completion. As shown in the following:
4. Highly Available flume ng construction
After completing the single-point flume ng build, below we build a highly available flume ng cluster, the frame composition is as follows:
Figure, we can see that flume storage can support a variety of, here only the HDFs and Kafka (such as: storage of the latest Sunday logs, and provide real-time log stream to the storm system).
4.1 Node Assignment
The flume agent and collector distribution is shown in the following table:
Name |
HOST |
Role |
Agent1 |
10.211.55.14 |
Web Server |
Agent2 |
10.211.55.15 |
Web Server |
Agent3 |
10.211.55.16 |
Web Server |
Collector1 |
10.211.55.18 |
AgentMstr1 |
Collector2 |
10.211.55.19 |
AgentMstr2 |
As shown in the figure, the Agent1,agent2,agent3 data flows into Collector1 and Collector2,flume ng itself provides a failover mechanism that can be automatically switched and restored. In, there are 3 generated log servers distributed in different rooms, to collect all the logs into a cluster storage. Below we develop the configuration Flume ng cluster
4.2 Configuration
In the following single point flume, the basic configuration is complete, we only need to add two new configuration files, they are flume-client.properties and flume-server.properties, the configuration content is as follows:
#agent1 nameagent1.channels=c1agent1.sources=r1agent1.sinks=K1 K2#set gruopagent1.sinkgroups=G1 #set Channelagent1.channels.c1.type=memoryagent1.channels.c1.capacity= +agent1.channels.c1.transactionCapacity= -Agent1.sources.r1.channels=C1agent1.sources.r1.type=Execagent1.sources.r1.command=Tail-f/home/hadoop/dir/logdfs/test.logagent1.sources.r1.interceptors=I1 i2agent1.sources.r1.interceptors.i1.type=Staticagent1.sources.r1.interceptors.i1.key=Typeagent1.sources.r1.interceptors.i1.value=LOGINagent1.sources.r1.interceptors.i2.type=timestamp# Set Sink1agent1.sinks.k1.channel=C1agent1.sinks.k1.type=Avroagent1.sinks.k1.hostname=Nnaagent1.sinks.k1.port=52020# set Sink2agent1.sinks.k2.channel=C1agent1.sinks.k2.type=Avroagent1.sinks.k2.hostname=Nnsagent1.sinks.k2.port=52020#set Sink Groupagent1.sinkgroups.g1.sinks=K1 K2#set failoveragent1.sinkgroups.g1.processor.type=Failoveragent1.sinkgroups.g1.processor.priority.k1=TenAgent1.sinkgroups.g1.processor.priority.k2=1agent1.sinkgroups.g1.processor.maxpenalty=10000
Note: Specify the IP and port of the collector.
#set Agent namea1.sources=R1a1.channels=c1a1.sinks=K1#set Channela1.channels.c1.type=memorya1.channels.c1.capacity= +a1.channels.c1.transactionCapacity= -# Other Node,nna to Nnsa1.sources.r1.type=Avroa1.sources.r1.bind=Nnaa1.sources.r1.port=52020a1.sources.r1.interceptors=I1a1.sources.r1.interceptors.i1.type=Statica1.sources.r1.interceptors.i1.key=Collectora1.sources.r1.interceptors.i1.value=NNAa1.sources.r1.channels=c1#set sink to Hdfsa1.sinks.k1.type=Hdfsa1.sinks.k1.hdfs.path=/home/hdfs/flume/Logdfsa1.sinks.k1.hdfs.fileType=DataStreama1.sinks.k1.hdfs.writeFormat=TEXTa1.sinks.k1.hdfs.rollInterval=1A1.sinks.k1.channel=C1a1.sinks.k1.hdfs.filePrefix=%y-%m-%d
Note: Modify the IP on another collector node, such as the NNS node will bind the object with Nna modified to NNS.
4.3 start
The start command on the agent node is as follows:
Flume-ng agent-n agent1-c conf-f flume-client.properties-dflume.root.logger=debug,console
Note: The agent1 in the command represents the name of the agent in the configuration file, such as agent1 in the configuration file. Flume-client.properties represents the configuration file configuration, you need to fill in the exact configuration file path.
The start command on the Collector node is as follows:
Flume-ng agent-n a1-c conf-f flume-server.properties-dflume.root.logger=debug,console
Note: The A1 in the command represents the name of the agent in the configuration file, such as A1 in the configuration file. Flume-server.properties represents the configuration file configuration, you need to fill in the exact configuration file path.
5.Failover Test
Let's test the high Availability (failover) of the Flume ng cluster below. The scene is as follows: We upload files in the Agent1 node, because we configure the Collector1 weight than Collector2, so Collector1 priority to capture and upload to the storage system. Then we kill Collector1, at this time have Collector2 responsible for the log collection upload work, after we manually restore Collector1 node flume service, again in Agent1 last file, Discover the Collector1 recovery priority level of collection work. The following are the specific examples:
- Collector1 Priority upload
- Log content previews uploaded in the HDFs cluster
- Collector1 downtime, Collector2 get priority upload permissions
- Restart Collector1 Service, Collector1 re-get priority upload permissions
6. Preview
Below you will see a preview of the HDFs file system as shown in:
- File previews in the HDFs file system
- Preview of uploaded file contents
7. Summary
There are a few things to keep in mind when configuring highly available flume ng. In the agent you need to bind the corresponding Collector1 and Collector2 IP and port, in addition, when configuring the collector node, you need to modify the current flume node's configuration file, The bind IP (or hostname) is the IP (or hostname) of the current node, and finally, at boot time, specify the name of the agent in the configuration file and the path to the configuration file, otherwise an error occurs.
8. Concluding remarks
This blog is to share with you here, if you study in the process of learning what is the problem, you can add groups to discuss or send e-mail to me, I will do my best to answer for you, with June encouragement!
High-availability Hadoop platform-flume ng practical illustration