Original link: Kafka combat-flume to KAFKA1. Overview
In front of you to introduce the entire Kafka project development process, today to share Kafka how to get the data source, that is, Kafka production data. Here are the directories to share today:
- Data sources
- Flume to Kafka
- Data source Loading
- Preview
Let's start today's shared content.
2. Data sources
The data produced by Kafka is provided by Flume sink, where we need to use the Flume cluster to distribute the agent's log collection to Flume (for real-time computing processing) and HDFS (Offline computing processing) through the Kafka cluster. About the Flume cluster agent deployment, here is not much to do, not clear the students can refer to the "high-availability Hadoop platform-flume ng practical illustrations" in the article, the following to introduce the data source flow chart, as shown:
Here, we use flume as a log collection system to send the collected data to the Kafka middleware for Storm to consume computing in real time, the entire process from each Web node, through the Flume agent agent to collect logs, and then aggregated into the flume cluster, The production process of the data is delivered to the Kafka cluster by the sink of Flume.
3.Flume to Kafka
From the diagram, we have clear the process of data production, below we see how to implement flume to Kafka transport process, below I use a brief diagram description, as shown in:
This expresses the conveying works from Flume to Kafka, and let's look at how to achieve this.
First, as we complete this part of the process, we need to deploy both the flume cluster and the Kafka cluster, and after the deployment of the associated cluster, we configure the flume sink data flow, the configuration information is as follows:
- The first is to configure the Spooldir method, which reads as follows:
Producer.sources.s.type = Spooldirproducer.sources.s.spooldir =/home/hadoop/dir/logdfs
- Of course, Flume's data sender type is also a variety of types, including: Console, Text, HDFS, RPC, etc., here we use the system is Kafka middleware to receive, the configuration content is as follows:
Producer.sinks.r.type =ORG.APACHE.FLUME.PLUGINS.KAFKASINKPRODUCER.SINKS.R.METADATA.BROKER.LIST=DN1:9092,dn2:9092,dn3: 9092producer.sinks.r.partition.key=0producer.sinks.r.partitioner.class= Org.apache.flume.plugins.singlepartitionproducer.sinks.r.serializer.class= Kafka.serializer.stringencoderproducer.sinks.r.request.required.acks=0producer.sinks.r.max.message.size=1000000 Producer.sinks.r.producer.type=sync Producer.sinks.r.custom.encoding=utf-8 Producer.sinks.r.custom.topic.name=test
In this way, we have configured the data flow to the receiver on the sink side of the flume.
4. Data loading
After the configuration is complete, we begin to load the data, first we produce the logs on the Spooldir side of the flume for flume to collect the logs. Then we go through the Kafka Kafkaoffsetmonitor monitoring tool to monitor the data production situation, and we start loading.
- Start the ZK cluster, as follows:
Zkserver. SH start
Note: Start on the ZK node, respectively.
Kafka-server-start. sh config/server.properties &
Enter the same command on the other Kafka node to complete the boot.
- Start the Kafka monitoring tool
Java-CP kafkaoffsetmonitor-assembly-0.2. 0. Jar com.quantifind.kafka.offsetapp.OffsetGetterWeb --zk dn1:2181,dn2:2181,dn3:2181 80891.days
Flume-ng agent-n producer-c conf-f flume-kafka-sink.properties-dflume.root.logger=error,console
Then, I upload log log in the/home/hadoop/dir/logdfs directory, here I only take a small portion of the log to upload, as shown, indicating the success of the log upload.
5. Preview
Below, we use the Kafka monitoring tool to preview our uploaded log records, and there is no message data generated in Kafka as follows:
- Launch Kafka cluster, preview for production messages
- Generate message data in Kafka by uploading logs via Flume
6. Summary
This article to you about the Kafka message generation process, follow-up will be in the Kafka Combat series for everyone to tell the Kafka of the message consumption process, such as a set of processes, here just for the follow-up Kafka combat coding to lay a foundation, let everyone first on the Kafka of the message production has a whole understanding.
Kafka Combat-flume to Kafka (turn)