Flume+kafka+hdfs Building real-time message processing system

Last Update:2014-10-22 Source: Internet

Author: User

Tags zookeeper

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Flume is a real-time message collection system, it defines a variety of source, channel, sink, can be selected according to the actual situation.

Flume Download and Documentation:

http://flume.apache.org/

Kafka

Kafka is a high-throughput distributed publish-subscribe messaging system that has the following features:

Provides persistence of messages through the disk data structure of O (1), a structure that maintains long-lasting performance even with terabytes of message storage.
High throughput: Even very common hardware Kafka can support hundreds of thousands of messages per second.
Support for partitioning messages through Kafka servers and consumer clusters.
Supports Hadoop parallel data loading.

The purpose of Kafka is to provide a publishing subscription solution that can handle all the action flow data in a consumer-scale website. This kind of action (web browsing, search and other user actions) is a key factor in many social functions on modern networks. This data is usually resolved by processing logs and log aggregations due to throughput requirements. This is a viable solution for the same log data and offline analysis system as Hadoop, but requires real-time processing constraints. The purpose of Kafka is to unify online and offline message processing through Hadoop's parallel loading mechanism, and also to provide real-time consumption through the cluster machine.

Kafka distributed subscription architecture such as:--taken from Kafka official website

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M00/4C/D7/wKiom1RGJRLB8OBQAABekLDcj7A019.jpg "title=" 150105je8xweaxjsassesa.png "alt=" Wkiom1rgjrlb8obqaabekldcj7a019.jpg "/>

Configure Kafka configuration file Server.properties, others can be modified according to their own circumstances.

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/4C/D8/wKioL1RGJSmT6kL3AALMNNsSy1E209.jpg "title=" Untitled. jpg "alt=" wkiol1rgjsmt6kl3aalmnnssy1e209.jpg "/>

Start Kafka, start zookeeper,zookeeper before starting the configuration no longer described.

# bin/kafka-server-start.sh Config/server.properties

Create a topic

# bin/kafka-topics.sh--create--zookeeper localhost:2181----replication-factor 1--partitions 1--topic test

View Topic

# bin/kafka-topics.sh--list--zookeeper localhost:2181

Test the normal production and consumption; Verify the correctness of the process

# bin/kafka-console-producer.sh--broker-list localhost:9092--topic test

# Bin/kafka-console-consumer.sh--zookeeper localhost:2181--topic test--from-beginning

Next is the integration between the frameworks

Flume and Kafka Integration

1. Download Flume-kafka-plus:https://github.com/beyondj2ee/flumeng-kafka-plugin

2. Extracting the Flume-conf.properties file from the plugin

Modify the File: #source section

Producer.sources.s.type = Exec
Producer.sources.s.command = Tail-f-n+1/mnt/hgfs/vmshare/test.log
Producer.sources.s.channels = C

Change the value of all topic to test

Put the changed configuration file into the flume/conf directory

In the project, extract the following jar packages into the environment under the flume Lib:

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M00/4C/D8/wKiom1RGKweDSgs2AABYq9hiPC4413.jpg "title=" Untitled 1.jpg "alt=" Wkiom1rgkwedsgs2aabyq9hipc4413.jpg "/>

The package directory is also Flumeng-kafka-plugin.jar in the Flume Lib directory.

Attach the Flume configuration file

############################################

# producer Config

###########################################

#agent section

Producer.sources = S

Producer.channels = C

Producer.sinks = R

#source section

Producer.sources.s.type = Exec

Producer.sources.s.channels = C

Producer.sources.s.command = Tail-f/var/log/messages

#producer. Sources.s.type=spooldir

#producer. Sources.s.spooldir=/home/xiaojie.li

#producer. Sources.s.fileheader=false

#producer. sources.s.type=syslogtcp

#producer. sources.s.port=5140

#producer. Sources.s.host=localhost

# each sink ' s type must be defined

Producer.sinks.r.type = Org.apache.flume.plugins.KafkaSink

producer.sinks.r.metadata.broker.list=10.10.10.127:9092

producer.sinks.r.zk.connect=10.10.10.127:2181

Producer.sinks.r.partition.key=0

Producer.sinks.r.partitioner.class=org.apache.flume.plugins.singlepartition

Producer.sinks.r.serializer.class=kafka.serializer.stringencoder

Producer.sinks.r.request.required.acks=0

producer.sinks.r.max.message.size=1000000

Producer.sinks.r.producer.type=sync

Producer.sinks.r.custom.encoding=utf-8

Producer.sinks.r.custom.topic.name=test

#Specify the channel the sink should use

Producer.sinks.r.channel = C

# each channel ' s type is defined.

Producer.channels.c.type = Memory

producer.channels.c.capacity = 1000

producer.channels.c.transactioncapacity=100

#producer. Channels.c.type=file

#producer. Channels.c.checkpointdir=/home/checkdir

#producer. Channels.c.datadirs=/home/datadir

Validating Flume and Kafka combinations

The front Kafka has been started, here directly to start flume

# bin/flume-ng agent-c conf-f conf/master.properties-n producer-dflume.root.logger=info,console

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/4C/E5/wKiom1RHJs-RUiZCABf8lQG-d6w703.jpg "title=" Untitled 2.jpg "alt=" Wkiom1rhjs-ruizcabf8lqg-d6w703.jpg "/>

Use Kafka's kafka-console-consumer.sh script to see if any flume have transmitted data to Kafka;

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/4C/E9/wKiom1RHW6bwiJWUAAwelWX_Eh0824.jpg "title=" Untitled 5.png "alt=" wkiom1rhw6bwijwuaawelwx_eh0824.jpg "/> can see that tail/var/log/messages has been passed flume to Kafka, stating flume+ The Kafka combination has been successful.

The logs eventually need to be kept in HDFs.

Also need to develop their own plug-ins to achieve, there is no more to say.

This article is from the "technology never-ending we are moving forward" blog, please be sure to keep this source http://470220878.blog.51cto.com/3101627/1566728

Flume+kafka+hdfs Building real-time message processing system

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More