Flume+kafka+hdfs Building real-time message processing system

Source: Internet
Author: User
Tags zookeeper

Flume is a real-time message collection system, it defines a variety of source, channel, sink, can be selected according to the actual situation.

Flume Download and Documentation:

http://flume.apache.org/

Kafka


Kafka is a high-throughput distributed publish-subscribe messaging system that has the following features:

    • Provides persistence of messages through the disk data structure of O (1), a structure that maintains long-lasting performance even with terabytes of message storage.

    • High throughput: Even very common hardware Kafka can support hundreds of thousands of messages per second.

    • Support for partitioning messages through Kafka servers and consumer clusters.

    • Supports Hadoop parallel data loading.

The purpose of Kafka is to provide a publishing subscription solution that can handle all the action flow data in a consumer-scale website. This kind of action (web browsing, search and other user actions) is a key factor in many social functions on modern networks. This data is usually resolved by processing logs and log aggregations due to throughput requirements. This is a viable solution for the same log data and offline analysis system as Hadoop, but requires real-time processing constraints. The purpose of Kafka is to unify online and offline message processing through Hadoop's parallel loading mechanism, and also to provide real-time consumption through the cluster machine.

Kafka distributed subscription architecture such as:--taken from Kafka official website

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M00/4C/D7/wKiom1RGJRLB8OBQAABekLDcj7A019.jpg "title=" 150105je8xweaxjsassesa.png "alt=" Wkiom1rgjrlb8obqaabekldcj7a019.jpg "/>

Configure Kafka configuration file Server.properties, others can be modified according to their own circumstances.

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/4C/D8/wKioL1RGJSmT6kL3AALMNNsSy1E209.jpg "title=" Untitled. jpg "alt=" wkiol1rgjsmt6kl3aalmnnssy1e209.jpg "/>

Start Kafka, start zookeeper,zookeeper before starting the configuration no longer described.

# bin/kafka-server-start.sh Config/server.properties

Create a topic

# bin/kafka-topics.sh--create--zookeeper localhost:2181----replication-factor 1--partitions 1--topic test

View Topic

# bin/kafka-topics.sh--list--zookeeper localhost:2181

Test the normal production and consumption; Verify the correctness of the process

# bin/kafka-console-producer.sh--broker-list localhost:9092--topic test

# Bin/kafka-console-consumer.sh--zookeeper localhost:2181--topic test--from-beginning


Next is the integration between the frameworks


Flume and Kafka Integration

1. Download Flume-kafka-plus:https://github.com/beyondj2ee/flumeng-kafka-plugin

2. Extracting the Flume-conf.properties file from the plugin

Modify the File: #source section

Producer.sources.s.type = Exec
Producer.sources.s.command = Tail-f-n+1/mnt/hgfs/vmshare/test.log
Producer.sources.s.channels = C

Change the value of all topic to test

Put the changed configuration file into the flume/conf directory

In the project, extract the following jar packages into the environment under the flume Lib:

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M00/4C/D8/wKiom1RGKweDSgs2AABYq9hiPC4413.jpg "title=" Untitled 1.jpg "alt=" Wkiom1rgkwedsgs2aabyq9hipc4413.jpg "/>

The package directory is also Flumeng-kafka-plugin.jar in the Flume Lib directory.

Attach the Flume configuration file

############################################

# producer Config

###########################################


#agent section

Producer.sources = S

Producer.channels = C

Producer.sinks = R


#source section

Producer.sources.s.type = Exec

Producer.sources.s.channels = C

Producer.sources.s.command = Tail-f/var/log/messages

#producer. Sources.s.type=spooldir

#producer. Sources.s.spooldir=/home/xiaojie.li

#producer. Sources.s.fileheader=false

#producer. sources.s.type=syslogtcp

#producer. sources.s.port=5140

#producer. Sources.s.host=localhost



# each sink ' s type must be defined

Producer.sinks.r.type = Org.apache.flume.plugins.KafkaSink

producer.sinks.r.metadata.broker.list=10.10.10.127:9092

producer.sinks.r.zk.connect=10.10.10.127:2181

Producer.sinks.r.partition.key=0

Producer.sinks.r.partitioner.class=org.apache.flume.plugins.singlepartition

Producer.sinks.r.serializer.class=kafka.serializer.stringencoder

Producer.sinks.r.request.required.acks=0

producer.sinks.r.max.message.size=1000000

Producer.sinks.r.producer.type=sync

Producer.sinks.r.custom.encoding=utf-8

Producer.sinks.r.custom.topic.name=test


#Specify the channel the sink should use

Producer.sinks.r.channel = C


# each channel ' s type is defined.

Producer.channels.c.type = Memory

producer.channels.c.capacity = 1000

producer.channels.c.transactioncapacity=100

#producer. Channels.c.type=file

#producer. Channels.c.checkpointdir=/home/checkdir

#producer. Channels.c.datadirs=/home/datadir


Validating Flume and Kafka combinations

The front Kafka has been started, here directly to start flume

# bin/flume-ng agent-c conf-f conf/master.properties-n producer-dflume.root.logger=info,console

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/4C/E5/wKiom1RHJs-RUiZCABf8lQG-d6w703.jpg "title=" Untitled 2.jpg "alt=" Wkiom1rhjs-ruizcabf8lqg-d6w703.jpg "/>

Use Kafka's kafka-console-consumer.sh script to see if any flume have transmitted data to Kafka;

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/4C/E9/wKiom1RHW6bwiJWUAAwelWX_Eh0824.jpg "title=" Untitled 5.png "alt=" wkiom1rhw6bwijwuaawelwx_eh0824.jpg "/> can see that tail/var/log/messages has been passed flume to Kafka, stating flume+ The Kafka combination has been successful.


The logs eventually need to be kept in HDFs.

Also need to develop their own plug-ins to achieve, there is no more to say.

This article is from the "technology never-ending we are moving forward" blog, please be sure to keep this source http://470220878.blog.51cto.com/3101627/1566728

Flume+kafka+hdfs Building real-time message processing system

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.