[Reprint] Building Big Data real-time systems using Flume+kafka+storm+mysql

Source: Internet
Author: User
Tags syslog uuid zookeeper



Original: http://mp.weixin.qq.com/s?__biz=MjM5NzAyNTE0Ng==&mid=205526269&idx=1&sn= 6300502dad3e41a36f9bde8e0ba2284d&key= C468684b929d2be22eb8e183b6f92c75565b8179a9a179662ceb350cf82755209a424771bbc05810db9b7203a62c7a26&ascene=0 &uin=mjk1odmyntyymg%3d%3d&devicetype=imac+macbookpro9%2c2+osx+osx+10.10.3+build (14D136) &version= 11000003&pass_ticket=hkr%2bxkpfbrbviwepmb7sozvfydm5cihu8hwlvne78ykusyhcq65xpav9e1w48ts1






Although I have always disapproved of the full use of open source software as a system, but for startups, it is high efficiency and low cost, there is a potential application space. Risk is the maintenance of the system.








This article describes how to use Flume+kafka+storm+mysql to build a distributed big data streaming architecture that covers the basic architecture, installation, deployment, and so on.




Architecture diagram




Data Flow graph





(is the Visio painting, the diagram is too big, put up the word looks relatively small, if there is a need for friends to stay in the mailbox)


Introduction to Real-time log Analysis system architecture


The system is divided into four main parts:









1). 



Responsible for collecting data in real time from each node and choosing Cloudera Flume to realize



2). Data access



Because the speed of  and the speed of data processing are not necessarily synchronous, a message middleware is added as a buffer, using Apache's Kafka



3). Flow-based computing



Real-time analysis of collected data, using Apache's storm



4). Data output



Persistent with the results of the analysis, tentatively using MySQL





Detailed description of each component and installation configuration:


Operating system: centos6.4


Flume


Flume is a distributed, reliable, and highly available log collection system for Cloudera, which supports the customization of various data senders in the log system and collects data, while Flume provides simple processing of data and writes to various data recipients (customizable ) capabilities.



Typical architecture for Flume:









Flume data source and output mode:



Flume provides 2 modes from console (console), RPC (THRIFT-RPC), text (file), tail (UNIX tail), syslog (syslog log system, TCP and UDP support), EXEC (command execution) The ability to collect data on a data source is currently used by exec in our system for log capture.



Flume data recipients, which can be console (console), text (file), DFS (HDFs file), RPC (THRIFT-RPC), and syslogtcp (TCP syslog log system), and so on. It is received by Kafka in our system.






Flume version: 1.4.0



Flume Download and Documentation:



http://flume.apache.org/



Flume Installation:



$tar ZXVF apache-flume-1.4.0-bin.tar.gz/usr/local



Flume Start command:



$bin/flume-ng agent--conf conf--conf-file conf/flume-conf.properties--name Producer-dflume.root.logger=info, Console



Note: You need to change the configuration file under the Conf directory and add the jar package to the Lib directory.





Kafka


Kafka is a message middleware that is characterized by:



1. Focus on high throughput, not other features



2. For real-time scenarios



3. The status of messages being processed is maintained on the consumer side, and not by the Kafka server side.



4, distributed, producer, broker and Consumer are distributed on multiple machines.



Architecture diagram for Kafka:









Kafka version: 0.8.0



Kafka Download and Documentation: http://kafka.apache.org/



Kafka Installation:



> Tar xzf kafka-<version>.tgz



> CD kafka-<version>



>./SBT Update



>./SBT Package



>./SBT assembly-package-dependency Kafka






Start and test commands:



(1) Start server



> bin/zookeeper-server-start.sh config/zookeeper.properties



> bin/kafka-server-start.sh config/server.properties



(2) Create a topic
> bin/kafka-create-topic.sh--zookeeper localhost:2181--replica 1--partition 1--topic test



> bin/kafka-list-topic.sh--zookeeper localhost:2181



(3) Send some messages



> bin/kafka-console-producer.sh--broker-list localhost:9092--topic test



(4) Start a consumer



> bin/kafka-console-consumer.sh--zookeeper localhost:2181--topic test--from-beginning





Storm


Storm is a distributed, high-fault-tolerant real-time computing system.



Storm Frame Composition:






Storm work task topology:






Storm version: 0.9.0



Storm Download: http://storm-project.net/



Storm Installation:



First step, install Python2.7.2



# wget Http://www.python.org/ftp/python/2.7.2/Python-2.7.2.tgz



# tar ZXVF python-2.7.2.tgz



# CD Python-2.7.2



#./configure



# make



# make Install



# vi/etc/ld.so.conf



Step two, install Zookeeper (Kafka comes with zookeeper, if choose Kafka, the step can be omitted)



#wget Http://ftp.meisei-u.ac.jp/mirror/apache/dist//zookeeper/zookeeper-3.3.3/zoo keeper-3.3.3.tar.gz



# tar zxf zookeeper-3.3.3.tar.gz



# ln-s/USR/LOCAL/ZOOKEEPER-3.3.3//usr/local/zookeeper



# VI ~./BASHRC (set zookeeper_home and Zookeeper_home/bin)



Step three, install Java



$tar ZXVF jdk-7u45-linux-x64.tar.gz/usr/local






If you use the following versions of storm0.9, you need to install ZEROMQ and JZMQ.



Fourth step, install ZEROMQ and JZMQ



JZMQ installation seems to rely on ZEROMQ, so should first install ZEROMQ, and then installed JZMQ.



1) Install ZEROMQ (not required):


    • # wget http://download.zeromq.org/historic/zeromq-2.1.7.tar.gz

    • # tar zxf zeromq-2.1.7.tar.gz

    • # CD zeromq-2.1.7

    • #./configure

    • # make

    • # make Install

    • # sudo ldconfig (update ld_library_path)


Missing C + + environment: Yum Install gcc-c++



The following issues were encountered: Error:cannot link with-luuid, install Uuid-dev



This is because no UUID-related package is installed.



The workaround is: # yum install uuid*



# yum Install e2fsprogs*



# yum Install libuuid*






2) Install JZMQ (not required)


    • # yum Install git

    • # git clone git://github.com/nathanmarz/jzmq.git

    • # CD JZMQ

    • #./autogen.sh

    • #./configure

    • # make

    • # make Install


Then, JZMQ is installed, there is a website on the reference to the question did not meet, meet the child shoes can refer to. In./autogen.sh This step if the error: Autogen.sh:error:could not the Find Libtool is required to run autogen.sh, this is because of the lack of libtool, you can use #yum Install libtool* to solve.



If you are installing storm0.9 and above, you do not need to install ZEROMQ and JZMQ, but you need to modify Storm.yaml to specify that the message is transmitted as Netty:



Storm.local.dir: "/tmp/storm/data"


storm.messaging.transport: "backtype.storm.messaging.netty.Context"
storm.messaging.netty.server_worker_threads: 1
storm.messaging.netty.client_worker_threads: 1
storm.messaging.netty.buffer_size: 5242880
storm.messaging.netty.max_retries: 100
storm.messaging.netty.max_wait_ms: 1000
storm.messaging.netty.min_wait_ms: 100





Fifth Step, install storm



$unzip Storm-0.9.0-wip16.zip



Note: The standalone version does not need to modify the configuration file, distributed when modifying the configuration file should note: After the colon must be preceded by a space.



To test if Storm is installed successfully:



1. Download Strom Starter's code git clone https://github.com/nathanmarz/storm-starter.git



2. Compiling with the Mvn-f M2-pom.xml package



If you have not installed MAVEN, see the following steps to install:
1. Download http://maven.apache.org/from MAVEN's website



Tar zxvf apache-maven-3.1.1-bin.tar.gz/usr/local



Configuring MAVEN Environment variables



Export Maven_home=/usr/local/maven



Export path= $PATH: $MAVEN _home/bin



Verify that MAVEN is installed successfully: MVN-V



Modify the Storm-starter Pom file m2-pom.xml, modify the dependent version of the Twitter4j-core and Twitter4j-stream two packages in dependency as follows:

Org.twitter4j
Twitter4j-core
[2.2,)


Org.twitter4j
Twitter4j-stream
[2.2,)



Generate target folder after compiling



Start Zookeeper



zkserver.sh start



Start the Nimbus Supervisor UI



Storm Nimbus



Storm Supervisor



Storm UI



JPS Viewing the Startup status



Go to target directory to execute:



Storm Jar Storm-starter-0.0.1-snapshot-jar-with-dependencies.jar Storm.starter.WordCountTopology wordcounttop



Then view http://localhost:8080



Note: The standalone version does not have to be modified Storm.yaml





Kafka and Storm consolidation


1. Download kafka-storm0.8 plugin: https://github.com/wurstmeister/storm-kafka-0.8-plus



2. The project download down needs to debug, find the dependent jar package. Then repackage it as a jar package for our storm project.



3. Add the jar package and Kafka_2.9.2-0.8.0-beta1.jar Metrics-core-2.2.0.jar Scala-library-2.9.2.jar ( These three jar packages can be found in the Kafka-storm-0.8-plus project dependencies)



Note: If you are developing a project that requires additional jars, remember to put it into storm lib, such as using MySQL to add Mysql-connector-java-5.1.22-bin.jar to Storm's lib.





Flume and Kafka Integration


1. Download Flume-kafka-plus:https://github.com/beyondj2ee/flumeng-kafka-plugin



2. Extracting the Flume-conf.properties file from the plugin



Modify the File: #source section



Producer.sources.s.type = Exec
Producer.sources.s.command = Tail-f-n+1/mnt/hgfs/vmshare/test.log
Producer.sources.s.channels = C



Change the value of all topic to test



Put the changed configuration file into the flume/conf directory



In the project, extract the following jar packages into the environment under the flume Lib:









The above is a stand-alone version of the Flume+kafka+storm configuration installation






Flume+storm Plug-in



Https://github.com/xiaochawan/edw-Storm-Flume-Connectors





Start step





Start Project deployment start after installing Storm,flume,kafka (it's a good idea to follow the installation documentation for Storm Kafka flume individual component tests before deployment starts).



The first step
Put a well-written storm project into a jar package and place it on the server, if you put it in/usr/local/project/storm.xx.jar



Note: For a storm project, see Kafka and Storm consolidation in the installation documentation.



Step Two



Start the zookeeper (here you can start the Kafka with the zookeeper or start a separate installation of the Kafka, the following take Kafka to take the example)



Cd/usr/local/kafka


Bin/zookeeper-server-start.sh config/zookeeper.properties
Step Three
Start Kafka
Cd/usr/local/kafka
> bin/kafka-server-start.sh config/server.properties
Create a Theme
> bin/kafka-create-topic.sh--zookeeper localhost:2181--replica 1--partition 1--topic test
Note: Because the offset of the Kafka message is managed by the zookeeper record, it is necessary to specify zookeeper's Ip,replica to indicate that the message of the topic has been copied several, partition that each topic is divided into sections. Test indicates the subject name.
Fourth Step
Start Storm
> Storm Nimbus
> Storm Supervisor
> Storm UI
cd/usr/local/project/
> Storm jar Storm.xx.jar storm.testtopology test
Note: Storm.xx.jar for us to write a good Storm project Jar package, the first step to complete the work. Storm.testtopology is the classpath of the main method in the Storm project. Test is the name of this topology.
Fifth Step
Start flume
Cd/usr/local/flume

Note: Flume.conf.properties for our custom flume configuration file, flume installation is not this file, we need to write their own, written in the way see Flume installation of the article.

Now that the program that needs to be started is all started and the Storm project is already running, you can turn on the storm UI to see if it's working.
http://localhost:8080
Note: The IP port of the machine IP for the storm Nimbus can be modified in the Storm profile Storm/conf/storm.yaml




This digest is titled "Flume+kafka+storm+mysql Architecture Design" from http://blog.csdn.net/mylittlered/article/details/20810265.





[Reprint] Building Big Data real-time systems using Flume+kafka+storm+mysql


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.