Kafka Combat-storm Cluster

Source: Internet
Author: User
Tags scp command

1. Overview

In the Kafka combat-real-time log statistics process, we talked about storm issues, and we need to use storm to consume data from Kafka cluster when we're done with real-time log statistics, so I'll share a storm with you here alone. Cluster to build and deploy. Here's a list of today's shares:

    • Storm Brief
    • Basic software
    • Installation deployment
    • Effect Preview

Let's start today's content sharing.

2.Storm Overview

Twitter is opening up storm, a distributed, fault-tolerant, real-time computing system that has been contributed to the Apache Foundation as follows:

Http://storm.apache.org/downloads.html
The main features of Storm are as follows:
    • A simple programming model. Similar to mapreduce reduces the complexity of parallel batching, storm reduces the complexity of real-time processing.
    • You can use a variety of programming languages. You can use a variety of programming languages on top of storm. Clojure, Java, Ruby, and Python are supported by default. To increase support for other languages, simply implement a simple storm communication protocol.
    • Fault tolerance. Storm manages the failure of worker processes and nodes.
    • Horizontal expansion. Calculations are performed in parallel between multiple threads, processes, and servers.
    • Reliable message handling. Storm guarantees that each message can be processed at least once. When a task fails, it is responsible for retrying the message from the message source.
    • Fast. The design of the system ensures that the message can be processed quickly, using ØMQ as its underlying message queue.
    • Local mode. Storm has a native mode that can fully simulate storm clusters during processing. This allows you to quickly develop and unit test.
A storm cluster consists of a master node and multiple working nodes. The master node runs a daemon called "Nimbus", which is used to assign code, lay out tasks, and detect faults. Each work node runs a daemon called "Supervisor", which listens for work, starts and terminates the worker process.  Both Nimbus and supervisor can fail quickly and are stateless, so they become very robust, and the coordination of the two is done by Apache's zookeeper. Storm's terms include stream, Spout, Bolt, Task, Worker, Stream Grouping, and topology. Stream is the data being processed. Spout is the data source. Bolts process data. A task is a thread that runs in spout or bolt. A worker is a process that runs these threads. Stream grouping specifies what the bolt receives as input data. The data can be randomly assigned (the term shuffle), either assigned according to the field value (the term is fields), or broadcast (the term is all), or it is always sent to a task (the term global), or not to the data (the term is none), Or it can be determined by the custom logic (the term Direct). Topology is a network of spout and bolt nodes connected by stream grouping. These terms are described in more detail in the Storm Concepts page. 3. Basic software when building a storm cluster, we need to have a storm install package, here I use the Apache version of the Storm installation package, the download link is as follows:
Storm installation Package "Zookeeper installation package" After the download is complete dependent on the underlying software, we begin the installation of deploying the Storm cluster. 4. Installation and deployment first of all, we unzip the relevant reliance on the basic software, about the installation of ZK, do not introduce here, you can refer to my written "Configure high-availability Hadoop platform", which contains detailed instructions on how to install ZK steps, the following focus on the storm cluster construction details.
    • Unzip the Storm installation package
tar -zxvf apache-storm-0.9. 4. tar. gz
    • Configuration link variables
Export storm_home=/home/hadoop/storm-0.9. 4 export PATH= $PATH: $STORM _home/bin
    • Configuring Storm Profiles (Storm.yaml)
########### these must be filledinch  fora storm configurationstorm.zookeeper.servers:-"DN1"-"DN2"-"DN3"Storm.zookeeper.port:2181Nimbus.host:"DN1"Supervisor.slots.ports:-6700-6701-6702-6703storm.local.dir:"/home/hadoop/data/storm"

Let's look at the role assignment for Storm, as shown in:

After the configuration completes the relevant files, we use the SCP command to distribute the files to the individual nodes, as shown in the following command:

SCP -R storm-0.9. 4/[email protected]:~/SCP -R storm-0.9. 4/[Email protected]:~/
    • Start the ZK cluster
~]$ Zkserver. SH  ~]$ zkserver. SH  ~]$ zkserver. SH start
    • Start the cluster
# Start the Nimbus service on the Nimbus node
&
~]$ Storm Supervisor &~]$ Storm Supervisor &
    • Start the Storm UI
[Email protected] ~]$ Storm UI &
    • Viewing the startup process
 [[email protected] Storm- ]$ JPS  2098   Jps  1983   core  1893   Quorumpeermain  1930  Nimbus 
 [[email protected] Storm- ]$ JPS  1763   worker  1762   worker  1662   Quorumpeermain  1765   worker  1692   supervisor  1891  Jps 
[Email protected] storm-0.9. 4 ]$ JPS  quorumpeermain2057  supervisor 2213 JPS
5. Preview of effects

Because, the cluster I have done the test, submitted topology, so there will be submitted records, from the process of the DN2 node above can also be seen, there are corresponding worker processes, if the first installation, the uncommitted task will not have a corresponding display, the following is attached to the Storm UI related preview, as shown in:

6. Summary

This is the introduction of the Storm cluster deployment, from the above storm map, we can be careful to find that there is a single point of storm distribution, there is a storm ha version abroad, but this unofficial version, At present, Storm provides a mechanism to ensure that even if the node is hung or the message is lost in the case of the correct data processing can be referred to the official solution, the address is as follows:

http://storm.apache.org/documentation/guaranteeing-message-processing.html
7. Concluding remarks

This blog is to share with you here, if you study in the process of learning what is the problem, you can add groups to discuss or send e-mail to me, I will do my best to answer for you, with June encouragement!

Kafka Combat-storm Cluster

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.