Storm-considerations during use
Introduction
Over the past few days, in order to optimize the original data processing framework, we have systematically learned some of storm's content and sorted out our experiences.
1. storm provides a data processing idea and does not provide specific solutions.
The core of storm is the definition of topo, and topo carries all
1, modify the/etc/hosts 172.16.3.7 Nimbus 172.16.3.8 supervisor1 172.16.3.9 supervisor2 172.16.3.10 supervisor3 2, the cluster all machine installs Zookeeper,storm needs the ZK to store the data and carries on the coordination between Nimbus, supervisorTar xzvf zookeeper-3.4.3.tar.gzMV zookeeper-3.4.3 ~/platform/zookeeper CP~/platform /zookeeper /conf/zoo_sample.cfg~/platform /zookeeper/conf/zoo.cfg (made with zoo_sample.cfg $zookeeper_home/
The installation of a storm cluster is divided into the following steps:1, first ensure the normal operation of zookeeper Cluster service and the correct installation of necessary components2. Release the compressed package3. Modify STORM.YAML Add cluster configuration information4. Use the storm script to start the service and view the service status5. View the state of a
information chain and even information network, so it is necessary to bring data to cross-link in each dimension, and data explosion is unavoidable. Therefore, streaming and NoSQL products came into being, which solved the problem of real-time framework and data storage computation.As early as 7, 8 years ago, such as UC Berkeley, Stanford and other universities began the study of streaming data processing, but because more focus on the financial industry business scenarios or Internet traffic m
Storm is a free, open-source, distributed, high-fault-tolerant real-time computing system that Twitter developers contribute to the community. Storm makes it easy to make continuous flow calculations, making up for real-time requirements that Hadoop batches cannot meet. storm is often used in real-time analytics, online machine learning, continuous computing, dis
Submit TopologiesCommand format: Storm jar "jar path" topology Package name. Topology class name "" Topology Name "Example: Storm Jar/storm-starter.jar storm.starter.WordCountTopology Wordcounttop#提交storm-starter.jar to the remote cluster and start the wordcounttop topology.Stop topologiesCommand format:
Reproduced:http://weyo.me/pages/techs/storm-topology-remote-submission/
As a late-stage patient with lazy cancer, although Storm requires only one command to submit a task, it has always wanted to have a simpler (TOU) single (LAN) approach, such as submitting a task directly after Windows has written it without having to manually put the jar It would be great to have the package copied to the serve
Starting from this article, we will introduce the first part and the second part in the storm Official Document. First, we will start to use the introduction section of the Getting Started Guide.
In the past decade, the field of data processing has undergone great changes, which can be considered as a revolution. Mapreduce, hadoop, and other related technologies make it possible to store and process massive data that we previously could not imagine. U
The environment of this article is as follows:Operating system: CentOS 6 32-bitZookeeper version: 3.4.8Storm version: 1.0.0JDK version: 1.8.0_77 32-bitPython version: 2.6.6Cluster situation: One master node (master) and two working nodes (SLAVE1,SLAVE2)
1. Build Zookeeper ClusterInstallation reference: CentOS under Zookeeper standalone mode, cluster mode installation2. Installing dependency packages on Nimbus and worker machines
Java 6Python 2.6.6
The above version is official
Overview
Recently to do a real-time analysis of the project, so need to go deep into the storm.
Why Storm
In combination, there are the following points:
1. At the time of birth
The MapReduce computing model opens another door to distributed computing, which greatly reduces the threshold for implementing distributed computing. With the support of the MapReduce architecture, developers need only focus o
The Big brothers who are learning storm, I have come to preach to teach the doubt, whether I think I will use an ACK. Well, then let me start to slap you in the face.Let's say the ACK mechanism:To ensure that the data is handled correctly, storm will track every tuple generated by spout.This involves the processing of ack/fail, if a tuple processing success means that the tuple and all the tuple produced by
We know that storm has a very important feature, which is that the Storm API ensures that one of its tuples can be fully processed, which is especially important, in fact, the reliability of storm is done by spout and bolt components together. The following is from the spout and bolt two convenient to introduce you to the reliability of
#autopurge.snapretaincount=3# Purge task interval in hours# S ET to ' 0 ' to disable Auto purge feature#autopurge.purgeinterval=1# log dirdatalogdir=/usr/local/zookeeper-3.4.7/log
Build the myID file for the zookeeper clusterCd/usr/local/zookeeper-3.4.7/dataecho 1 > myID
Start Zookeepercd/usr/local/zookeeper-3.4.7/bin./zkserver.sh start
Install storm:
Download storm:http://storm.apache.org/downloa
The cluster environment in which Hadoop is deployed is mentioned earlier because we need to use HDFS to store the storm data offline into the HDFs and then use Hadoop to extract data from the HDFS for analytical processing.
As a result, we need to integrate STORM-HDFS, encountered many problems in the integration process, and some problems can be found on the Internet, but the solution is not practical, so
For a fault-tolerant mechanism, storm uses a system-level component Acker, combined with an XOR check mechanism, to determine whether a tuple is sent successfully, and then spout to resend the tuple, ensuring that a tuple is re-sent at least once in the case of an error.
But when you need to accurately count the number of tuples, such as sales scenarios, and want each tuple to be and be processed only once, Storm
In the standalone deployment directory of the Ubuntu environment, view Ubuntu installation JDK installation Pythod installation Zookeeper installation ZeroMQ installation Jzmp installation Storm check whether Ubuntu is 32-bit or 64uname-a returned result = gt; SMPFriFeb2200: 31: 26UTC2013x86_64x86_64x86_64GNU/Linu
Storm standalone deployment in Ubuntu
Directory
View Ubuntu
Install JDK
Install Pythod
I
Submit Topologies Command Format: Storm jar [Jar path] [topology package name. Topology class name] [topology name]
Example: Storm JAR/storm-starter.jar storm. starter. wordcounttopology wordcounttop # Submit the storm-starter.jar to the remote cluster and start the wordcou
from:http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/When performing performance optimizations on Storm topology, it is helpful to understand how to configure and apply Message Queuing inside storm, and in this article I will explain and demonstrate the worker and internal thread when
Write problems occur during the configuration of storm cluster, recorded1.storm is managed through zookeeper, first to install zookeeper, from the ZK official online down, I am down here 3.4.9, after downloading the move to/usr/local, and unzip.TAR-ZXVF zookeeper-3.4.9.tar.gz2. Go to conf directory, copy zoo_sample.cfg and rename not zoo.cfg, modify zoo.cfg configuration fileCP Zoo_sample.cfg/usr/local/zook
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.