Come with me. Cloud computing (1) Storm

Source: Internet
Author: User
Tags zookeeper

Overview

Recently to do a real-time analysis of the project, so need to go deep into the storm.

Why Storm

In combination, there are the following points:

1. At the time of birth

The MapReduce computing model opens another door to distributed computing, which greatly reduces the threshold for implementing distributed computing. With the support of the MapReduce architecture, developers need only focus on how to use the semantics of MapReduce to solve specific business logic without headaches such as fault tolerance, scalability, reliability, and so on. For a time, people with MapReduce this hammer to knock all kinds of nails, naturally also try to use the MapReduce calculation model to solve the problem that flow processing wants to solve. After various failed attempts, it was realized that the improved MapReduce did not adapt to the streaming scene, and that a new architecture had to be developed to accomplish the task (MapReduce is not suitable for streaming) Yahoo! In its S4 introduction paper has a more detailed elaboration, and UCBerkeley's Sparkstreaming project is now trying to challenge this conclusion, interested comrades please see for yourself. On the other hand, people have doubts about the traditional CEP solution, and think that its non distributed architecture is not scalable enough to scaleout to meet the massive data processing requirements. This time, Yahoo! S4 and Twitter storm just scratched people's itch.

2. Scalability

More specifically, it is the ability to scaleout. The so-called scale out (http://en.wikipedia.org/wiki/Scalability), simply is when a cluster of processing capacity is not used, as long as the additional new nodes to the inside, computing ability to migrate to these new nodes to meet the needs. If possible, the idea of choosing scaleout rather than scale up is already in the deep. In general, the key to achieve scaleout is shared nothing architecture, that is, the various states required for the calculation are self satisfied, there is no strong dependence on specific nodes, so that the calculation can be easily migrated between the nodes, the entire system computing capacity is not enough time, It's OK to add a new node. The storm model itself is scaleout friendly, and topology corresponding spout and bolt do not need to be bound to a specific node and can easily be distributed across multiple nodes. In addition, Storm provides a very powerful command (rebalance) that dynamically adjusts the number of constituent elements (Spout/bolt) in a particular topology and its corresponding relationship to the actual compute node.

3. System Reliability

Storm This distributed Flow Computing framework is based on zookeeper, and a large number of system running state meta information is serialized in zookeeper. Thus, when a node fails, the corresponding critical state information is not lost, in other words, the high availability of zookeeper guarantees the high availability of storm. Document (Https://github.com/nathanmarz/storm/wiki/Fault-tolerance) discusses the error redundancy behavior of storm subsystems, which can be further referenced.

4. Reliability of calculation

Distributed computing involves the communication and dependency between multiple nodes/processes, and it is a challenging task to correctly maintain the status and dependencies of all participants. Storm implements a complete set of mechanisms to ensure that messages are fully processed (https://github.com/nathanmarz/storm/wiki/Guaranteeing-message-processing). In addition, through Transactionaltopology (https://github.com/nathanmarz/storm/wiki/Transactional-topologies), Storm can guarantee each tuple "Is and is processed only once."

5. OpenSource

This is needless to say, open source makes the storm community and active, to this writing, Storm has developed to 0.81,storm users have a long list (https://github.com/nathanmarz/storm/ Wiki/powered-by, there are many such as Taobao, Alipay, twitter,groupon the internet giant.

6. Clojure based on the implementation

Storm's core code is Clojure and Java. Clojure is a functional programming language (http://clojure.org/) based on the JVM and is one of the few languages that support STM (Softwaretransactional Memory). Since the introduction of Clojure, it has been widely concerned, it is generally believed that the functions of its functional programming can be useful in distributed environment, and Storm gives a good example. From another point of view, Storm can also greatly promote the popularization of clojure.

In general, the Times to create a hero, Storm at the right time in the right place, and just do the right thing, think not red all unreasonable.

High-level architecture

The architecture of storm from a high point of view:

See more highlights of this column: http://www.bianceng.cnhttp://www.bianceng.cn/Servers/cloud-computing/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.