Storm detail one, storm overview

Source: Internet
Author: User
First, Storm overviewStorm is a distributed, reliable, 0-fault streaming data-processing system. Its job is to delegate various components to handle some simple tasks independently of each other. The spout component is the one that processes the input stream in the storm cluster, and spout passes the read data to the component called Bolt. The bolt component processes the received data tuple and can also be passed to the next bolt.      We can think of the storm cluster as a chain of bolt components, where the data travels, and bolt acts as a node on the chain to process the data. The storm and Hadoop cluster surfaces look similar, but Hadoop runs on Mapreducejobs, and the topology topology on Storm is very different, and the key difference is that MapReduce eventually ends,      And a topology will always run (unless you kill manually), in other words, storm is for real-time data analytics, and Hadoop is for offline data analysis.      Suppose there is a situation where, when you look at political programs, they often mention names and some hot topics, and if we keep a record of the repetition of the names and topics, the result should be a very interesting thing. So, imagine that in a storm environment, we can use the arguments as input streams, spout components to read the data, and then send each sentence to the BOLT1 component, BOLT1 component is responsible for dividing this sentence into words, and then sending these words to the BOLT2 component, The BOLT2 component is responsible for counting the number of each word and then storing that information in the database.      The debate is constantly talking, and storm is constantly refreshing the results in the database, and when you want to see the results, you just need to query the database. Now, you can imagine if you can distribute these spout and bolts evenly across the cluster, and you can easily do unlimited expansion. This is the power of storm.
Figure 1.1: A simple topology
Some typical scenarios for storm 1. Data flow processing: Unlike other streaming systems, Storm does not require intermediate queue Media 2. Real-time computing: continuous real-time data processing, real-time processing of the results of the update display to the client 3. Distributed Remote Procedure call: can take advantage of the cluster CPU-intensive computing.
second, Storm components      There are two types of nodes in the cluster: the master node and the work node Master node: Running the Nimbus process, distributing the code, scheduling the task, and monitoring the running state (mainly the node success failure state). Worker node: Runs the supervisor process and is responsible for performing a subset of the topology diagram 1.2:storm components in the cluster:

The state in the storm cluster is stored on zookeeper or on a local disk, so processes in storm are stateless, and any one node failure or restart does not affect the entire cluster. Storm Bottom uses ZEROMQ to ensure its extraordinary features: concurrent socket analogy TCP is faster, suitable for clustered environments and supercomputing via InProc, IPC, TCP and multicast message asynchronous IO Connect n-to-n via fanout,pubs Ub,pipeline,requst-reply using Push/pull mode third, storm characteristicsSimple programming: relies primarily on spout and bolts to support multiple programming languages: JVM-based languages are supported, and any other language can support high fault tolerance as long as an intermediate class is implemented: run down, restart, and more: can be arbitrarily added and deleted nodes to the cluster High reliability: All messages are guaranteed to be consumed at least once, that is, the messages in Storm are not lost quickly: it doesn't have to be much sooner. After you have a preliminary understanding of storm, the next section will run through a simple demo to give you a real feel for storm.




















Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.