Brief introduction
Strom is an open-source distributed streaming computing system that handles streaming data, known as streaming Hadoop, that can be used to make traffic alerts, terminal marketing, and access to competitor products in the telecommunications industry to retain business. This article describes storm in detail from Storm's location in the Hadoop ecosystem, storm terminology, Storm platform build, storm application building, and more.
Location of Strom in the big data ecosystem
650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/74/A5/wKioL1YlmqbSDM7KAAM79f84AX4716.jpg "title=" Big Data platform Architecture "width=" "height=" 306 "border=" 0 "hspace=" 0 "vspace=" 0 "style=" width:600px;height:306px; "alt=" Wkiol1ylmqbsdm7kaam79f84ax4716.jpg "/>
As can be seen, Storm is in HDFs, but not that storm can only deal with the data in HDFs, but the data source of storm is usually log log or Kafka data, when the data through Strom processing, its flow can be HDFs, HBase, relational databases, and so on.
Strom is a computing system, and in big data processing, we've got a familiar calculation that is mapreduce, and this architecture diagram shows that storm and MapReduce are sibling relationships, and storm is called streaming Hadoop. So the next step is to introduce storm by comparing it with MapReduce.
3.Strom Common terms Introduction
| Strom |
MapReduce (based on hadoop2.x) |
Describe |
| Nimbus |
Applicationmaster |
MapReduce ResourceManager responsible for task allocation, resource application, the same in Strom Nimbus responsible for the distribution of Code, task allocation and scheduling work |
| Supervisor |
NodeManager |
Mapreducer NodeManager responsible for resource application, work process initiation and monitoring, Strom Supervisor also responsible for the start and stop of the task process |
Worker
|
Yarnchild |
A process that is really responsible for task processing |
| Topology |
Mapreduce |
Driver Program |
|
|
|
|
|
|
4.650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M00/74/AA/wKiom1Ylp9OyXmSyAAE0M4qtHUE217.jpg "title=" 042647ziwfr6qpw66xw159.jpg "width=" "height=" 318 "border=" 0 "hspace=" 0 "vspace=" 0 "style=" width:500px;height : 318px; "alt=" wkiom1ylp9oyxmsyaae0m4qthue217.jpg "/>
-
Strom schema topology: Applications built with Storm describe the source of the data, The processing logic of data and the flow of data. The components in the
-
Spout:topology, which describe the source of the data through Spout, have a nexttuple () function in Spout, which is called continuously, the generation of the source data is implemented in the function, and the data flows to the next node , only one spout is allowed in each topology. A component in the
-
Bolt:topology that receives data emitted from the previous node (spout or Bolt), which has an execute (tuple tuple) method that is passively executed when the data is received. To merge, filter, persist, and so on. Bolts can be the end point of a complete data processing process in topology, or a transfer point. The
-
Tuple:tuple is the basic unit for passing messages in storm, and the data structure of a Tuple is a list.
-
Stream: A stream is formed by a continuous tuple.
-
Stream grouping: Describes the rules for partition when data flows between different components (Spout/bolt), with the following types:
1.shuffle Grouping: Randomly distribute the tuple within the stream to ensure that each bolt receives the same number of tuples as 2. Fields Grouping: Grouped by field, the same fields are assigned to the same bolt, and different fields are assigned to different Bolt3.all Grouping: broadcast, for each tuple, all bolts receive 4. Global Grouping: Globally grouped, this tuple is assigned to a task within a bolt of storm, which is the 5 with the lowest ID. Non Grouping: No grouping, the current effect is the same as the global groouping Grouping: Direct grouping, the recipient of the specified message.
This article is from the "Shing" blog, make sure to keep this source http://hellowode.blog.51cto.com/8646864/1704426
Introduction to the flow calculation of large data processing