Introduction to the flow calculation of large data processing

Source: Internet
Author: User

    1. Brief introduction

      Strom is an open-source distributed streaming computing system that handles streaming data, known as streaming Hadoop, that can be used to make traffic alerts, terminal marketing, and access to competitor products in the telecommunications industry to retain business. This article describes storm in detail from Storm's location in the Hadoop ecosystem, storm terminology, Storm platform build, storm application building, and more.

    2. Location of Strom in the big data ecosystem


650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/74/A5/wKioL1YlmqbSDM7KAAM79f84AX4716.jpg "title=" Big Data platform Architecture "width=" "height=" 306 "border=" 0 "hspace=" 0 "vspace=" 0 "style=" width:600px;height:306px; "alt=" Wkiol1ylmqbsdm7kaam79f84ax4716.jpg "/>

    1. As can be seen, Storm is in HDFs, but not that storm can only deal with the data in HDFs, but the data source of storm is usually log log or Kafka data, when the data through Strom processing, its flow can be HDFs, HBase, relational databases, and so on.

    2. Strom is a computing system, and in big data processing, we've got a familiar calculation that is mapreduce, and this architecture diagram shows that storm and MapReduce are sibling relationships, and storm is called streaming Hadoop. So the next step is to introduce storm by comparing it with MapReduce.

3.Strom Common terms Introduction

Strom MapReduce (based on hadoop2.x) Describe
Nimbus Applicationmaster MapReduce ResourceManager responsible for task allocation, resource application, the same in Strom Nimbus responsible for the distribution of Code, task allocation and scheduling work
Supervisor NodeManager Mapreducer NodeManager responsible for resource application, work process initiation and monitoring, Strom Supervisor also responsible for the start and stop of the task process
Worker
Yarnchild A process that is really responsible for task processing
Topology Mapreduce Driver Program






4.650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M00/74/AA/wKiom1Ylp9OyXmSyAAE0M4qtHUE217.jpg "title=" 042647ziwfr6qpw66xw159.jpg "width=" "height=" 318 "border=" 0 "hspace=" 0 "vspace=" 0 "style=" width:500px;height : 318px; "alt=" wkiom1ylp9oyxmsyaae0m4qthue217.jpg "/>
    • Strom schema topology: Applications built with Storm describe the source of the data, The processing logic of data and the flow of data. The components in the

    • Spout:topology, which describe the source of the data through Spout, have a nexttuple () function in Spout, which is called continuously, the generation of the source data is implemented in the function, and the data flows to the next node , only one spout is allowed in each topology. A component in the

    • Bolt:topology that receives data emitted from the previous node (spout or Bolt), which has an execute (tuple tuple) method that is passively executed when the data is received. To merge, filter, persist, and so on. Bolts can be the end point of a complete data processing process in topology, or a transfer point. The

    • Tuple:tuple is the basic unit for passing messages in storm, and the data structure of a Tuple is a list.

    • Stream: A stream is formed by a continuous tuple.

    • Stream grouping: Describes the rules for partition when data flows between different components (Spout/bolt), with the following types:

1.shuffle Grouping: Randomly distribute the tuple within the stream to ensure that each bolt receives the same number of tuples as 2. Fields Grouping: Grouped by field, the same fields are assigned to the same bolt, and different fields are assigned to different Bolt3.all Grouping: broadcast, for each tuple, all bolts receive 4. Global Grouping: Globally grouped, this tuple is assigned to a task within a bolt of storm, which is the 5 with the lowest ID. Non Grouping: No grouping, the current effect is the same as the global groouping Grouping: Direct grouping, the recipient of the specified message.






This article is from the "Shing" blog, make sure to keep this source http://hellowode.blog.51cto.com/8646864/1704426

Introduction to the flow calculation of large data processing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.