Understand the parallel execution of storm, the relationship between Workder,executor,task and the scheduling algorithm

Source: Internet
Author: User

The official explanation of Storm Worker,executor,task is very clear, https://github.com/nathanmarz/storm/wiki/ Understanding-the-parallelism-of-a-storm-topology reprinted on a personal blog. A picture wins thousands of words:

Storm distinguishes between the following three main entities that is used to actually run a topology in a storm cluster:

    1. Worker processes
    2. Executors (Threads)
    3. Tasks

Here are a simple illustration of their relationships:

A worker process executes a subset of a topology. A worker process belongs to a specific topology and could run one or more executors for one or more components (spouts or Bo LTS) of this topology. A running topology consists of many such processes running on many machines within a Storm cluster.

An executor is a thread, which is spawned by a worker process. It may run one or more tasks for the same component (spout or bolt).

A Task performs the actual data Processing-each spout or bolt that is implement in your code executes as many Tasks across the cluster. The number of the tasks for a component are always the same throughout the lifetime of a topology, but the number of executors (threads) for a component can change over time. This means, the following condition holds true: #threads ≤ #tasks . By default, the number of the tasks are set to being the same as the number of executors, i.e. Storm would run one task per thread.

Configuring the Parallelism of a topology

Note in Storm's terminology "parallelism" is specifically used to describe the so-called  parallelism hint , which means the initial number of executor (threads) of a component. In this document though we use the term ' parallelism ' in a more general sense to describe how can configure not be only T He number of executors but also the number of worker processes and the number of the tasks of a Storm topology. We'll specifically call out when "parallelism" was used in the normal, narrow definition of Storm.

The following sections give an overview of the various configuration options and how to set them in your code. There is more than one-setting these options though, and the table lists only some of them. Storm currently have the following order of precedence for configuration settings: defaults.yaml < storm.yaml < topology-specific Configuration < Internal component-specific Configuration < external component-specific configuration.

Number of worker processes
    • Description:how many worker processes to create for the topology across machines in the cluster.
    • Configuration option:topology_workers
    • How to set in your code (examples):
      • Config#setnumworkers
Number of executors (threads)
    • Description:how many executors to spawn per component.
    • Configuration option:?
    • How to set in your code (examples):
      • Topologybuilder#setspout ()
      • Topologybuilder#setbolt ()
      • Note that as of Storm 0.8, the parameter now specifies the initial number of parallelism_hint executors, not tasks!, for that Bolt.
Number of tasks
    • Description:how many tasks to create per component.
    • Configuration Option:topology_tasks
    • How to set in your code (examples):
      • Componentconfigurationdeclarer#setnumtasks ()

Here is a example code snippet to show these settings in practice:

 topologybuilder. ( "Green-bolt" new  Greenbolt (), 2) . Setnumtasks (4. Shufflegrouping ( "blue-spout< Span class= "O");               

In the above code we configured Storm to run the bolt with an initial number of the both executors and four GreenBolt associated TA Sks. Storm would run, and the other tasks per executor (thread). If you don't explicitly configure the number of tasks, Storm would run by default one task per executor.

Example of a running topology

The following illustration shows how a simple topology would look like in operation. The topology consists of three components:one spout calledBlueSpoutand Bolts calledGreenBoltandYellowBolt. The components is linked such thatBlueSpoutSends its output toGreenBolt, which in turns sends it own output toYellowBolt.

The is GreenBolt configured as per the code snippet above whereas and only BlueSpout set the YellowBolt parallelism hint (number of ex ecutors). Here is the relevant code:

ConfigConf=NewConfig();Conf.Setnumworkers(2);Use the worker processesTopologybuilder.Setspout("Blue-spout",NewBluespout(),2);Set parallelism hint to 2Topologybuilder.Setbolt("Green-bolt",NewGreenbolt(),2).Setnumtasks(4)shufflegrouping ( "blue-spout" . ( "Yellow-bolt" new  Yellowbolt (), 6) . Shufflegrouping ( "Green-bolt" . ( "mytopology" conf topologybuilder. ()            

And of course Storm comes with additional configuration settings to control the parallelism of a topology, including:

    • Topology_max_task_parallelism:this setting puts a ceiling on the number of executors so can be spawned for a single com Ponent. It is typically used during testing to limit the number of threads spawned when running a topology in local mode. You can set the This option via E.g.config#setmaxtaskparallelism ().
How to change the parallelism of a running topology

A Nifty feature of Storm is so you can increase or decrease the number of worker processes and/or executors without bein G Required to restart the cluster or the topology. The act of doing is called rebalancing.

You have both options to rebalance a topology:

    1. Use the Storm Web UI to rebalance the topology.
    2. Use the CLI tool storm rebalance as described below.

Here's an example of using the CLI tool:

# Reconfigure the Topology "Mytopology" to use 5 worker processes,# the spout ' blue-spout ' to use 3 executors and# The bolt "Yellow-bolt" to use ten executors. --Blue-spout=-yellow-bolt=ten    
References for this article
    • Concepts
    • Configuration
    • Running topologies on a production cluster
    • Local mode
    • Tutorial
    • Storm API documentation, most notably the classConfig

Understand the parallel execution of storm, the relationship between Workder,executor,task and the scheduling algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.