Storm topology concurrency

Source: Internet
Author: User
Tags configuration settings
Document directory
  • Concept
  • Processing the parallelism of a topology, concurrency Configuration
  • Example of a running Topology
  • How to change the parallelism of a running topology, dynamic change of concurrency

Understanding the parallelism of a storm Topology

Https://github.com/nathanmarz/storm/wiki/Understanding-the-parallelism-of-a-Storm-topology

 

Concept

One topology can contain one or more workers (running on different machines in parallel). Therefore, the Worker Process is to execute a subset of the topology, and the worker can only correspond to one topology.

A worker can contain one or more executors. Each component (spout or bolt) corresponds to at least one executor. Therefore, it can be said that executor executes a subset of compenent, and one executor can only correspond to one component.

A task is a specific processing logic. An executor thread can execute one or more tasks.
However, by default, each executor executes only one task. Therefore, we often think that a task is an execution thread.

 

AWorker ProcessExecutes a subset of a topology.
A worker process belongs to a specific topology and may run one or more executors for one or more components (spouts or bolts) of this topology.
A running topology consists of your such processes running on your machines within a storm cluster.

AnExecutor Is a thread that is spawned by a worker process. It may run one or more tasks for the same component (spout or bolt ).

ATaskPerforms the actual data processing-each spout or bolt that you implement in your code executes as your tasks within ss the cluster.
The number of tasks for a component is always the same throughout the lifetime of a topology, but the number of Executors (threads) for a component can change over time. this means that the following condition holds true:#threads ≤ #tasks.
By default, the number of tasks is set to be the same as the number of executors, I. e. Storm will run one task per Thread.

 

Processing the parallelism of a topology, concurrency Configuration

The following sections give an overview of the various configuration options and how to set them in your code. there is more than one way of setting these options though, and the table lists only some of them.

Storm currently has the following order of precedence for configuration settings:

defaults.yaml<storm.yaml<Topology-specific configuration <internal component-specific configuration <external component-specific configuration

 

For concurrency configuration, you can configure concurrency in multiple places in storm. The priority is shown above...
Including,

The number of worker processes can be configured in the configuration file and code. worker is the execution process. Therefore, considering the concurrency effect, the number should be at least greater than the number of machines.

The number of executors, the number of concurrent threads of component, can only be configured in the Code (through setbolt and setspout parameters), such as setbolt ("green-bolt", new greenbolt (),2)

The number of tasks, which can be left blank. The default value is 1 and executor1. You can also use setnumtasks () to configure

Number of worker Processes
  • Description: How does worker processes to createFor the TopologyUsing SS machines in the cluster.
  • Configuration option: topology_workers
  • How to Set in your code (examples ):
    • Config # setnumworkers
Number of Executors (threads)
  • Description: How many executors to spawnPer component.
  • Configuration option :?
  • How to Set in your code (examples ):
    • Topologybuilder # setspout ()
    • Topologybuilder # setbolt ()
    • Note that as of storm 0.8parallelism_hintParameter now specifies the initial number of Executors (not tasks !) For that bolt.
Number of tasks
  • Description: How many tasks to createPer component.
  • Configuration option: topology_tasks
  • How to Set in your code (examples ):
    • Componentconfigurationdeclarer # setnumtasks ()

Here is an example code snippet to show these settings in practice:

topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)               .setNumTasks(4)               .shuffleGrouping("blue-spout);

In the above Code we configured storm to run the boltGreenBoltWith an initial number of two executors and four associated tasks. storm will run two tasks per Executor (thread ). if you do not explain icitly configure the number of tasks, storm will run by default one task per executor.

 

Example of a running Topology

The following validation shows how a simple topology wocould look like in operation.

The topology consists of three components: one spout calledBlueSpoutAnd two bolts calledGreenBoltAndYellowBolt.

The components are linked such thatBlueSpoutSends its outputGreenBolt, Which in turns sends its own outputYellowBolt.

 

Config conf = new Config();conf.setNumWorkers(2); // use two worker processestopologyBuilder.setSpout("blue-spout", new BlueSpout(), 2); // set parallelism hint to 2topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)                .setNumTasks(4)                   //set tasks number to 4               .shuffleGrouping("blue-spout");topologyBuilder.setBolt("yellow-bolt", new YellowBolt(), 6)               .shuffleGrouping("green-bolt");StormSubmitter.submitTopology(        "mytopology",        conf,        topologyBuilder.createTopology()    );

The figure and code are clear. Two + 2 + 6 = 10 executor threads are defined by setbolt and setspout.

In addition, two workers are set for the same setnumworkers, so storm runs an average of five executors on each worker.

For green-bolt, four tasks are defined, so each executor has two tasks.

 

How to change the parallelism of a running topology, dynamic change of concurrency

Storm supports dynamically changing (increasing or decreasing) the number of worker processes and the number of executors without restart topology, called rebalancing.

Use the storm web ui or the storm rebalance command. See the following example.

A nifty feature of storm is that you can increase or decrease the number of worker processes and/or executors without being required to restart the cluster or the topology. the act of doing so is called rebalancing.

You have two options to rebalance a topology:

  1. Use the storm web UI to rebalance the topology.
  2. Use the CLI tool storm rebalance as described below.

Here is an example of using the CLI tool:

# Reconfigure the topology "mytopology" to use 5 worker processes,# the spout "blue-spout" to use 3 executors and# the bolt "yellow-bolt" to use 10 executors.$ storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.