The various components executed in the storm cluster and their parallel

Source: Internet
Author: User

First, the components executed in stormWe know that the power of storm is that it can be very easy to scale its computing power horizontally in a cluster, and it will cut the entire operation into several separate tasks for parallel computing in the cluster.     In storm, a task is a spout or bolt instance that executes in a cluster. To facilitate understanding of how storm handles the tasks we assign to it in parallel, let me first describe the four components involved in topology in a cluster:
    • Nodes (machines): Nodes in a cluster are those nodes that work together to perform topology.
    • Workers (JVMs): A worker is a separate JVM process . Each node can be configured to perform one or more workers, and a topology can specify how many workers to execute.
    • Executors (Threads): A thread that executes in a worker JVM . A worker process can execute one or more executor threads. A executor is capable of performing multiple tasks,storm by default, one per executor assigns a task.
    • Tasks (Bolt/spout instances): Tasks are instances of spouts and bolts, which are handled in detail by executor threads.

Second, parallel in Storm (take wordcounttopology as an example)We can adjust the amount of parallelism in our work by configuration, if we don't set it,storm default Most processes have a number of parallelism of 1。 If we do not configure the wordcounttopology individually, then our topology run conditions such as what we see: One of our nodes assigns a worker to our topology, and this worker initiates a executor thread for each task.
2.1 Adding workers to topologyOne of the simplest ways to improve topology computing power is to add workers to our topology. Storm provides us with two ways to add workers: through configuration files or through program settings.
    • Descriptive narrative: How many worker processes are created for topology in a cluster
    • Configuration options: Topology_workers
    • Configure in code:
      • Config#setnumworkers
Configure workers with the Config object:
config config = new config (); config.setnumworkers (2);
Note: No matter how many workers are set up under Localmode, there is only one worker JVM process at last. 2.2 Configuring Executors and TasksAs we have said earlier, Storm creates a task for each topology component, and the default one executor only one task. Task is an instance of spouts and bolts, a executor thread can be handled by more than one tasks,tasks is a process that really handles detailed data, and the spout and bolts we write in code can be seen as being performed by tasks distributed in the cluster. The number of tasks is usually constant during the execution of the topology, but the executors of the components can vary. This also means: Threads<=tasks. 2.2.1 Setting the number of executor (thread)Specify the executors of a component by setting parallelism hint.
    • Descriptive narrative: How many executor each component produces
    • Configuration options:?
    • Configure in code:
      • Topologybuilder#setspout ()
      • Topologybuilder#setbolt ()
      • Note that as of Storm 0.8, the parameter now specifies the initial number of parallelism_hint executors, not tasks!, for that Bolt.
The following we specify that the number of concurrent sentensespout is 2, then this spout component will have two executors, each executor assigned a task, and its topology execution is as seen as: builder.setspout (sentence_spout_id, SPOUT, 2 );
2.2.2 Set the number of tasks    The Setnumtasks () method is used to specify the number of tasks for a component.
    • Descriptive narrative: How many tasks each component creates
    • Configuration options: Topology_tasks
    • Configure in code:
      • Componentconfigurationdeclarer#setnumtasks ()
Below we areSplitsentencebolt Set 4 tasks and 2 executors, in which case each executor thread will be assigned to run 4/2=2 tasks and then allocate 4 tasks for Wordcountbolt. Each task is run by a executor. Its topology for example with what is seen:
       Builder.setbolt (split_bolt_id, Splitbolt, 2). Setnumtasks (4). Shufflegrouping (sentence_spout_id); Builder.setbolt (count_bolt_id, Countbolt, 4). fieldsgrouping (split_bolt_id, Newfields ("word"));
    Assuming that 2 workers are allocated at the beginning, the execution of the topology is as seen in the following example:  examples of a city topologyThe following illustration shows a panoramic view of the actual topology, topology consists of three components, one spout:bluespout, two Bolt:greenbolt, and Yellowbolt.
For example, if we configure two worker processes, two spout threads, two greenbolt threads, and six Yellowbolt threads, then each worker process will have 5 executor threads distributed to the cluster. Here's a look at the detailed code:
java config conf = new Config (); conf.setnumworkers (2);// use of the worker processes

Topologybuilder.setspout ("Blue-spout", New Bluespout (), 2); Set parallelism hint to 2

Topologybuilder.setbolt ("Green-bolt", New Greenbolt (), 2). Setnumtasks (4). shufflegrouping ("Blue-spout");

Topologybuilder.setbolt ("Yellow-bolt", New Yellowbolt (), 6). Shufflegrouping ("Green-bolt");

stormsubmitter.submittopology ("Mytopology", conf, Topologybuilder.createtopology ());
Of course, there is also a parameter in storm to control the number of concurrent topology:
    • Topology_max_task_parallelism: This parameter can control the maximum number of executor on a component. It is often used to test the maximum number of threads in the local mode topology. Of course we can also set it in code: config#setmaxtaskparallelism ().
Iv. How to change the parallelism in an executive topologyA very good feature of storm is the ability to dynamically modulate the number of worker processes or executor threads during topology execution without restarting topology.     Such a mechanism is called rebalancing. We have two ways of balancing a topology:
    1. Balancing through the Storm Web UI
    2. Balancing with CLI tool storm
Here is an example of a CLI tool application:

# Reconfigure the Topology "Mytopology" to use 5 worker processes, # the spout ' blue-spout ' to use 3 executors and # The B The Olt "Yellow-bolt" to the use of the executors.

$ storm Rebalance mytopology-n 5-e blue-spout=3-e yellow-bolt=10














The various components executed in the storm cluster and their parallel

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.