Tasks & Executors RElation
Q1. However I ' m a bit confused by the concept of "task". is a task an running instance of the component (spout or bolt)? An executor has multiple tasks actually is saying the same component are executed for multiple times by the executor, am I correct?
A1:yes, and Yes
A task is just an instance of a component (spout or bolt). The executor thread calls the Nexttuple or Execute method of the task during execution
Q2. Moreover in a general parallelism sense, Storm would spawn a dedicated thread (executor) for a spout or bolt, and what's is Co Ntributed to the parallelism is a executor (thread) having multiple tasks?
A2:running more than one task per executor does not increase the level of parallelism-a executor always have one threa D, it uses for all of the IT tasks, which means that the tasks run serially on an executor.
Running multiple tasks does not increase the degree of parallelism because a executor is just a thread, which means that it executes all of the tasks sequentially
- The number of executor threads can is changed after the topology have been started (see
storm rebalance
command).
- The number of the tasks of a topology is static.
And by definition, there is the invariant of #executors <= #tasks
.
The number of tasks for a topology is fixed, but the number of executor (threads) can be changed dynamically. Default, number of executor <= tasks
So one reason for have the executor tasks per the flexibility to expand/scale up the topology th Rough the Storm rebalance
command in the future without taking the topology offline. For instance, imagine your start out with a Storm cluster of the machines but already know that next week another ten boxes W Ill be added. Here's could opt for running the topology at the anticipated parallelism level of machines already on the initial Boxes (which is, of course, slower than). Once the additional boxes is integrated you can then Storm rebalance
the topology All boxes without any downtime.
Another reason to run the executor is a for (primarily functional) testing. For instance, if your dev machine or CI server was only powerful enough to run, say, 2 executors alongside all the other St Uff running on the machine, you can still run for the tasks (here:15 per executor) to see whether code such as your custom Sto RM grouping is working as expected.
A executor running 2+task number of cases is usually:
- In order to provide much flexibility for the topology operation, the concurrency can be extended in the run
- For functional testing
In practice we normally we run 1 task per executor.
Ps:note that Storm would actually spawn a few more threads behind the scenes. For instance, each executor have its own "send Thread", which is the responsible for handling outgoing tuples. There is also "system-level" background threads for e.g. ACKing tuples that run alongside "your" threads. IIRC the Storm UI counts those acking threads in addition to "your" threads.
In fact we are usually executors number = task number
Reference
Http://stackoverflow.com/questions/17257448/what-is-the-task-in-storm-parallelism
Http://www.cnblogs.com/yufengof/p/storm-worker-executor-task.html
Http://storm.apache.org/releases/0.9.6/Understanding-the-parallelism-of-a-Storm-topology.html
Understanding of concurrency of [Storm]