The comparison http://www.111cn.net/sys/linux/96715.htm of worker, executor and task in distributed computing system storm
Overview of 3 kinds of scheduler
Eventscheduler: Distribute the available resources in the system evenly to the topology that need resources, but it is not absolutely uniform, the following will be explained in detail
Defaultscheduler: And Evenetscheduler is similar, but will be the other topology not need to collect the resources, and then Eventscheduler
Isolationscheduler: Users can define this topology machine resource and assign these topology as a priority when allocating storm to ensure that the machine assigned to the topology serves only this one topology
Defaultscheduler
Call the cluster Needsschedualertopologies method to obtain the topologies for the task assignment
Start processing each topology separately
Call the cluster Getavailableslots method to get the resources available to the current cluster, return in the form of a <node,port> collection, and assign to Available-slots
The executor information of the current topology is obtained and converted to the <START-T ask-id,end-task-id> collection is stored all-executors, the topology information is calculated according to executors, Using the compute-executors algorithm, will be explained later
Then call the Eventscheduler Get-alive-assigned-node+port->executors method to get the resources that topology has already obtained, and return to <node+port,executor> Collection to alive-assigned, why do you want to calculate the allocated resources for the current topology instead of all the allocated resources in the cluster? , guessing might be useful when doing a task rebalance.
Then we call the Slot-can-reassign to judge the slots information in alive-assigned, and select the slot deposit variable which can be reassigned can-reassigned
This available resource is made up of available-slots and can-reassigned.
Next, calculate the total number of slot that the current topology can use Total-slots--to-use:min (numworker number of topology)
If the number of slots currently allocated is total-slots--to-use>, the Bad-slots method is invoked to compute the slot that can be freed
Call the cluster Freeslots method to release the computed Bad-slot
Finally, the Eventscheduler Schedule-topologies-evenly is called to allocate
Continue to the next topology
Main process Comb: Get current cluster idle resource-> compute current topology executor information (used when allocating)-> compute reallocated and deallocated resource-> allocations
Eventscheduler
Eventscheduler scheduling algorithm with default compared to a calculation can be reallocated to allocate resources, directly using the supervisor of idle slot to distribute, no longer in this detail.
Eventscheduler and Defaultscheduler scheduling examples:
These two scheduling mechanisms in general, the scheduling results are basically consistent, so together to see:
Cluster initial state
Next we submit 3 topology
Topology |
Worker number |
Executer number |
Task number |
T-1 |
3 |
8 |
16 |
T-2 |
5 |
10 |
10 |
T-3 |
3 |
5 |
10 |
1. Submit T-1
The sort-slots algorithm handles available slots, with the result {[S1 6700] [S2 6700] [S3 6700] [S4 6700] [S1 6701] [S2 6701] [S3 6701] [S4 6701] [s1 6702] [s2 6702] [S3 6702] [S4 6702] [S1 6703] [S2 6703] [S3 6703] [S4 6703]}
Compute-executors algorithm calculated after the executor list is: {[1 2] [3 4] [5 6] [7 8] [9 10] [11 12] [13 14] [15 16]}; Note: format is [Start-task-id end-task -id], a total of 8 worker, the first contains 2 Task,start-task-id for 1,end-task-id 2, so it is recorded as [1 2], and so on ... the compute-executors algorithm will be detailed in the next blog post
8 Executor on 3 Worker's distribution status is [3,3,2]
The results of the assignment are:
{[1 2] [3 4] [5 6]}-> [S1 6700]
{[7 8] [9] [one]}-> [s2 6700]
{[->]} [S3 6700]
After allocation, the cluster status is:
2. Submit T-2
Available slot after sort-slots: {[S1 6701] [S2 6701] [S3 6701] [S4 6700] [S1 6702] [S2 6702] [S3 6702] [S4 6701] [S1 6703] [S2 6703] [S3 6703] [S4 6702] [S4 6703]}
Comput-executors computed after executor list: {[1 1] [2 2] [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10]}
The distribution of 10 executor on 5 worker is [2,2,2,2,2]
The results of the assignment are:
{[1 1] [2 2]}-> [S1 6701]
{[3 3] [4 4]}-> [S2 6701]
{[5 5] [6 6]}-> [S3 6701]
{[7 7] [8 8]}-> [S4 6700]
{[9 9] [ten]}-> [S1 6702]
After allocation, the cluster status is:
3. Submit T-3
Sort-slots after slot list is: {[S1 6703] [S2 6702] [S3 6702] [S4 6701] [S2 6703] [S3 6703] [S4 6702] [S2 6704] [S3 6704] [S4 6703] [s 4 6704]}
Compute-executors After the executor list is: {[1 2] [3 4] [5 6] [7 8] [9 10]}
The distribution of 5 executor on 3 worker: [2,2,1]
The results of the assignment are:
{[1 2] [3 4]}-> [S1 6703]
{[5 6] [7 8]}-> [S2 6702]
[9]-> [S3 6702]
After allocation, the cluster status is:
As shown in the figure, this task scheduling method is not absolutely uniform, S1 already full load operation, and S4 just use a slots.