Background
Crawler System: One is the control server, and the other 100 are used as crawlers. The server periodically distributes collection tasks every day.
Problem: As the target collection task is large, we are going to add 10 crawlers. It is expected that the original task allocation rules will not be changed
In addition, machines with fewer tasks can be allocated at the same time, and machines with fewer tasks can be evenly allocated (as far as possible ).
It can ensure that the machine has tasks and does not run falsely ).
Solution
Solution 1: Calculate the number of tasks and task_sum for each machine of the assigned task, and add the number of tasks to be allocated dis_num to the average score.
Each machine obtains the avg_task_num average number of tasks after a distributed task. If the number of assigned tasks is less than avg_task_num
. The number of assigned tasks is avg_task_num minus the number of assigned tasks on each machine. If the number of tasks is sufficient, the priority is given.
Assigned to machines with fewer tasks.
Analysis: This solution is suitable at first glance. The task will be properly distributed, and the distribution volume will be calculated appropriately.
The task is distributed centrally on a machine.
However, if the number of tasks is insufficient, the task cannot be evenly distributed. Assume that the number of tasks for the following five machines is
[50, 50, 50, 0, 0], machines 4 and 5 are newly added to the crawler system. The number of tasks and task_sum are 150, and the number of tasks to be allocated
Dis_num is 200,100, and the number of tasks allocated to each machine after 50 or 10 is as follows:
| Number of tasks to be assigned Dis_num |
Avg_task_num |
Machine 1 |
Machine 2 |
Machine 3 |
Machine 4 |
Machine 5 |
| 200 |
70 |
50 + 20 |
50 + 20 |
50 + 20 |
0 + 70 |
0 + 70 |
| 100 |
50 |
50 + 0 |
50 + 0 |
50 + 0 |
0 + 50 |
0 + 50 |
| 50 |
40 |
50 + 0 |
50 + 0 |
50 + 0 |
0 + 40 |
0 + 10 |
| 10 |
32 |
50 + 0 |
50 + 0 |
50 + 0 |
0 + 10 |
0 + 0 |
In the table above, the number of tasks on each machine is expressed as a + B, where A is the number of tasks allocated on the machine, and B is the actually allocable
Number of tasks. When the number of tasks to be allocated (dis_num) is 200 and 100, we find that this scheme can give priority to and evenly allocate tasks;
However, when the number of tasks to be allocated (dis_num) is 50, we can find that this scheme can give priority to machines 4 and 5 without tasks.
Task, but the number of tasks has a certain degree of deviation; when the number of tasks to be allocated (dis_num) is 50, we find that due to the allocation order
The number of tasks allocated to machine 5 is 0. This is not the expected result. We expect that if the number of tasks to be allocated (dis_num) is
10. In order not to let the machine run falsely, we hope to allocate five tasks to each of machines 4 and 5. How can this problem be solved?
At this point, you may be more likely to think of adding conditional judgment. If there are machines with a number of 0, priority will be given to them and the distribution will be average. However,
If there are enough tasks to be assigned, how can we determine how many machines with zero tasks are assigned first? If the allocation is higher
The number of other machines is small, and the load of crawlers is not balanced enough. What if there is less allocation priority? Can I perform another allocation?
This idea seems to always have some flaws. If it is converted into code, some additional if statements should be added in the end for judgment. If it is not fully considered
Task Scheduling may pose a great risk.
Solution 2: We can simply convert the problem:
Figure 1: ideal task allocation under normal circumstances
Figure 2: ideal task allocation with newly added machines
We compare the task allocation to the water adding process of the connected bucket, assuming that the bucket is connected (ideally connected under any circumstances), in the bucket or multiple or
Less ice cubes (ideal ice cubes do not float, and the volume of ice and water is still ). After adding water, the last horizontal plane must be the lowest in the bucket.
Position (the reader's brain fills up his mind, the person goes to the height, the water goes low !).
If we regard a bucket as the task pool on our crawler machine and ice as the task that has been assigned to the crawler machine, adding water can be regarded as our
Task Allocation: the horizontal plane after adding water can be equivalent to the minimum number of tasks after the task is assigned. I call it min_line ).
For the above task allocation, if we can calculate this minimum level line (min_line) before the allocation, we can easily implement it.
Assign the desired task. The actual number of tasks allocated to each machine is the difference between min_line and the number of allocated tasks. If the difference is negative, no
Assign tasks. Can this min_line be computed? Well, of course you can.
Min_line computing process description
Total_sum: Sum of the number of assigned tasks on the machine involved in the allocation (assuming that all machines are involved in the allocation, the number of tasks initialized to all machines and)
Dis_num: number of tasks to be allocated
T_num: number of machines actually involved in task allocation (assume that all machines are involved in the allocation and the number of machines is initialized to all)
Avg_num: number of tasks of the machines involved in the allocation after the allocation is completed
Ideas:
1. min_line is actually avg_num = (total_sum + dis_num)/t_num that actually participates in the allocation. Assume that all machines participate in the allocation.
2. First, check whether all the assigned tasks of the machines involved in the allocation are smaller than avg_num to traverse each machine and compare the assigned tasks of each machine with avg_num,
If the number of assigned tasks on a machine is greater than avg_num, it indicates that the machine is definitely not involved in the final allocation. You need to remove the machine.
Otherwise, min_line = avg_num. After completing a traversal, re-calculate avg_num and re-execute this step.
Summary
I am just trying a rule that can better schedule tasks, just like a flow of water, to truly prioritize and evenly allocate tasks (at least theoretically )!