A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
What this pattern does is generalize by merging multiple different tasks and operations into a single cell, enabling cloud computing applications to increase computing resource utilization, reduce administrative overhead, and the overhead of connection interactions between tasks.
PS: Here I make a small comment on the translation of the "Computational unit" here, where the cell can be thought of as a logical running container, which can be a variety of container based on cgroup, or it can be executor like worker,spark in storm and so on. This pattern, in addition to being used as a reference when designing a distributed system, is also useful when using frameworks like Storm or spark to help configure scheduling mechanisms that optimize the entire computational framework (such as storm support for custom scheduling mechanisms).CONTEXT and Problem
A cloud computing application frequently makes different computing operations. In some solutions, in the early stages of the system design, it is meaningful to follow the separation of concerns design principles and to run different operations in a fully independent deployed compute unit. However, while this strategy simplifies the design of business logic, deploying a large number of compute units across parts of the same cloud application can greatly increase the runtime overhead and increase the complexity of system administration.
This example shows a cloud computing application architecture with multiple computing units. Each compute unit runs in its own separate virtual environment. Each feature is implemented by a separate task that runs in its own independent cell.
Each cell consumes resources even when it is idle or under a low pressure, so this is not the most cost-effective solution.Solution
In order to reduce system overhead, improve resource utilization, improve communication speed, and reduce the difficulty of system management, we may be able to combine different tasks or operations into a single unit of calculation to execute.
Tasks can be grouped according to the feature information provided by the system and the cost of combining the feature information. A common approach to judging similar tasks is to focus on the scalability, lifecycle, and Conditions (resources) of the task. By grouping these tasks together, you can measure them as a unit. The elastic features of the cloud platform can be scaled according to the load in these combined unit of calculation.
Here's a counter-example to show how to use the extensibility of a task to determine which tasks should not be grouped together:
There are two tasks: Task 1 is used to listen to a very rare message in the queue, and Task 2 is used for a large number of communication events in the network. The second task has a high demand for system elasticity and is likely to increase or decrease the corresponding compute units based on the size of the network traffic, but if you add the compute cells to the first task at the same time as the second task increases the cell scale, you will cause many listeners in the system to listen for that rare message. This is obviously meaningless (that is, the first task does not need to scale computing resources frequently, whereas the second task often requires adding and subtracting computing resources, and if you group them together and then add and subtract compute resources, it can cause unnecessary waste).
In many cloud computing environments, you can specify specific resources for compute units, such as CPU cores, memory, hard disk space, and so on. In general, the more resources you bind, the greater the overhead you will need to bind. So in terms of efficiency, try to extend the execution time of a computational unit with a lot of resources so that it stays in the longest possible time (that is, you need to keep a computational unit with a lot of computing resources busy and assign tasks to it).
If you have a bunch of tasks that require a significant amount of CPU resources in a single burst of time, you need to combine them as much as possible and then request a cell with enough resources to execute them. However, when the system is under too much pressure (that is, the existing resources are not enough to meet all the tasks at the same time) to balance this requirement (the translator yy: For example, these tasks can be divided into one group instead of multiple groups, and then some scheduling, such as batch execution, This allows the CPU to run full while in batches, in case there is a resource competition. In contrast, some computationally intensive tasks that run for long periods of time should not be partitioned into the same unit of computation.Issues and Considerations
Here are a few things to look at when considering how to implement this pattern:
1. Extensibility and resiliency: Many cloud computing platforms support the dynamic additions and deletions of the compute cells to achieve the scalability and elasticity of the system, so it is important that the tasks are grouped so that the different tasks of the extended elasticity feature can be divided (as demonstrated in the previous example).
2. Life cycle: Some cloud computing platforms periodically recycle the virtualized environment (resources) of the compute units. However, in this design, it is generally necessary to configure the compute unit in the system to not be recycled, unless the task has already been completed, when a task that takes a long time to execute is run. An alternative is to use check-pointing, where the task pauses when the cell is terminated, makes a check point, and then continues to run when the cell is restored.
3.Release cadency: If there is a task configuration in the cell or the business code changes frequently, it will result in the need to stop the compute cells often, then update the business code, update the task configuration and reload the task, and finally restart the compute unit. However, when this occurs, the computational task in the same cell is affected, and restarts as the cell's pause restarts. So it's going to be independent of that recurring task to avoid having a big impact on other tasks.
4. Security: Tasks in the same cell typically work in the same security control context and share permissions to access resources. Therefore, tasks that are assigned together must be highly trustworthy to each other and ensure that a task does not adversely affect other tasks. Also, if more tasks are allocated in a cell, the greater the damage it receives when the cell is attacked, and the most vulnerable task in the cell becomes the short board for all tasks in the entire cell.
5. Fault tolerance of the cell: if a task in a cell has an exception, it is likely to have an impact on other tasks in the same cell, for example, if a task fails to start in a bad design, it is likely that the startup logic of the entire cell will be affected. It is therefore necessary to take into account the fault tolerance at the computational unit level and avoid the impact of individual task failures on other tasks.
6. Competitive processing: In use to avoid the allocation of two resource-generating tasks, in an ideal situation, the assigned tasks should have different resource utilization characteristics. For example, two compute-intensive tasks are best not to be allocated together, and a similar two of tasks that require a lot of memory should not be allocated together. In this case, it is a good combination to combine the computationally intensive tasks with the tasks that require a lot of memory (compare this to the strategy in the case of a burst of stress in the previous article).
7. Complexity: When using this pattern, different tasks are mixed together into a single cell to run, and this approach will inevitably complicate the program logic in the computational unit compared to the traditional design, thus improving maintenance, debugging, and testing difficulties.
8. Keep the logical architecture stable: Optimize the design of each task in the process so that it can remain as unchanged as possible in the future.
9. Other: Consolidating compute resources is just one way to reduce the overhead of running tasks concurrently. This approach requires rigorous planning and supervision to ensure that it is performing optimally. Depending on the characteristics of the system or the operating environment, sometimes some other strategy may be more useful, so it cannot be confined to this mode.
Cloud computing Design Pattern Translation (V): Compute Resource Consolidation pattern
Start building with 50+ products and up to 12 months usage for Elastic Compute Service