Cloud computing design model (17th)-queue-based load balancing model
Using a queue as a task, it can be called to successfully intercept heavy loads. It may lead to failed services or buffer between task timeout services in other ways. This mode can help minimize the impact of availability and response requirements on tasks and services during peaks.
Background and problems
Many solutions involve running service calling tasks in the cloud. In this environment, if a service carries intermittent heavy loads, it may cause performance or reliability problems.
A service can be a component, and it is the same solution as part of a task to take advantage of it, or it can be a third-party service that provides access to frequently-used resources, such as cache or storage services. If the same service is used by multiple running tasks at the same time, it can be the unpredictable number of requests that the service may make at any given time point.
It may be a service that may encounter a peak demand, resulting in overload and failure to respond to requests in a timely manner. A large number of concurrent request drive services may also cause service failure. If it cannot process the arguments, that is, these requests may cause the service to fail.
Solution
Restructured the solution and introduced the queue between tasks and services. Tasks and services run asynchronously. A task Post contains messages that serve the data required by a queue. The queue acts as a buffer to store the message until it is retrieved by the service. This service retrieves and processes messages from the queue. From multiple tasks, it can generate requests at a highly variable rate, and can be passed to the service through the same message queue. Figure 1 shows this structure.
Figure 1-load of services on the queue level
The queue effectively decouples tasks from the service, and the service can process the information of the number of requests from the parallel task at its own speed. In addition, there is no delay to a task. If the service is unavailable, it delivers a message to the queue.
This mode provides the following benefits:
? It can help maximize availability, because the latency caused by the service will not be applied to the application, it can continue to publish message queues, even if the service is unavailable or is not processing messages, it has immediate and direct impact.
? It can help maximize scalability, because the number of queues and the number of services can change to meet the needs.
? It can help to control costs, because the number of service instances deployed only needs to meet the average load, rather than the peak load.
Note:
Some services can achieve throttling. If the demand reaches the threshold, exceeding the system may fail. Throttling may reduce the function availability. You may be able to achieve load balancing with these services to ensure that this threshold is not met.
Problems and precautions
Consider the following when deciding how to implement this mode:
? To control the speed of the service to process messages, to avoid sharp application logic of the target resource is necessary. Avoid putting peak demands in the next phase of the system. The test system is under load to ensure that it provides the desired region, and adjusts the number of queues and processes messages to implement the number of service instances.
? Message queue is a one-way communication mechanism. If a task expects a response from the service, it may be necessary to execute the mechanism that the service can use to send the response. For more information, see Asynchronous message primer.
? You must be careful, if you apply to automatically scale to the request service in the monitored queue, because this may lead to more competition for any resources, the shares of these services, and reduce the validity of queue-level load.
When to use this mode
This pattern is ideal for any type of application that may be subject to heavy load services.
This pattern may not be appropriate if the application expects the response of the service with minimum latency.
Example
Microsoft's Azure Web Role stores data using a separate storage service. If a large number of Web role instances run simultaneously, the storage service may be overwhelmed and cannot send requests quickly enough to prevent timeout or failure to respond to these requests. Figure 2 lists the problem.
Figure 2-the service is being overwhelmed by a large number of concurrent requests from a Web role instance
To solve this problem, you can use the load between a queue-level Web role instance and the storage service. However, the storage service is designed to accept synchronization requests and cannot be easily modified to read information and manage throughput. Therefore, you can introduce a secondary role to receive requests from the queue and forward them to the storage service proxy service. The application logic of the secondary role can be controlled when it transmits requests to the storage service to prevent the storage service from being overwhelmed by the speed. Figure 3 shows the solution.
Figure 3-use the queue and secondary roles to horizontally scale and load between service instances
MSDN: http://msdn.microsoft.com/en-us/library/dn589783.aspx
Cloud computing design model (17th)-queue-based load balancing model