Cloud computing design mode (23)--throttling Throttle mode

Source: Internet
Author: User

Cloud computing design mode (23)--throttling Throttle mode


Controls the consumption of resources used by an application, a single tenant, or an instance of an entire service. This mode allows the system to continue to run and meet the service level agreement even when increasing the demand for resources to place an extreme load.

Background and issues


The cloud application load typically varies based on the number of active users or the type of activity they are performing. For example, multiple users may be activated during working hours, or the system may be required to perform computationally expensive analysis at the end of each month. There may also be sudden and unexpected bursts of activity. If the system's processing requirements exceed the available resources, it will suffer poor performance and even fail. The system may have to meet the level of the service contract, and such a failure may be unacceptable.

There are many strategies that can be used to handle variable workloads in the cloud, depending on the business goals of the application. One strategy is to use Autoscale to match the supply resources at any given time to the needs of the user. This has the potential to consistently meet user needs while optimizing operating costs. However, although Autoscale may cause more resource configuration, this configuration is not instantaneous. If the demand grows rapidly, there may be a time window, where there is a resource deficit.

Solution Solutions


Another strategy to Autoscale is to allow applications to use only a few soft limits on the resources, and then throttle when they reach this limit. The system should monitor how it uses resources so that when usage exceeds some system-defined thresholds, it can adjust requests from one or more users to keep the system working and meet any service level agreements (SLAs) that are in place. For more information about monitoring resource usage, see Instrumentation and telemetry guidance.

The system can implement a variety of throttling strategies, including:
• Deny requests from individual users who have accessed the system API more than n times per second over a given time. This requires system meters to use resources for each tenant or user who runs the application. For more information, please refer to the service metering guidelines.
• Disabling or degrading the selection of unnecessary services functions so that the necessary services can provide sufficient resources to run unimpeded. For example, if the application is a video stream output, it can switch to a lower resolution.
• Use load balancing to smooth the amount of activity (this method is covered in more detail by the queue-based load balancing pattern). In a multitenant environment, this approach reduces the performance for each tenant. If the system must support tenants with different SLA combinations, work for high-value tenants may be performed immediately. Request that other tenants can hold back and deal with the backlog in a timely manner to mitigate. The priority queue pattern, which can be used to help implement this method.
• Actions that represent low-priority applications or tenants that are deferred. These actions can be paused or cut, the exception generates notifications, the system is busy, and the operation should retry the tenant later.

Figure 1 shows an area graph of resource utilization (memory, CPU, bandwidth, and other factors) for the application of time to the three characteristics being used. A feature is a functional area, for example, a component that performs a specific set of tasks, a code snippet, performs a complex calculation, or that provides a service, such as an element that is cached in memory. These characteristics are marked as a, B, and C.

Figure 1-Resource utilization for a graph of time represents three users running an application

Attention:

The immediate line feature under the zone represents a resource that calls this feature when used in an application. For example, the following lines feature an area that shows the application resource using the feature A is being used, and the application invocation function for the area between feature a and feature B lines is used. A summary of the indicated resources for each feature area shows the total resource utilization of the system.



The curve in Figure 1 shows the effect of the delay operation. Just before the time T1, the total resources allocated to all applications using these features reached a threshold (soft limit for resource utilization). At this point, the application is at risk of exhausting available resources. In this system, feature B is less important than feature a or feature C, so it is temporarily disabled and its used resources are freed. Between the time t1,t2, use the function A and the application in function C to continue functioning normally. Finally, resources use these two features to decrease the point when, at time T2, there is sufficient capacity to enable feature B again.

The automatic scaling and tuning method can also be combined to help keep the application responsive and within the SLA. If demand is expected to remain high, throttling can provide a temporary solution while the system expands. At this point, the full functionality of the system can be restored.

Figure 2 shows the overall resource utilization by using an area graph of all the applications running in the system with time, and shows how to limit the combinations that can be combined with autoscale.

Figure 2-chart shows the effect of automatic scaling combined with throttling


In time T1, the threshold specifies the soft limit of resource utilization. At this point, the system can begin to scale out. However, if the new resource does not become available quickly enough to reproduce some of the resources may be exhausted, and the system may fail. To prevent this from happening, the system is temporarily limited, as described earlier. when auto-scaling is complete and additional resources are available, restrictions can be relaxed.

Issues and considerations


When deciding how to implement this pattern, you should consider the following points:
• Throttling the application, and using the strategy, is a building decision that affects the overall design of the system. Throttling should be considered at the beginning of the application design, as it is not easy to add it once the system has been implemented.
• Throttling must be carried out quickly. The system must be able to detect increased activity and respond accordingly. The system must also be able to revert to the original state after the rapid load has eased. This requires the corresponding performance data to be constantly captured and monitored.
• If a service needs to temporarily deny a user's request, it should return a specific error code so that the client application understands that the reason for refusing to perform some action is due to throttling. The client application can wait for a period of time and then retry the request.
• Throttling can be used as a temporary measure of the system autoscales. In some cases it may be better to simply throttle, rather than proportionally, if the active burst is abrupt and is not expected to be long-lived because scaling can significantly increase the operating cost.
• If throttling is in use for temporary measures, and a system is autoscales, and if resource requirements grow rapidly, the system may not be able to continue functioning even when operating in throttling mode. If this is unacceptable, consider maintaining large capacity reserves and configuring more aggressively auto-scaling.

when to use this mode


Use this mode:
• To ensure that the system continues to meet service level agreements.
• To prevent a single tenant from monopolizing the resources provided by the application.
• In order to handle unexpected activities.
• To help limit the cost-optimized systems that need to maintain the maximum resource level that it operates.

Example


Figure 3 shows how throttling can be implemented in a multitenant system. Users from each tenant organization access a cloud-hosted application that they fill out and submit a survey. The application contains instruments that are used to monitor the speed at which these users submit requests to the application.

To prevent users from affecting the responsiveness and availability of all other users of the app from one tenant, limit the number of requests that users can submit to each second from any one tenant. The application block request exceeds this limit.

Figure 3-Throttling in a multi-tenant application

This article is translated from msdn:http://msdn.microsoft.com/en-us/library/dn589798.aspx

Cloud computing design mode (23)--throttling Throttle mode

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.