A brief introduction of reliable automatic extension service based on feedback control

Source: Internet
Author: User
Tags numeric value requires

Introduced

When we deploy the service to the production environment, we need to determine the number of servers online. This is a difficult decision because, for a given traffic load, we usually don't know how many servers are needed. As a result, people have to use more (and possibly too many) servers to be "safe". But the server is cost, so doing so will make things too expensive.

In fact, things are worse than that. Traffic does not stay the same all day. If we deploy the server at peak traffic, most of the servers are in a low usage state for most of the time. Especially in a cloud-based deployment scenario where the server instance can be freely allocated at any time, we should be aware that if we only activate the server instance needed to handle the load at that time, we can significantly save costs.

One possible way to do this is to use a fixed schedule to specify the number of server instances required (in some way) for each hour of the day. The difficulty with this approach is that fixed schedules cannot handle random changes: If for some reason today's traffic is 10% larger than yesterday's, then this schedule will not be able to provide additional servers to handle this unexpected load. Similarly, if the peak of traffic is half an hour ahead of schedule, a system based on fixed schedules will not be able to cope.

Instead of using a fixed (time based) plan, we can also consider a rule-based solution: for any given traffic load, we have a rule that specifies how many server instances to use. This scenario is more resilient than a timesheet based scenario, but it requires us to know how many servers are needed to process each type of traffic load. And if the nature of the traffic changes, this can happen, for example, if the proportion of long-running queries increases, what happens? Rule-based schemes will not be able to respond correctly.

Feedback control is a design pattern that is fully capable of addressing all of these challenges. By continuously monitoring some service metrics (such as response time), if the metric value deviates from expectations, make appropriate adjustments (such as increasing or decreasing the server). Because feedback comes from the actual behavior of the controlled system, it can handle even unforeseen events (such as traffic exceeding all expectations). In addition, in contrast to the previously rule-based scheme, feedback control requires little information about the controlled system. The reason for this is that feedback is really self-correcting: continuous monitoring of the quality of service indicators, once it deviates from expectations, will be immediately observed and corrected immediately. And when necessary, the process can be repeated. Simply put: If the response time worsens, the feedback control system simply activates the additional server instance, and if it still does not improve, it adds more server instances. This is the whole process of feedback control.

Feedback control has always been a standard method of mechanical and electrical engineering, but it is rarely used as a design concept in software architecture. It is particularly useful for situations where information is incomplete and random, and it is quite different from a deterministic algorithm solution for computer science. Finally, although the concept of feedback control is simple, in order for it to be effective, deploying a real controller in a production environment still needs to understand and understand some of the actual "tricks". This article will introduce some of the concepts and common difficulties.

The nature of the feedback loop

The following figure shows a basic feedback loop. On the right, we see the control system. Its "output" is related to the quality of service metrics. The value of the metric is continuously supplied to the controller, which is compared to the expected value of the input on the left (the expected values of the system's output metrics are called "settings"). Based on the two inputs (the actual value and expectation of the service Quality index), the controller calculates the appropriate control action of the control system. For example, if the actual value of response time is slower than expected, the control action will include activating an additional server instance.

The diagram shows the general structure of all feedback loops. Its basic components are the controller and the controlled system. Information from the system's output flows through the return path to the controller and is compared to the "set value". Through these two inputs, the controller determines the corresponding control action.

So what does the controller actually do? How does it decide what action should be taken?

To answer this question, we should remember that the main purpose of using feedback control is to reduce the deviation between actual output and expected value of the system. This deviation can be expressed as "tracking error":

Error = actual value-expected

To reduce this error, the controller can do anything it deems appropriate. We certainly have the absolute freedom to design this algorithm, but we need to know some knowledge of the controlled system.

Let's reconsider the data center scenario. We know that increasing the number of servers can reduce the average response time. So, we can choose a control strategy that increases the number of online servers when the actual response time is worse than expected (if the reverse is the case, reduce the number of servers). But in fact we can do better, because this algorithm is only a flag, does not take into account the size of the error. If the tracking error is large, we should make a bigger adjustment. In fact, the usual practice is to direct the control action to the tracking error.

Action = k * ERROR

where k is a numeric value. By choosing this algorithm, large deviations will result in large corrective actions, while small deviations will result in smaller corrections. Both of these are important: big moves are meant to quickly reduce large deviations. But equally important is that when the error is small, the control action should be smaller. We can only do this, the control cycle will tend to a stable state. Otherwise, its behavior will always oscillate around expectations, and we usually want to avoid that.

As we have said before: it is free to choose a specific algorithm for the feedback controller, but it is usually a good idea to keep it simple. The "magic" of feedback control depends on the loop structure of the information flow, not on the particularly complex controllers. In order to allow simpler controllers, feedback control requires a more complex system architecture.

One thing that must be ensured, however, is that the control action must be in the normal direction. To ensure this, we should know something about the behavior of the controlled system. This is usually not a problem, as we know more servers mean faster response times, and so on. But this is a key message that we need to know.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.