Cloud design Mode (18)--Retry mode

Source: Internet
Author: User


When an application is enabled to handle an expected, transient failure, it tries to connect to the expected failure by a transparent retry operation, which is caused by a transient service or network resource. This mode can improve the stability of the application.

Background and issues


The communication of the application with the elements running in the cloud must be sensitive to transient failures that can occur in such environments. These failures include the presence of a network connection during a service that is busy with instantaneous loss of components and services, temporary unavailability of services, or timeouts.

These faults are generally self-correcting, and it is likely to be successful if a faulty action is repeatedly triggered by a suitable delay. For example, the database service, which is processing a large number of concurrent requests, can implement throttling policies, temporarily rejecting any further requests until its workload has eased. An application that tries to access the database may not be able to connect, but it may succeed if it tries again with an appropriate delay.

Solution Solutions


In the cloud, transient fault situations are not uncommon and applications should be designed to gracefully and transparently handle them, reducing the impact that this failure may have on the application being performed on business tasks.

If an application detects a failure, it tries to send the request to the remote service, and it can handle the failure by using the following policy:
• If the fault indicates that the fault is not instantaneous or not successful, the application should abort the operation and report an appropriate exception if it repeats (for example, the failed authentication that caused the failure to provide credentials is not successful, no matter how many attempts are attempted).
• If the specific fault reported is unusual or rare, this may be due to abnormal conditions such as network packets becoming corrupted while it is being sent. In this case, the application can retry the failed request again immediately, because the same failure is not possible to be duplicated and the request will likely be successful.
• If a failure is caused by a more common connection, or a "busy" failure, the network or service may need to be in a short period of simultaneous connection problems corrected or the backlog of work is cleared. The application should wait for a suitable time before the request is retried.

For a more common short-term failure, the retry period should be selected to propagate requests from multiple instances that are as homogeneous as possible from the application. This can reduce the likelihood of a busy business continuing to overload. If multiple instances of an application constantly bombard the service with the retry request, it may take longer for the service to recover.

If the request still fails, the application can wait for a further period and try again. If necessary, the process can repeat and increase the retry delay until a maximum number of requests has failed. The delay time can be incrementally increased, or a timing strategy that can be used, such as exponential fallback, depends on the nature and likelihood of the failure, which will be corrected during this time period.

Figure 1 shows this pattern. If the requested number of requests after the attempt is unsuccessful, the application should fail with the exception and process it accordingly.

Figure 1-Invoking the managed services operation in retry mode


The application should exchange code for all attempts to access the remote service, implementing a retry policy that includes one of the policies listed above. Sending to different service requests is subject to different policies, and some vendors provide encapsulation of this method library. The policies that these libraries typically perform are parameterized, while application developers can specify values such as the number of retries and the time entry between retries.

The code of the application that should log the details of these failures should be logged in the detection of failures and retries of failed operations. This information may be a useful operator. If a service is frequently reported as unavailable or busy, it is often because the service has exhausted its resources. You can reduce the frequency with which these failures occur when the service is converted. For example, if the database service is constantly overloaded, it may be advantageous for the partitioned database and the load to spread across multiple servers.

Attention:

Microsoft Azure provides extensive support for the retry mode. This mode and practice transient fault handling block allows applications to handle many azure service transient failures through a series of retry policies. Microsoft Entity Framework version 6 provides an attempt to retry the database operation. In addition, many APIs in Azure Service bus and azure storage transparently perform retry logic.

Issues and considerations


When deciding how to implement this pattern, you should consider the following points:
• The retry policy should be adjusted to meet the business needs of the application and the nature of the failure. It may be better for some non-critical operations to fail quickly rather than retry several times and affect the throughput of the application. For example, in an interactive Web application that tries to access a remote service, this may be a better retry after retrying with only a short delay between the number of fewer failures and display an appropriate message to the user (for example,"Please later"), try again to prevent the application from becoming unresponsive. For batch applications, it can be more appropriate to increase the delay between the number of retry attempts and the exponential increase between attempts.
• The policy of a high attack retry with minimal delay between attempts and a large number of retries may further reduce the footprint that is nearing operation or capacity. This retry policy may also affect the response of the application if it is constantly trying to perform a failed operation instead of doing it diligently.
• If after a significant number of retry requests still fail, it may be better for the application to prevent further requests going to be in the same resource for one cycle, and simply report the failure immediately. When the term expires, the application can temporarily allow one or more requests to see if they are successful. For more information on this strategy, see circuit breaker patterns.
• The operation of a service called by an application that implements a retry policy that may need to be idempotent. For example, a request sent to a service can be received and processed successfully, but it may not be able to send a response due to a transient failure, indicating that the processing is complete. Then the retry logic in the application may attempt to repeat the request without receiving the assumption of the first request.
• A request-to-service failure may be different due to various reasons, depending on the nature of the failure. Some exceptions can indicate a failure and can be resolved very quickly, while others may indicate that the failure lasts longer. It may be beneficial to retry the policy to adjust the time between retry attempts based on the type of the exception described.
• The operation that considers how to retry is part of a transaction that affects the consistency of the overall transaction. This may be useful to fine-tune the retry policy for transactional operations, maximize the chances of success, and reduce the need to undo all trading steps.
• Make sure that all retry codes are tested completely for various fault conditions. Check that it does not severely affect the performance or reliability of your application, resulting in excessive load on services and resources, or generating race conditions or bottlenecks.
• The implementation understands the retry logic only in the full aspect of a failed operation. For example, if a retry policy task that contains a call to another task also contains a retry policy, this extra retry of the layer can be extended with deferred processing. It may be better to configure a low-level task to fail quickly and report a failure to return the reason for calling its task. The higher-level task can then decide what to do with the failure of its own strategy.
• Log all connection failures, prompting for retries, so that potential problems with the application, services or resources can be identified is important.
• Research is most likely to occur in a service or resource discovery if they are likely to be persistent or terminal failure. If this is the case, it may be better to handle the fault as an exception. The application can report or record the exception, and then try to invoke another service, either continuously or (if there is one available), or by providing a downgrade feature. For more information on how to detect and handle persistent failures, see circuit breaker patterns.

when to use this mode

Use this mode:
• When an application may experience a transient failure because it interacts with a remote service or accesses a remote resource. These failures are expected to be short-lived and duplicate requests that have not previously succeeded in subsequent attempts.

This mode may not be suitable for:
• When the failure is likely to be persistent, as this may affect the responsiveness of the application. The application can simply be a waste of time and resources trying to repeat a request is most likely to fail.
• For handling faults, it is not an internal exception that causes errors due to transient failures, such as business logic in the application.
• As an alternative solution to the scalability issues in the system. If an application has frequent "busy" failures, it is usually indicated that the service or resource being accessed should be increased accordingly.

Example


This embodiment illustrates the implementation of the retry mode. The Operationwithbasicretryasync method, as shown below, invokes the external service asynchronously through the Transientoperationasync method (the details of the method are specific to the service and are omitted from the sample code).

[CSharp]View Plaincopy
  1. Private int retrycount = 3;
  2. ...
  3. Public async Task Operationwithbasicretryasync ()
  4. {
  5. int currentretry = 0;
  6. for (;;)
  7. {
  8. Try
  9. {
  10. //calling external service.
  11. await Transientoperationasync ();
  12. //Return or break.
  13. Break ;
  14. }
  15. catch (Exception ex)
  16. {
  17. Trace.traceerror ("Operation Exception");
  18. currentretry++;
  19. //Check If the exception thrown was a transient exception
  20. //Based on the logic in the error detection strategy.
  21. //Determine whether to retry the operation, as well
  22. //Long to wait, based on the retry strategy.
  23. if (currentretry > this.retrycount | |! Istransient (ex))
  24. {
  25. //If This is a transient error
  26. //or we should not retry re-throw the exception.
  27. throw;
  28. }
  29. }
  30. //Wait to retry the operation.
  31. //Consider calculating an exponential delay here and
  32. //Using a strategy best suited for the operation and Fault.
  33. Await.Task.Delay ();
  34. }
  35. }
  36. Async method that wraps a call to a remote service (details not shown).
  37. Private async Task Transientoperationasync ()
  38. {
  39. ...
  40. }


The declaration that calls this method is wrapped in a loop that is encapsulated in a try/ catch block. If the call to the Transientoperationasync method succeeds, no exception is thrown for the for loop to exit. If the Transientoperationasync method fails, the catch block checks for the reason for the failure, and if it is considered a transient error code, wait for a short delay, and then retry the operation.

The For loop also tracks the number of times that the operation has been attempted, and if the code fails three times the exception is considered more durable. If the exception is not temporary, or is long, the catch handles the thrown exception. This exception exits the For loop, and the code that calls the Operationwithbasicretryasync method should be captured.

The Istransient method, as shown below, checks whether there is a specific set of exceptions that are related to the environment in which the code is running. An over-exception definition can vary depending on the resource being accessed and the operating environment on which it is executed.

[CSharp]View Plaincopy
  1. Private bool Istransient (Exception ex)
  2. {
  3. //Determine if the exception is transient.
  4. //In some cases this is as simple as checking the exception type
  5. //Cases it may is necessary to inspect and other properties of the exception.
  6. if (ex is operationtransientexception)
  7. return true;
  8. var webexception = ex as WebException;
  9. if (webexception! = null)
  10. {
  11. //If The Web exception contains one of the following status values
  12. //It may be transient.
  13. return new[] {webexceptionstatus.connectionclosed,
  14. Webexceptionstatus.timeout,
  15. Webexceptionstatus.requestcanceled}.
  16. Contains (Webexception.status);
  17. }
  18. //Additional exception checking logic goes here.
  19. return false;
  20. }


This article is translated from msdn:http://msdn.microsoft.com/en-us/library/dn589788.aspx

Cloud design Mode (18)--Retry mode

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.