"Remus:high availability via asychronous Virtual machine Replication" translation

Source: Internet
Author: User

Abstract

It is a very expensive task to allow an application to avoid hardware failures, because it often means refactoring the software to include complex recovery logic while deploying dedicated hardware, which is a huge hurdle to improving the reliability of large or legacy applications. We will then describe a generic, highly available service that can provide protection for existing and unmodified software when the physical machine that it is running fails. Remus provides very strong fault tolerance, which allows a running system to seamlessly migrate to another physical machine in the event of a failure, requiring only a short period of downtime and completely preserving all host State, such as network connectivity, and so on. Our approach is to encapsulate the protected software in a single virtual machine, sending the changed state asynchronously to backup at frequencies up to four times per second, and allowing the current virtual machine to run slightly ahead of the system state of the backup.

1 Introduction

High-availability systems are a very broad concept. But the requirements for reliability are very common, even for system designers who have only modest resources. Unfortunately, it is very difficult to achieve high availability----because it requires the system to maintain redundant components and switch to backup in the event of a failure. The high-availability systems used to protect the modern server's business typically use special hardware, or custom-made software, or both. Either way, it's too complicated and expensive to get the normal server to pass through the fault transparently.

The Remus software system introduced in this paper provides the operating system and application-independent high availability on ordinary hardware. We took advantage of the virtual machine thermal migration technology in virtualization and expanded it to replicate snapshots of the entire operating system at very high frequencies between the two physical machines, often up to 25ms per second. With this technology, our system discretized the operation of the virtual machine into a series of snapshots. An external output, especially a network packet, cannot be sent until the system state in which it was manufactured has not been copied.

Virtualization technology makes it possible to create copies of the entire runtime machine, but it does not guarantee that the process is efficient. It is unrealistic to transfer the changes of each state synchronously, because the replication operation consumes a large amount of bandwidth from the network device. In fact, we allow checkpoint and replicate to run first and asynchronously, and the state of the system is not visible to the outside until checkpoint is committed. We achieve high-speed replication by only allowing the system to run for dozens of milliseconds.

The main contribution of this paper is to provide an example. Replicating the entire system is a well-known approach to providing high availability. However, compared to the checkpoint operation for the application, only some related data needs to be copied. Our approach may bring ha to the public as a platform service to provide virtual machines . The system can provide the same and even better protection for commercial solutions, without the constraints of hardware and software. Many existing systems mirror only persistent storage and require the application to recover from the crash-consistent persistent state. Instead, Remus can guarantee that no other visible state will be lost except at the moment the primary crashes.

1.1 Goals

Remus's goal is to get mission-cirtical availability on low-and middle-end systems. By simplifying configuration and merging many servers into a handful of physical machines, virtualization makes these systems more prevalent than ever. However, the benefits of consolidation also add to the pitfalls of hardware failures. Remus addresses this issue through commercialization of high availability as a service provided by the virtualization platform itself, providing tools for administrators of individual virtual machines to mitigate the risks associated with virtualization.

The implementation of Remus is based on the following high-level objectives:

generality: It is unacceptable to customize a software to support high availability, not to mention that an organization may need to rely on a wide variety of software. To solve this problem, high availability has to be a low-level service that occurs in a common mechanism without having to focus on the protected application and on what hardware it is running.

Transparency: In most real-world situations, the source code of operating systems and applications cannot be obtained and modified. To achieve maximum support for a wide variety of applications, high availability cannot require some functionality, such as fault detection or state recovery, to be provided by modifying the operating system or the application's source code.

Seamless Failure Recovery: When a single machine failure occurs, any externally visible state cannot be lost. In addition, the recovery must be very fast, from the perspective of the external users as if only a short-term loss of packets. A TCP connection that has already been established cannot be lost or reset.

This is a very lofty goal and needs to provide protection far above the ordinary ha system. A normal HA system is simply an asynchronous-based storage image and a recovery code specific to certain applications. at the same time, you want to achieve this level of availability and not modify the code in the virtual machine, which requires a very coarse-grained way to solve the problem . One of the goals of the system in the end, and beds, is to deliver the performance of the deployable level, even in the face of the very common SMP in today's server hardware, while achieving these goals.

1.2 Approach

The Remus runs in paired servers that run in active-passive mode. We use three technologies to solve the problems inherent in this approach. First, we build the system on top of the virtualization infrastructure to enable replication across the system. We then use the speculative execution to improve the performance of the system and decouple the external output from the synchronization point. This allows primary server to remain available, while synchronization with replicated server executes asynchronously. Remus basic execution steps are shown in Figure1.

vm-based whole-system replication. Hypervisor has been used in HA systems before. There, virtualization is used to run the system pair in a lock-step manner, and some additional support is provided to allow virtual machines running on a pair of physics to follow the same path: external events are inserted into the primary and fallback VMs at the same time. So that they are in exactly the same state. There are two issues in enforcing this deterministic implementation. First, this requires a highly specific architecture that allows the system to fully understand the source of the executed instruction set and the external event. Second, there is an unacceptable overhead when executing on multiprocessor systems, where the processor's interaction through shared memory must be handled and propagated precisely.

Speculative execution. Replication can be achieved by duplicating the state of the system or by deterministic repetitive input. We consider the latter to be impractical for real-time operations, especially in multiprocessor environments. Therefore, Remus does not attempt to make calculations deterministic----there is a real possibility that the output generated by the system at a given checkpoint will be different from the output of the system rollback checkpoint, and then repeated input. However, the replica state must be synchronized with primary only if the output of primary is already externally visible . Instead of having the normal output stream cause synchronization to occur, we might as well cache the output until an appropriate point in time and perform some calculations before the synchronization point. This is actually a trade-off between output latency and running overhead, and the middle degree is controlled by the administrator.

asychronous replication. The replication operation is performed asynchronously by caching the output on the primary server. Primary can continue execution after the machine state is acquired without waiting for confirmation at the other end. Overlapping normal execution and copy operations can greatly improve performance. This guarantees a very efficient operation in the case of checkpoint every dozens of milliseconds.

"Remus:high availability via asychronous Virtual machine Replication" translation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.