Implement SLA-based high availability in the cloud for System X and System P

Source: Internet
Author: User

High Availability (HA) is a term often associated with cloud http://www.aliyun.com/zixun/aggregation/13748.html "> Infrastructure solutions, which refers to business continuity and minimal downtime." Specifically, HA in any cloud infrastructure should have the following objectives:

reduce planned downtime to prevent unplanned outages and quickly restore continuous availability from downtime

Supporting the cloud infrastructure is the modern hypervisor, which provides most of the functionality and features of HA. This article outlines how IBM SmartCloud enterprise+ handles planned and unplanned server outages, how to recover from downtime, and how to ensure continuous server availability. This article then describes the HA implementations of virtual machines (VMS) in IBM SmartCloud enterprise+, which run on the VMware and AIX (LPAR) logical partitions on the IBM System X and system P platforms.

Reduce planned downtime

Planned downtime is typically for software maintenance or release, updating, or scheduled maintenance of equipment. Most cloud vendors have planned downtime, but due to the high uptime of the company's operations, planned downtime needs to remain at a minimum level.

IBM SmartCloud Enterprise+ provides an automated way to provide a VM patch and secure and unsecured updates to the OS. It automatically deploys updates according to pre-defined cycles (the customer decides which VMS to install patches during that cycle) without any human intervention. This fully automated patch dramatically reduces the number of planned outages and enables VMs to be available for a long period of time, ensuring business continuity.

Prevent unplanned Downtime

There are a number of reasons why unplanned outages can be caused in a cloud environment. The main causes are virtual machine management program infrastructure failures, OS failures, and network failures.

IBM SmartCloud enterprise+ can handle most common failures in the shortest amount of downtime. As described later in this article, the Monitoring agent on System X and the native daemon on system p can detect OS failures, while the VMware heartbeat interval on system x and some local daemon on system p can detect network failures.

Fast Recovery Downtime

For downtime caused by unplanned outages, the rate of recovery depends on the nature of the failure. Downtime may be caused by a host platform failure or a storage failure, or it may be an OS failure or network failure. If the cloud vendor does not plan properly, downtime caused by a host platform failure or storage failure can result in severe data and run-time loss.

The failover mechanism in IBM SmartCloud enterprise+ enables the system to recover quickly from host platforms and storage failures. All workloads on the failed host platform are assigned to other host platforms, and the downtime is short. The storage failure is handled by the mirrored data store. All data in the VM is replicated in two data stores, and if a database fails, the VM can start and run another replicated data store.

Continuous availability

Reducing planned and unplanned outages, and rapid recovery from downtime, all contribute to continuous availability, which is where servers (in the platform, service cloud) stay active for most of the time and require only very short downtime. Continuous availability can be achieved in the following ways:

appropriately Configure the HA attribute in the underlying virtual machine hypervisor implement certain fault detection monitoring services with the features provided by the operating system to monitor any OS failure application monitoring to help achieve high application availability

IBM SmartCloud Enterprise+ uses most of the HA availability features provided by hypervisor, such as failover mechanisms on host platforms, restart priorities, heartbeat intervals, OS monitoring and fault detection, and panic detection.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.