: Download full MP4 video
1. Summary of Churyang
Enterprise Cloud can start with disaster preparedness cloud
To make applications highly available and resilient, the cloud platform itself should provide disaster preparedness and HA mechanisms for infrastructure and cloud services, such as:
- Cloud platform itself: cloud-based distributed storage, virtual machine ha, controller disaster Recovery, SDN network disaster recovery, virtual machine data protection
- Cloud delivers disaster-resilient cloud services
–EC2 (virtual machine can drift), EC2 instance Recovery
–ami (virtual machine template backup)
–ebs, EBS snapshot/recovery
–S3 (copy of data)
–elb (load balanced to multiple EC2 instances)
–rds (read/write separation, primary master Deployment)
–cloudformation (Multi-zone consistent deployment)
–autoscale (Auto scaling server scale)
–EIP (EC2 instance IP can drift)
The cloud can provide resiliency for physical machine applications, such as:
– Put applications, database data into S3 (may need to be developed) to provide high-reliability assurance
– Put data into a disaster recovery service (such as a service for love) and restore to a local when needed
– P2V Physical machines (for example, services with love numbers) to EC2 and restore to the cloud when needed
– Replicate database data to RDS in the cloud (using MySQL, Oracle, SQL Server replication tools)
Enterprise implementation of cloud disaster preparedness can be "gradual" way
- Simple Backup to recover data
- Light (apply cold Backup)
- Warm standby (Application run resources in the cloud that store and subtract)
- Thermal standby (1:1 hot spares in the cloud and physical machines)
2. Disaster tolerance and Availability Overview 2.1 data Center comparison of cloud and traditional methods
Traditional Methods |
Cloud |
Data centers create disaster preparedness systems that are costly |
Low cost: Lower total cost of ownership |
Tools and processes for storage, archiving, backup, and recovery incur costs |
Unified Process Tools: Highly scalable storage, unified process tools across AWS regions and availability zones |
Capacity design planning, equipment procurement, etc. face challenges |
Elasticity: No need for detailed planning |
2.2 What can cause a system outage?
Including:
-Artificial (more than 60% reasons)
-Crash stop
-Equipment failure
-Natural disasters
-Safety incidents
- ...
Typical man-made accidents: (although there is a A-b backup mechanism) someone with root permission to delete the file permissions in a (result rm-rf/). Although the B machine takes over, but the busy error when B recovers a, with a recovered B
2.3 High-availability ha and fault-tolerant FT definitions
- are part of a business continuity plan
- Not with or without, but can be customized
- What you want to do when it happens: business 24x7, ensuring data security, recovering application operations after a major disaster
2.4 Sequence diagram for disaster backup
Recovery time Objective (how long does the RTO recover after a problem?) ), Recovery point Objective (which original point of time can the RPO be restored to?) )
Traditional IT approach: building a disaster preparedness center in another physical environment
-Low-end disaster preparedness: offsite backup of the same city
-High-end disaster preparedness: two-way live disaster preparedness
3 using AWS to implement disaster preparedness 3.1 benefits of AWS backup and disaster preparedness
Reliable/durable, simplified infrastructure, pay-per-demand, global deployment, high scale/easy scaling, security
3.1 High-availability targets
Reduce disaster preparedness budget 50%, reduce on-premises IT systems, no need for a second physical center, no tape libraries
3.2 Basic principles of disaster tolerance design
- Objective: The application can continue to function even if the physical hardware is broken, removed, or replaced
- Principle: Avoid single point of failure, assume all devices will be bad, ready for recovery process
3.3 High availability and fault-tolerant services
- Most AWS services have high availability and are fault tolerant
–S3, SQS, RDS, DynamoDB, ELB, Router53
- Services can be used to build high-availability app apps
– Availability zone design, ELASTICIP, EBS, EBS Snapshot, ELB, as, Router53
–EC2 (reserved instance), AMIs
3.4 AWS's Global Infrastructure
- 11 Regional regions (one State in the United States, China, Europe, etc.) these areas have security requirements for data (for example, data in China is not available elsewhere and accounts are independent)
- 30 Availability Zones Availableility Zones (each region, with multiple availability zones), equivalent to the same city, no more than 100 km between each availability zone, and latency less than 3 milliseconds
- 53 X Edges
3.5 Disaster preparedness at AWS-Multiple availability zones
- High-availability design of AWS Services
- Deployment of EC2, RDS Multi-AZ (Muli-az)
- Using an EIP (elastic IP)
- Using Elb and AS
- Use an AMI to meet RTO requirements
- EC2 (reserved instance – Great discount, guaranteed for instance)
3.6 Disaster Recovery at AWS 2-cross-region deployment (equivalent to remote offsite, three centers in two places)
- Automating deployment with Cloudformation
- Cross-region S3, EBS snapshots, AMI replication
- Block Storage Replication tool (Common tools: Rsync, Xcopy, robcopy)
- Route53, autoscaling
- Cross-region RDS and MySQL replicas
- Establish a cross-region master-master database (database comes with tools: MySQL multi-master, MS SQL always on Cluster, Oraclemulti-master)
3.7 AWS Storage Options
- EBS (High-performance block storage)
- S3 (High extended object storage, 11 9 persistence)
4 Common disaster Preparedness architecture Patterns 4.1 AWS Disaster Preparedness Model Overview
AWS Cloud Storage is ideal for backup and disaster preparedness
- S3 (Cloud storage)
– High Persistent object storage
– Life Cycle Management
– Ideal for backup archiving
- EBS (Cloud Drive)
– Persistent storage used for EC2 instances
– Replication in a single availability zone (AZ)
– Mirroring provides persistent backups, shared replication within availability zones (regions)
- Glacier (Glacier)
– Long-term archive storage
– Recovery takes 3-5 hours
4.2 Backup and Recovery modes
A. Features
- Benefits of simple backup and recovery
– Simple and Quick Start
– Backup costs for the base (mainly storage costs)
- Preparation for backup and recovery
– Back up existing systems
– Store backups in S3
– Familiar with process steps to restore from backup
B. Backing up to the cloud
- Multiple storage methods to S3
- Direct connection to AWS Direct Connect
- Over the internet
- AWS Import/export (Rough Mail Drive to AWS, export by admin)
- AWS Storage Gateway
C. Backup from the Cloud
- Export data from OS-level application data
- Create an instance from the AMI and put the data in EBS
- Copy data from S3 to EBS
- Then copy to the on-premises data center via AWS Import/export
- or via AWS Direct Connect direct Connection copy
- or copy to Data center via AWS Storage Gateway
4.3 Indicator (warning light) architecture
A. Features
Focus only on backing up your core data and preparing your deployment architecture (but not running) through, for example, Amis, Cloudformation, and so on, when disaster occurs, restore data to AWS resources such as Ec2,ebs
B. LED Architecture
Use the AMI to prepare the Web and AppServer for running, while running with a small RDS instance (because it's just data backup)
C. Indicators – Actions after a failure
EC2 instances in the cloud can be started immediately after a problem with the app in the main datacenter
4.4 Warm Standby architecture
A. Features
-Build an environment that is similar to the production system environment but scaled down. Extend AWS Resources to meet production needs when a disaster occurs
-Cost is more expensive than "light", but cheaper than "hot standby" because it is smaller than production resources, so it is called warm instead of "hot" (1:1) Standby architecture
B. Warm standby Architecture
Keep the resources of the production environment running in the cloud in proportion to the scaled down operation (e.g., the production environment is 4cpu+8g memory, the cloud can be 2cpu+4g memory)
C. Warm standby – action after a failure
The standby environment can be run immediately after a problem in the production environment, and the processing power can be increased as needed via scale up or scale out
4.5多-point multi-live architecture
A. Features
-All production system data is synchronized/asynchronous to the offsite data center, using an RI (reserved instance) instance to ensure capacity
-The Master System can be on the enterprise physical machine or in the cloud
B. Multi-point multi-live architecture
A production system in the physical data center has a full backup ready in the cloud
C. Multi-point and multi-live architecture – one more disaster recovery for applications in the cloud
Systems in the cloud can be re-prepared (2 regions)
5. Through the process demo a disaster preparedness system to create 5.1 disaster preparedness targets
- The system uses as many AWS services as possible
- Use license of existing data whenever possible
- The system does not generate revenue and minimizes costs
- The system has a large load change and needs to be scaled according to traffic
- Up to 99.99% service levels, and automatic fail-over redundancy
- Rpo:0-2 minutes, rto:0-15 minutes
5.2 Scheduling of migration services
Disaster preparedness targets |
AWS Services that you use |
dns/Domain Name server |
Amazon Router53 |
Load Balancer |
Elastic loadbalancing |
web/Application Server |
AutoScaling |
Database server |
Multi-node, cluster deployment |
Certified Directory Server |
Multi-node deployment |
Data center Failure |
Multiple availability Zone (AZ) deployment |
Disaster accidents |
Multi-zone (region) deployment |
5.3 AWS-related services
5.4 Region Selection
Choose between Singapore and Tokyo 2 regions (region)
5.5 Setting up VPC and subnets
VPC is also a highly available architecture that establishes multiple connection points through multiple VPN instances (you can establish a connection across vpcs)
You also need to consider the high availability of VPN connections, so you can prepare 2 VPN instances to ensure connectivity for two VPCS
5.6 Select Web, app, and DB instance types
EC2 instance types include: common type, compute optimized (high frequency), storage and I/O optimization (high performance Local disk), GPU (with GPU), memory optimized (high compute and memory ratio)
5.7 Consider domain name services and load Balancing services
- Domain Name Service can use ROUTER53 (global use of the same), load balancer can use ELB (each of the different zones)
- Connected instances are deployed in Tokyo and Singapore (region), and are deployed in 2 availability zones (AZ) in Tokyo
5.8 Deployment of the RDP Gateway server (bastion machine)
- Place the EC2 instance running the RDP gateway in a public VPC, equivalent to a bastion machine
- Remote access to EC2 instances in VPC of various private networks must pass these bastion machines
5.9 domain (AD) Login permissions Control
AD is deployed in Tokyo and Singapore region respectively, and in 2 availability zones (AZ) in Tokyo
5.10 Web and application servers
- Production environment deployed in 2 AZ in Tokyo
- Then copy to Singapore via AMI-warm standby mode
5.11 Setting up a database
Use Microsoft's WSFC and SQL AlwaysOn to secure multi-AZ and multi-zone data replication for SQL databases
5.12 Deploy one more ad backup in a large area
Deploy ad's multi-datacenter architecture using the ad's cluster configuration tool
5.13 final cross-region architecture
5.14 Failover-specific applications
Selective deployment: For example, when the Tokyo metropolitan area in addition to the problem, the Singapore region in the default state to run some content initialization, etc., you can consider using RI (reserved instance) to do
5.15 Switch to AZ2 if there is a problem with the AZ1 in the region
5.16 assume a cross-region failover occurs if the whole region is out of the question
6 Summary 6.1 Benefits of using AWS for cloud resiliency
- Cloud own services with usability design, off-the-shelf services
- Implement fine-grained control to achieve RTO/RPO tradeoffs
- Easy deployment and testing of disaster preparedness systems and plans
- Achieve global deployment
- Many ecological partners
- Outside ..... Implement A/B testing, grayscale deployment, etc.
6.2 Step-up for high availability and disaster backup
- Start with traditional data center disaster preparedness
- Progressive Cloud disaster Preparedness
- Backup, archive, and restore with S3
- LED Disaster preparation
- Warm disaster preparedness
- Thermal disaster Preparedness
- Migrating production systems to the cloud and disaster preparedness in the local machine room
- Increase heat recovery in the cloud
- Enhance high availability for major applications
- Achieve high levels of disaster preparedness across regions with multi-region enhanced availability
"Summary" leverages AWS for high availability and cloud disaster preparedness