In disaster-tolerant design, a clear idea is needed to help us both consider the big picture and take care of the details. It is necessary to take the business demand as the lead, not to talk about the specific function of a product. I have summarized the following three steps:
A deep understanding of business needs
The figure above shows some business Parameters. Excerpted from this article.
We focus on a few of these elements:
RTO (Recovery time Objective): After a disaster, it requires that the application be restored within that period.
RPO (RecoveryPoint objective): A time period in which data can be tolerated after a disaster occurs.
Theoretically, of course, the disaster-recovery program supports RTO and RPO as small as possible, but it is not necessary to pursue the minimum value, which is the overengineering of the so-called "high cost". A good architect should provide a solution to meet your needs from the perspective of the customer.
When communicating with customers, be sure to ask why, what is the value of RTO and RPO? Many times you will find that no one can speak clearly. This needs to start with the application. For example, some applications have already achieved high availability, such as Mscluster, LVS and so on, support the application of the infrastructure do not have to think too much disaster tolerance. Most of the time hypervisor yourself ha can be satisfied.
Risk
Considered from the severity (Severity) and probability (Likehood). For example, financial institutions are very demanding, and one of my clients is unable to accept the huge loss caused by system downtime. So they require Zerorto and zero RPO after the risk assessment.
Ii. Consideration of factors affecting the design of key architectures (architecture decisions)
Site:
Local: Some disaster-tolerant programs are implemented locally to meet customer needs
Dedicated DR Sites: the need for specialized drsite is determined by the company's IT strategy and sustainable development. Of course, the cost of the impact is very large.
Shared Dr Site: Sharing Dr Site is a disaster-tolerant and may have other uses as well.
Cloud Based Recovery: Can consider cloud service provider's disaster tolerance plan. For example, the VMware Hybrid Cloud (VCHS) has recently launched a specific disaster-tolerant program.
Storagereplication
Software: Full use of software to achieve data synchronization, do not rely on sanreplication.
SAN based: Most high-end storage devices themselves support the replication of sanbased. If you have a special need, you can also use software to achieve advanced sanreplication. Like EMC Recovery Point.
Network between data centers
Dr Dedicated: exclusively for DR
MPLS: public.
Measure whether the disaster-tolerant solution meets RTO and RPO requirements based on bandwidth and synchronized data volumes