Ctrip yesterday did not open the site of the server failure, I believe we also see Ctrip Science and Technology building all night to accelerate the renovation of the magnificent scene, then for server downtime How to do? This article small make up to teach everybody server failure contingency plan.
First distinguish the factors that cause the server to fail:
1. External attack
2. Internal attack
3. Operation Dimension Misoperation
What about server downtime? server Failure Contingency plan
Both external and internal failures, good backup and redundancy measures can minimize downtime.
Backup issues Although it sounds incredible, in practice, many enterprises have not established a set of tested backup systems. The point of backup is that the production system can be quickly restored or rebuilt at critical times. In a corporate network, the recurring problem is actually:
A flaw in the backup step resulted in the correct backup process not being completed
Subsequent backup failures due to depleted storage space after a limited amount of storage space
Backup media is compromised and cannot recover successfully
Traditionally, tape has become an ideal backup medium because of its low cost and high storage density. However, several of the fatal drawbacks of this traditional backup media often make the data contained in it inaccessible:
Missing Tape Index card
Tape media is susceptible to external magnetic field in stored procedures
The media itself is damaged
The read device was corrupted during media read
In addition, tape backup media itself is stored in a tape warehouse, and the time consuming to retrieve the required backup tapes from the warehouse, transfer to the data center, and reload the data is often objective.
Even a backup system is still not able to withstand all accidents. A fire in the Samsung data center halted service of its cloud services in 2014. Without offsite backups, the fire will make it extremely difficult to restore local backup.
Redundancy is important for sudden-onset events, as soon as possible, or for ongoing service delivery. This month, a well-known payment company caused a period of interruption in service due to a network connectivity failure in the data center. If there is a better redundancy scheme, the impact of such an accident will be reduced, and will even dissolve into an imperceptible internal accident.
Most servers have two independent PSU, any PSU failure does not affect its normal service; In general, the server's two PSU will be connected to the two different circuits or uninterruptible power supply to avoid mains failure; Data center power supply with UPS and diesel generators to avoid interruption of service caused by power generation companies without notice of stop power supply services. As well as the network, access to multi-channel ISP line, and its independent wiring, at the same time in many lines to announce the address, it can make the network service robustness is higher.
In the system's perspective, only the simultaneous configuration of backup and redundancy scheme can improve usability and avoid the long service interruption caused by uncontrollable factors.
What about the server downtime? Contingency plan for server failure to introduce you here
Note : More wonderful tutorials Please pay attention to the triple computer tutorial section, triple Computer office group: 189034526 welcome you to join