Reflection on Ctrip event: it is time to pay attention to database disaster recovery!

Source: Internet
Author: User

Reflection on Ctrip event: it is time to pay attention to database disaster recovery!

Last week, a major news in the internet technology circle was the paralysis of a tourism portal website. It is said that this situation was caused by the physical deletion of its background database, resulting in all data loss. At the time of the author's publication, the business of the website has not been restored, and this has occurred for more than 10 hours. It is estimated that the loss of this accident will exceed 10 million yuan per hour.

No matter whether the accident was an operational error, a device fault, or someone deliberately did it, the database caused the entire website crash and people began to pay attention to and pay attention to database security and high availability. In this case, I will take this incident as an introduction to some views on database data disaster recovery.

"What is database disaster recovery"

Database disaster recovery refers to the simultaneous operation of a database and another database in another different location at the same time. It is completely synchronized with the currently running database in real time. In this way, when the primary database has any problems, the database in another location can take over the business immediately, so that the entire business will not be interrupted.

In many traditional databases, disaster recovery is a very mature technology. For example, Oracle's DataGuard and DB2's HADR are all well-known cross-data center database disaster recovery solutions. Databases that use the disaster recovery mechanism, when the data center of the primary system fails, whether the system is killed or the entire data center is collapsed by an earthquake ), the standby system can perceive and take over data services in a short time, so that online services can continue to run.

From a technical perspective, the traditional disaster recovery solution generally sends logs from the master database to the slave database in sequence and re-executes operations on the master database. In this way, there may be only a very small latency between the master database and the slave database, basically achieving synchronization. However, in many new distributed databases, disaster recovery solutions are not as popular as traditional databases. Therefore, data security risks are also an important factor for many enterprises to consider using the new distributed database.

"Considerations for database disaster recovery"

In addition to the basic data replication function, what other important considerations does the disaster recovery solution have?

1) latency between the master and slave Databases. Since the master and slave databases are deployed in different data centers, Internet latency is a factor that must be considered. The lower the latency between the master and slave databases, the less data is lost when the master database fails. For example, if the latency between the master and slave databases can be reduced to less than one second, when the system where the master database is located experiences a man-made or uncontrollable disaster, the data loss caused by switching between the master and slave databases is limited to one second. In this way, compared with the paralysis of the entire portal website, the loss suffered by the enterprise is almost negligible.

2) low bandwidth usage. Generally, the network bandwidth between the master and slave data centers is very expensive. Because the network between the master and slave data centers is generally cross-wan, the bandwidth capacity cannot be assumed to be 1 Gigabit or 10 Gigabit bandwidth as the LAN does. Therefore, the number of data channels during network transmission and the compression ratio during data transmission are very important indicators.

3) Secure Transmission Channels. Since data is transmitted across wide area networks, can someone set up an sniffer outside the data center intercept our network communication? If the communication between the master and slave nodes is always in plain text, this is a very important security risk. Therefore, whether the data communication between the master and slave data centers is encrypted is the third important security indicator.

These important factors of cross-Data Center replication have been paid great attention to in SequoiaDB Enterprise Edition. In an environment that ensures a smooth network, using SequoiaDB for remote cross-Data Center replication can limit the latency of the master and slave databases to several seconds. At the same time, the SSL data channel works with data compression, this ensures efficient, secure, and reliable data transmission.

"Accident prevention also requires sound permission management"

Of course, apart from technology, many disaster scenarios are produced by humans. Strict encryption measures cannot prevent DBAs from selling data from the inside, and the strict firewall cannot block the rm-rf of the system administrator. Therefore, it is necessary to train, monitor, and isolate internal employees.

For example, for database administrators, it is recommended that the master and slave databases be managed by different teams. For example, two data centers in Beijing and Shanghai can be operated by different DBA teams. Even if one team has problems, it will not affect the O & M system of the other team.

Second, many enterprises attach great importance to data backup, but a team may be able to manage and modify backup files at will. In this case, IT is assumed that the members of this team are damaged internally and all production systems and backups are destroyed, which will completely paralyze the entire enterprise IT system. Therefore, how to isolate the management of historical and online data is also an important management measure for enterprise internal security processes.

Finally, I deeply sympathize with the faults on the website, and hope that other companies will learn from them, pay more attention to data security, and avoid repeated accidents.
 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.