Disaster Recovery for ORALCE databases

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

With the rapid development of office automation and electronic commerce, the dependence of enterprise on information system is more and more high, and database plays an important role as the core of information system. Especially in some industries with high data reliability requirements such as banks, securities, telecommunications and so on, if there is an accidental downtime or loss of data will be very heavy. This database administrator should develop a detailed database backup and disaster recovery strategy for specific business requirements, and rigorously test every possible situation by simulating failures to ensure high availability of data. Database backup is a long-term process, and recovery is only after an accident, recovery can be seen as the reverse process of backup, the degree of recovery depends largely on the situation of backup. In addition, the correctness of the steps taken by the database administrator during the recovery has a direct impact on the final recovery results, this article mainly for the Oracle database may encounter a variety of failures to provide a corresponding recovery method, for everyone's reference.
To have a clear understanding of Oracle database Backup and recovery, it is necessary to have a good understanding of several running states of the database. The operating state of an Oracle database is divided into 3 main categories, which are:
L Nomount (not installed) Oracle simply reads the configuration information in the INI file and initializes the SGA area.
L mount (install) Oracle reads the INI file as well as read the control file and obtains information about the physical structure of the database.
L Open Database to check that all files are at the same point in time, restore errors to rollback for unfinished transactions, and eventually allow users to access.

Database backup mainly divided into three types: cold backup, hot backup; logical backup;
The backup of the database is not the focus of this article, here is a brief overview, Oracle database backup mainly:
Cold backup is a database full backup that includes all data files, control files, online log files, and INI files in the state of shutting down the database.
Hot backup refers to backing up the data files and control files while the database is running, and you must run the database in the (Archive Log) archive mode to use hot backups.
L Export (logical backup) This is the simplest backup method that can be exported by a table, a user, or an entire database in a database, and supports full, cumulative, and incremental three ways. With this method, the database must be open, and if the database is not in the restrict state it will not guarantee the consistency of the exported data.

The recovery of database can be divided into two main categories: complete recovery; incomplete recovery;
Full recovery refers to restoring the database to a point in time of failure without losing any data. Incomplete recovery refers to restoring a database to a point in time before a failure occurs, and all changes after this point in time will be lost. If there are no special requirements, we recommend that full recovery should be used as far as possible.
The Oracle Database recovery process takes place in two steps, starting with all the redo in the redo log file to the data file, and then rolling back all uncommitted transactions in the redo so that all data is restored to the moment of disaster. The recovery of a database can only be done on the data file before the failure, restoring it to the point of failure, and the data file cannot be rolled back to a previous time. For example, we have a 2001/1/1 database backup, and when 2001/5/1 brings us to the confusion of data in the database and wants to restore the database to 2001/4/30, we can only recover the 2001/1/ 1 database backup and then use the redo record on it to roll it forward to the 2001/4/30 state, instead of rolling back the 2001/5/1 database to 2001/4/30.
In order to design the database recovery scheme, we first classify the errors that may be encountered, the Oracle database errors are divided into 5 main categories:
L SQL statement failed
L Thread Failure
L Instance failed
L User Action failed
L Storage Device failed
If the first three failures occur without our human intervention, the Oracle system will automatically recover. For user-operated failures (such as mistakenly deleting data), the main remedial action we take is to import the most recent logical backup or to an incomplete recovery at a point in time. A tablespace-based point-in-time recovery (TSPITR) has been introduced from a new version after Oracle 8 to restore the tablespace containing the error actions to a specified time, without having to perform incomplete recovery of the entire database. You can also consider using Logminer to generate reverse SQL when the error action is found to be timely and the amount of data is small.

The failure of the storage device is more complex is the focus of this article, the failure of the storage device will inevitably make the file placed on it become unavailable, we first of all the Oracle database involved in a partition, the main can be divided into:
Oracle's system files, refers to Oracle's running files, various applications
L Database Control files
L database redo log files online
L Data Files
L Archive Log files
Avoiding the first file failure relies primarily on system administrators for operating system-level backups, which can only be recovered by operating system backups after an incident occurs.
The control file records the structure of the entire database, the status of each data file, the system SCN, checkpoint counters, and other important information, in the creation of the database will allow the user to specify three locations to store the control files, they mirror each other, when any one of them fails, You can restart the data by simply annotating it from the INI file and commenting out the failed data file. When all controls are invalidated, create controlfile can be executed in Nomount mode to regenerate the control file, but redo Log,data file, filename and address, and maxlogfiles,maxdatafiles must be provided. Maxinstances and other information. If ALTER DATABASE backup Controlfile to trace or ALTER DATABASE backup Controlfile to ' is run before failure xxx ' Backs up the control files, can be rebuilt using the generated script or overwritten with backup files, if the old control file is used to restore using the Recover XXX using Backup controlfile option on recovery. and use the Resetlogs option to open the database.
If the missing is an online log file, 1, missing, inactive log files are handled in two cases; 2, the currently active log file is missing.
If this is the first case, the failed log filegroup has more than one member, you can shutdown the database, and then use the operating system command to overwrite the corrupted log member file with the corrupt member file (each of the log files in the same log member group is mirror-like). If its physical location is not available, copy it to a new drive, change the file location using ALTER DATABASE rename file ' xxxx ' to ' xxxx ', and then start the database if a cold backup is done normally. If there is only one log member in the corrupted log group, mount the database first, convert it to Noarchivelog mode, execute ALTER DATABASE add logfile member ' xxx ' to group ' X ' adds a member to the related group, then executes ALTER DATABASE drop logfile ' Bad_ File ' will delete the corrupted log file, because the database structure changes need to back up the control file, then change the database back to Archivelog mode, do a cold backup.
If the current active log file is missing, the database is not mirrored and all members of the current log group become unavailable. First, the database is shutdown abort, all the data files are recovered from the most recent full backup, and the database is booted to the Mount State. If the original log file physical location is not available, change the location of the file using ALTER DATABASE rename file ' xxx ' to ' xxx '. Then, use the Recover DB until cancel command to recover the database until you are prompted to use the last archive log, and then enter Cancel. Then open the database with ALTER DATABASE open resetlogs, and if there is no problem, make a cold backup immediately. Attention! All information contained in the corrupted redo log will be lost, meaning that data submitted before the database crash is likely to be lost. This can be costly for some demanding applications, so try to have multiple log members per log group and place them on a different drive to prevent media failure.
Data file failure is also divided into several situations, 1, loss of data files contained in the system table space, 2, the loss of the non-rollback segment of the system data files, 3, missing a rollback segment of the non-system data files.
If the corrupted data file is the system tablespace. The only way is to recover the corrupted data file from the previous backup (if the original location is not available to change the location of the new file using the ALTER DATABASE Rename command), and then execute the recover in the state of the database Mount database/ DataFile the database to open the database by replying to it. Note: When the system table space or its data files are offline, the database cannot be opened, so all recovery efforts must be completed in the Mount state.
When a missing data file does not belong to the system tablespace and does not include a rollback segment, there are options to recover in both states of the database---in the state of the database open or in the database mount. If a user is anxious to access data in the undamaged part of the database or restore a corrupted data file for a long time, you can take the damaged data file offline, open the database to the user, and then recover the corrupted data file and then bring it back online. The steps are as follows: When the database is mount, the relevant data file or TABLESPACE is taken offline ALTER DATABASE datafile XXX offline, and then the databases open so that the part of the database that is not damaged can be accessed by the user before the recover Datafile/tablespace, when finished, use ALTER DATABASE datafile/tablespace ' xxx ' online to bring it back online to be accessed. Of course, users can also choose to be in the database Mount state, with recover database/datafile all the recovery work done, all the data files open together for users to access.
If the missing data file is the last case, a non-system Tablespace data file with a rollback segment is included. You can also choose whether you want to restore in the open state of the database or in the Mount state. Unlike the previous case, however, when the data file containing the rollback segment is corrupted, if the first offline opens the database, the tables that are involved in uncommitted transactions before all database crashes are inaccessible, meaning that the objects involved before the rollback segment recovery are not allowed to be accessed. And when all data files containing rollback segments are in the offline state, the database cannot perform any DML operations, so you can create several temporary rollback segments for data to use the Create rollback when you restore the data file that contains the rollback segment in database open state segment Temp1 tablespace system; Alter rollback segment TEMP1 online; When the data file is restored, delete the alter rollback segment TEMP1; Drop rollback segment Temp1;. Note: When you use this method to make the recovered data file online, all the original rollback segments will be in offline state, and you must manually use alter ROLLBACK segment RBSXX online to bring them back online so that they can be used properly by the database. This step is not required if all recoveries are completed under the database Mount State.
If the data file is lost, the user discovers a backup of the data file before the failure, and has not opened the database since the missing data file was first established without using the resetlogs option. That is, the user's control file was created before the corrupted data file was established, and the archive log contains all the redo records for the corrupted data file. There is also a recovery method that allows the user to take the corrupted data file or tablespace offline ALTER DATABASE datafile/tablespace XXX offline before executing ALTER DATABASE create DataFile ' new/ Xxx.dbf ' as ' old/xxx.dbf ', the database rebuilds an empty data file based on the information saved in the control file, then executes the recover tablespace/datafile to apply all the redo records to the data file so that it is fully restored to the current state. You can then bring it back online again.
If the missing is the last file---the physical location of the archive or archive is not available, first shutdown the database and immediately makes a cold backup. Then modify the archive log file destination path in the INI file to restart the database. A disaster in the future just restore the relevant files from the most recent backup, and the database does not need to back up the lost archive when it is recover. Parameters such as Log_archive_duplex_dest and log_archive_dest_1...5 are provided in a new release after Oracle 8 to allow for the reliability of the archive by keeping multiple archives in different locations, or even to remote servers.

Finally, a few points to note when database recovery:
1. This article discusses the default premise for all cases that the database is running in an archive (Archivelog) mode and involves only common scenarios and most basic recovery methods. Using the recovery manager provided by Oracle, Rman can do the same, and it is recommended that Rman be used for backup and recovery if the running environment is more complex.
2. Once the database is in disaster, it is best to make a full cold backup before the recovery so that errors can be remedied when the recovery occurs. A large part of the data loss is caused by improper recovery operations.
3. Once the database has been restored, especially after using the Resetlogs option to open the database, close the database immediately for a full cold backup. Because, to prevent discarded redo logs from being used again the next time you restore, the resetlogs option will re-create the Redo log file and clear its count to zero, which will make all previously made backups unusable (in general).
4. Special attention should be paid to complete recovery of the database, from the failure of the time before the backup to recover the damaged file, do not make the backup of the redo log file cover the current redo log file, otherwise you can only do incomplete recovery and to lose part of the data.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Disaster Recovery for ORALCE databases

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Disaster Recovery for ORALCE databases

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support