Emergency fault handling for Oracle Systems

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Oracle
Oracle Physical Structure Failure treatment method:
An Oracle physical structure failure is a variety of database failures that result from corruption of each physical file that constitutes a database. These failures may be caused by a hardware failure or by a human error. So we first need to determine the cause of the problem, if it is a hardware failure first to solve the hardware problem. In the absence of hardware problems, we can follow the following processing side to further processing.

Control file corruption:
The control file records important configuration information about Oracle, such as database name, character set name, individual data files, location of log files, and so on. Controlling file corruption can cause the database to shut down abnormally. Once the control file is missing, the database does not start, which is a more serious error.
You can locate a corrupted control file by querying the database's log file. The log file is located in $oracle_base/admin/bdump/alert_orcl.ora.

To corrupt a single control file:
1. Make sure that the database is turned off without using the following command to close the database:
Svrmgrl>shutdown immediate;
2. View initialization file $oracle_base/admin/pfile/initorcl.ora, and determine the path to all control files.
3. Overwrite the wrong control file with the other correct control file with the operating system command.
4. Restart the database with the following command
svrmgrl>startup;
5. Use the appropriate method for database full backup.

Damage to all control files:
1. Make sure that the database is turned off without using the following command to close the database:
Svrmgrl>shutdown immediate;
2. Restore the most recent control files from the corresponding backup result set. For points that do not take a library backup, the closest control file backup can be recovered directly from the tape to the appropriate directory, and the appropriate Rman script is used to restore the most recent control file for points with a library backup.
3. Use the following command to create a script that produces a database control file:
Svrmgrl>startup Mount;
Svrmgrl>alter database backup controlfile to trace noresetlogs;
4. Modify the trace file generated in step three to copy and modify some of the statements about creating the control file so that it can reflect the latest database structure. Assume that the resulting SQL file name is createcontrol.sql.
Attention:
The exact path of the trace file can be determined by viewing the $oracle_base/admin/bdump/alert_orcl.ora file after performing the 3rd step.
5. Recreate the control file with the following command:
Svrmgrl>shutdown abort;
Svrmgrl>startup Nomount;
Svrmgrl> @createcontrol. sql;
6. Use the appropriate method for database full backup.

Redo log file corruption:
The database of all the increase, deletion, change will be recorded in the Redo log. If the currently active redo log file is corrupted, it causes the database to shut down abnormally. Inactive redo logs end up with log switching becoming active redo logs, so corrupted inactive redo logs can eventually cause the database to terminate abnormally. Each group of redo logs in Ipas/mswitch has only one member, so in the following analysis only the corruption of the Redo log group is considered, regardless of the corruption of individual redo log members.

Determine the location of the corrupted redo log and its status:
1. If the database is in a usable state:
SELECT * from V$logfile;
Svrmgrl>select * from V$log;
2. If the database is terminated abnormally:
Svrmlgr>startup Mount;
Svrmgrl>select * from V$logfile;
Svrmgrl>select * from V$log;
Where the logfile state is invalid to indicate that the log file has been corrupted; log state is inactive: The Redo log file is inactive; active: Indicates that the Redo log file is in the active state ; Current: Indicates that the redo log is the log file that is currently in use.

The corrupted log file is not active:
1. Delete the corresponding log group:
Svrmgrl>alter database drop logfile Group group_number;
2. Re-create the corresponding log group:
Svrmgrl>alter Database Add log file group Group_number (' Log_file_descritpion ', ...) ) Size log_file_size;

The corrupted log file is active and is not in the current log:
1. Clear the corresponding log group:
Svrmgrl>alter Database Clear unarchived logfile Group Group_number;

The corrupted log file is the current active log file:
To clear the appropriate log group with a command:
Svrmgrl>alter Database Clear unarchived logfile Group Group_number;
If the purge fails, you can only do incomplete recovery based on Point-in-time.
Open the database and use the appropriate method for full database backup:
Svrmgrl>alter database open;

Partial data file corruption:
If the corrupted data file belongs to a non-system tablespace, the database can still be open for operation, except that the corrupted data file is inaccessible. In this case, the corrupted data file can be recovered separately in the database open state. The database system terminates abnormally if the data file in the system table space is corrupted. The database can only be opened in Mount mode before the data file is restored. You can view the database log file to determine whether the currently corrupted data file belongs to the system tablespace.

Corrupted data file in non-system tablespace
1. Determine the name of the damaged file:
Svrmgrl>select name from V$datafile where status= ' INVALID ';
2. The corrupted data file is in the offline state:
Svrmgrl>alter database datafile ' datafile_name ' offline;

3. Restore the most recent backup of this data file from the corresponding backup result set. Points that do not take a library backup can be recovered directly from the tape, and the corresponding Rman script is used for points with a library backup.
4. Recover Data files:
Svrmgrl>alter database recover datafile ' file_name ';
5. Make database file online:
Svrmgrl>alter database datafile ' datafile_name ' online;
6. Use the appropriate method for database full backup.

Corrupted data file for the system tablespace:
1. Start the database by Mount method
Svrmgrl>startup Mount;
2. Restore the most recent backup of this data file from the corresponding backup result set. Points that do not take a library backup can be recovered directly from the tape, and the corresponding Rman script is used for points with a library backup.
3. Restore system table Space:
Svrmgrl>alter database recover datafile ' datafile_name ';
4. Open the database:
Svrmgrl>alter database open;
5. Use the appropriate method for database full backup.

Table Space Corruption:
If the system table space is corrupted, the database can still be open for operation, but the corrupted tablespace cannot be accessed. This allows for a separate recovery of the corrupted tablespace in the case of a database open state. The database system terminates abnormally if the system table space is corrupted. The database can only be opened in Mount mode, and then the table space is restored. You can view the database log file to determine whether the currently corrupted tablespace is the system tablespace.

Non-system table space corruption:
1. Place the damaged tablespace in a offline state:
Svrmgrl>alter tablespace ' tablespace_name ' offline;
2. Restore the most recent backup of this table space from the corresponding backup result set. Points that do not take a library backup can be recovered directly from the tape, and the corresponding Rman script is used for points with a library backup.
3. Restore the tablespace:
Svrmgrl>alter database recover tablespace ' tablespace_name ';
4. Make Table space Online:
Svrmgrl>alter tablespace ' tablespace_name ' online;
5. Use the appropriate method for database full backup.

system table space is corrupted:
1. Start the database by Mount method
Svrmgrl>startup Mount;
2. Restore the most recent backup of the system table space from the corresponding backup result set. Points that do not take a library backup can be recovered directly from the tape, and the corresponding Rman script is used for points with a library backup.
3. Restore system table Space:
Svrmgrl>alter database recover tablespace system;
4. Open the database:
Svrmgrl>alter database open;
5. Use the appropriate method for database full backup.

All files in the entire database are corrupted:
The corruption of all the files in the entire database typically occurs when a shared disk array fails to recover from a disaster, in which case only the database can be recovered. If the archive directory of the database has also been lost, the database will not be able to do a full recovery, with the loss of user data.

No site with a library backup:
1. Unpack the various files from the tape to the appropriate directory with the most recent backup.
2. To open the database by Mount method:
Svrmgrl>startup Mount;
3. Restore the database:
Svrmgrl>recover database until cancel;
4. Open the database:
Svrmgrl>alter database open resetlogs;
5. Use the appropriate method for database full backup.

Live with a library backup:
1. Open the database in Nomount mode:
Svrmgrl>startup Nomount;
2. Soft recovery of database through the corresponding Rman script.
$rman CMDFILE=HOT_DATABASE_RESTORE.RCV
3. Open the database:
Svrmgrl>alter database open resetlogs;
4. Use the appropriate method for database full backup.

There are some classic emergency processing scenarios with the most recent database full cold backups:
Data files, archive redo logs and control files are lost or corrupted at the same time:
Status when no new archives:
Conditions and assumptions: New archive log (s) has not been generated since the last mirrored backup; Archivelog Mode; Mirrored (cold) copies of DataFile (s) and control file (s) with synchronization
Recovery steps:
1. CC The mirrored copy of DataFile (s) and control file (s) back to the original location:
$ cp/backup/good_one.dbf/orig_loc/bad_one.dbf
$ cp/backup/control1.ctl/disk1/control1.ctl
2. Start the database with the Mount option:
$ svrmgrl
Svrmgrl> Connect Internal
Svrmgrl> Startup Mount
3. Restore the database with the old control file:
svrmgrl> Recover database using Backup controlfile until cancel;
Media Restore Complete
(Must cancel immediately)
4. Reset the logfiles (cannot be omitted for startup):
svrmgrl> ALTER DATABASE open resetlogs;
5. Close the database and do a whole-store cold backup.

Status of New archives:
Conditions and assumptions: New archive log (s) has been generated since the last mirrored backup; Archivelog Mode; A mirrored (cold) copy of the synchronized datafile (s) and control file (s); archive log (s) is available.
Recovery steps:
1. If the database is not closed, first close it:
$ svrmgrl
Svrmgrl> Connect Internal
svrmgrl> Shutdown Abort
2. CC The backup file back to the original location:
All database Files
All control files (without archive (s) or redo (s), the update of control files has no meaning)
All on-line Redo Logs (not archives)
Init.ora file (Options)
3. Start the database:
$ svrmgrl
Svrmgrl> Connect Internal
Svrmgrl> Startup

Data files, redo logs and control files are lost or corrupted at the same time:
Conditions and assumptions: Archivelog Mode; Mirror (cold) copy of all lost files that have been synchronized; archive log (s) available
Recovery steps (must take the approach of incomplete recovery):
1. If the database is not closed, first close it:
$ svrmgrl
Svrmgrl> Connect Internal
svrmgrl> Shutdown Abort
2. CC The backup file back to the original location:
All database Files
All Control Files
All on-line Redo Logs (not archives)
Init.ora file (Options)
3. Start the database but does not open:
Svrmgrl>startup Mount
4. Do incomplete database recovery, apply all from the last Mirror (cold) Backup accumulated archives:
svrmgrl> recover database until cancel using Backup controlfile;
......
......
Cancel
5. Reset the logfiles (cannot be omitted for startup):
svrmgrl> ALTER DATABASE open resetlogs;
6. Close the database and do a whole-store cold backup.

Data files and control files are lost or corrupted at the same time:
Conditions and assumptions: Archivelog Mode; Cold copy with synchronized datafile (s) and control file (s); archive log (s) available
Recovery steps:
1. CC Cold copy of Datafiles (s) and control file (s) back to the original location:
$ cp/backup/good_one.dbf/orig_loc/bad_one.dbf
$ cp/backup/control1.ctl/disk1/control1.ctl
2. Start the database with the Mount option:
$ svrmgrl
Svrmgrl> Connect Internal
Svrmgrl> Startup Mount
3. Restore the database with the old control file:
svrmgrl> recover database until cancel using Backup controlfile;
Media Restore Complete
(Cancel after applying the last archive log)
4. Reset the logfiles (cannot be omitted for startup):
svrmgrl> ALTER DATABASE open resetlogs;

When a redo log and control file is lost or corrupted at the same time:
Conditions and assumptions: control Files are all lost or damaged; Archivelog Mode; Mirror (cold) copy with control Files
Recovery steps:
1. If the database is not closed, first close it:
$ svrmgrl
Svrmgrl> Connect Internal
svrmgrl> Shutdown Abort
Svrmgrl>exit
2. Overwrite the corrupted control file with the mirror (cold) Copy of control file:
$ cp/backup/control1.ctl/disk1/control1.ctl
3. Start the database but does not open:
$ svrmgrl
Svrmgrl> Connect Internal
Svrmgrl> Startup Mount
4. Drop Broken redo log (troubleshooting hardware failure):
svrmgrl> ALTER DATABASE drop logfile Group 2;
5. Re-create redo log:
svrmgrl> ALTER DATABASE Add logfile Group 2 '/orig_loc/log2.dbf ' size 10M;
6. Restore the database with the old control file:
svrmgrl> recover database until cancel using Backup controlfile;
(Must cancel immediately)
7. Reset the logfiles (cannot be omitted for startup):
svrmgrl> ALTER DATABASE open resetlogs;
8. Close the database and do a whole-store cold backup

Only occurs when the archive redo log is missing or corrupted:
According to different circumstances and circumstances, choose one of the following means:
A. Backup all datafiles immediately (if the system uses a general hot backup or Rman hot backup)
B. Shut down the database and make a cold backup immediately (if the system uses cold backup)
C. Venture forward! Keep the database running without backing up until the next backup cycle. This is the bet that the database will not have to recover before the next backup cycle arrives.
Note: The risky choice: if you have an error that requires database recovery, you can only revert to the scene of the operation before the problem archive log. From another point of view, when the archive log (s) is problematic, the database itself does not have any problems if it does not need to be restored.

Oracle Logical Structure Failure processing method:
The fault of the logical structure generally refers to the situation that the important data is lost because of the human error operation. In this case the physical structure of the database is complete and consistent. For this situation to take a full recovery of the original database is not appropriate, we generally use three ways to restore user data.

Use the Exp/imp tool to recover user data:
This is only possible if the missing data has a backup that was previously used with the EXP command.
1. Create a temporary user within the database:
Svrmgrl>create user Test_user identified by test;
Svrmgrl>grant Connect,resource to Test_user;
2. In the files backed up from the previous exp command, drop the data in the user's way into the test user:
$imp System/manager file=export_file_name tables= (lost_data_table_name ...) fromuser=lost_data_table_owner touser=test _user constraint=n;
3. Restore the lost data from the test user to the original user with the corresponding DML statement.
4. Delete the test User:
Svrmgrl>drop user Test_user Cascede;

Use Logminer to restore user data:
Logminer is a log analysis tool provided by Oracle. It can analyze online logs and archived logs based on the data dictionary to obtain a history of the various DML operations of the database and fallback information for various DML operations. These users can then rejoin the database for lost data due to misoperation.
1. Confirm that the database Utl_file_dir parameters have been set, if not, you need to add this parameter to the Oracle initialization parameter file, and then restart the database. The following example assumes utl_file_dir= '/OPT/ORACLE/DB01 ';
2. Create the data dictionary information required by the Logminer, assuming that the generated data dictionary text file is Dict.ora:
Svrmgrl>execute dbms_logmnr_d.build (dictionary_filename=> ' Dict.ora ', dictionary_location=> '/opt/oracle/ Db01 ');
3. Determine the required analysis of the log or the scope of the archive log. This can determine the approximate log range based on the time that the user is mistakenly operating. The possible log files are/opt/oracle/db02/oradata/orcl/redo3.log and archived log '/opt/oracle/arch/orcl/orclarc_1_113.ora ' If the user is mistaken.
4. Create a list of the log files to parse by adding in the order of the log files:
Svrmgrl>execute dbms_logmnr.add_logfile (logfilename=> '/opt/oracle/arch/orcl/orclarc_1_113.ora ', options= &GT;DBMS_LOGMNR. NEW);
Svrmgrl> Execute dbms_logmnr.add_logfile (logfilename=> '/opt/oracle/db02/oradata/orcl/redo3.log ', options= &GT;DBMS_LOGMNR. AddFile);
5. Start the log analysis, assuming that the time required for analysis between ' 2003-06-28 12:00:00 ' and ' 2003-06-28 13:00:00 ':
Svrmgrl>execute DBMS_LOGMNR.START_LOGMNR (dictfilename=> '/opt/oracle/db01/dict.ora ', starttime=>to_date ( ' 2003-06-28 12:00:00 ', ' yyyy-mm-dd HH:MI:SS '), Endtime=>to_date (To_date (' 2003-06-28 13:00:00 ', ' yyyy-mm-dd HH:MI: SS '));
6. Obtain analysis results:
Svrmgrl>select Operation,sql_redo,sql_undo from V$logmnr_contents;
7. Repair data based on analysis results.
8. End LOGMNR:
svrmgrl>dbms_logmnr.end_logmnr;
9. Make database full backup of the original database with appropriate method.

Restore user data with backup:
Instead of restoring in the original database, this method uses a database backup to re-establish a new database on the new machine. Restore the database to a user error operation on the new machine through backup, so that the lost data can be recovered to the original database.
1. Install the database software on the new machine.
2. For the site with a library backup, you need to install and debug the appropriate backup tube software on the new database server.
3. Based on Point-in-time database recovery operation According to the time point of user misoperation. For sites that do not have a library backup, you can select the most recent backup tapes to restore before the user's misoperation, and the points that are only used with a library backup can be recovered through Rman scripts that are recovered based on the point-in-time recovery point.
4. Re-open the database:
Svrmgrl>alter database open resetlogs;
5. Retrieve the lost user data from the new database and restore it to the original database through DML operations.
6. Make database full backup of the original database with appropriate method.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Emergency fault handling for Oracle Systems

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Emergency fault handling for Oracle Systems

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support