-- ======================================
-- Oracle instance recovery
-- ======================================
I. Oracle instance failure
Oracle instance failures are mostly caused by instance non-consistent shutdown, which is usually called a crash ). The instance failure result is equivalent to shutdown abort.
Cause of instance failure
Power Load failure
Hardware faults
Background process failed
Shut down database exceptions
Status after instance failure
The database may lose committed transactions and store uncommitted transactions, resulting in Database Inconsistency.
Solution
Use startup to restart the instance. The instance implements automatic recovery and rolls back uncommitted transactions based on online log files.
View alarm logs and tracking logs to find out the cause of the fault.
Ii. Check Points
The checkpoint has been discussed in the architecture. The recovery of the instance is closely related to the checkpoint. Therefore, the checkpoint process is discussed again.
1. What is a checkpoint?
Is a database event used to reduce the crash recovery time. The checkpoint position determines the start position of instance recovery.
Triggered by background processes. When triggered, the ckpt process notifies the dbwn process to write dirty data in the data buffer to the data file.
The ckpt process is also responsible for updating the header information of the data file and the checkpoint information on the control file.
2. Checkpoint Triggering Conditions
Log switching (automatic switch or manual switch)
When the database is shut down with the immediate, transaction, and normal options
Manually trigger (alter system checkpoint)
Alter tablespace tablespace_name begin | end bakcup
Alter tablespace tablespace_name offline
Alter database datafile '<dir> 'offline
Alter tablespace | datafile read only
3. Checkpoint queue
Is a dirty database linked list
Each modified record package in the checkpoint queue has a unique data block identifier (log file number, block number, offset)
The earliest queue will be preferentially written to the data file (regardless of whether the queue is modified multiple times)
The earliest queue will be cleared from the queue after being written.
4. Checkpoint Classification
Full checkpoint
Before Oracle 8i, when a checkpoint occurs, Oracle writes all the data in the dirty buffer list to the data file, which is called a full checkpoint or a regular checkpoint.
Specific trigger conditions
Alter system switch logfile
Shutdown normal, immediate, transactional
Alter system checkpoint
Incremental checkpoint (fast-start checkpoint)
The checkpoint queue mechanism is introduced. Every second, ckpt updates the oldest RBA In the checkpoint queue to the control file. RBA (redo log block address) also serves as the starting point for instance recovery.
Incremental checkpoints are subdivided into full checkpoints so that data can be periodically written to the data file based on the oldest data blocks.
Each dirty block will be moved to the checkpoint queue and arranged according to the redo block address modified by LRBALow RBA for the first time.
The low rba value of the data block written to the checkpoint queue is the smallest. Even if the minimum queue in the queue is modified multiple times, the order of the data block in the checkpoint queue does not change after modification.
When incremental check points are executed, DBWn ensures that the first modified data is written in the order of LRBA In the checkpoint queue to improve the checkpoint.
In this case, the ckpt process uses the lightweight control file update protocol to write the currently lowest RBA to the control file.
Ckpt does not rewrite the checkpoint information and data file header information of the data file in the control file when performing lightweight updates.
Only records the control file checkpoint SCN and writes RBA information based on the incremental checkpoint.
By converting a full checkpoint to an incremental checkpoint, the instance recovery time is greatly shortened.
Note: updating the data file header and control file lags behind the occurrence of checkpoint events.
Trigger of incremental checkpoint
Log_checkpoint_interval, log_checkpoint_timeout,
Value set for fast_start_io_target and fast_start_mttr_target
Minimum Log File Size
Number of dirty blocks in Buffer Cacha
Some checkpoints
Dirty data in the tablespace is written to the disk.
Triggered by alter tablespace tablespace_name offline
5. Differences between full checkpoints and incremental checkpoints
The full checkpoint will write the checkpoint information to the control file and data file at the same time.
The incremental checkpoint only writes RBA to the control file.
6. view the checkpoint information and set the LOG_CHECKPOINTS_TO_ALERT parameter to true.
Alter system set LOG_CHECKPOINTS_TO_ALERT = TRUE;
-- View log_checkpoints_to_alert Parameters
SQL> SHOW PARAMETER log_checkpoints_to_alert
NAME TYPE VALUE
-----------------------------------------------------------------------------
Log_checkpoints_to_alert boolean FALSE
-- Set the log_checkpoints_to_alert Parameter
SQL> ALTER SYSTEM set log_checkpoints_to_alert = TRUE;
System altered.
-- Clear the alarm Log File
SQL> ho cat/dev/null>/u01/app/oracle/admin/orcl/bdump/alert_orcl.log
-- Check that the control file in the data file header is 3037172
SQL> SELECT file #, status, tablespace_name,
2 dbms_flashback.get_system_change_number cur_scn,
3 to_char (resetlogs_time, 'yyyy-mm-dd hh24: mi: ss') rst_dt,
4 resetlogs_change # rst_scn,
5 to_char (checkpoint_time, 'yyyy-mm-dd hh24: mi: ss') ckpt_dt,
6 checkpoint_change # ckpt_scn, checkpoint_count ckpt_cnt
7 FROM v $ datafile_header;
FILE # STATUS TABLESPACE_NAME CUR_SCN RST_DT RST_SCN CKPT_DT CKPT_SCN CKPT_CNT
--------------------------------------------------------------------------------------------------------------
1 online system 3037641 2837290 11:59:23 3037172 19:05:30 531
2 ONLINE UNDOTBS1 3037641 11:59:23 2837290 19:05:30 3037172
3 online sysaux 3037641 11:59:23 2837290 19:05:30 3037172
4 online users 3037641 11:59:23 2837290 19:05:30 3037172
5 online example 3037641 11:59:23 2837290 19:05:30 3037172
6 ONLINE TBS1 3037641 11:59:23 2837290 19:05:30 3037172
7 ONLINE TBS1 3037641 11:59:23 2837290 19:05:30 3037172
7 rows selected.
SQL> save/u01/app/oracle/oradata/query_1. SQL;
Created file/u01/app/oracle/oradata/query_1. SQL
SQL> ALTER SYSTEM SWITCH LOGFILE; -- SWITCH logs
System altered.
SQL> ho cat/u01/app/oracle/admin/orcl/bdump/alert_orcl.log | more -- View alarm logs
Sun Jul 25 19:14:29 2010
Beginning log switch checkpoint up to RBA [0xd. 2.10], SCN: 3037657
Thread 1 advanced to log sequence 13
Current log #3 seq #13 mem #0:/u01/app/oracle/oradata/orcl/redo3a. rdo
Current log #3 seq #13 mem #1:/u01/app/oracle/oradata/orcl/redo3b. rdo
SQL> @/u01/app/oracle/oradata/query_1. SQL; -- the data file header is the same as the SCN recorded in the alarm log after a minute of lag.
FILE # STATUS TABLESPACE_NAME CUR_SCN RST_DT RST_SCN CKPT_DT CKPT_SCN CKPT_CNT
--------------------------------------------------------------------------------------------------------------
1 online system 3037803 2837290 11:59:23 3037657 19:14:29 532
2 ONLINE UNDOTBS1 3037803 11:59:23 2837290 19:14:29 3037657
3 online sysaux 3037803 11:59:23 2837290 19:14:29 3037657
4 online users 3037803 11:59:23 2837290 19:14:29 3037657
5 online example 3037803 11:59:23 2837290 19:14:29 3037657
6 ONLINE TBS1 3037803 11:59:23 2837290 19:14:29 3037657
7 ONLINE TBS1 3037803 11:59:23 2837290 19:14:29 3037657
SQL> SELECT TO_CHAR (sysdate, 'yyyy-mm-dd hh24: mi: ss') FROM dual; -- Time Lag minute 19:19:59-19:14:29
TO_CHAR (SYSDATE, 'yy'
-------------------
2010-07-25 19:19:59
SQL> ALTER SYSTEM CHECKPOINT; -- generate a CHECKPOINT
System altered.
SQL> ho cat/u01/app/oracle/admin/orcl/bdump/alert_orcl.log | more -- view the SCN: 3037881 in the alarm log
Sun Jul 25 19:14:29 2010
Beginning log switch checkpoint up to RBA [0xd. 2.10], SCN: 3037657
Thread 1 advanced to log sequence 13
Current log #3 seq #13 mem #0:/u01/app/oracle/oradata/orcl/redo3a. rdo
Current log #3 seq #13 mem #1:/u01/app/oracle/oradata/orcl/redo3b. rdo
Sun Jul 25 19:19:34 2010
Completed checkpoint up to RBA [0xd. 2.10], SCN: 3037657
Sun Jul 25 19:21:55 2010
Beginning global checkpoint up to RBA [0xd. 116.10], SCN: 3037881
Completed checkpoint up to RBA [0xd. 116.10], SCN: 3037881
SQL> @/u01/app/oracle/oradata/query_1. SQL; -- the data file header is synchronized with the SCN of the alarm log, which is 3037881
FILE # STATUS TABLESPACE_NAME CUR_SCN RST_DT RST_SCN CKPT_DT CKPT_SCN CKPT_CNT
--------------------------------------------------------------------------------------------------------------
1 online system 3037890 2837290 11:59:23 3037881 19:21:55 533
2 ONLINE UNDOTBS1 3037890 11:59:23 2837290 19:21:55 3037881
3 online sysaux 3037890 11:59:23 2837290 19:21:55 3037881
4 online users 3037890 11:59:23 2837290 19:21:55 3037881
5 online example 3037890 11:59:23 2837290 19:21:55 3037881
6 ONLINE TBS1 3037890 11:59:23 2837290 19:21:55 3037881
7 ONLINE TBS1 3037890 11:59:23 2837290 19:21:55 3037881
-- View full checkpoints
SQL> SELECT addr, indx, rtckp_scn,
2 rtckp_tim,
3 rtckp_rba_seq, rtckp_rba_bno
4 FROM x $ kccrt;
Addr indx RTCKP_SCN RTCKP_TIM RTCKP_RBA_SEQ RTCKP_RBA_BNO
--------------------------------------------------------------------------------
B7D59C10 0 3037881 07/25/2010 19:21:55 13 278
SQL> show parameter log_check -- view the value of log_checkpoint_timeout
NAME TYPE VALUE
-----------------------------------------------------------------------------
Log_checkpoint_interval integer 0
Log_checkpoint_timeout integer 1800
Log_checkpoints_to_alert boolean TRUE
Ho cat/u01/app/oracle/admin/orcl/bdump/alert_orcl.log -- Incremental checkpoint in the alarm Log File
----------------------------------------------------------------------------------
Sun Jul 25 19:22:46 2010
Incremental checkpoint up to RBA [0xd. 119.0], current log tail at RBA [0xd. 119.0]
Sun Jul 25 19:52:51 2010
Incremental checkpoint up to RBA [0xd. 37a. 0], current log tail at RBA [0xd. 420.0]
---------------------------------------------------------------------------------
SQL> select CPDRT, CPLRBA_SEQ | '.' | CPLRBA_BNO | '.' | CPLRBA_BOF "Low RBA ",
2 CPODR_SEQ | '.' | CPODR_BNO | '.' | CPODR_BOF "On disk RBA", CPODS, CPODT, CPHBT
3 from x $ kcccp where indx = 0; -- get the incremental checkpoint information in the control file
CPDRT Low RBA On disk RBA CPODS CPODT CPHBT
-----------------------------------------------------------------------------
97 13.5574.0 13.6391.0 3041226 07/25/2010 22:16:37 725323317
The -- CPDRT column is the number of dirty blocks in the checkpoint queue.
-- The CPODS column is the scn of on disk rba.
-- The CPODT column is the timestamp of on disk rba.
-- The CPHBT column is the heartbeat
Iii. instance recovery
1. When you enable the non-consistent shutdown or shutdown abort database, the instance will be restored.
2. The instance recovery process is automatic.
3. Use information in online redo log files to synchronize data files
4. Two different operations are involved.
Roll-forward: Restore the data file to the status before the instance fails.
Rollback: data that has been modified but has not been submitted will be removed to the status before modification.
Iv. Process of instance recovery
The following picture is from the official Oracle textbooks.
650) this. width = 650; "src =" http://www.bkjia.com/uploads/allimg/131229/204S0M11-0.jpg "title =" 1.jpg" alt = "165908555.jpg"/>
1. First, Oracle compares the checkpoint information in the control file with the header information of the data file and finds that the data is inconsistent.
2. From the last checkpoint to the end of the log file, it will be re-applied to the data file and the undo information (rollback) will be generated. This stage is also called cache recovery
3. The data file contains submitted or unsubmitted data. Even if unsubmitted data exists, the database is opened and users are allowed to connect.
4. uncommitted transactions will be rolled back
5. The data file contains only submitted data.
5. Adjust instance recovery
1. Set the appropriate size for the number of online log records and data blocks that have an impact on the recovery process in the parameter file.
2. Adjust the size of online log files to affect the checkpoint occurrence frequency.
3. Use SQL commands to create a checkpoint event
4. Use Fast-start fault recovery
5. Restore related parameters
LOG_CHECKPOINT_TIMEOUT --> interval between two checkpoints (in seconds). This parameter is rarely used now.
LOG_CHECKPOINT_INTERVAL --> Number of redo block data blocks between two checkpoints (not db_block ),
-- Redo block size = OS block size this parameter is rarely used now
FAST_START_MTTR_TARGET --> specify how long it will take to restore the instance (unit: seconds) (This will be discussed later)
RECOVERY_PARALLELISM --> specifies the concurrency during the pre-rolling process.
FAST_START_PARALLEL_ROLLBACK --> pre-UNDO required block in the rollback phase, and then increase the rollback concurrency
-- We recommend that you set 2-way CPU to LO and 4-way CPU to HI. Otherwise, the default value is false.
FAST_START_IO_TARGET --> the number of I/O operations required for recovery when the database goes down, which is rarely used after 10 GB
Vi. View related to instance recovery
V $ INSTACE_RECOVERY --> View fast_start_mttr_target settings and MTTR information.
V $ FAST_START_SERVERS --> related concurrency information during transaction rollback
V $ FAST_START_TRANSACTION --> information about the transaction being recovered
Full checkpoint
Select * from X $ KCCRT where indx = 0;
Incremental checkpoint
SQL> select * from X $ KCCCP where indx = 0;
VII. instance recovery demonstration
-- Delete alarm logs
SQL> ho rm-f/u01/app/oracle/admin/orcl/bdump/alert_orcl.log
SQL> SELECT * FROM scott. emp WHERE ename = 'Scott ';
EMPNO ENAME JOB MGR HIREDATE SAL DEPTNO
----------------------------------------------------------------------------------------
7788 scott analyst 7566 19-APR-87 7100 20
-- Update SCOTT's salary and submit the transaction
SQL> UPDATE scott. emp SET sal = sal/2 WHERE ename = 'Scott ';
1 row updated.
SQL> COMMIT;
Commit complete.
-- Insert two new records without committing transactions
SQL> INSERT INTO scott. emp (empno, ename, job) SELECT '000000', 'mark', 'develer' FROM dual;
1 row created.
SQL> INSERT INTO scott. emp (empno, ename, job) SELECT '000000', 'Mary ', 'designer' FROM dual;
1 row created.
SQL> SELECT * FROM scott. emp WHERE empno IN (2001,2002 );
EMPNO ENAME JOB MGR HIREDATE SAL DEPTNO
----------------------------------------------------------------------------------------
2001 Mark Develpoer
2002 Mary Designer
-- Disable the instance and restart the instance. The instance will be automatically rolled back.
SQL> SHUTDOWN ABORT;
ORACLE instance shut down.
SQL> STARTUP
ORACLE instance started.
Total System Global Area 251658240 bytes
Fixed Size 1218796 bytes
Variable Size 88082196 bytes
Database Buffers 159383552 bytes
Redo Buffers 2973696 bytes
Database mounted.
Database opened.
-- Obtain rollback information from the alarm log
SQL> ho ls/u01/app/oracle/admin/orcl/bdump
Alert_orcl.log orcl_arc0_4016.trc orcl_arc1_4018.trc orcl_lgwr_3995.trc
SQL> ho cat/u01/app/oracle/admin/orcl/bdump/alert_orcl.log
----------------------------------------------------------------
ALTER DATABASE MOUNT
Thu Jul 22 12:44:40 2010
Setting recovery target incarnation to 10
Thu Jul 22 12:44:40 2010
Successful mount of redo thread 1, with mount id 1252833332
Thu Jul 22 12:44:40 2010
Database mounted in Exclusive Mode
Completed: ALTER DATABASE MOUNT
Thu Jul 22 12:44:41 2010
ALTER DATABASE OPEN
Thu Jul 22 12:44:41 2010
Beginning crash recovery of 1 threads -- start crash recovery
Thu Jul 22 12:44:41 2010
Started redo scan -- scan redo
Thu Jul 22 12:44:42 2010
Completed redo scan
142 redo blocks read, 58 data blocks need recovery
Thu Jul 22 12:44:42 2010
Started redo application
Thread 1: logseq 4, block 3156
Thu Jul 22 12:44:42 2010
Recovery of Online Redo Log: Thread 1 Group 3 Seq 4 Reading mem 0
Mem #0 errs 0:/u01/app/oracle/oradata/orcl/redo3a. rdo
Mem #1 errs 0:/u01/app/oracle/oradata/orcl/redo3b. rdo
Thu Jul 22 12:44:42 2010
Completed redo application
Thu Jul 22 12:44:43 2010
Completed crash recovery at -- restore
Thread 1: logseq 4, block 3298, scn 2921577
58 data blocks read, 58 data blocks written, 142 redo blocks read
Thu Jul 22 12:44:43 2010
LGWR: STARTING ARCH PROCESSES
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid = 17, OS id = 4018
ARC0 started with pid = 16, OS id = 4016
Thu Jul 22 12:44:45 2010
Thread 1 advanced to log sequence 5
Thread 1 opened at log sequence 5
Current log #1 seq #5 mem #0:/u01/app/oracle/oradata/orcl/redo1a. rdo
Current log #1 seq #5 mem #1:/u01/app/oracle/oradata/orcl/redo1b. rdo
Successful open of redo thread 1
Thu Jul 22 12:44:45 2010
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set -- FAST_START_MTTR_TARGET is not set
Bytes --------------------------------------------------------------------------------------------------
-- The scott user has been submitted, so it is submitted after recovery
SQL> SELECT * FROM scott. emp WHERE ename = 'Scott ';
EMPNO ENAME JOB MGR HIREDATE SAL DEPTNO
----------------------------------------------------------------------------------------
7788 scott analyst 7566 19-APR-87 3550 20
-- The newly added two records are not submitted and are rolled back after the instance recovers
SQL> SELECT * FROM scott. emp WHERE empno IN (2001,2002 );
No rows selected
8. Set the FAST_START_MTTR_TARGET parameter.
/*
The FAST_START_MTTR_TARGET parameter is used to reduce the cache recovery time.
After the FAST_START_MTTR_TARGET value is set, the Write Attempt of the incremental checkpoint in database management reaches the set target recovery time.
If the set value is reasonable, the entire recovery process is close to the set time.
Note: When the FAST_START_MTTR_TARGET parameter is used, the FAST_START_IO_TARGET, LOG_CHECKPOINT_INTERVAL,
LOG_CHECKPOINT_TIMEOUT parameter. If these parameters are set, the cache rediscovery will be prevented from meeting the specified FAST_START_MTTR_TARGET value.
A reasonable time value should be set for FAST_START_MTTR_TARGET.
The default value is 0, indicating that the automatic check point adjustment function is disabled.
The maximum value is 3600. When the set value is greater than 3600, it is automatically rounded to 3600.
The minimum value is 1. When it is set to 1, in fact it is not practical. Therefore, the recovery time cannot reach the set target value */
-- Set the value of fast_start_mttr_target to 0.
SQL> alter system set fast_start_mttr_target = 0;
System altered.
SQL> CREATE TABLE tb_test AS SELECT * FROM all_objects WHERE 1 = 2; -- CREATE a new TABLE
Table created.
SQL> INSERT INTO tb_test SELECT * FROM all_objects; -- INSERT record to new table
49945 rows created.
-- The following query shows that ESTIMATED_MTTR is 28.
SQL> SELECT recovery_estimated_ios, actual_redo_blks, target_mttr, estimated_mttr,
2 optimal_logfile_size FROM v $ instance_recovery;
RECOVERY_ESTIMATED_IOS ACTUAL_REDO_BLKS TARGET_MTTR ESTIMATED_MTTR OPTIMAL_LOGFILE_SIZE
-----------------------------------------------------------------------------------
762 11661 0 28
SQL> COMMIT; -- submit a transaction
Commit complete.
SQL> SELECT recovery_estimated_ios, actual_redo_blks, target_mttr, estimated_mttr,
2 optimal_logfile_size FROM v $ instance_recovery;
RECOVERY_ESTIMATED_IOS ACTUAL_REDO_BLKS TARGET_MTTR ESTIMATED_MTTR OPTIMAL_LOGFILE_SIZE
-----------------------------------------------------------------------------------
767 11669 0 28
-- As we can see from the above, commit only updates the content of the log buffer to the log file
SQL> ALTER SYSTEM CHECKPOINT; -- manually update the CHECKPOINT
System altered.
SQL> SELECT recovery_estimated_ios, actual_redo_blks, target_mttr, estimated_mttr,
2 optimal_logfile_size FROM v $ instance_recovery;
RECOVERY_ESTIMATED_IOS ACTUAL_REDO_BLKS TARGET_MTTR ESTIMATED_MTTR OPTIMAL_LOGFILE_SIZE
-----------------------------------------------------------------------------------
0 0 0 28
-- The preceding query shows that the RECOVERY_ESTIMATED_IOS and ACTUAL_REDO_BLKS fields have been reduced to 0.
-- When the checkpoint is generated, the dirty content in the database buffer is written to the data file.
-- ESTIMATED_MTTR does not change because the column is not updated in real time.
Oracle video tutorial follow: http://u.youku.com/user_video/id_UMzAzMjkxMjE2.html