Oracle instance recovery

Source: Internet
Author: User

-- ======================================

-- Oracle instance recovery

-- ======================================


I. Oracle instance failure

Oracle instance failures are mostly caused by instance non-consistent shutdown, which is usually called a crash ). The instance failure result is equivalent to shutdown abort.

Cause of instance failure

Power Load failure

Hardware faults

Background process failed

Shut down database exceptions

Status after instance failure

The database may lose committed transactions and store uncommitted transactions, resulting in Database Inconsistency.

Solution

Use startup to restart the instance. The instance implements automatic recovery and rolls back uncommitted transactions based on online log files.

View alarm logs and tracking logs to find out the cause of the fault.


Ii. Check Points

The checkpoint has been discussed in the architecture. The recovery of the instance is closely related to the checkpoint. Therefore, the checkpoint process is discussed again.

1. What is a checkpoint?

Is a database event used to reduce the crash recovery time. The checkpoint position determines the start position of instance recovery.

Triggered by background processes. When triggered, the ckpt process notifies the dbwn process to write dirty data in the data buffer to the data file.

The ckpt process is also responsible for updating the header information of the data file and the checkpoint information on the control file.

2. Checkpoint Triggering Conditions

Log switching (automatic switch or manual switch)

When the database is shut down with the immediate, transaction, and normal options

Manually trigger (alter system checkpoint)

Alter tablespace tablespace_name begin | end bakcup

Alter tablespace tablespace_name offline

Alter database datafile '<dir> 'offline

Alter tablespace | datafile read only

3. Checkpoint queue

Is a dirty database linked list

Each modified record package in the checkpoint queue has a unique data block identifier (log file number, block number, offset)

The earliest queue will be preferentially written to the data file (regardless of whether the queue is modified multiple times)

The earliest queue will be cleared from the queue after being written.

4. Checkpoint Classification

Full checkpoint

Before Oracle 8i, when a checkpoint occurs, Oracle writes all the data in the dirty buffer list to the data file, which is called a full checkpoint or a regular checkpoint.

Specific trigger conditions

Alter system switch logfile

Shutdown normal, immediate, transactional

Alter system checkpoint

Incremental checkpoint (fast-start checkpoint)

The checkpoint queue mechanism is introduced. Every second, ckpt updates the oldest RBA In the checkpoint queue to the control file. RBA (redo log block address) also serves as the starting point for instance recovery.

Incremental checkpoints are subdivided into full checkpoints so that data can be periodically written to the data file based on the oldest data blocks.

Each dirty block will be moved to the checkpoint queue and arranged according to the redo block address modified by LRBALow RBA for the first time.

The low rba value of the data block written to the checkpoint queue is the smallest. Even if the minimum queue in the queue is modified multiple times, the order of the data block in the checkpoint queue does not change after modification.

When incremental check points are executed, DBWn ensures that the first modified data is written in the order of LRBA In the checkpoint queue to improve the checkpoint.

In this case, the ckpt process uses the lightweight control file update protocol to write the currently lowest RBA to the control file.

Ckpt does not rewrite the checkpoint information and data file header information of the data file in the control file when performing lightweight updates.

Only records the control file checkpoint SCN and writes RBA information based on the incremental checkpoint.

By converting a full checkpoint to an incremental checkpoint, the instance recovery time is greatly shortened.

Note: updating the data file header and control file lags behind the occurrence of checkpoint events.

Trigger of incremental checkpoint

Log_checkpoint_interval, log_checkpoint_timeout,

Value set for fast_start_io_target and fast_start_mttr_target

Minimum Log File Size

Number of dirty blocks in Buffer Cacha

Some checkpoints

Dirty data in the tablespace is written to the disk.

Triggered by alter tablespace tablespace_name offline

5. Differences between full checkpoints and incremental checkpoints

The full checkpoint will write the checkpoint information to the control file and data file at the same time.

The incremental checkpoint only writes RBA to the control file.

6. view the checkpoint information and set the LOG_CHECKPOINTS_TO_ALERT parameter to true.

Alter system set LOG_CHECKPOINTS_TO_ALERT = TRUE;

-- View log_checkpoints_to_alert Parameters

SQL> SHOW PARAMETER log_checkpoints_to_alert


NAME TYPE VALUE

-----------------------------------------------------------------------------

Log_checkpoints_to_alert boolean FALSE


-- Set the log_checkpoints_to_alert Parameter

SQL> ALTER SYSTEM set log_checkpoints_to_alert = TRUE;


System altered.

-- Clear the alarm Log File

SQL> ho cat/dev/null>/u01/app/oracle/admin/orcl/bdump/alert_orcl.log

-- Check that the control file in the data file header is 3037172

SQL> SELECT file #, status, tablespace_name,

2 dbms_flashback.get_system_change_number cur_scn,

3 to_char (resetlogs_time, 'yyyy-mm-dd hh24: mi: ss') rst_dt,

4 resetlogs_change # rst_scn,

5 to_char (checkpoint_time, 'yyyy-mm-dd hh24: mi: ss') ckpt_dt,

6 checkpoint_change # ckpt_scn, checkpoint_count ckpt_cnt

7 FROM v $ datafile_header;


FILE # STATUS TABLESPACE_NAME CUR_SCN RST_DT RST_SCN CKPT_DT CKPT_SCN CKPT_CNT

--------------------------------------------------------------------------------------------------------------

1 online system 3037641 2837290 11:59:23 3037172 19:05:30 531

2 ONLINE UNDOTBS1 3037641 11:59:23 2837290 19:05:30 3037172

3 online sysaux 3037641 11:59:23 2837290 19:05:30 3037172

4 online users 3037641 11:59:23 2837290 19:05:30 3037172

5 online example 3037641 11:59:23 2837290 19:05:30 3037172

6 ONLINE TBS1 3037641 11:59:23 2837290 19:05:30 3037172

7 ONLINE TBS1 3037641 11:59:23 2837290 19:05:30 3037172

7 rows selected.


SQL> save/u01/app/oracle/oradata/query_1. SQL;

Created file/u01/app/oracle/oradata/query_1. SQL

SQL> ALTER SYSTEM SWITCH LOGFILE; -- SWITCH logs


System altered.


SQL> ho cat/u01/app/oracle/admin/orcl/bdump/alert_orcl.log | more -- View alarm logs

Sun Jul 25 19:14:29 2010

Beginning log switch checkpoint up to RBA [0xd. 2.10], SCN: 3037657

Thread 1 advanced to log sequence 13

Current log #3 seq #13 mem #0:/u01/app/oracle/oradata/orcl/redo3a. rdo

Current log #3 seq #13 mem #1:/u01/app/oracle/oradata/orcl/redo3b. rdo


SQL> @/u01/app/oracle/oradata/query_1. SQL; -- the data file header is the same as the SCN recorded in the alarm log after a minute of lag.


FILE # STATUS TABLESPACE_NAME CUR_SCN RST_DT RST_SCN CKPT_DT CKPT_SCN CKPT_CNT

--------------------------------------------------------------------------------------------------------------

1 online system 3037803 2837290 11:59:23 3037657 19:14:29 532

2 ONLINE UNDOTBS1 3037803 11:59:23 2837290 19:14:29 3037657

3 online sysaux 3037803 11:59:23 2837290 19:14:29 3037657

4 online users 3037803 11:59:23 2837290 19:14:29 3037657

5 online example 3037803 11:59:23 2837290 19:14:29 3037657

6 ONLINE TBS1 3037803 11:59:23 2837290 19:14:29 3037657

7 ONLINE TBS1 3037803 11:59:23 2837290 19:14:29 3037657


SQL> SELECT TO_CHAR (sysdate, 'yyyy-mm-dd hh24: mi: ss') FROM dual; -- Time Lag minute 19:19:59-19:14:29


TO_CHAR (SYSDATE, 'yy'

-------------------

2010-07-25 19:19:59


SQL> ALTER SYSTEM CHECKPOINT; -- generate a CHECKPOINT


System altered.


SQL> ho cat/u01/app/oracle/admin/orcl/bdump/alert_orcl.log | more -- view the SCN: 3037881 in the alarm log

Sun Jul 25 19:14:29 2010

Beginning log switch checkpoint up to RBA [0xd. 2.10], SCN: 3037657

Thread 1 advanced to log sequence 13

Current log #3 seq #13 mem #0:/u01/app/oracle/oradata/orcl/redo3a. rdo

Current log #3 seq #13 mem #1:/u01/app/oracle/oradata/orcl/redo3b. rdo

Sun Jul 25 19:19:34 2010

Completed checkpoint up to RBA [0xd. 2.10], SCN: 3037657

Sun Jul 25 19:21:55 2010

Beginning global checkpoint up to RBA [0xd. 116.10], SCN: 3037881

Completed checkpoint up to RBA [0xd. 116.10], SCN: 3037881


SQL> @/u01/app/oracle/oradata/query_1. SQL; -- the data file header is synchronized with the SCN of the alarm log, which is 3037881


FILE # STATUS TABLESPACE_NAME CUR_SCN RST_DT RST_SCN CKPT_DT CKPT_SCN CKPT_CNT

--------------------------------------------------------------------------------------------------------------

1 online system 3037890 2837290 11:59:23 3037881 19:21:55 533

2 ONLINE UNDOTBS1 3037890 11:59:23 2837290 19:21:55 3037881

3 online sysaux 3037890 11:59:23 2837290 19:21:55 3037881

4 online users 3037890 11:59:23 2837290 19:21:55 3037881

5 online example 3037890 11:59:23 2837290 19:21:55 3037881

6 ONLINE TBS1 3037890 11:59:23 2837290 19:21:55 3037881

7 ONLINE TBS1 3037890 11:59:23 2837290 19:21:55 3037881


-- View full checkpoints

SQL> SELECT addr, indx, rtckp_scn,

2 rtckp_tim,

3 rtckp_rba_seq, rtckp_rba_bno

4 FROM x $ kccrt;


Addr indx RTCKP_SCN RTCKP_TIM RTCKP_RBA_SEQ RTCKP_RBA_BNO

--------------------------------------------------------------------------------

B7D59C10 0 3037881 07/25/2010 19:21:55 13 278


SQL> show parameter log_check -- view the value of log_checkpoint_timeout


NAME TYPE VALUE

-----------------------------------------------------------------------------

Log_checkpoint_interval integer 0

Log_checkpoint_timeout integer 1800

Log_checkpoints_to_alert boolean TRUE

Ho cat/u01/app/oracle/admin/orcl/bdump/alert_orcl.log -- Incremental checkpoint in the alarm Log File

----------------------------------------------------------------------------------

Sun Jul 25 19:22:46 2010

Incremental checkpoint up to RBA [0xd. 119.0], current log tail at RBA [0xd. 119.0]

Sun Jul 25 19:52:51 2010

Incremental checkpoint up to RBA [0xd. 37a. 0], current log tail at RBA [0xd. 420.0]


---------------------------------------------------------------------------------

SQL> select CPDRT, CPLRBA_SEQ | '.' | CPLRBA_BNO | '.' | CPLRBA_BOF "Low RBA ",

2 CPODR_SEQ | '.' | CPODR_BNO | '.' | CPODR_BOF "On disk RBA", CPODS, CPODT, CPHBT

3 from x $ kcccp where indx = 0; -- get the incremental checkpoint information in the control file


CPDRT Low RBA On disk RBA CPODS CPODT CPHBT

-----------------------------------------------------------------------------

97 13.5574.0 13.6391.0 3041226 07/25/2010 22:16:37 725323317


The -- CPDRT column is the number of dirty blocks in the checkpoint queue.

-- The CPODS column is the scn of on disk rba.

-- The CPODT column is the timestamp of on disk rba.

-- The CPHBT column is the heartbeat


Iii. instance recovery

1. When you enable the non-consistent shutdown or shutdown abort database, the instance will be restored.

2. The instance recovery process is automatic.

3. Use information in online redo log files to synchronize data files

4. Two different operations are involved.

Roll-forward: Restore the data file to the status before the instance fails.

Rollback: data that has been modified but has not been submitted will be removed to the status before modification.

Iv. Process of instance recovery

The following picture is from the official Oracle textbooks.

650) this. width = 650; "src =" http://www.bkjia.com/uploads/allimg/131229/204S0M11-0.jpg "title =" 1.jpg" alt = "165908555.jpg"/>

1. First, Oracle compares the checkpoint information in the control file with the header information of the data file and finds that the data is inconsistent.

2. From the last checkpoint to the end of the log file, it will be re-applied to the data file and the undo information (rollback) will be generated. This stage is also called cache recovery

3. The data file contains submitted or unsubmitted data. Even if unsubmitted data exists, the database is opened and users are allowed to connect.

4. uncommitted transactions will be rolled back

5. The data file contains only submitted data.

5. Adjust instance recovery

1. Set the appropriate size for the number of online log records and data blocks that have an impact on the recovery process in the parameter file.

2. Adjust the size of online log files to affect the checkpoint occurrence frequency.

3. Use SQL commands to create a checkpoint event

4. Use Fast-start fault recovery

5. Restore related parameters

LOG_CHECKPOINT_TIMEOUT --> interval between two checkpoints (in seconds). This parameter is rarely used now.

LOG_CHECKPOINT_INTERVAL --> Number of redo block data blocks between two checkpoints (not db_block ),

-- Redo block size = OS block size this parameter is rarely used now

FAST_START_MTTR_TARGET --> specify how long it will take to restore the instance (unit: seconds) (This will be discussed later)

RECOVERY_PARALLELISM --> specifies the concurrency during the pre-rolling process.

FAST_START_PARALLEL_ROLLBACK --> pre-UNDO required block in the rollback phase, and then increase the rollback concurrency

-- We recommend that you set 2-way CPU to LO and 4-way CPU to HI. Otherwise, the default value is false.

FAST_START_IO_TARGET --> the number of I/O operations required for recovery when the database goes down, which is rarely used after 10 GB


Vi. View related to instance recovery

V $ INSTACE_RECOVERY --> View fast_start_mttr_target settings and MTTR information.

V $ FAST_START_SERVERS --> related concurrency information during transaction rollback

V $ FAST_START_TRANSACTION --> information about the transaction being recovered

Full checkpoint

Select * from X $ KCCRT where indx = 0;

Incremental checkpoint

SQL> select * from X $ KCCCP where indx = 0;


VII. instance recovery demonstration

-- Delete alarm logs

SQL> ho rm-f/u01/app/oracle/admin/orcl/bdump/alert_orcl.log


SQL> SELECT * FROM scott. emp WHERE ename = 'Scott ';


EMPNO ENAME JOB MGR HIREDATE SAL DEPTNO

----------------------------------------------------------------------------------------

7788 scott analyst 7566 19-APR-87 7100 20

-- Update SCOTT's salary and submit the transaction

SQL> UPDATE scott. emp SET sal = sal/2 WHERE ename = 'Scott ';


1 row updated.


SQL> COMMIT;


Commit complete.

-- Insert two new records without committing transactions

SQL> INSERT INTO scott. emp (empno, ename, job) SELECT '000000', 'mark', 'develer' FROM dual;


1 row created.


SQL> INSERT INTO scott. emp (empno, ename, job) SELECT '000000', 'Mary ', 'designer' FROM dual;


1 row created.


SQL> SELECT * FROM scott. emp WHERE empno IN (2001,2002 );


EMPNO ENAME JOB MGR HIREDATE SAL DEPTNO

----------------------------------------------------------------------------------------

2001 Mark Develpoer

2002 Mary Designer

-- Disable the instance and restart the instance. The instance will be automatically rolled back.

SQL> SHUTDOWN ABORT;

ORACLE instance shut down.

SQL> STARTUP

ORACLE instance started.


Total System Global Area 251658240 bytes

Fixed Size 1218796 bytes

Variable Size 88082196 bytes

Database Buffers 159383552 bytes

Redo Buffers 2973696 bytes

Database mounted.

Database opened.

-- Obtain rollback information from the alarm log

SQL> ho ls/u01/app/oracle/admin/orcl/bdump

Alert_orcl.log orcl_arc0_4016.trc orcl_arc1_4018.trc orcl_lgwr_3995.trc


SQL> ho cat/u01/app/oracle/admin/orcl/bdump/alert_orcl.log


----------------------------------------------------------------


ALTER DATABASE MOUNT

Thu Jul 22 12:44:40 2010

Setting recovery target incarnation to 10

Thu Jul 22 12:44:40 2010

Successful mount of redo thread 1, with mount id 1252833332

Thu Jul 22 12:44:40 2010

Database mounted in Exclusive Mode

Completed: ALTER DATABASE MOUNT

Thu Jul 22 12:44:41 2010

ALTER DATABASE OPEN

Thu Jul 22 12:44:41 2010

Beginning crash recovery of 1 threads -- start crash recovery

Thu Jul 22 12:44:41 2010

Started redo scan -- scan redo

Thu Jul 22 12:44:42 2010

Completed redo scan

142 redo blocks read, 58 data blocks need recovery

Thu Jul 22 12:44:42 2010

Started redo application

Thread 1: logseq 4, block 3156

Thu Jul 22 12:44:42 2010

Recovery of Online Redo Log: Thread 1 Group 3 Seq 4 Reading mem 0

Mem #0 errs 0:/u01/app/oracle/oradata/orcl/redo3a. rdo

Mem #1 errs 0:/u01/app/oracle/oradata/orcl/redo3b. rdo

Thu Jul 22 12:44:42 2010

Completed redo application

Thu Jul 22 12:44:43 2010

Completed crash recovery at -- restore

Thread 1: logseq 4, block 3298, scn 2921577

58 data blocks read, 58 data blocks written, 142 redo blocks read

Thu Jul 22 12:44:43 2010

LGWR: STARTING ARCH PROCESSES

ARC0: Archival started

ARC1: Archival started

LGWR: STARTING ARCH PROCESSES COMPLETE

ARC1 started with pid = 17, OS id = 4018

ARC0 started with pid = 16, OS id = 4016

Thu Jul 22 12:44:45 2010

Thread 1 advanced to log sequence 5

Thread 1 opened at log sequence 5

Current log #1 seq #5 mem #0:/u01/app/oracle/oradata/orcl/redo1a. rdo

Current log #1 seq #5 mem #1:/u01/app/oracle/oradata/orcl/redo1b. rdo

Successful open of redo thread 1

Thu Jul 22 12:44:45 2010

MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set -- FAST_START_MTTR_TARGET is not set


Bytes --------------------------------------------------------------------------------------------------

-- The scott user has been submitted, so it is submitted after recovery

SQL> SELECT * FROM scott. emp WHERE ename = 'Scott ';


EMPNO ENAME JOB MGR HIREDATE SAL DEPTNO

----------------------------------------------------------------------------------------

7788 scott analyst 7566 19-APR-87 3550 20


-- The newly added two records are not submitted and are rolled back after the instance recovers

SQL> SELECT * FROM scott. emp WHERE empno IN (2001,2002 );


No rows selected


8. Set the FAST_START_MTTR_TARGET parameter.

/*

The FAST_START_MTTR_TARGET parameter is used to reduce the cache recovery time.

After the FAST_START_MTTR_TARGET value is set, the Write Attempt of the incremental checkpoint in database management reaches the set target recovery time.

If the set value is reasonable, the entire recovery process is close to the set time.

Note: When the FAST_START_MTTR_TARGET parameter is used, the FAST_START_IO_TARGET, LOG_CHECKPOINT_INTERVAL,

LOG_CHECKPOINT_TIMEOUT parameter. If these parameters are set, the cache rediscovery will be prevented from meeting the specified FAST_START_MTTR_TARGET value.

A reasonable time value should be set for FAST_START_MTTR_TARGET.

The default value is 0, indicating that the automatic check point adjustment function is disabled.

The maximum value is 3600. When the set value is greater than 3600, it is automatically rounded to 3600.

The minimum value is 1. When it is set to 1, in fact it is not practical. Therefore, the recovery time cannot reach the set target value */



-- Set the value of fast_start_mttr_target to 0.

SQL> alter system set fast_start_mttr_target = 0;


System altered.


SQL> CREATE TABLE tb_test AS SELECT * FROM all_objects WHERE 1 = 2; -- CREATE a new TABLE


Table created.


SQL> INSERT INTO tb_test SELECT * FROM all_objects; -- INSERT record to new table


49945 rows created.


-- The following query shows that ESTIMATED_MTTR is 28.

SQL> SELECT recovery_estimated_ios, actual_redo_blks, target_mttr, estimated_mttr,

2 optimal_logfile_size FROM v $ instance_recovery;


RECOVERY_ESTIMATED_IOS ACTUAL_REDO_BLKS TARGET_MTTR ESTIMATED_MTTR OPTIMAL_LOGFILE_SIZE

-----------------------------------------------------------------------------------

762 11661 0 28


SQL> COMMIT; -- submit a transaction


Commit complete.

SQL> SELECT recovery_estimated_ios, actual_redo_blks, target_mttr, estimated_mttr,

2 optimal_logfile_size FROM v $ instance_recovery;


RECOVERY_ESTIMATED_IOS ACTUAL_REDO_BLKS TARGET_MTTR ESTIMATED_MTTR OPTIMAL_LOGFILE_SIZE

-----------------------------------------------------------------------------------

767 11669 0 28


-- As we can see from the above, commit only updates the content of the log buffer to the log file

SQL> ALTER SYSTEM CHECKPOINT; -- manually update the CHECKPOINT


System altered.


SQL> SELECT recovery_estimated_ios, actual_redo_blks, target_mttr, estimated_mttr,

2 optimal_logfile_size FROM v $ instance_recovery;


RECOVERY_ESTIMATED_IOS ACTUAL_REDO_BLKS TARGET_MTTR ESTIMATED_MTTR OPTIMAL_LOGFILE_SIZE

-----------------------------------------------------------------------------------

0 0 0 28

-- The preceding query shows that the RECOVERY_ESTIMATED_IOS and ACTUAL_REDO_BLKS fields have been reduced to 0.

-- When the checkpoint is generated, the dirty content in the database buffer is written to the data file.

-- ESTIMATED_MTTR does not change because the column is not updated in real time.

Oracle video tutorial follow: http://u.youku.com/user_video/id_UMzAzMjkxMjE2.html


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.