1. backup and recovery problems
The final result of considering business needs, performance, and capital costs is usually a compromise. It is extremely important to record this scheme, usually in the form of a service level agreemenet ).
The service level agreement is related to backup and recovery in three aspects:
- Mean Time (mean time Betwwen failures, MTBF)
- Mean time to recover (MTTR)
- Data Loss
The goal of DBA is to increase MTBF while reducing MTTR and data loss.
MTBF refers to the frequency of database failures. Oracle provides two advanced options for availability: RAC and streams.
MTTR refers to the downtime after the database fails.
The third goal is to minimize the loss of data.
2. failure type
2.1 statement failure
Four common causes of statement failure are:
Invaild data: This is usually caused by violation of the format or integrity constraints.
Space management ):
Space allocation problem: related statement failure causes include: unable to expand a segment because the tablespace is full; depletion of Undo space; insufficient temporary space when querying by disk order; a user reaches the quota; an object reaches the maximum interval limit.
Logic error: in some cases, programmers may develop code that cannot be run by the database.
2.2 user process failure
User processes may fail for many reasons, including abnormal user logout, terminal restart, and programs that cause address violations.
2.3 network failure
You need to consider three aspects: listeners, network interface cards, and routes.
2.4. User error
The ideal way to solve user errors is to first prevent these errors. Flash-back query is a query of the data that exists at a time. By undoing data, you can only construct a read-consistent database for the current session.
? {
Copycode ('p639code1 ');
} "Href =" javascript:; "> [copy to clipboard] {
Showcodetxt ('p639code1'); Return false;
} "Href =" javascript:; "> View code SQL
SQL> conn Scott/tigerconnected. SQL> Delete from EMP; 14 rows deleted. SQL> commit; Commit complete. SQL> select count (*) from EMP; count (*) ---------- 0 ###################################### ########## the time here is 1/24 hours, 1/24/60 is 1 minute, restore the record deleted 2 minutes ago ################################ ############### SQL> insert into EMP select * from EMP as of Timestamp (sysdate-2/24/60 ); 14 rows created. SQL> commit; Commit complete. SQL> select count (*) from EMP; count (*) ---------- 14 SQL> drop table EMP; Table dropped. SQL> select count (*) from EMP; select count (*) from EMP * error at line 1: ORA-00942: Table or view does not exist SQL> flashback table EMP to before drop; flashback complete. SQL> select count (*) from EMP; count (*) ---------- 14 |
Log miner is an advanced tool for extracting information from online and archived redo logs. Redo all changes that contain data blocks. By extracting changes in table data blocks, you can reconstruct the changes. Therefore, redo can be used to restore backups at an earlier time. If there is a copy of the relevant log, log miner can return to any previous time point, and the flashback query can only return to the time allowed to cancel the tablespace.
For fixing user errors, Incomplete recovery and database flashback are more significant methods.
2.5 media failure
If the media fails, the disk is damaged. Therefore, the files stored on the disk are damaged. To cope with media failures, you must generate multiple copies of control files, online redo logs, and archived redo log files. In addition, you must back up control files, data files, and archived log files. You do not need to back up the redo log. In fact, the redo log is backed up when it is copied to the archived log. Multiplexing does not protect data files. data files must be protected through hardware redundancy.
2.6 instance failure
Instance failure refers to the unordered shutdown of an instance, which is usually called a crash ). Power failure, shutdown or restart the server, and many critical hardware problems can cause the instance to fail. In some situations where the background process may fail, an immediate instance failure will also be triggered. The results are the same as those of executing the shutsown abort command.
When the instance fails, the database may lose the committed transactions and store the uncommitted transactions, causing the database to crash. This is because the server process is working in the memory, and the process updates the data block and the revocation block in the cache area of the database. At this time, the dbwn process did not completely write the updated data block to the data file. However, the lgwr process performs write operations in as real time as possible.
3. instance recovery
3.1 process of instance recovery
Instance recovery only uses the content of the online log file to re-build the database cache area to the status before the crash. This process will repeat all the changes extracted from the redo logs related to the data blocks that were not written to the disk at the time of crash. After completing the preceding operations, you can open the database. At this time, the database still has a restore error, but the instance you see has been repaired, so you are allowed to connect. This stage of instance recovery is called rollback, which restores all changes. Each redo record has the minimum information required to reconstruct a change: the address of the data block and the new value. During the rollback, each redo record is read, and the corresponding data blocks are loaded into the cache area of the database from the data file, and the application changes accordingly. Data blocks are written back to the disk.
Instance recovery is automatic and inevitable. You can use startup to call instance recovery. First, the SMON process reads the control file when the database is in the load mode. When the database is too open, the SMON process will view all the data files and the file headers of the online redo log files. At this time, if the instance has failed, because the file header is not fully synchronized, the SMON process will find the instance failed, and thus enter the instance recovery routine. The database can only be opened after the previous rollback phase ends.
If the startup command is executed, no data will be lost. In the event of any crash, you should try to execute the startup command and check the degree of the crash.
3.2 database crash cannot be caused by instance recovery
Because the lgwr process always performs write operations prior to the dbwn process and performs real-time write operations at the same time of submission, there is always enough information in the redo stream, this allows you to rebuild any submitted changes that have not been written to the data file and roll back any uncommitted changes that have been written to the data file. The instance recovery mechanism of redo and rollback makes it impossible for the Oracle database to have a Rolling Error.
3.3 adjust instance recovery
MTTR (the average recovery time after various events) is an important part of many SLA. Instance recovery ensures that no renewal error is generated, but it takes a lot of time before the database is opened to roll back the instance. This time depends on two factors: the number of retries to be read, and the number of read/write operations to be completed on the data file during application redo. Both factors are controlled by checkpoints.
The checkpoint () ensures a specific time. All data changes in the dbwn process that constitute a specific system change number (SCN) have been written to the data file. In an instance crash event, only the SMON process needs to repeat the redo generated from the previous checkpoint. All changes made before the checkpoint position have been written into the data file, whether submitted or not. Therefore, it is clear that you do not need to use redo to reconstruct the transactions committed before the checkpoint location. In addition, changes made by transactions not submitted before the checkpoint will also be written into the data file, so you do not need to reconstruct the Undo data before the checkpoint, the data to be used for rollback already exists in the disk revocation segment.
The closer the checkpoint location is, the faster the instance recovers.
The fast_start_mttr_target parameter makes it easy to control the instance recovery time. This parameter is specified in seconds. Oracle then confirms that the dbwn process can write data blocks to the disk at a sufficient speed after the instance crashes, so that the specified number of seconds is not exceeded. Therefore, the smaller the value of the fast_start_mttr_target parameter, the dbwn process will try to minimize the interval between the checkpoint location and the actual time. Note that if an impractical small value is set, the dbwn process cannot meet the requirements in any way.
? {
Copycode ('p639code2 ');
} "Href =" javascript:; "> [copy to clipboard] {
Showcodetxt ('p639code2 '); Return false;
} "Href =" javascript:; "> View code SQL
SQL> conn system/oracleconnected. ######################################## ######### set the value of fast_start_mttr_target to 0, to disable the checkpoint adjustment function ################################### ############ SQL> alter system set fast_start_mttr_target = 0; system altered. ######################################## ######## create a table and start a transaction, workload simulation ##################################### ########## SQL> Create Table T1 as select * From all_objects where 1 = 2; table created. SQL> insert into T1 select * From all_objects; 11185 rows created. ######################################## ######### query and display the read/write operations and redo data blocks that must be processed on the data file during instance recovery # estimated_mttr displays the unit of seconds instance recovery time #################################### ########### SQL> select recovery_estimated_ios, starting, estimated_mttr 2 from V $ instance_recovery; recovery_estimated_ios prepare estimated_mttr ---------------------- ------------ ---------------- 1311 17326 16 SQL> commit; Commit complete. ######################################## ######## this is the submitted query, we can see that the results have not changed much, because commit does not affect the dbwn process ################################ ############### SQL> select recovery_estimated_ios, actual_redo_blks, estimated_mttr 2 from V $ instance_recovery; recovery_estimated_ios effecestimated_mttr certificate ---------------- -------------- 1311 17343 16 ############################### ################# manually execute the Checkpoint Process, as you can see, recovery_estimated_ios and actual_redo_blks have been cleared # estimated_mttr will not become smaller because the content of this column will not be updated in real time ###################### ########################## SQL> alter system checkpoint; system altered. SQL> select region, region, estimated_mttr 2 from V $ instance_recovery; recovery_estimated_ios region estimated_mttr -------------------- ------------------ ---------------- 10 6 13 |