? Determine what types of failures can occur in Oracle DB? Describe how to optimize instance recovery? Explain the importance of checkpoints, redo log files, and archived log files? Configure the quick recovery area? Configuring Archivelog Mode
The responsibilities of the database administrator include:? Try to avoid a database failure?extended mean time between failures (MTBF)? Protect critical components in a redundant way?reduced mean recovery time (MTTR)? Minimizing data loss The goal of the database administrator (DBA) is to ensure that the database is open for users to use when needed. To achieve this goal, DBAs should complete the following tasks (working with system administrators):? Anticipate common causes of failure and try to avoid these causes? Tryextended mean time between failures (MTBF), or will it adversely affect availability? Try to ensure hardware reliability, protect critical components in a redundant manner, and perform operating system maintenance on a regular basis. Oracle DB offers advanced configuration options for increased MTBF, including:-Real application cluster-streams-oracle Data Guard? Reduce mean recovery time (MTTR) by identifying the recovery process scenarios in advance and configuring backups to be used whenever needed? Reduce data loss to a minimum. DBAs who are responsible for fulfilling accepted best practices can configure the database so that committed transactions are never lost. The specific items that are used to ensure this goal include:-Archive log Files-Flashback technology-standby database and Oracle Data Guard
Faults are usually divided into the following categories:? Statement failed? User process failed? Network failure? User error? Instance failure? Type of media failure failure? Statement failed: Single database operation (selection, insert, update, or delete) failed.? User process failed: Single database session failed.? Network failure: The connection to the database is disconnected. User error: The user completed the operation successfully, but the operation was not correct (the table was deleted or the error data was entered).instance failure: The DB instance shuts down unexpectedly. Media failure: Lost any files required for database operations (that is, files have been deleted or disk has failed).
After a single database operation fails, a DBA may be required to intervene to correct user rights or database space allocation errors. Even for problems that do not occur directly within the scope of the work, DBAs may be required to assist in diagnosing the failure. This may vary by organization. For example, for organizations that use off-the-shelf applications (that is, organizations without software developers), DBAs are the only point of contact that must be checked for logic errors in the application. To understand the logic errors in your application, you should work with the developer to understand the scope of the problem. The Oracle DB tool can help by checking for audit trails or previous transaction processing. Note: In many cases, statement failure is a design-determined and desired result. For example, security policies and quota rules are usually developed in advance. When a user attempts to exceed his or her limits and generates an error, the operation failure may be the desired result without any solution.
A user process that is abnormally disconnected from an instance may contain an in-progress, uncommitted task that needs to be rolled back. To ensure that the server process session remains connected,Process Monitor (Pmon)The background process polls the server process periodically. If Pmon discovers that a user of a server process is no longer connected, Pmon recovers from any ongoing transactions, and also rolls back uncommitted changes and unlocks any locks held by the failed session. No DBA intervention is required to recover from a user process failure, but the administrator must observe the trend of change. Do not worry if one or two users are disconnected abnormally. There may be cases where a small number of user processes fail. However, the same failure and systematic failure continue to indicate that there are other problems. If the percentage of abnormally disconnected connections is high, it may indicate that the user needs training (including learning how to unregister the program, rather than just terminating the program). In addition, it may indicate a network or application problem.
The best solution to a network failure is to provide a redundant path to the network connection. By backing up listeners, network connections, and network interface cards, you can reduce the chance of network failure and thus avoid impacting system availability.
Users may inadvertently delete or modify the data. If you have not submitted or exited their program, you can simply roll back. Oracle can be used via Enterprise Manager or SQL interfaceLogminerTo query the online redo log and archive redo logs. The amount of time that transaction data remains in the online redo log may be longer than the amount of time retained in the restore segment, and if the redo information is configured to be archived, the redo information is retained until the archive file is deleted. By flashing the table back to the state before deletion, the user can recover the table from the Recycle Bin after the table is deleted.
if the Recycle Bin is cleared, or if the user deletes the table using the Purge option, the deleted table can still be recovered by using point-in-time recovery (PITR) If the database is configured correctly.
Using Flash back technology:? View the previous state of the data? Read data back and forth? Assist users with error analysis and recovery Flashback technology Oracle DB provides Oracle Flashback technology: This technology consists of a set of features that enable you to view the previous state of the data and read data back and forth without having to restore the database from the backup. Use this technique to help users analyze errors and recover from errors. If the user submits the wrong changes, the following features can be used to parse the error:? Flashback query: View submitted data that existed at some point in the past. With as ofThe SELECT command of a clause refers to a time in the past by a timestamp or SCN. Flashback version query: View historical data submitted at a specific time interval. Using the Select commandVERSIONS betweenClause (for performance reasons, an existing index is used). Flashback transaction query: View all database changes made at the transaction level. Possible workarounds for recovering from user errors include:? Flashback transaction Recovery: Fallback for a particular transaction and its subordinate transaction processing.? Flashback table: reads one or more tables back to their previous time content without affecting other database objects. Flashback Delete: By adding the deleted table and itssubordinate objects (such as indexes and triggers)Back to the database from the Recycle Bin, undo the operation to delete the table. Flashback database: Returning the database toa past time or system change number (SCN).
If all database files are synchronizedbefore closingIf you close the DB instance, an instance failure occurs. hardware or software failure, or useSHUTDOWN Abort and startup ForceAn instance failure can also occur when the emergency shutdown command occurs. If Oracle Restart is enabled and it is monitoring the database, it is rarely necessary for an administrator to recover from an instance failure. When a DB instance fails, Oracle Restart attempts to restart the instance. If manual intervention is required, there may be a more serious problem preventing the instance from restarting, such as a memory CPU failure.
- Understanding Instance Recovery: Checkpoint (CKPT) session
To understand instance recovery, you need to understand the functionality of a particular background process.every three seconds (or more frequently),The CKPT process will beControl FilesData is stored in the DBWN to record which modified chunks of data have been written from the SGA to disk. This is called "Incremental Checkpoint”。 The purpose of the checkpoint is to identifyThe location where the online redo log file begins the instance recovery (this location is called the checkpoint location). If a log switchover occurs, the CKPT process also writes this checkpoint information to theData File Header。 There are checkpoints for the following reasons:? Make sure that the modified blocks in memory are written to disk on a regular basis so that no data is lost when the system or database fails? Reduce the time it takes to recover an instance (only the online redo log file entry after the last checkpoint is processed.) )? Ensure that all committed data is written to the data file during shutdown CKPT The checkpoint information that is written by the process includescheckpoint location, System change number (SCN), online redo log filewhere the recovery begins, information about the logWait a minute. Note: The CKPT process does not write data blocks to disk or writes a redo block to the online redo log file.
- Understanding Instance Recovery: Redo log files and log write processes
Understanding Instance Recovery: Redo log files and log write process redo log files are recorded due to transaction processing and internal operation of the Oracle serverchanges made to the database。 (A transaction is a logical unit of work that consists of one or more SQL statements that the user runs.) Redo log files will protect the database,avoid power outages, disk failuresThe resulting system failure results in incomplete data. Redo log files should MultiplexingTo ensure that the information stored therein is not lost when a disk failure event occurs. The redo log consists of a redo log file group, and the Redo log file group consists of a redo log file and its multiplexed copy. Each of the same replicas is called a member of the group, and each group is identified by a number.Log Write process (LGWR)writes a redo record from the redo log buffer to all members of the Redo log group until the file is filled or the request log switchesoperation. Then, switch to the next set of files and perform a write operation. Redo log groups will be used in a circular fashion. Best practices Tip: If possible, multiplexed redo log files should reside on separate disks.
- Understanding Instance Recovery
Automatic instance recovery or crash recovery:? The reason is that you try to open a file in the databasedo not sync on shutdown? Use the information stored in the Redo log group to synchronize files? Two different operations are involved:– Roll Forward: Apply Redo Log changes (committed and uncommitted) to the data file. – Fallback: Changes that have been performed but not committed are returned to the initial state. Instance recovery Oracle DB automatically recovers from an instance failure. What the instance needs to do is start normally. If Oracle Restart is enabled and configured to monitor the database, the startup action occurs automatically. The instance loads the control file and then attempts to open the data file. If the instance discovery data file has not been synchronized during shutdown, the data file is rolled forward to the state it was closed with the information contained in the Redo log group. The database is then opened and all uncommitted transactions are rolled back.
- Phase of instance Recovery
Phase of instance recovery to enable an instance to open a data file,Data File HeaderThe system change number (SCN) that is included with the database must beControl FilesThe current SCN matches stored in the. If the number does not match, the instance is applied onlineRedo data in redo log, and redo the transaction sequentially until the data file is up to date. When all data files are synchronized with the control file, the database is opened and the user can log on. After the redo log is applied, all transactions are applied and the database is returned to the state in which the error occurred. This typically includes transactions that are in progress but not yet committed. After you open the database, youFallback to those uncommitted transactions。 At the end of the fallback phase of instance recovery, the data file contains only the data that has been committed.
- Optimize instance Recovery
? During instance recovery, you must use thecheckpoint location and the end of redo log .application of transaction processing to data files。? You can optimize instance recovery by controlling the difference between the checkpoint location and the end of the redo log. Optimize instance recovery The transaction information is logged in the Redo log group before the instance returns a commit complete to the transaction. Redo the information in the log group to ensure that transactions can be resumed if an error occurs. In addition, transaction processing information needs to be written to the data file. Because the data file write process is much slower than the write process, data file writes are usually performed after the information is logged in the Redo log group. (data file random write process is slow to do log file continuous write process.) Every three seconds, the checkpoint process records information about the checkpoint location in the redo log in the control file. Therefore, Oracle DB considers that all redo log entries recorded before this point in time are not required for database recovery. In the drawing, the striped block has not been written to disk. The time required for instance recovery refers to the time required to advance the last checkpoint of the data file to the latest SCN recorded in the control file. Administrator by settingMTTR target (in seconds)and resizing the Redo log group to control the time. For example, for a two redo group, the distance between the checkpoint location and the end of the Redo log group cannot begreater than 90% of the minimum redo log group.
? Specify the desired time in seconds or minutes.? The default value is 0 (disabled). The maximum value is 3,600 seconds (1 hours). Use the MTTR Guide to select one of the following options to help you set your mttr goal:? Enterprise Manager > Advisor Central > Mttr Advisor (Enterprise Manager > Guide > MTTR guidance), where "Advisor Central "In the related links (related links)" section? Enterprise Manager > Availability > Recovery Settings (Enterprise Manager > Availability > Recovery Settings) When the expected average recovery time is set to 41 seconds, the SQL statement is displayed: ALTERSYSTEMsetfast_start_mttr_target =BOTH SCOPE=
The Fast_start_mttr_target initialization parameter simplifies the configuration of the recovery time for an instance or system failure. The MTTR guidance converts the Fast_start_mttr_target value to multiple parameters so that instance recovery can be enabled within the desired time period (or within the range as close to this time as possible). Note that setting the Fast_start_mttr_target parameter explicitly to 0 disables the MTTR guidance. The setting value of the Fast_start_mttr_target parameter must support the system's service-level protocol. If the value of the mttr target is small, the I/O overhead is increased by increasing the number of data file writes (which can affect performance). However, if the mttr target is set too large, the instance will take a long time to recover after the crash.
Oracle Corporation defines a media failure as any failure that causes one or more database files (data files, control files, or redo log files) to be lost or corrupted. To recover from a media failure, you need to restore and restore the missing files. To ensure that the database can be recovered from a media failure.
To configure the maximum recoverability of a database, you must do the following:? Schedule regular backups? Multiplexing control files? Multiplexing Redo log groups? Preserve archived copies of redo logs configuration recoverability to provide maximum protection for your data, you must:? Scheduling regular backups to repair media failures typically requires restoring lost or corrupted files from a backup. The multiplexed control files are the same for all control files associated with the database. If you lose a control file, it is not difficult to recover, but it is difficult to recover if all the control files are missing. To avoid losing all control files, you must have at least two copies. Multiplexing Redo log Group to recover from an instance failure or a media failure, you can use redo log information to roll forward the data file to the last committed transaction. If the Redo log group relies on a redo log file, the loss of this file means that data is likely to be lost. Make sure that you have at least two copies of each redo log group, and that each copy should be on a different disk controller if possible. Keep an archived copy of the redo log if a file is lost and restored from the backup, the instance must apply the redo information to advance the file to the latest SCN contained in the control file. When you use the default settings, the database overwrites this information after the redo information is written to the data file. The database can be configured to retain the redo information in the archive copy of the redo log. This is called putting the database into Archivelog mode. You can perform configuration tasks in enterprise Manager or by using the command line.
- Configure the Quick recovery area
Quick Recovery zone:? Strongly recommended to simplify backup storage management? Use storage space (separate from working database files)? The location is specified by the Db_recovery_file_dest parameter? Size specified by the Db_recovery_file_dest_size parameter? Large enough to hold backups, archive logs, flashback logs, multiplexed control files, and multiplexed redo logs? Automatic management based on retention policies configuring the quick recovery area means determining the location, size, and retention policies.Change the quick recovery area size to 8122MB alter SYSTEMSETdb_recovery_file_dest_size=8516534272SCOPE =BOTH Configuring the Quick recovery area The quick recovery area is a space that is specifically set up on disk to contain archived logs, backups, flashback logs, multiplexed control files, and multiplexed redo logs. The Quick recovery area simplifies backup storage management, so this feature is highly recommended. The storage location where the quick recovery area should be placed should be different from the data files of the database and the location of the primary online log files and control files. The amount of disk space allocated to the quick recovery area depends on the size and activity level of the database. In general, the larger the fast recovery area, the more useful it is. Ideally, the fast recovery zone should be large enough to hold a copy of the data files and control files, as well as the flashback logs, online redo logs, and archive logs required to restore the database from a reserved backup based on a retention policy. In shortThe fast recovery zone should be at least two of the database sizeSo that a backup and several archived logs can be retained. The space management of the rapid recovery area is determined byBackup Retention PolicyControl. Retention policies determine when files become obsolete, that is, when these files are no longer useful for achieving data recovery goals. Oracle DB automatically manages the storage by removing files that are no longer needed.
- Multiplexing Control files
To protect against database failures, the database control file should hold multiple copies. The multiplexed control file control file is a BinaryA small file that describes the structure of the database. As long as the database is mounted or opened, the Oracle server must be able to write to the file. If the file does not exist, the database cannot be loaded, so the control file needs to be restored or recreated. The database should have at least two control files, and the files should be on separate disks to minimize the impact of losing a control file. Because all control files must be available at any time, losing a control file can cause an error in the instance. However, it is not difficult to recover in this case, just copy one of the control files. If you lose all your control files, it is harder to recover, but this failure is often not a catastrophic failure. Adding a control file if you use ASM as a storage technology, there is no need for further multiplexing as long as there are two control files and one control file on each disk group (for example, +data and +fra). For databases that use OMF (for example, databases that use ASM storage), all additional control files must be created during the recovery process using Rman (or through Oracle Enterprise Manager). In a database that uses regular file system storage, adding a control file is a manual operation: 1. Use the following command to change spfile:alter SYSTEM SET control_files = '/u01/app/oracle/oradata/orcl/control01.ctl ', '/u02/app/oracle/ Oradata/orcl/control02.ctl ', '/u03/app/oracle/oradata/orcl/control03.ctl ' scope=spfile;2. Close the database. 3. Use the operating system to copy the existing control files to the location selected for the new file. 4. Open the database.
Multiplexing redo log groups to avoid media failures and data loss. This increases database I/O. It is recommended that redo log groups meet the following criteria:? Each group has at least two members (files)? Each member: – If using file system storage, is on a separate disk or controller – if ASM is used, it is on a separate disk group (for example, +data and +fra) Note: multiplexed redo logs can affect the overall performance of the database. Redo log files The Redo Log group consists of one or more redo log files. Each log file in the group is the same. Oracle Corporation recommends that each redo log group contain a minimum of two files. If you use file system storage, each member should be distributed on a separate disk or controller so that the entire log group does not break when a single device fails. If you use ASM storage, each member should be in a separate disk group, such as +data and +fra. The loss of the entire current log group is one of the most severe media failures, as this can result in data loss. But missing a member of a log group that includes more than one member is trivial, and this does not affect database operations (it only causes alerts to be published in the alert log). Keep in mind that because the commit cannot be completed before the transaction information is written to the log, the multiplexing redo log can severely affect the performance of the database. Redo log files must be placed in thethe fastest speedThe controller service is on the fastest disk. Try not to keep any other database files on the same disk as the Redo log files (unless you use automatic storage management [ASM]). Because only one group can be written at a given time, members with multiple groups on the same disk have no performance impact.
The redo log is multiplexed by adding members to an existing log group. To add members to the Redo log group (the database is open and does not affect user performance), follow these steps: 1. Select Enterprise Manager > Server > Redo log Groups (Enterprise Manager > Server > Redo Log Group). 2. Select a group and click the Edit button, or click the group Number link. The Edit Redo log group (editing redo Log groups) page appears. 3. In the Redo log members (redo log Member) area, click Add. The Add Redo log Member (add redo Log member) page is displayed. 4. Select the appropriate Storage type (storage type) and enter the required information. ALTERDATABASEADDLOGFILEMEMBER'/u01/app/oracle/oradata/ Test0924/redo01a.log 'toGROUP1 For ASM, select the disk group and, if necessary, specify the template and alias information. For file system storage, enter the file name and directory. Click Continue (Continue). Repeat these steps for each existing group that you want to multiplex. The following shows an example of SQL syntax for adding a redo log member to the Redo Log Group 1 (using ASM):sql> ALTER DATABASE add LOGFILE MEMBER ' +data ' to group 1; When you add a redo log member to a group The status of the member is marked as invalid (which can be seen in the V$logfile view). This state is justified because the data has not been written to the new members of the group. When a switch log or a group containing a new member becomes current, the state of the member is changed to NULL.
To preserve the redo information, create an archived copy of the redo log file by performing the following steps. 1. Specify the archive log file naming convention. 2. Specify the location of one or more archived log files. 3. Switch the database to Archivelog mode. The archive log file instance willOnline redo Log GroupAs a circular buffer in which transaction information can be stored, it populates a group and then goes to the next group. When all groups are written, the instance begins overwriting the information in the first log group. To configure the database so that it has maximum recoverability, you must instruct the database to build before allowing overwriting of the redo informationcopy of online redo log Group。 These replicas are called "Archive Logs". To simplify the process of creating an archive log file, do the following: 1. Specifies the naming convention for the archive log. 2. Specify one or more destination locations to store the archived logs. One of the target locations can beQuick Recovery Zone。 3. Place the database in Archivelog mode. Note: If you are using the Quick recovery area, you do not need to perform steps 1 and 2. The target location should already exist before you put the database in Archivelog mode. If you specify a directory as a target location, the end of the directory name should be preceded by a slash.
ARCN is an optional background process. However, this process is important for recovering a database after a disk corruption. When the online redo log group fills up, the Oracle instance begins writing to the next online redo log group. The process of switching from one online redo log group to another online redo log group is called "Log switchover." The ARCN process initiates an archive of the filled log group each time a log switch is made.the process automatically archives the online Redo log group before reusing thelog groups, preserving all changes made to the database. This allows the database to be multiplexed to the point of failure, even if the disk drive is damaged. One important decision the DBA must make is to configure the database toARCHIVELOGmode, or configure it to run in Noarchivelog mode.? In Noarchivelog mode, each log switchover will becoverOnline redo log file.? In Archivelog mode, the inactive filled online redo log file group must be archived before these online redo log files can be used again. Note? The Archivelog mode is essential for most backup strategies, and this mode is easy to configure. If the archive log file destination is full or cannot be written to, the database will eventually stop. Deletes the archive from the archive log file target location and the database will continue to operate.
- Archive log files: Naming and destination locations
On the Recovery Settings (Restore Settings) page, specify the naming and archiving destination location information. If you use file system storage, it is recommended that you add multiple locations on different disks. Archive log files: Naming and target locations to configure the archive log file name and destination location, select Enterprise Manager > Availability > Configure Recovery Settings (Enterprise Manager > Availability > Configure Recovery Settings) ". Each archive log file must have a unique name, which avoids overwriting older log files. Please specify a naming format as shown in. To help create a unique file name, Oracle Database 11g allows you to use multiple wildcard characters in the name format:? %s: Contains the log sequence number as part of the file name? %t: Contains the thread number as part of the file name? %r: Include resetting the log ID to ensure that the archive log file name is unique (even after you reset the log sequence number with some advanced recovery technology)? %d: Contains the database ID as part of the file name according to best practices, the format should contain%s,%t, and%r (you can also include%d if multiple databases share the same archive log target location). By default, if the Fast recovery area is enabled, specify Use_db_recovery_file_dest as the archive log file target location. The archive log file can be written to up to 10 different destination locations. The target location can be either a local destination location (directory) or a remote target location (an Oracle Net alias for the standby database). Click Add another row (add another row) to increase the target location. To change the recovery settings, you must establish a connection as SYSDBA or Sysoper. Note: If you do not want the archive log file to be sent to Use_db_recovery_file_dest, delete this location.
To put the database in Archivelog mode, perform the following steps in Oracle Enterprise Manager: 1. Select the Archivelogmode (archivelog mode) check box and click Apply. Only databases in the Mount state can be set to Archivelog mode. 2. Restart the database (using SYSDBA permissions). 3. (optional) View the archive status. 4. Back up the database. Note: Databases in Archivelog mode have access to all backup and recovery options. Sqlplus/as sysdbashutdown immediatestartup mountalter database archivelog;alter database open;archive log list enable archive Log mode 1. In Enterprise Manager, select Availability > Configure Recovery Settings > ARCHIVELOG Mode (Availability > Configure recovery settings > ARCHIVELOG mode) ". The equivalent SQL command is:sql> ALTER database ARCHIVELOG; This command is issued only if the database is in the Mount state. Therefore, you must restart the instance to complete this last step. 2. When you restart the database in Enterprise Manager, you are prompted to specify the operating system and database identity certificates. The database identity certificate must be the identity of the user with SYSDBA permissions. 3. After restarting the instance, changes to the archive process, log format, and log destination locations take effect. In Sql*plus, you can use the Archive LOG List command to view this information. 4. Switch to the Archivelog mode fallback database because the database can only be recovered from the last backup that was performed in that mode. When the database is in Noarchivelog mode (the default mode), only the state of the last backup is restored. Any transactions that are performed after the backup are lost. In Archivelog mode, you can revert to the state at the time of the last commit. Most production databases are running in Archivelog mode.
Source: http://blog.csdn.net/rlhua/article/details/12616383