RMAN performance optimization Overview

Source: Internet
Author: User

RMAN performance optimization full strategy (I) It is easy to say that RMAN is doing operations during backup and recovery: Reading data from the "Source" to the buffer zone, then the self-read buffer is written to the "destination", and a lot of operations will occur in this process to complete the data block verification. If an operation is slow, we call it the key to bottleneck detection is to pick out the bottleneck ① determine the maximum speed of the backup source and backup device from the disk read speed, tape write speed, backup speed cannot exceed these two speeds, we can only try to approach it as much as possible. In our mind, I need to determine the disk read speed: you can perform sar-d during the peak load of the data server to add the blks/s column of the physical disk, multiply the size of the operating system block, or pick out some disks or LV, perform the/dev/null dd operation, and then use sar-d for observation, measuring speed II the speed of the backup device can be obtained by backing up multiple file systems with large data volumes in parallel. ② using v $ session_longops to monitor RMAN performance v $ session_longops is a useful view, operations over 6 seconds will be recorded in this view, and the time it takes to view the operations of RMAN in this view is as follows, how long does each part use [SQL] sys @ ORCL> ed Wrote file afiedt. buf 1 select. SID,. PROGRAM,. STATUS, B. OPNAME, B. ELAPSED_SECONDS, B. TIME_REMAINING 2 from v $ session a, V $ SESSION_LONGOPS B 3 WHERE. SID = B. sid and 4. SERIAL # = B. SERIAL # AND 5 upper (. PROGRAM) LIKE '% RMAN %' AND 6 * TIME_REMAINING> 0 ③ using v $ backup_sync_io AND v $ backup_async_io to monitor whether there is a bottleneck for IO backup the main part is the IO operation, therefore, IO is also the most likely cause of bottlenecks. Oracle provides the v $ backup_sync_io and v $ backup_async_io views: observe the actual backup speed, observe the waiting time in the backup process. The data in the two views is stored in the cycle when the instance is running, and when the database is restarted, the data in these two views will be cleared (1) the synchronization IO bottleneck query v $ backup_sync_io view, And the discrete_bytes_per_second column that follows the AGGREGATE value indicates that the data is backed up and restored in synchronous mode every second. the number of bytes of data, this value should be close to the read/write rate of the backup device. If this value is smaller than the read/write rate of the backup device, the opportunity for optimization is to check and optimize the script from the CPU load, backup process, network, MML interface configuration, and other aspects: [SQL] sys @ ORCL> ed Wrote file afiedt. buf 1 SELECT device_type device, TYPE, filename, 2 to_char (open_time, 'yyyymmdd hh24: mi: ss') OPEN, 3 to_char (close_time, 'yyyymmdd hh24: mi: ss ') CLOSE, 4 elapsed_time elapse, export d_bytes 5 FROM v $ backup_sync_io 6 WHERE close_time> SYSDATE-1 7 * order by close_time (2) asynchronous IO bottleneck I focus on the efficiency of backup and recovery per second query v $ backup_async_io and focus on the TYPE value of AGGREGATE effective_bytes_per_second in the production environment, basically, asynchronous IO is used. Therefore, this view uses a very frequent multi-Script: [SQL] sys @ ORCL> ed Wrote file afiedt. buf 1 SELECT device_type device, TYPE, filename, 2 to_char (open_time, 'yyyymmdd hh24: mi: ss') OPEN, 3 to_char (close_time, 'yyyymmdd hh24: mi: ss ') CLOSE, 4 elapsed_time elapse, inclue_bytes 5 FROM v $ backup_async_io 6 WHERE close_time> SYSDATE-1 7 * order by close_time similarly when inclutive_bytes_per_second represents the number of bytes per second for asynchronous backup and data recovery. This value should be close the read/write speed of the backup device, if this value is much smaller than the read/write speed of the backup device, we should also note that Ⅱ focuses on IO wait v $ backup_async_io and IO wait related columns: IO_COUNT: the total number of io ready: asynchronous buffer request. The number of times the buffer can be restored immediately. SHORT_WAITS: The Request buffer cannot be obtained immediately. However, LONG_WAITS is the number of times the request buffer can be accessed through short and non-blocking polling, the number of times the IO Device needs to be blocked. Among them, LONG_WAIT is the focus object, when the LONG_WAITS/IO_COUNT value is relatively high, it indicates that there is a bottleneck in the IO mode. Pay attention to the related files, check whether there is a problem with IO distribution (ii) Preparations before optimization (1) Strategic ① IO adjustment backup and recovery is a read-and write-intensive operation data file IO balanced backup if the database performs a good balance in IO, data files are also made across disks (stripe) Oracle testing will have at least 10% backup performance improvement ② memory adjustment RMAN backup process is to read data to the buffer, then, we recommend setting a reasonable Large pool for writing data to the backup device Oracle through the MML interface, so that the RMAN buffer comes from the Large pool ③ SQL statements that are poorly optimized consume I/O, using various database resources such as cache will reduce the available resources of RMAN ④ the adjustment of the backup policy will be backed up during busy periods of the business, and the configuration of full and Incremental backup, such as the DG Environment, will be completed, full backup can be done from the Standby node, without affecting services, it also ensures the backup speed. The Rac environment can back up two nodes at the same time to increase the read speed. (2) Tactical ① parallel Channel (Channel Parallelism) RMAN backup and recovery operations are performed through channels. The Channel is displayed on the database Server as a Server process. When RMAN allocates a Channel, it establishes a connection to a database instance. Multiple channels can perform backup and recovery independently. Therefore, the number of active channels is the number of parallel channels, in short, the purpose of parallel channel ② multiplexing (Mutiplexing) multiplexing is to speed up the performance of data read from the disk during backup. It is designed for a single channel when a single channel is backing up, it reads data from multiple data files at the same time, and then writes the data to the same backupset. This operation mode is called multiplexing. The number of multiplexing levels depends on three factors: ● FILESPERSET parameter ● MAXOPENFILES parameter ● number of files read through the channel. For example, my database has 100 data files, and the FILESPERSET parameter is 12, the MAXOPENFILES parameter is 10 so multiplexing level = min (), 10) = 10 ③ same/asynchronous I/O is described below as a brief description of the data stream transfer process during the same/different/step backup to the band Database: ④ Disk/tape buffer (Buffers) the buffer size determines the amount of data that can be transferred by a single IO. I. The size of the disk buffer is determined by the Mutiplexing level. The following table lists the parameters for the comparison: for details about the Buffer allocated to each file, see [SQL] sys @ ORCL> ed Wrote file afiedt in the statement. buf 1 SELECT type, filename, buffer_size, buffer_count, open_time, close_time 2 FROM v $ backup_async_io 3 * ORDER by type, open_time, close_time II tape buffer when you use a tape with library as a backup device, when an SBT channel is allocated, Oracle allocates a Buffer for each channel. When the BACKUP_TYPE_IO_SLAVES initialization value is TRUE, the memory space of the tape Buffer is allocated from the SGA region. When the BACKUP_TYPE_IO_SLAVES initialization value is FALSE, the tape buffer is allocated from the PGA. ORACLE recommends that this part of space be allocated from the large pool, avoid the contention between the I/O buffer of RMAN and the Library cache. ⑤ the tape itself. Each vendor has the advantages and disadvantages of each product. (3) Improve backup performance. (1) allocate a reasonable number of parallel channels for actual testing. indicates, if the backup device has a library, when the number of parallel channels equals to the number of machines in the database, the best performance will be achieved. This is also the case where two or three channels are allocated to the database to achieve the best performance, if the number of parallel channels is greater than the number of on-band channels, the Backupset will be stored in a mix of multiple tapes, thus affecting the recovery speed. If you back up the data to the disk, when the number of parallel channels is equal to the number of disk subsystems, the optimal performance is achieved. The number of disk subsystems refers to the number of output devices distributed across several disks. For example, the disk subsystem is distributed on three physical hard disks, it is easy to CONFIGURE three channels for parallel channels. The following is an example of configuring two parallel channels: [SQL] CONFIGURE DEVICE TYPE SBT_TAPE PARALLELISM 2; configure channel 1 device type 'sbt _ TAPE 'parms' ENV = (TDPO_OPTFILE =/usr/tivoli/tsm/client/oracle/bin64/tdpo. opt) '; configure channel 2 device type 'sbt _ TAPE 'parm' ENV = (TDPO_OPTFILE =/usr/tivoli/tsm/client/oracle/bin64/tdpo. opt) '; ② determine a reasonable number of "multiplexing". According to the actual test and Oracle suggestions, the rules for Multiplexing are as follows: if all the disks or data files to be backed up are well striped (stripe), the multiplexing will not be large, you can set the multiplexing level to 1 or 2, multiplexing should set a value greater than 8 under 8. It is often used when many files with empty blocks are backed up or when Incremental backup is performed ③ asynchronous IO is used by default, when RMAN is used to back up a tape, synchronous I/O synchronization can only be performed once at a time point. At this time, the backup performance must be very bad, while asynchronous I/O can perform multiple operations at a time point, better fill the write buffer to ensure that the streaming of tape is easy to enable for systems that support local asynchronous IO, the initialization parameter BACKUP_TAPE_IO_SLAVES is set to TRUE. ④ When the backup device is a tape-carrying device, adjust the tape buffer with the BLKSIZE parameter. When the backup device is a tape, this is an important part of improving RMAN backup performance. The BLKSIZE parameter of the RMAN channel determines the size of the tape buffer. The actual test and Oracle recommendations both indicate that the tape buffer should be at least 256 kb if your tape backup the Not Streaming problem occurs, the problem found after check does Not occur in the backup of Empty files and Incremental backup. You can try to adjust the BLKSIZE parameter to change the tape buffer. Not Streaming can improve the BLKSIZE parameter and it is also very simple, adjust the PARAM parameter of the allocate channel or configure channel. For example, you can set the tape buffer to 512 K as follows: [SQL] RMAN> configure channel device type sbt PARAMS = "BLKSIZE = 524288" ⑤ set a reasonable LARGE_POOL_SIZE value. If the LARGE_POOL_SIZE parameter is not set, the disk and tape buffer will try to be allocated from the shared pool. This will cause contention among components in the shared pool, such as the Library cache. A reasonable value should be allocated to the large pool. If the size is insufficient, disk and tape buffers are allocated from PGA, and alert warning message: [SQL] ksfqxre: failure to allocate chared memory means sync I/O will be used whenever async I/O to file not supported natively ⑥ problems to consider when backing up a large number of empty blocks file and Incremental backup, it is difficult to ensure that the tape buffer zone is full, so it will cause the Not Sreaming problem of the tape. This optimization is also very easy to say, and Multiplexing can be used at this time) adjust to a relatively large value, such as 50 (4) to improve Recovery performance ① database performance ● I/O Recovery is a read-and write-intensive operation, which requires: read the archived logs and read the blocks related to the data files to the Cache to write the dirty blocks that have been recovered back to the hard disk. Therefore, the database must have good IO balancing and good IO performance ● DBWR Performance Recovery the dirty block writing back to data files in the process is completed by the DBWR process. Therefore, the performance of DBWR will also affect the performance of the Recovery DBWR. You can use the v $ session_wait view" free buffer waits "indicates that if there is always such a wait at each time point, it indicates that DBWR write speed has a bottleneck and the methods to increase DBWR write speed include: enable asynchronous IO and add a DBWR process ● CPU performance each data block that requires recover must be read into the Buffer cache before redoing the log application. Therefore, there is a plug (Latch) obtaining process, CPU resources are required for the data block modification process, including cache buffers chains and cache buffers lru chain to obtain the plug-in. Therefore, you must ensure sufficient CPU bandwidth during the Recovery process, in particular, when performing parallel Recovery, ② the theories and actual measurements of the archive logs and Incremental Backup required for Recovery indicate that Incremental Backup will speed up data Recovery, the more archive data is used, the longer the restoration time will be, and the later versions will be added with the change block record mechanism, which will greatly speed up Incremental backup, at the same time, the impact on application system performance is greatly reduced ③ The amount of data files to be restored must be carefully analyzed to reduce the amount of media Recovery and Recovery, for example, if it is only a few blocks in a data file, you can consider how to recover the Block Media. ④ Which of the following is a good idea for archiving logs? It is to store some recent archived logs on the disk, this will speed up the Recovery speed. ⑤ parallel Recovery (10 Gb or later) You can specify a degree of parallelism for the RECOVER command when performing Recovery on multiple CPU Systems, multiple PARALLEL processes work simultaneously, for example, [SQL] RMAN> RECOVER TABLESPACE users PARALLEL 4; (v) Summary RMAN tuning is a task of physical activity and the requirement for continuously testing RMAN performance adjustment is also a balance point, so that the Backup recovery performance can meet the actual requirements and minimize the impact on production.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.