Standby disk IO performance is poor, affecting primary performance

Last Update:2016-09-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Recently processed one due to poor performance of standby disk IO, the performance of primary was affected.
The main library is mostly waiting for "log file switch Completion", which, through ash dump analysis, finally discovers that the actual wait event is "Lgwr-lns wait on channel". This event basically boils down to the problem of network performance and standby IO performance, while the customer's transmission mode is "MAXIMUM availability"
Finally, we propose two solutions,
(1). Replace standby storage with better performance
(2). Modify the transfer mode to maximum performance and use the LGWR async transfer Mode
Here, incidentally, the standby three modes of transmission and the corresponding available transmission mode

Compare items	Maximum Protection	Maximum Availability	Maximum Performance
Redo Write or transfer process	Lgwr	Lgwr	LGWR or Arch .
Network transfer Mode	Sync	Sync	Sync or Async .
IO Write Success Confirmation	Affirm	Affirm	affirm or Noaffirm .
Standby redologs	Need	Need	LGWR need, Arch does not need

The root of the problem is that standby IO performance is poor, while using the "MAXIMUM availability" mode of transmission, using the sync mode, requires disk IO to write a successful confirmation message, resulting in a drag of the primary performance.

2. Here is an introduction to sync and async

http://docs.oracle.com/cd/B10501_01/server.920/a96653/log_arch_dest_param.htm#77394

Sync=parallel
Sync=noparallel

The SYNC attribute specifies that network I/O are to being performed synchronously for the destination, which means that once The I/O is initiated, the archiving process waits for the I/O to complete before continuing. The SYNC attribute is one requirement for setting up a no-data-loss environment, because it ensures that the redo records were successfully transmitted to the standby site before continuing.

If The log writer process is defined to being the transmitter to multiple standby destinations This use the SYNC attribute, T He user has the option of specifying Sync=parallel or Sync=noparallel for each of the those destinations.

-If Sync=noparallel is used, the log writer process performs the network I/O to each destination in series. In other words, the log writer process initiates an I/O to the first destination and waits until it completes before Initi Ating the I/O to the next destination. Specifying the Sync=noparallel attribute is the same as specifying the Async=0 attribute.

-If Sync=parallel is used, the network I/O is initiated asynchronously, so that I/O to multiple destinations can be Initi Ated in parallel. However, once the I/O is initiated and the log writer process waits for each I/O operation to complete before continuing. This are, in effect, the same as performing multiple, synchronous I/O operations simultaneously. The use of Sync=parallel are likely to perform better than sync=noparallel.

Because the PARALLEL and Noparallel qualifiers only make a difference if multiple destinations is involved, Oracle Corpor Ation recommends that all destinations use the same value.

Async[=blocks]

The ASYNC attribute specifies that network I/O is performed asynchronously for the destination. Once the I/O is initiated, the log writer continues processing the next request without waiting for the I/O to complete an D without checking the completion status of the I/O. Use of the ASYNC attribute allows standby environments to be maintain Ed with little or no performance effect on the primary database. The optional block count determines the size of the SGA network buffer to be used. In general, the slower the network connection, the larger the block count should is. Also, specifying the async=0 attribute is the same as specifying the Sync=noparallel attribute.

By carefully interpreting the documentation, you can summarize the following points
Sync, after the IO transmission is initiated, the primary can continue with the next step only after standby does the IO acknowledgement of successful feedback, so that if the standby IO performance is poor, the main library performance will be affected.
Async, it is not necessary to confirm the IO, after primary initiated IO initialization, the next step, standby write speed, will not affect the primary

3. After fully understanding these two concepts, turn back to analyze the customer's questions:
The customer has a total of three standby, but the log_archive_dest_3 corresponding standby server performance is poor, in the system relatively busy time period, in the Oswatcher log can be found, standby IO utilization is 100%.
At this point, the problem has been confirmed that the standby server and primary performance gap is relatively large, and because the use of LGWR Sync transfer mode, resulting in standby IO pressure ratio is large.
and primary to confirm that the transmission of the log message is complete before standby can continue the next step, causing primary performance to be greatly affected.

4. Summary, it is recommended that the performance of standby and primary is not too big difference, at least to achieve primary 70~80% performance, or in switch and fail over, standby can not take over primary business.
And in the daily log transmission and so on, also can affect the performance of primary.
Perhaps after reading this article, you will have a question? Maximum availability can automatically switch to maximum performance? How does it affect performance?

5. With the problem, let's analyze it, first look at the concept:

Maximum Availability Thisprotection mode provides the highest level of data protection which is possiblewithout Compromisin G The availability of the primary database. Like Maximumprotection mode, a transaction won't commit until the redo needed to recoverthat transaction are written to The local online redo log and to the Standbyredo logs of at least one transactionally consistent standby database. Unlikemaximum protection mode, the primary database does not shut off if a faultprevents it from writing its redo stream To a remote standby redo log. Instead,the Primary database operates in Maximum performance mode until the fault iscorrected, and all gaps in redo log fi Les is resolved. When all gaps areresolved, the primary database automatically resumes operating in maximumavailability mode.
This mode ensures that no data loss would occur if the primarydatabase fails, but only if a second fault does not prevent a Complete set Ofredo data from being sent from the primary database to at least one standbydatabase.

Maximum availability mode-This protection mode provides the highest possible level of data protection without compromising the availability of the primary database. As with the maximum protected mode, transactions will not be committed until the redo required by the recovery transaction is written to the local online redo log and at least one of the prepared redo logs on the transactional consistency database. Unlike the maximum protected mode, the primary database does not shut down if a failure causes the primary database to fail the write redo stream to the offsite standby redo log. Instead, the primary database runs in Maximum performance mode until the failure is eliminated, and the interrupts in all redo log files are resolved. When all interrupts are resolved, the primary database automatically continues to run in Maximum availability mode.

This mode ensures that data loss does not occur if the primary database fails, but only if the second failure does not prevent the full redo data set from being sent from the primary to at least one standby database.

In maximum availability mode, if the connection to the standby is normal, the operation is equivalent to maximum protection mode, and the transaction is also committed by the master repository. If the repository loses contact with the main library, the main library automatically switches to maximum performance mode to ensure maximum availability of the main library.

did you find it? "Loss of contact" is important if the repository loses contact with the main library。 The situation in this article is precisely the normal connection, that is, IO performance is poor, not completely do not provide services.

This article is from the "Small Kennel" blog, please make sure to keep this source http://hsbxxl.blog.51cto.com/181620/1846499

Standby disk IO performance is poor, affecting primary performance

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Standby disk IO performance is poor, affecting primary performance

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Standby disk IO performance is poor, affecting primary performance

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support