ORACLE awr report log file sync wait for event optimization summary "Turn from Itpub"

Last Update:2014-08-27 Source: Internet

Author: User

Tags mutex cpu usage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

from the white master (eel with) to log file sync waiting event Optimization summary, for you puber to learn the reference:

The log file Sync average wait time exceeds 7ms, if the waiting time is too long, it means that log write is too long each write, if you can optimize the redo log file storage, so that it is stored on a faster disk, you can reduce the waiting event of a single wait time. (RAID 5--> raid ten)
What to do if the problem cannot be solved by optimizing the I/O performance of the redo log, or after I have optimized the I/O performance of the redo log, or if we are unable to meet our expectations?

Second, an experienced DBA might recommend increasing the log buffer. Referring to the increase in the log buffer, may be some friends will be puzzled, redo log file write wait for a long time how to and log cache flush directly associated with it? In fact, this problem is not difficult to explain, if the data file I/O performance is a problem, the average block read waiting time is too long, then by increasing the DB cache to reduce the total number of I/O, so as to achieve optimal I/O effect. The principle of increasing the log buffer is the same, so that you can make
The log cache stores more redo log data, thereby reducing the number of LGWR writes due to redo log buffers, resulting in an average write redo log file perincreases the number of redo bytes, thus reducing the number of redo I/O, thus achieving the goal of optimizing the log file sync wait event.

Thirdly, if neither of these methods is possible, there is another way: to reduce the number of commits. If the submissions are too frequent, then no matter how the optimization can not completely solve the problem.
The log file sync wait time can be reduced effectively by increasing the number of one commit record and reducing the batch submission. Using this approach means that larger adjustments are needed, and even modifications to the application architecture will be costly.
　　
Four, there is also a solution to optimize the log file Sync event, that is, some of the frequently committed transactions set to asynchronous commit.
asynchronous Commit is a new feature introduced by the 10g version, and you can control asynchronous commits by setting the Commit_write parameter.
commit_write Parameter Default value is "Immediate,wait"
can be set to "immediate,nowait" to implement asynchronous commits.
systems that use asynchronous commits need to do some extra checking and processing, clean up inconsistent data, and reinsert data that was lost just because of an asynchronous commit. This requires some special processing at the application level, the validation mechanism and the error data processing mechanism. We need to make some special settings at the application level. It should be noted that those data that are particularly important and that cannot be fully replenished in the future are not suitable for use in this way
　　

The log file sync Wait event is critical, and we should establish a baseline for this metric in our daily maintenance of the database, and if this indicator changes, be sure to analyze and resolve the problem as quickly as possible. Once this indicator deteriorates, it can result in a sharp drop in system performance and even a temporary suspension. Last year, a customer's system, usually log file sync indicator is 2-3ms. In a patrol at the old white found that the indicator grew to 7ms, when the inspection report recommended that customers pay attention to this indicator, and as soon as possible to check the storage system and operating system, to find out the cause of the slowdown. The customer checked the storage, did not find the fault, so it did nothing. In the next month when the inspection, found that the indicator grew to 13ms, again warning, still no problem found. The indicator continued to deteriorate over the next two months, growing to more than 20 milliseconds. As the previous months of inspection work did not find the problem, and the current system is still very normal, so the customer did not go to seriously check. Finally one day, the system suddenly hangs, 5 minutes after the return to normal. Later check the cause, that is, log file sync wait. According to my suggestion, the customer checked from start to finish, finally found that one of the LVM link in the flash-off phenomenon, fixed the link, everything back to normal.

through the above case, we have to learn, if the log file sync indicator has deteriorated, we must quickly troubleshoot the root cause of the problem, if the log file sync waiting time continues to rise, then the likelihood of the system hangs is also increasing. It is imperative to find the cause of the problem as soon as possible.

-----------------------------------------------------------------------------

A summary of the log file sync wait event optimization from the master (Eygle), for your puber to learn for reference:

Http://www.eygle.com/statspack/statspack14-LogFileSync.htm
When a user submits (commits) or rolls back (rollback), the redo information of the session needs to be written out to redo logfile.
The user process notifies LGWR to perform a write-out operation, and LGWR notifies the user of the process after completing the task.
This wait event refers to the user process waiting for the LGWR to complete the notification.

For a rollback operation, the event records the time from the user issuing the rollback command to the rollback completion.

If you wait too much, it may indicate that the LGWR writing is inefficient, or that the system commits too often.
For this issue, you can focus on:
Log file parallel write wait event
Statistics such as user Commits,user rollback can be used to observe commit or rollback times

Solution:
1. Improve LGWR Performance
Try to use a fast disk and do not store redo log file on a RAID 5 disk
2. Using Bulk Submissions
3. Appropriate use of nologging/unrecoverable and other options

The average redo write size can be calculated by the following formula:

Avg.redo Write size = (Redo Block Written/redo writes) *512 bytes

If the system produces a lot of redo, and each write less, general description LGWR is too frequent activation.
may lead to excessive redo-related latch competition, and Oracle may not be able to effectively use piggyback functionality.

We extract some data from a statspack to look at this problem.

Here we see that log file Sync and db file parallel write wait for the same time.
Apparently log file sync is waiting for the completion of DB file parallel write.

There must be a bottleneck here in disk IO, where the actual user's redo and data files are stored on a RAID disk with performance issues.
Need to be adjusted.

Due to the frequent submission of transitions, LGWR over-frequent activation, we see there is redo writing latch competition.
About redo Writing Competition You can find detailed descriptions at Steve's site:
Http://www.ixora.com.au/notes/lgwr_latching.htm

Oracle Internals Notes
LGWR Latching

When Lgwr wakes up, it first takes the redo writing latch to update the SGA variable that shows whether it's active. This prevents other Oracle processes from posting LGWR needlessly. Lgwr then takes the redo allocation latch to determine how much redo might is available to write (subject to the release O f the Redo copy latches). If None, it takes the redo writing latch again to record that it is no longer active, before starting another RDBMS IPC me Ssage wait.
If There is redo to write, LGWR then inspects the latch recovery areas for the redo copy latches (without taking the latch ES) to determine whether there is any incomplete copies into the log buffer. For incomplete copies above the sync RBA, LGWR just defers the writing of that block and subsequent log buffer blocks. For incomplete copies below the sync RBA, Lgwr sleeps on a lgwr wait for redo copy wait event, and was posted when the requ Ired Copy latches has been released. The time taken by LGWR to take the redo writing and redo allocation latches and to wait for the redo copy latches are Accum Ulated in the redo writer latching time statistic.

(Prior to release 8i, foreground processes held the redo copy latches more briefly because they do not retain them for th e Application of the change vectors. Therefore, LGWR would instead attempt to assure itself this there were no ongoing copies into the log buffer by taking all The redo copy latches.)

After each redo write have completed, LGWR takes the redo allocation latch again in order to update the SGA variable Contai Ning the base disk block for the log buffer. This effectively frees the log buffer blocks that has just been written, so that they could be reused.

------------------------------------------------------------------------------------

from Master Lu (vage) on the log file sync waiting event Optimization summary, for you puber to learn the reference:

1. Log File Sync is the time from the start of submission to the end of the submission. Log file Parallel Write is the time LGWR begins writing redo file to the end of the redo file. With this in mind, you know that log file sync contains the log file parallel write. Therefore, the log file sync wait time out, must first look at the log file parallel write. If the log file Sync average wait time (also known as commit response time) is 20ms,log file parallel write 19ms, then the problem is obvious, Redo file I/O is slow and slows down the commit process.

2, log file sync time more than log file parallel write. The server process began to commit, to notify LGWR write REDO,LGWR finish redo notification process submitted, back and forth notification is also to consume CPU. Except for round-trip notification, commit also increase the SCN and so on, if log file sync and log file parallel write gap is very large, prove I/O is not a problem, but it may be CPU resource tension, Causes the process and LGWR to be notified back or forth or other operations that require the CPU to get insufficient CPU, resulting in a delay.

in this case, take a look at CPU occupancy, load, if the load is high and CPU usage is high, which is because the CPU causes log file Sync response time to be extended. In this case, the database will usually have some complications, such as Latch/mutex competition is more serious than usual, because the CPU is tight, Latch/mutex competition some will add huge.

3, log file sync and log file parallel write difference is very large, but the CPU utilization is not high, this situation is relatively rare, this is a category of difficult diseases. I/O is also fast, CPU is plentiful, log fie parallel write response time is very short, but log file sync response time is very large. This is the most difficult to locate the situation, can be a comprehensive comparison of redo related data (v$sysstat), redo related latch changes.
For example, the average response time for redo Synch is not committed every redo synch time, but redo synch time is required for each commit. If the redo synch time should be fast, and log file sync is slow, then there is a problem with the mutual notification phase of the LGWR and the process. and redo entries, the number of redo entries, the real implication is the number of times the process writes redo to the log buffer. Redo log space wait time, redo log space requests data and log Buffer space await events are also under concern. The size of the log buffer usually does not affect the log File Sync, but with the change of log buffer, you can understand the change of redo volume.
about the effect of log buffer on log File sync,

under the new IMU mechanism, the redo data is first in the shared pool, submitted to log buffer, if there is wait, the wait time is log buffer Space. From log buffer to disk, the wait event is log file sync.
The same is true under the old mechanism, the wait before log buffer is log file sync after the wait is Space,log buffer.

4. Controlling file I/O may affect log file sync.
This issue has not yet been studied in depth, but it was previously observed in Ali's database.

5. Log File sycn and buffer Busy Waits.
there is no direct relationship. Other reasons, such as redo related latch, cause log File sync and buffer Busy waits to appear simultaneously. At this time log File sync and buffer Busy waits are not the original, the real culprit is log buffer access performance degradation.

6. Transferring redo to the remote Dataguard in synchronous mode will also cause log File sync.

Redo is an important optimization object of Oracle, DBWR's working principle I have deciphered almost, the next goal is LGWR, unfortunately still did not have time to do, and later for everyone to summarize in detail.

ORACLE awr report log file sync wait for event optimization summary "Turn from Itpub"

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More