In the RAC Database
In the RAC Database
The 'Log file sync' wait event in the RAC database is more complex than the 'Log file sync' wait event in the Standalone Database, mainly because the RAC database needs to synchronize the SCN to all instances.
The 'Log file sync' wait event in the standalone database. When the user session is committed (commit), the user session will notify the LGWR process to write the information in the redo buffer to the redo log file, after the LGWR Process completes the write operation, the LGWR post (notification) user session write operation is completed. After the user session receives the LGWR notification, the operation is submitted. Therefore, the user session remains in the waiting state before it receives the LGWR post (notification). The specific waiting event is 'Log file sync '. In the RAC database, Commit SCN needs to be synchronized/propagated to all nodes for consistent read. Two main methods of SCN synchronization/propagation are available: Lamport SCN and immediate commit propagation (BOC ).
By default, Lamport SCN is used in 10gR1 and earlier versions. In Lamport SCN mode, commit SCN on a node cannot be synchronized or propagated to all nodes immediately, that is, it may be delayed for synchronization or propagation, the Lamport SCN method of some RAC databases with high real-time requirements is not desirable. If you want commit SCN to synchronize/spread to all nodes immediately, manually modify the parameter MAX_COMMIT_PROPAGATION_DELAY = 1. By default, immediate commit propagation (BOC) is used from 10gR2. BOC is the commit SCN on one node, which is immediately synchronized/propagated to all nodes.
This article describes how immediate commit propagation (BOC) works:
1. When the user session is committed, the user session notifies the LGWR process to write the information in the redo buffer to the redo log file;
2. After the LGWR process receives the user session notification, it writes the information in the redo buffer to the redo log file, and LGWR synchronizes/propagates the commit scn to the LMS process of the remote database instance;
3. the LMS of the remote database instance synchronizes the commit SCN to the local SCN, and then notifies the lms of the commit instance that the SCN synchronization has been completed;
4. When the lms of the commit instance receives the LMS notification from all remote database instances, the lms of the commit instance then notifies all local LGWR nodes that the SCN synchronization has been completed;
5. After LGWR completes the IO operation and LMS process notifications, LGWR notifies the user session that commit is successful. The user session is waiting for log file sync until it receives the LGWR notification;
Based on the above principles, we can hardly find out the main reasons for 'Log file sync' to wait for the event:
1. Slow disk IO causes the LGWR process to write information in the redo buffer to the redo log file;
2. the user session commit is too frequent;
3. The CPU resources on the local or remote server are insufficient. As a result, the LMS and/or LGWR cannot be scheduled in time and work properly;
4. Poor performance of RAC private network, resulting in slow LMS synchronization to commit SCN;
5. Oracle BUG;
Analyze and process the important log/information when 'Log file sync' is waiting for the event:
1. AWR
For example, the waiting time for log file sync in AWR is basically the same as that for log file parallel write. Therefore, log file sync is caused by IO problems.
2. LGWR and LMS process trace file
For example, the following information is reported in the LGWR trace file, which may be caused by slow IO.
Warning: log write time 1000 ms, size 2 kb
3. OSWatcher <--- helps us confirm the server CPU resource usage
For example, the following is the output of vmstat in OSW, where the number of processes in runQ reaches 48, indicating that CPU resources were very tight at the time, which would cause LMS/LGWR to fail to receive CPU scheduling, this causes Log file sync to wait.
Procs memory page faults cpu
R B w avm free re at pi po fr de sr in sy cs us sy id
48 22 0 23877753 30244459 0 0 0 0 0 153454 2184632 38 60 2
48 22 0 23877753 30244094 0 0 0 0 0 153694 2181493 36 61 3
4. Alert log
5. Script to Collect Log File Sync Diagnostic Information (lfsdiag. SQL) [Document 1064487.1]
The main method to solve the 'Log file sync' wait event is as follows:
1. Improve disk IO speed
2. Batch submission is adopted to reduce the number of application commit times
3. Install OSWatcher to locate processes with high CPU usage
4. Use a dedicated network to correctly set network UDP buffer Parameters
5. Install the latest version of the database to avoid bugs. For specific bug fixes, see:
WAITEVENT: "log file sync" Reference Note (Doc ID 34592.1)