This is the case for a customer in March, because after a server hardware failure, the business translation occasionally appears slow to commit. Let's take a look at the awr situation first.
We can see that the load profile information for this system is actually not high, only 21 transaction per second. First Look at top5events:
From the top 5event, we can find that the AVG wait of log file sync is very high, up to 124ms. As you should know, for the vast majority of cases
, the average wait time for log file sync is less than 5ms, and the value is a bit high.
We know that there are a number of reasons why log file sync waits. About log file Sync,tanel Poder the Great God wrote a very bull PDF, you can participate in the examination.
Here I mainly refer to the diagram of the great God, to briefly describe the reasons why the log file sync might be, first of all, we look at the process from the front-end process submission to the final feedback, as well as the flow of intermediate processing:
From there, we can clearly see the whole process. Here you can do a simple descriptive narrative:
1, when the user initiates a commit;
2, the front-end process (that is, the server process) will post a message to the LGWR process, tell it, you should go to write redo buffer.
3, when the LGWR process is instructed, start calling the operating system function for physical write, in the time of physical writing, will appear
Log file parallel write waits. There may be questions here about why 12c had only a LGWR process before it was parallel
Write it? It is to be explained here that the LGWR process writes the data in the redo buffer to the log file, in batch mode.
The process (in fact, the DBWN process is also the batch mode), with associated implicit parameter control.
4, when the LGWR is finished wrtie operation, the LGWR process will return a message to the front-end process (server process), tell it, I have finished,
You can complete the submission.
5. User completes the commit operation.
This is supplemented by the principle that Oracle log writes first, if the relevant entry information for redo buffer is not immediately written to redo before commit
Log file, if the database appears crash, then this will throw data.
From the flowchart above, we can actually see that log file sync and log file Parallel write can be said to be interrelated. In other words, assuming that the log file parallel write is very long, it must cause the log file sync wait time to lengthen.
If log file parallel write waits very high, then it is probably a physical disk IO problem, such as the following:
From being able to release, assuming that the LGWR process is too long in the process of completing the IO operation, the log file parallel write waits to be raised.
In fact, in the whole process of user commit to complete commit, involves a lot of links, not only the physical IO will affect the log file Sync/log file parallel write. There are also CPUs that affect log file sync and log file parallel write. Let's take a look at the figure:
We can see that 4 of the above process involves CPU scheduling, assuming that during the whole transaction commit process, the system CPU is extremely tense, then this may cause the LGWR process can not get the CPU, will be queued, obviously, this will inevitably lead to log file Sync or log file parallel write waits
The increase.
Note: Oracle also has the ability to control the CPU prioritization of a process by means of an implicit _high_priority_processes. In a system where the CPU is relatively scarce, it can be mitigated by setting this number of parameters.
Finally we return to this case, the customer here environment, we are able to eliminate the CPU problem. So the biggest suspicion may be the storage itself, causing the IO is very slow, however, in fact, it can be ruled out, you should actually notice the top 5 event, log file parallel write average wait
Time is not high, assuming that the storage IO problem, then the average wait time for this event should be relatively well paid.
We were able to see that the log file sync and log file parallel write waits were almost identical. However, the AVG wait time for log file parallel write is only 4ms, which is a normal value. That means we can eliminate the storage IO problem.
So what's the problem? We use the scripts provided by Oracle MOS to query the distribution of log file sync and log file parallel write waits: (real-time Viewing)
123456789101112131415161718192021222324252627282930 |
INST_ID EVENT WAIT_TIME_MILLI WAIT_COUNT
---------- ---------------------------------------- --------------- ----------
1 log file sync 1 259306
1 log file sync 2 2948999
1 log file sync 4 1865918
1 log file sync 8 173699
1 log file sync 16 43194
1 log file sync 32 6095
1 log file sync 64 1717
1 log file sync 128 2458
1 log file sync 256 5180
1 log file sync 512 9140
1 log file sync 1024 558347
1 log file parallel write 1 5262
1 log file parallel write 2 4502377
1 log file parallel write 4 1319211
1 log file parallel write 8 46055
1 log file parallel write 16 23694
1 log file parallel write 32 3149
1 log file parallel write 64 283
1 log file parallel write 128 267
1 log file parallel write 256 157
1 log file parallel write 512 73
1 log file parallel write 1024 42
1 log file parallel write 2048 39
1 log file parallel write 4096 103
1 log file parallel write 8192 21
1 log file parallel write 16384 22
1 log file parallel write 32768 190
1 log file parallel write 65536 1
|
You can simply calculate, in fact, the log file sync and log file parallel write wait event, almost 99% of the average waiting time is
Less than or equal to 4ms, this is a normal situation; however, there are a few cases where the waiting time is very long, such as log file sync with the highest single wait
The time is up to 1 seconds, because the occasional wait is very high, so the average wait time for the entire log file sync has been pulled higher.
In the end, the problem is clearer, and I think this is because the link between the host and the storage can be caused by an anomaly or instability. A temporary solution
Move the redo logfile to the local disk and conquer the problem.
PostScript: After the customer confirmed that the storage fiber optic cable interface is really loose.
Source:http://www.cnblogs.com/hrhguanli/p/3891951.html
Oracle Wait Event log file Sync + log file parallel write (awr optimized)