An example of Log File sync waiting for Ultra-High Performance

Source: Internet
Author: User

This is the case of a customer in February March. The reason is that after the server hardware is replaced, the Business Translation occasionally experiences slow submission. Let's take a look at AWR.

 

 

We can see that the load profile information of the system is actually not high, only 21 transactions per second. Let's take a look at top5events:

From the top 5 events, we can find that the AVG wait of Log File sync is very high, as high as 124 Ms. You should know that for the vast majority of cases

The average wait time of Log File sync is less than 5 ms, which is a little too high.

We know that there are many reasons for the log file sync wait. Regarding Log File sync, tanel poder has written a very good PDF, so you can take the lead.

Here I mainly reference the map of the great god to briefly describe the possible causes of Log File sync. First, let's look at the feedback from the front-end process submission to the final feedback, and the entire process of intermediate processing:

From this, we can clearly see the entire process. Here we can briefly describe:

1. When a user initiates a commit;

2. The frontend process (that is, the server process) will post a message to the lgwr process and tell it that you should write redo buffer.

3. When the lgwr process is instructed, it starts to call the operating system function for physical writing. During this time

Log File parallel write is waiting. Some people may wonder why there was only one lgwr process in 12C, but it was parallel.

What about write? Here, we need to note that the lgwr process writes the data in the redo buffer to the log file in batch mode.

Process (in fact, the dbwn process is also in the batch mode), with related implicit batch number control.

4. After lgwr completes the wrtie operation, the lgwr process will return a message to the front-end process (server process), telling it that I have completed writing,

You can complete the submission.

5. the user completes the commit operation.

Here, we will add that this is because of the Oracle log write priority principle. If the redo buffer entry information is not immediately written to redo before commit

In log file, if the database has crash, data will be lost.

From the preceding flowchart, we can see that log file sync and Log File parallel write are interrelated. In other words, if the log file parallel write takes a long time, the log file sync wait time will be extended.

If the log file parallel write wait is very high, it may be a physical disk Io problem, for example:

 

From being able to release, assuming that the lgwr process takes too long to complete Io operations, the log file parallel write waits to rise.

In fact, the entire process of sending a commit to the completion of the commit involves many links, not only physical Io will affect Log File sync/log file parallel write. CPU also affects Log File sync and Log File parallel write. Let's take a look at the figure:

 

We can see that the four links in the above process involve CPU scheduling. Assuming that the system CPU is extremely tight during the entire transaction commit process, this may cause the lgwr process to fail to obtain the CPU and wait in queue. Obviously, this will inevitably cause Log File sync or log file parallel write to wait.

.

Note: Oracle can also control the process to get the CPU priority by implicit arguments _ high_priority_processes. In a system with a relatively low CPU, you can set the number of workers to mitigate this problem.

Finally, let's go back to this case. We can eliminate the CPU problem in the customer's environment. The biggest suspect may be the storage problem, resulting in a very slow Io. However, in fact, this can also be ruled out. In fact, we should note the top 5 events, average Log File parallel write wait

The time is not high. If it is a storage Io problem, the average wait time of this event should be higher than that of the queue.

 

We can see that the Waits of Log File sync and Log File parallel write are almost the same. However, the AVG wait time of Log File parallel write is only 4 ms, which is a normal value. That is to say, we can eliminate the storage Io problem.

So what is the problem? We use the script provided by Oracle MOs to query the distribution of Log File sync and Log File parallel write WAITS:

?
123456789101112131415161718192021222324252627282930    INST_ID EVENT                                    WAIT_TIME_MILLI WAIT_COUNT---------- ---------------------------------------- --------------- ----------         1 log file sync                                          1     259306         1 log file sync                                          2    2948999         1 log file sync                                          4    1865918         1 log file sync                                          8     173699         1 log file sync                                         16      43194         1 log file sync                                         32       6095         1 log file sync                                         64       1717         1 log file sync                                        128       2458         1 log file sync                                        256       5180         1 log file sync                                        512       9140         1 log file sync                                       1024     558347         1 log file parallel write                                1       5262         1 log file parallel write                                2    4502377         1 log file parallel write                                4    1319211         1 log file parallel write                                8      46055         1 log file parallel write                               16      23694         1 log file parallel write                               32       3149         1 log file parallel write                               64        283         1 log file parallel write                              128        267         1 log file parallel write                              256        157         1 log file parallel write                              512         73         1 log file parallel write                             1024         42         1 log file parallel write                             2048         39         1 log file parallel write                             4096        103         1 log file parallel write                             8192         21         1 log file parallel write                            16384         22         1 log file parallel write                            32768        190         1 log file parallel write                            65536          1

You can simply calculate the log file sync and Log File parallel write wait events. The average wait time of about 99% is

It is normal if the value is less than or equal to 4 ms. However, in a few cases, the waiting time is very long. For example, the maximum waiting time of Log File sync is

The wait time is as high as 1 second, because the occasional wait is very high, so the average wait time of the entire log file sync is increased.

In the end, the problem is clearer than the problem. I think this is because the link between the host and storage may be abnormal or unstable. Temporary solution

This problem is solved by moving the redo logfile to the local disk.

Note: After confirmation from the customer, the storage fiber optic cable interface is indeed loose. Haha

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.