Company user feedback a system in 14:00~15:00 (2016-08-16) This time period reaction is relatively slow, so generated the time period of the AWR report,
As shown above, the server is not busy during this period by comparing the elapsed time and db hours. Analysis of Top 5 Timed events, we can see the first five wait event
You can see that the wait event Enq:tx-row lock contention takes up all of the wait event 17.3% proportions, guessing it might be caused by a lock wait (enqueue wait), but this is not conclusive because CPU time and DB file Sequential read waits for a larger percentage of events. So I used Awrddrpt.sql to generate the AWR comparison report for the same period of 15th and 16th.
As shown above, the DB time of number 16th 14:00-15:00 is smaller than the db time of 15th, from top 5 Timed events, 15th no Enq:tx-row lock contention wait event, then it is likely this caused, then I Next look at the 16th 14:00-15:00 time period of the AWR report of the wait Events, as shown below, Enq:tx-row lock contention Total Waits Tims (s) for 1022 seconds, AVG Waits (MS) For 2848 milliseconds, the lock wait (Enqueue wait) is still pretty serious.
So let's go check segments by Row lock Waits, and see if those objects produce a wait event "" Enq:tx-row Lock contention, as shown below, mostly table Inv_next_no, and index Idx_inv_srn_ Hd_n1
In addition to the trace file in the bdump found at around 14:30 TNS-12535 & TNS-1260 Error, then I will generate 14:25:00~14:35:00 this time period of the Ash report to analyze
Client Address = (address= (protocol=tcp) (host=192.168.xxx.xxx) (port=44913))
2016-08-16 14:32:36.012
NS Primary Error:tns-12535:tns:operation timed out
NS Secondary error:tns-12606:tns:application timeout occurred
kmduicxd:0x7f381c4bc770, Kmduiflg:1, circuit:0x3778688f0
process id = (0X37F799EF8, 1)
process id = (17, 1)
# = 416
Connection context = 0x7f381c4bc770
User session = ((nil)), flag = (100c0), queue = (9)
Current buffer = (0), status = (4, 0)
Client Address = (address= (protocol=tcp) (host=192.168.xxx.xx) (port=60069))
2016-08-16 14:32:58.679
NS Primary Error:tns-12535:tns:operation timed out
NS Secondary error:tns-12606:tns:application timeout occurred
kmduicxd:0x7f381c4bd960, Kmduiflg:1, circuit:0x377942fd0
process id = (0X37F799EF8, 1)
process id = (17, 1)
# = 798
Connection context = 0x7f381c4bd960
User session = ((nil)), flag = (100c0), queue = (9)
Current buffer = (0), status = (4, 0)
Can be seen mainly because of the following statement caused by
SELECT from WHERE for UPDATE
Select from where ' srn-ymg-201608-' for Update
As soon as you see the FOR UPDATE statement, you already know that 7788, when two sessions or multiple sessions simultaneously update a row, there will be a Enq:tx-row lock contention wait event, If one of the sessions is slow to commit a transaction or because of other failures such as the network, the other sessions will wait for the resource, and in high cases there will be a lot of blocking. This is really due to unreasonable design. Because the system is going to generate a unique and sequential ticket (prefix + number), in order to obtain a unique and sequential ticket, this design is implemented using the Select for update, without using sequence (because sequence may jump the number, Cause the number is not continuous), also can not use other ways. If you can change the business logic of generating a number, this problem will be solved.
ORACLE awr combined with ash diagnostic analysis Enq:tx-row lock contention