Performance case 02 -- sybase connection blocking, 02 sybase
Symptom:Recently, a problem was reported on the site. during the approval process, the system often gets stuck. The whole system is useless and the browser access is in the loading state.
Troubleshooting:
1. generally, when the system crashes, we first think of the memory problem, but the phenomenon is loading. That is to say, the thread is not running and it is suspected that the thread is blocked. The jvisualvm monitors the configuration, if the memory is not full after the problem occurs, check the thread dump and find that most of them are executing SQL queries,
It was initially found that SQL Execution was slow.
2. We used the sybase Database. After executing a few simple SQL statements, we found that the execution was not completed in a few minutes.Sp_sysmon "00:00:30"In the last 30 seconds of monitoring, we found that the cpu, memory, and thread are okay, and almost 1% of the usage was not available. We suspect that a connection has blocked the table, resulting in all other connections being blocked.
3. Use the stored procedure we write to view the blocked connection. The result is as follows:
PS: Use the sp_lock command to check which connection is blocking the database, but tableid is displayed. You also need to query and convert it to a specific table name, the stored procedure is converted here and will be attached later.
Found that more than 10 thousand locks, more than 7000 exclusive locks, Ex_row-blk is blocked other connections locks, found 21 blocked other connections locks, the corresponding table is T_ZXLD_SYYH, execute select * from master .. sysprocesses finds the corresponding connection, for example, if the User has 316/287/283, the result is as follows:
At this point, the cause is basically determined. We use the c3p0 connection pool, which should be a connection that blocks the table. All other connection queries are blocked, resulting in the connection pool being full, all requests that involve database queries are blocked, and the page is always in the loading state.
4. the next step is to find the blocking point and find that tran_name is both $ chained_transaction, which is determined by the program. That is to say, all transactions in a chain of responsibility are blocked, the system uses the chain of responsibility for two items, and you can directly determine the chain of responsibility resulting from the approval. The next step is to check whether there is a problem with the code here. The approval process is as follows:
(1) Start a transaction
(2) query the primary table data to be approved based on the parameters. Two SQL statements
(3) call the approval component one by one (a component similar to a workflow) for approval. About five SQL statements are updated for each piece of data, and two SQL statements are queried.
(4) update the status of the master table. One SQL statement is updated for each piece of data.
(5) Insert audit logs. One SQL statement is inserted for each piece of data.
(6)Generates a reminder message to query all the data in the master table, 20 query SQL statements, and 5 update and delete SQL statements (very slow)
(7) Update the incremental record table, record the modification time and status of the data, and update one SQL statement for each data record.
(8) commit a transaction
If mutual blocking is suspected during concurrency, all business tables will be queried where the reminder message is generated. If other connections are approved in the transaction at this time, the transaction will be blocked, and other connections are generating messages, mutual congestion, resulting in deadlocks. Theoretically, the database automatically handles deadlocks, but it does not know the cause. The number of deadlocks reported in the log is not very large.
In addition, the business processing here is too reasonable. Normal users approve about 100 pieces of data in batches each time, so there are more than 900 SQL statements for addition, deletion, and modification, and more than 200 SQL queries, SQL statements such as message generation are very slow (because old messages are updated), so many operations are very slow in a transaction. The approval table contains about million data and the business master table contains million data, the execution is not very fast.
Solution: rebuild the code here and start a new thread in step (5) (6) (7) of the slow execution above, instead of being executed in the transaction. Even if the execution fails, it will not affect much. In this way, SQL is reduced by 30%, and the speed is much faster. It was updated to the site and observed for a week. It was considered a solution to the problem if no response was reported.
Although the problem is solved, the root cause is not found. Why is the deadlock blocked? Why is the deadlock not detected automatically? This will be tracked in the future.
Finally, although the problem was solved, it took a lot of time to communicate with the O & M personnel. It took a lot of time to ask the O & M personnel for on-site data. After a simple arrangement, we collected the data when there was a problem in the future:
1. sp _ lock result; (use the stored procedure provided at the end of the article)
2. select * from master... sysprocesses;
3. If the database is slow, add the result of sp_sysmon "00:00:30;
4. dump the middleware thread file;
Attach the stored procedure of sybase to view the lock
IF OBJECT_ID ('dbo.sp__lock') IS NOT NULLDROP PROCEDURE dbo.sp__lockGOcreate procedure sp__lock( @dbname char(30)=null,@spid int=null, @dont_format char(1) = null)asbegin declare @dbid smallint if @dbname is not null select @dbid=db_id(@dbname) if (charindex("sa_role", show_role()) > 0) begin if @dont_format is null select "Type"=substring(v.name,1,11), "User"=substring(suser_name(p.suid)+" ("+rtrim(convert(char(6),l.spid))+")",1,20), "Table"=substring(db_name(l.dbid)+".."+convert(char(20),object_name(l.id,l.dbid)),1,26), "Page"=convert(char(8),l.page), "Cmd"=substring(p.cmd,1,11) from master..syslocks l, master..sysprocesses p, master..spt_values v where p.spid=l.spid and l.type = v.number and v.type = "L" and p.dbid=isnull(@dbid,p.dbid) and p.spid=isnull(@spid,p.spid) and l.dbid=isnull(@dbid,l.dbid) and l.spid=isnull(@spid,l.spid) order by l.dbid, l.id, v.name else select "Type"=v.name, "User"=suser_name(p.suid)+" ("+rtrim(convert(char(6),l.spid))+")", "Table"=db_name(l.dbid)+".."+object_name(l.id,l.dbid), "Page"=l.page, "Cmd"=p.cmd from master..syslocks l, master..sysprocesses p, master..spt_values v where p.spid=l.spid and l.type = v.number and v.type = "L" and p.dbid=isnull(@dbid,p.dbid) and p.spid=isnull(@spid,p.spid) and l.dbid=isnull(@dbid,l.dbid) and l.spid=isnull(@spid,l.spid) order by l.dbid, l.id, v.name return end select "Type"=v.name, "Usernm"=convert(varchar(60),suser_name(p.suid)+" ("+rtrim(convert(char(6),l.spid))+")"), "TableNm"=convert(varchar(60),db_name(l.dbid)+".."), "Page"=l.page, "Cmd"=p.cmd, l.id, l.dbid into #locks from master..syslocks l, master..sysprocesses p, master..spt_values v where p.spid=l.spid and l.type = v.number and v.type = "L" and l.dbid=isnull(@dbid,l.dbid) and l.spid=isnull(@spid,l.spid) and p.dbid=isnull(@dbid,p.dbid) and p.spid=isnull(@spid,p.spid) update #locks set TableNm=TableNm+object_name(id,dbid) where dbid=db_id() or dbid=1 or dbid=2 update #locks set TableNm=TableNm+convert(varchar,id) where dbid<>db_id() and dbid>2 delete #locks where TableNm like "tempdb..#locks%" if @dont_format is null select substring(Type, 1,11), "User"=substring(Usernm, 1,14), "Table"=convert(char(26),TableNm), "Page"=convert(char(8),Page), "Cmd"=substring(Cmd,1,11) from #locks order by dbid, id, Type else select Type, "User"=Usernm, "Table"=TableNm, Page, Cmd from #locks order by dbid, id, TypeendGO