A very rare troubleshooting process for SQL Server image exceptions

Source: Internet
Author: User
Tags ack truncated

This Monday night, received a database mirroring suspend and synchronizing state to switch back and forth the alarm, see this error, in fact, quite common.

Our repair steps are generally as follows:

1. ALTER DATABASE * * * SET PARTNER RESUME

If this step does not work, you need to trigger a transaction, because occasionally we encounter the synchronizing state, can trigger a transaction repair, the reason may be the mirror because of network reasons ice a bit

2. CREATE table dbo.tb_repair_mirroring_**** (ID int), and then delete the table

When this repair is also a failure, we need to try to remove the image and then reapply, of course, the premise of success must ensure that the Master/slave transaction log can be rolled forward, that is, the transaction log chain is not lost or destroyed

3. ALTER DATABASE * * * Set PARTNER OFF and reapply ALTER DATABASE * * * Set PARTNER = ' tcp://*** ' (similar statement)

4. If the above can not be repaired, the only way to fix it is to rebuild the image, and until now, this is a problem, it can be said that the last trick

But unfortunately, all of the methods have been used and have not been able to repair, or continue to appear this error.

Errorlog Main Library:

2014-12-17 15:57:30.640 spid33s error:1453, Severity:16, State:1.

2014-12-17 15:57:30.640 spid33s ' tcp://*** ', the remote mirroring partner for database ' * * * *, encountered error 3624, stat US 1, severity 20.  Database mirroring has been suspended. Resolve the error on the remote server and resume mirroring, or remove mirroring and re-establish the mirror server Instan Ce.

Standby Library:

2014-12-17 00:00:42.97 spid41s error:17066, Severity:16, State:1.
2014-12-17 00:00:42.97 spid41s SQL Server assertion:file: <loglock.cpp>, line=807 Failed assertion = ' result = = LCK _ok '. This error is timing-related. If the error P ersists after rerunning the statement, use DBCC CHECKDB to check the database for structural integrity, or Restart the server to ensure in-memory data structures is not Corr upted.

Looks like the data page has the problem, looked up the related error and resembles the bug, simultaneously also generated many dump files, further analyzes the situation:

Main Library Exception db current tps,3000 around

Begin Trandeclare @value intdeclare @value2 intselect @value =convert (int,cntr_value) from Sys.dm_os_performance_ Counters

where object_name like ' mssql$***:d atabase%% ' and instance_name= ' * * * * and counter_name like ' transactions/sec%% '

WAITFOR DELAY ' 00:00:01 '

Select @value2 =convert (int,cntr_value) from Sys.dm_os_performance_counters

where object_name like ' mssql$***:d atabase%% ' and instance_name= ' * * * * and counter_name like ' transactions/sec%% '

Select @[email Protected]

Commit Tran

The main library send queue is increasing

Begin Trandeclare @value intdeclare @value2 intselect @value =convert (int,cntr_value) from Sys.dm_os_performance_ Counters

where counter_name = ' Log Send Queue KB ' and instance_name= ' * * *;

WAITFOR DELAY ' 00:00:01 '

Select @value2 =convert (int,cntr_value) from Sys.dm_os_performance_counters

where counter_name = ' Log Send Queue KB ' and instance_name= ' * * *;

Select @value *1./1024 as FIRST_SECOND_MB, @value2 *1./1024 as SECOND_SECOND_MB, (@[email protected]) *1./1024 as DIFF_MB

Commit Tran

Sometimes catch up

Application Queue for Standby library

Begin Trandeclare @value intdeclare @value2 intselect @value =convert (int,cntr_value) from Sys.dm_os_performance_ Counters

where counter_name = ' Redo Queue KB ' and instance_name= ' * * *

WAITFOR DELAY ' 00:00:01 '

Select @value2 =convert (int,cntr_value) from Sys.dm_os_performance_counters

where counter_name = ' Redo Queue KB ' and instance_name= ' * * *;

Select @value *1./1024 as FIRST_SECOND_MB, @value2 *1./1024 as SECOND_SECOND_MB, (@[email protected]) *1./1024 as DIFF_MB

Commit Tran

The application speed of the standby library is about 14MB.

Begin Trandeclare @value bigintdeclare @value2 bigintselect @value =convert (bigint,cntr_value) *1./1024/1024 from sys.dm _os_performance_counters

where counter_name = ' Redo bytes/sec ' and instance_name= ' * * *;

WAITFOR DELAY ' 00:00:01 '

Select @value2 =convert (bigint,cntr_value) *1./1024/1024 from Sys.dm_os_performance_counters

where counter_name = ' Redo bytes/sec ' and instance_name= ' ******scard ';

Select (@[email protected]) as SPEED_MB

Commit Tran

Suppose to ignore the new log generated by the main library, the time required to catch up with the main library 32min, if the main library is newly generated more than 40min

Select 27338.278320/14/60 =32.545569428566

Re-troubleshoot

Select cntr_value,* from Sys.dm_os_performance_counterswhere counter_name in (' Log Send Flow Control Time (ms) ', ' Bytes Sen T/sec ', ' Log Bytes sent/sec ', ' Log compressed Bytes sent/sec ', ' Log harden Time (ms) ', ' Log Send Flow Control time (ms) ',

' Log Send Queue KB ', ' mirrored Write transactions/sec ', ' Pages sent/sec ',

' Send/receive Ack time ', ' sends/sec ', ' Transaction Delay ')

and Instance_name= ' * * * ';

Send/receive Ack Time:

Milliseconds that messages waited for acknowledgement from the partner, and the last second.

This counter was helpful in troubleshooting a problem that might being caused by a network bottleneck, such as unexplained FAI Lovers, a large send queue, or high transaction latency. In such cases, you can analyze the value of this counter to determine whether the network is causing the problem.

Whether the problem may occur on the network, the host Toa (network component) has been upgraded, but the standby toa is not upgraded

Waiting for confirmation of the time stable around 800ms, compared to upgrade Toa host in 100ms or so, of course, this is also related to the TPS compared to the example, but the problem is found in the place may be here

Begin Trandeclare @value bigintdeclare @value2 bigintselect @value =convert (bigint,cntr_value) from Sys.dm_os_ Performance_counters

where counter_name = ' send/receive Ack time ' and instance_name= ' * * *;

WAITFOR DELAY ' 00:00:01 '

Select @value2 =convert (bigint,cntr_value) from Sys.dm_os_performance_counters

where counter_name = ' send/receive Ack time ' and instance_name= ' * * *;

Select (@[email protected]) as ' send/receive Ack time '

Commit Tran

Half a child found the sending queue has become larger, sending speed is

The application queue of the standby is also very slow to reduce

Follow-up we communicated with the user maintenance time did CHECKDB, but did not find no bad page, indicating that the data is OK, then the problem may be in the log.

Just as we found that the data log is always large, and shrink the database log invalid, look at the type of wait, found to be active_transation, generally, the normal situation is to wait for log backup and Databae mirroring (only for RDS environment, The environment is more complex, waiting for more types), so we feel a little bit.

Log_reuse_wait_desc to Active_transaction

Active_transaction: The transaction is active (all recovery models).

    • A long-running transaction may exist at the beginning of a log backup. In this case, another log backup may be required to free up space.
    • Transactions are deferred (only for SQL Server 2005 Enterprise Edition and later). A "deferred transaction" is a valid active transaction, because some resources are not available and their rollback is blocked.

First we check to see if there are transactions in the blocking log being truncated:

DBCC Opentran (* * *)

Found a major clue, there is a No. 277 process is blocking the culprit (the user initiated a transaction), analyzed the No. 277 process, is the user to check the system view of the script, wherein, a SELECT statement, the other in the buffer is similar, in short, there is a transaction in the inside

Number 15th of the transaction today 19th, resulting in the middle of the log is all the activity log can not be truncated, with the user confirmed that kill off, again DBCC OPENTRAN (' * * * ')

Found that the results have been changing, it can be understood that the activity log has been moving forward (less and fewer), after the backup of the activity log back to about 7G, re-success.

Finally in the case of no impact on the user's successful resolution, we almost want to let him stop the business do restart (not necessarily can restart success), but also once again prove that the strength of the team is huge, everyone's careful is the key to solve the problem.

https://connect.microsoft.com/SQLServer/feedback/details/412955/

Http://stackoverflow.com/questions/4192730/sqlbulkcopy-mirroring-failed

http://msdn.microsoft.com/zh-cn/library/ms188290 (v=sql.105). aspx

Http://msdn.microsoft.com/zh-cn/library/ms189931.aspx

Http://msdn.microsoft.com/zh-cn/library/ms176064.aspx

http://msdn.microsoft.com/zh-cn/library/ms178037 (v=sql.105). aspx

A very rare troubleshooting process for SQL Server image exceptions

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.