Event Cause: Recent research and development response, a database from 08 to 12 environment, the occasional write-operation delay problem;
Event analysis: After excluding system resource contention, initial analysis may cause the synchronization mode AlwaysOn node to often appear the problem of session timeout waiting for submission due to network jitter.
After troubleshooting, the extended event found that 35202 errors occurred irregularly, which is the message of a replica connection recovery.
Because the computer room network environment is complex, the database server and the application Server mix a switch, at the peak of business, because the upper port traffic is full and cause the connection failure occurs repeatedly.
Since the network environment can not be rebuilt in the short term, it starts from the SQL Server itself, and only the part of the data synchronization is reformed;
Existing environment:
SQL AG: Two-node synchronization mode, two nodes each have a network card connected to the switch, no direct-link jumper (Wsfc also no longer require a separate heartbeat network)
Transformation programme:
1, two nodes each enable a network card, the direct connection mode of communication, while configuring private address
Copy Code code as follows:
server_a:10.0.0.11
server_b:10.0.0.12
2, delete the endpoint of two nodes, manually recreate listener_ip as direct-attached IP endpoint
3, change the endpoint_url of the AG, each copy
4, waiting for data resynchronization;
The third step of the script is as follows, to operate on two nodes separately, note that LISTENER_IP is a direct-attached network card IP
Copy Code code as follows:
/****** Object:endpoint [Hadr_endpoint] Script DATE:2015/1/6 16:06:17 ******/
DROP ENDPOINT [Hadr_endpoint]
Go
/****** Object:endpoint [Hadr_endpoint] Script DATE:2015/1/6 16:06:17 ******/
CREATE ENDPOINT [Hadr_endpoint]
state=started
As TCP (Listener_port = 5022, listener_ip = (10.0.0.11))
For data_mirroring (role = all, authentication = WINDOWS Negotiate
, encryption = REQUIRED algorithm AES)
Go
The fourth step is the following script, which can be executed in the master copy
Copy Code code as follows:
ALTER availability GROUP [Alwayson01]
MODIFY REPLICA on n ' node_01 ' with (Endpoint_url = n ' tcp://10.0.0.11:5022 ')
ALTER availability GROUP [Alwayson01]
MODIFY REPLICA on n ' node_02 ' with (Endpoint_url = n ' tcp://10.0.0.12:5022 ')
Note: The two replicas are not synchronized after the deletion of the endpoint, but the databases in the Listener and group AG are unaffected, and the service of the master copy is still normal for the application;