Possible failures during SQL Server AlwaysOn availability replica Sessions

Source: Internet
Author: User
Tags sessions

Introduction

A physical failure, an operating system failure, or a SQL Server failure can cause a session between two availability replicas to fail. The availability replica does not periodically check the components on which Sqlservr.exe depends to verify that the components are working correctly or have failed. However, for some types of failure, the affected component will report an error to Sqlservr.exe. Errors reported by another component are called "hard errors." In order to detect other failures that may be overlooked, the always on availability group implements its own session-timeout mechanism. Specifies the session time-out period in seconds. This time-out period is the maximum time a server instance waits to receive a PING message from an instance before it considers disconnecting from another instance. When a session time-out occurs between two availability replicas, the availability replica assumes that a failure has occurred and declares a "soft error."

Failure caused by a hard error

Possible causes of hard errors include (but are not limited to) the following situations:

    • Connection or network cable disconnected
    • Network card fails
    • Router changes
    • Firewall changes
    • Endpoint Reconfiguration
    • The drive where the transaction log resides is missing
    • Operating system or process failure

For example, if the log drive in the primary database stops responding or fails, the operating system notifies Sqlservr.exe that a critical error has occurred.

Some components, such as network components and some IO subsystems, use their own time-out settings to determine the failure. These timeout settings are independent of always on availability groups, which do not understand them and do not recognize their behavior at all. In these cases, a timeout delay increases the time between a failure and the availability replica receiving the resulting hard error.

Faults caused by soft errors

Scenarios that may cause session timeouts include (but are not limited to) the following:

    • Network errors such as TCP link timeouts, packets being deleted or corrupted, or packet order errors.
    • The operating system, server, or database is in a pending state.
    • Windows server timed out.
    • Insufficient compute resources, such as CPU or disk overload, transaction log filling, or system running out of memory or threads. In these cases, you need to increase the time-out period, reduce the workload, or replace the hardware to handle the appropriate workload.

Callback timeout mechanism

Because soft errors cannot be detected directly by the server instance, soft errors can cause an availability replica to wait indefinitely for the response of another availability replica in the session. To prevent this, always on availability groups implement the session-timeout mechanism, which is based on the following criteria: The connected availability replica sends pings at regular intervals on each open connection. Receiving a ping within the time-out period indicates that the connection is still open and that the server instance is communicating through this connection. When you receive a ping, the replica resets the timeout counter on this connection. The primary and secondary replicas ping each other to indicate that they are still active, and the session time-out limit is a user-configurable replica property with a default value of 10 seconds.

If a ping from another replica is not received within the session time-out period, the connection will time out, the connection will be closed, and a time-out copy enters the disconnected state. Even if it is a copy of synchronous-commit mode, the transaction will not wait for the replica to reconnect temporarily to switch the secondary replica to asynchronous-commit mode. After the secondary replica is reconnected with the primary replica, they will resume synchronous-commit mode.

Reference: https://msdn.microsoft.com/zh-cn/library/ff877884 (v=sql.120). aspx

Summary

Failure in a database other than the primary database could not be detected. In addition, it is unlikely that a data disk failure will be detected unless the database restarts due to a data disk failure, and a valid error check is performed on the availability replica only in the event of a soft error.

Note:

pursuer.chen

Blog:http://www.cnblogs.com/chenmh

This site all the essays are original, welcome to reprint, but reprint must indicate the source of the article, and at the beginning of the article clearly give the link.

Welcome to the exchange of discussions

Possible failures during SQL Server AlwaysOn availability replica Sessions

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.