Managing SQL Server AlwaysOn (1)-Basic maintenance

Source: Internet
Author: User
Tags failover

This article belongs to the managing SQL Server AlwaysOn series article


Objective:
The previous series has covered SQL Server AlwaysOn knowledge points, installation demos, and considerations. But this is not the end, but more is the starting point. It's like you can't give birth to a child, you have to keep it (management). As a DBA, much of the work is about managing AlwaysOn. So here's a separate series that describes SQL Server AlwaysOn management. This series follows an AlwaysOn environment that deploys the foundation from 0 onwards.In this series, you are ready to tell the following:
    1. Managing SQL Server AlwaysOn (1)-Basic maintenance
    2. Manage SQL Server AlwaysOn (2)--Add, remove secondary replicas
    3. Managing SQL Server AlwaysOn (3)--Availability Group backup
    4. Managing SQL Server AlwaysOn (4)-Common exceptions
    5. Managing SQL Server AlwaysOn (5)-General monitoring (1)--general monitoring
    6. Managing SQL Server AlwaysOn (5)-General monitoring (2)--extended event monitoring
    7. Managing SQL Server AlwaysOn (6)--Warning
    8. Managing SQL Server AlwaysOn (7)--to be replenished
Note: Due to work requirements, they may not be updated sequentially.

Basic maintenance:
For basic maintenance, this article probably describes the following:
    • Cluster maintenance, including patch upgrades.
    • Manage availability groups, including how to fail over synchronous/asynchronous nodes.
    • Add multiple listeners
    • Other management content
Below I began to introduce and demonstration, if the reader does not have a real environment, you can refer to the article mentioned in the opening, the first self-built, if there is an environment, it is recommended to first make a backup of the virtual machine, because some operations are very destructive.

Cluster Management:
The topic in this series is SQL Server AlwaysOn, and just because AlwaysOn needs to be built on Windows Server Failover Cluster (WSFC, or Windows Clustering), we need to introduce the underlying environment a little bit, But will not do in-depth introduction, after all, each knowledge point learned deep, is not trivial.First we must be clear that the cluster installation is not equal to the end of the work, from the beginning of installation, our management and maintenance work to really start. This section will introduce:
    • Moving instances between nodes
    • Rolling Patch Upgrades

Moving instances between nodes:
In the daily operations, in addition to preventing accidental power outages, one of the benefits of deploying a highly available technology is the significant reduction in downtime caused by the maintenance process. In particular, the operating system or SQL Server patching operations. If your environment is a two-node cluster (assume that a,b,a is the primary node/active node. b is the passive node), then the node moves when it is possible to initiate a restart such as patching. The steps are as follows:
    1. Patch upgrade to passive Node B. (a Master B is)
    2. After the first step succeeds, the instance of master Node A fails over (Failover) to passive Node B. The role of the original active Node A and the original passive Node B is reversed at this time. (A by B master)
    3. The original active Node A, now passive node to patch upgrade. After the success, the operation is finished. (A by B master)
Then decide if you want to switch the activity instance (now B) back to the a node, depending on your specific needs. There is no mandatory requirement, but the actual situation needs to be considered. For example, if the availability of the instance is the first priority and is much larger than all the other requirements, then it may not be appropriate to failover back again because this operation will cause a short service to be unavailable. or because B-node is a high-level server configured to provide performance, then through this patching or failover operation to move the service to the update, the stronger node, there is no need to failover back to the original node, of course, WSFC is recommended to use the same hardware and software configuration, so after this, Finally, A is also promoted to the same configuration.In order to complete the above steps, there are two ways to do this, one for graphics and the other for PowerShell commands.
Graphics operations:
First log on to Node 1, open failover Cluster Manager, right-click the cluster role name for Windows → "Move" → "Select Node":


The Move Cluster Role dialog box is displayed:



In the can choose to move the node, because currently is a two-node, so there is actually no other choice, click OK, after the node began to move, after the move will see:



PowerShell operations:
With sufficient permissions, you can use the Move-clustergroup cmdlet to implement a role failover. For example, the following command can failover the role of Node1 to Node2:
Move-clustergroup-name "WSFC cluster role name"-node Target section name
Note the WSFC cluster role name is double-quoted. The demo environment looks like this, note that my environment is now Node1 for the owner, I want to go to Node2, so in PowerShell I need to write "Target node name”:



After execution, you can see that the current cluster owner has been replaced with Node2 without a refresh:




Again, this article is aimed at basic maintenance and is focused on SQL Server AlwaysOn, so the true professional maintenance of WSFC is not discussed and demonstrated.
Rolling Patch Upgrade:
The above shows a two-node cluster, where a cluster of more than 2 nodes is common in a large production environment, and if you are upgrading SQL Server, you need to consider rolling patch updates. We assume that SQL Server now runs on a different large or small version, which can result in data corruption, so it needs to be repaired by patching. Here are the steps to proceed:
    1. Lists the possible owners of all the nodes in the role.
    2. Select 50% of these nodes, and then remove the possible owners. , because there are only two nodes, so you can only tick off one of them, such as Node1 (because the current active node in the Node2, so you can not tick off Node2). The same can be done with PowerShell commands: You can see the same result after execution
      Get-clusterresource "AlwaysOn role name (Sqlag in this case)" | Set-clusterownernode-owners reserved nodes (in this case Node2 reserved, i.e. Node1 removed)
      PowerShell Execution Demo:
    3. When a 50% of the possible owner nodes are removed, a patch upgrade can be made to the selected nodes. After verifying that the nodes have succeeded, add them back again.
    4. Move the node of the role to one of the nodes that has been successfully upgraded. Repeat the above actions to remove the non-upgraded nodes from the possible owners → upgrade these nodes → verify → add back to the list of possible owners for the role.


Manage Availability Groups:
For AlwaysOn availability groups, when the configuration is complete, you also start the administrative operation. Includes failover of availability groups, monitoring, and adding additional listeners in special cases. The following are discussed individually:
Failover:
When the secondary replica is configured for synchronous-commit mode (synchronous commit modes) and an automatic failover is set, the availability group is automatically moved to the redundant copy when the primary replica encounters a failover condition. In some cases, manual failover is required. Common situations are: Dr Test, planned maintenance operation, etc.

Sync failover:
If you need to fail over a replica in synchronous-commit mode, you can do it on the primary replica in SSMs, as follows:





Note the following figure, if you are in the secondary copy of the failover interface open, there is no such step


Then click Next and Finish. If there is no error, the availability group transfer succeeds. But that doesn't mean it's done, and it's often easy to forget that it's important to check that the job, account number, and program are properly configured and running on the new primary, because in my past environment, the secondary replicas were not configured for log backups, and in the previous section, AlwaysOn must use the full recovery model. In this mode, the database log files for the production environment are likely to burst abnormally, causing insufficient disk space to stop the service. This has been the case with AlwaysOn in the past, and in some cases, such as account configuration issues, unsynchronized dependencies, and other factors that have resulted in a successful transfer but the system is not functioning properly.so have enough to check! Also we can use T-SQL to execute, for example we now need to cut from Node2 to Node1Note: Execute the following command in the secondary replica:




Refresh after execution to see the copy from "(secondary)" to "(primary)"

Asynchronous fail-over:
When an availability group is asynchronous-commit mode, it is technically possible to perform failover operations like synchronous-commit mode, but you need to "force" the failover and must accept the risk of possible data loss. You can use the following command to failover:Note that you need to perform on the secondary replica:





In addition to executing on the secondary replica, your cluster must also have quorum in order to run successfully, otherwise it is necessary to force the cluster online before forcing the AlwaysOn availability group online. In order to avoid data loss as much as possible, a planned failover can be performed under the possible conditions, following the specific steps:
    1. Disable various login accounts, but make sure the account you are working on is not disabled.
    2. Modify the replica mode to synchronous-commit mode.
    3. Failover
    4. Change the commit mode back to asynchronous commit mode.
    5. Enable the account that was disabled in the first step.
Synchronizing non-contained objects:
Regardless of which failover mode is used, there are some objects at the instance level that are not included in the availability group, so synchronization is required. The most straightforward approach is to use SSIS packages to periodically synchronize objects between instances. Here are some things to sync:
    • Logins
    • Credentials (certificate)
    • SQL Server Agent Jobs
    • Custom error messages
    • Linked Servers
    • Server-level Event warning
    • Stored procedures in the master library
    • Server-level triggers
    • Key-related content

To add multiple listeners:

In general, there is only one single availability group listener (Listener) for each availability group, but in some extreme cases multiple listeners may be created for the same availability group. For example, some historical problems. At this point, you may need to create an additional listener for hard coding.However, it is not possible to create a second listener through the graphical interface (SSMS), T-SQL, or even PowerShell, which must be done using failover Cluster Manager. We create a client access point resource in our "Sqlag" role. Follow the steps below:

Add a new Client access call name and enter a new IP address:



You can see that the status of the Sqlag role is "partially running" after adding, and you see that the following new client access point is "offline" status:

At this point we right-click the new Client Access point and select Properties:
InDependent Relationships, select OR, and add the original listener name (in this case, 192.168.1.123):

Then right-click "AGListener2" and select online. Once the configuration is complete, the client can access it using any of the listener names:



Other management matters:
In addition to the aforementioned, there are other things that make the database unusable, most notably the inability to turn the database into Single-user or read-only mode. This can cause the application's security state to change. This is why you need to disable the login account in the failover section. And if you have to change the single-user mode of the database, you first need to remove the availability group.You can remove from an availability group using the following command:

After execution, the secondary copy becomes "Restoring" from "synchronized"
At this point it may be necessary to restore the database back, this time can execute the following statement:

Another important consideration is the deployment of the database and its log files. These files must exist in each replica under the same path.


Summarize:
The basic basic maintenance of SQL Server AlwaysOn is here first, and if there is a follow-up supplement, add it as explicitly as possible or, if necessary, another article. Next section describes: Managing SQL Server AlwaysOn (2)-adding, removing secondary replicas





Managing SQL Server AlwaysOn (1)-Basic maintenance

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.