Configure always on availability Group for SQL Server on Ubuntu

Last Update:2017-06-01 Source: Internet

Author: User

Tags failover mssql node server

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Here's a quick introduction to how to create a SQL Server AG (always on availability Group) in Ubuntu Step-by-step, as well as a fill method for pits encountered during configuration.

There are currently two types of SQL Server AG available on Linux, one high-availability architecture that uses cluster servers to provide business continuity. This structure includes the Read-scale node. Next, we will introduce this method of building AG. The other is Read-scale AG, which does not have a Cluster service, which provides only read-only scalability and does not provide high-availability functionality. For information on how to create this simple AG please refer to: Configure read-scale availability Group for SQL Server on Linux.

You can also specify the cluster TYPE when you create availability group:

Wsfc:windows server failover Cluster. This is the default value on the Windows system;
EXTERNAL: Failover cluster on a non-Windows server, such as pacemaker on Linux;
None: Cluster Manager is not included and refers to the availability Group that created the Read-scale type.

where Linux can use external or none, I understand that the extenral feature is similar to the current SQL Server Ag,none is a new type, without the cluster feature of the AG that does not support high availability and disaster recovery. The primary role is to share the load on the primary server, support multiple read-only standby nodes, and this type also supports use on Windows, which is a new feature supported by SQL Server 2017. For more detailed information please refer here: Read-scale availability groups.

The next step is to introduce the availability group approach to the high availability structure.

1. Installing and Configuring SQL Server

A SQL AG has at least two nodes, and because of the limited environment, only one of the simplest two-node AG is installed here. The first is to install two Ubuntu machines and SQL Server as described in SQL Server on Ubuntu--ubuntu (full).

Note: Multiple nodes of the same AG must be physical or virtual machines, when all are virtual machines must be on the same virtualization platform, because Linux needs to use fencing agent to isolate the resources on the node, different platform fencing agent type is different, Refer to Policies for Guest Clusters in detail.

2. Create AG

On Linux, an AG must be created before it can be managed as a resource plus a cluster. The following describes how to create an AG.

A) preparatory work:

Update the machine name of each node server meets this requirement : 15 characters or less; the network is unique. If you do not meet the requirements, you can change the machine name using the following command:

sudo vi / etc / hostname
Use the following command to modify the Hosts file to ensure that multiple nodes in the same AG can communicate with each other:

sudo vi / etc / hosts
It must be noted here: After the modification, you can use the ping command to try to ping the hostname. You must return the corresponding real IP address. That is, the Hosts file cannot contain corresponding records like hostname and 127.0.0.1. After the configuration, note the "127.0." 1.1 Ubuntu1604Bob2 "line was commented by me, otherwise there may be problems when starting the Cluster service:

If you do not comment, the return result of ping hostname is 127.0.1.1, and the real IP after the comment is returned:

You need to return to the real IP post-configuration to make it work.

In addition, you can use this command to view the IP of the current server:

sudo ip addr show
b) Enable the Always On Availability Group function on all nodes and restart the service:

sudo / opt / mssql / bin / mssql-conf set hadr.hadrenabled 1
sudo systemctl restart mssql-server
c) Execute SQL statements on all nodes to open the AlwaysOn_health event session to facilitate problem diagnosis:

ALTER EVENT SESSION AlwaysOn_health ON SERVER WITH (STARTUP_STATE = ON);
GO
For more information about Event Session, please refer to: AlwaysOn Extended Events.

d) The user used to create the db mirroring endpoint:

CREATE LOGIN dbm_login WITH PASSWORD = ‘** <Your Password> **’;
CREATE USER dbm_user FOR LOGIN dbm_login;
e) Create certificate:

SQL Server Mirroring Endpoint on Linux uses certificates to authenticate communications. The following command creates a master key and certificate and backs them up. Connect to the Primary SQL Server and execute the following command:

CREATE MASTER KEY ENCRYPTION BY PASSWORD = ‘** <Master_Key_Password> **’;
CREATE CERTIFICATE dbm_certificate WITH SUBJECT = ‘dbm’;
BACKUP CERTIFICATE dbm_certificate
TO FILE = ‘/var/opt/mssql/data/dbm_certificate.cer’
WITH PRIVATE KEY (
FILE = ‘/var/opt/mssql/data/dbm_certificate.pvk’,
ENCRYPTION BY PASSWORD = ‘** <Private_Key_Password> **’
);
f) Copy the backup of the certificate to all non-primary nodes and use it to create the certificate:

First execute the following command on the Primary node to copy the backup of the certificate to other nodes:

cd / var / opt / mssql / data
scp dbm_certificate. * [email protected] ** <node2> **: / var / opt / mssql / data /
Note: If you encounter Permission denied, you can use the sz and rz commands to transfer files through the host.

Then execute the following command on the secondary node on the destination side to add sufficient permissions to the user mssql:

cd / var / opt / mssql / data
chown mssql: mssql dbm_certificate. *
Finally, create a certificate on the secondary node at the destination using the backed up certificate:

CREATE MASTER KEY ENCRYPTION BY PASSWORD = ‘** <Master_Key_Password> **’;
CREATE CERTIFICATE dbm_certificate
AUTHORIZATION dbm_user
FROM FILE = ‘/var/opt/mssql/data/dbm_certificate.cer’
WITH PRIVATE KEY (
FILE = ‘/var/opt/mssql/data/dbm_certificate.pvk’,
DECRYPTION BY PASSWORD = ‘** <Private_Key_Password> **’
);
g) Create database mirroring endpoint on all nodes:

CREATE ENDPOINT [Hadr_endpoint]
AS TCP (LISTENER_IP = (0.0.0.0), LISTENER_PORT = ** <5022> **)
FOR DATA_MIRRORING (
ROLE = ALL,
AUTHENTICATION = CERTIFICATE dbm_certificate,
ENCRYPTION = REQUIRED ALGORITHM AES
);
ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED;
GRANT CONNECT ON ENDPOINT :: [Hadr_endpoint] TO [dbm_login];
Note: Here the Listener IP cannot be modified for the time being, it can only be 0.0.0.0. There are currently BUGs that may be fixed in the future.

h) Create AG on the Primary node:

CREATE AVAILABILITY GROUP [UbuntuAG]
WITH (DB_FAILOVER = ON, CLUSTER_TYPE = EXTERNAL)
FOR REPLICA ON
N ‘** <node1> **‘
WITH (
ENDPOINT_URL = N‘tcp: // ** <node1> **: ** <5022> ** ‘,
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
FAILOVER_MODE = EXTERNAL,
SEEDING_MODE = AUTOMATIC
),
N ‘** <node2> **‘
WITH (
ENDPOINT_URL = N‘tcp: // ** <node2> **: ** <5022> ** ‘,
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
FAILOVER_MODE = EXTERNAL,
SEEDING_MODE = AUTOMATIC
);
ALTER AVAILABILITY GROUP [UbuntuAG] GRANT CREATE ANY DATABASE;
Note: This warning "Attempt to access non-existent or uninitialized availability group with ID" may appear during the execution process. Ignore it for the time being, and it may be fixed in future versions.

The UbuntuAG2 is a newly created AG, and the Secondary node is still in the OFFLINE state:

i) Add other Secondary nodes to the AG:

ALTER AVAILABILITY GROUP [UbuntuAG] JOIN WITH (CLUSTER_TYPE = EXTERNAL);
ALTER AVAILABILITY GROUP [UbuntuAG] GRANT CREATE ANY DATABASE;
After adding nodes:

j) Test: Create a DB and join the AG just created:

CREATE DATABASE [db1];
ALTER DATABASE [db1] SET RECOVERY FULL;
BACKUP DATABASE [db1]
TO DISK = N‘var / opt / mssql / data / db1.bak ’;
ALTER AVAILABILITY GROUP [UbuntuAG] ADD DATABASE [db1];
k) Verification: Check whether the DB has successfully synchronized on the Secondary side:

SELECT * FROM sys.databases WHERE name = ‘db1’;
GO
SELECT DB_NAME (database_id) AS ‘database’, synchronization_state_desc FROM sys.dm_hadr_database_replica_states;
At this time, a simple AG is created, but it cannot provide high availability and disaster recovery functions. You must configure a Cluster technology to make it work. If the TSQL in steps h) and i) above is replaced by the following two, then a read-scale AG is created.

Create AG command:
CREATE AVAILABILITY GROUP [UbuntuAG]
WITH (CLUSTER_TYPE = NONE)
FOR REPLICA ON
N ‘** <node1> **‘ WITH (
ENDPOINT_URL = N‘tcp: // ** <node1> **: ** <5022> ** ‘,
AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
FAILOVER_MODE = MANUAL,
SEEDING_MODE = AUTOMATIC,
SECONDARY_ROLE (ALLOW_CONNECTIONS = ALL)
),
N ‘** <node2> **‘ WITH (
ENDPOINT_URL = N‘tcp: // ** <node2> **: ** <5022> ** ‘,
AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
FAILOVER_MODE = MANUAL,
SEEDING_MODE = AUTOMATIC,
SECONDARY_ROLE (ALLOW_CONNECTIONS = ALL)
);
ALTER AVAILABILITY GROUP [UbuntuAG] GRANT CREATE ANY DATABASE;
Add the Secondary node to the AG command:

ALTER AVAILABILITY GROUP [UbuntuAG] JOIN WITH (CLUSTER_TYPE = NONE);
ALTER AVAILABILITY GROUP [UbuntuAG] GRANT CREATE ANY DATABASE;
Note: At this time, the AG has no Listener, and the current version cannot temporarily create a Listener.

3. Configure a cluster resource manager, such as Pacemak
er
Specific steps are as follows:

a) Install and configure Pacemaker on all Cluster nodes:

First set up a firewall to allow the relevant ports to pass (including Pacemaker high-availability service, SQL Server Instance, and Availability Group Endpoint),

sudo ufw allow 2224 / tcp
sudo ufw allow 3121 / tcp
sudo ufw allow 21064 / tcp
sudo ufw allow 5405 / udp
sudo ufw allow 1433 / tcp # Replace with TDS endpoint
sudo ufw allow 5022 / tcp # Replace with DATA_MIRRORING endpoint
sudo ufw reload
Or you can disable the firewall directly:

sudo ufw disable
Install the Pacemaker package on all nodes:

sudo apt-get install pacemaker pcs fence-agents resource-agents
Set the password of the default user created by Pacemaker and Corosync packages during installation. Make sure that the password is the same on all nodes:

sudo passwd hacluster
b) Enable and enable pcsd and Pacemaker services:

sudo systemctl enable pcsd
sudo systemctl start pcsd
sudo systemctl enable pacemaker
This error may occur during execution "pacemaker Default-Start contains no runlevels, aborting.", Which can be ignored temporarily.

c) Create Cluster and start:

First, in order to prevent the residual configuration files of the Cluster from affecting the later construction, you can first execute the following command to delete the existing Cluster:

sudo pcs cluster destroy # On all nodes
sudo systemctl enable pacemaker
Then create and configure the Cluster:

sudo pcs cluster auth ** <nodeName1> ** ** <nodeName2> ** -u hacluster -p ** <password for hacluster> **
sudo pcs cluster setup --name ** <clusterName> ** ** <nodeName1> ** ** <nodeName2…> **
sudo pcs cluster start --all
At this time, this error may appear "Job for corosync.service failed because the control process exited with error code. See" systemctl status corosync.service "and" journalctl -xe "for details." The diagnostic solution is as follows:

Check whether there is a problem according to the description in 2-a);
Run the following command to check the log path in the configuration file.
vi /etc/corosync/corosync.conf
Check whether the relevant file exists. If it does not exist, create the relevant file. Assuming that the logfile path is /var/log/cluster/corosync.log and the file does not exist yet, execute the following command.
sudo mkdir / var / log / cluster
sudo chmod 777 / var / log / cluster
sudo echo >> /var/log/cluster/corosync.log
d) Configure isolation: STONITH. The test environment is currently not configured for simplicity and will be updated in the future. Normally, the production environment needs a fencing agent to isolate resources. For support information, please refer to: Support Policies for RHEL High Availability Clusters-Virtualization Platforms.

In addition, we first execute the following command to disable isolation:

sudo pcs property set stonith-enabled = false
e) Set start-failure-is-fatal to false:

pcs property set start-failure-is-fatal = false
The default value is true. When true, if the Cluster fails to start the resource for the first time, after the automatic failover operation, you need to manually clear the record of the number of failed resource startups. Use this command to reset the resource configuration:

pcs resource cleanup <resourceName>

4. Add AG to the Cluster
Specific steps are as follows:

a) Install the SQL Server resource package integrated with Pacemaker on all nodes:

sudo apt-get install mssql-server-ha
b) Create a SQL Server login user for Pacemaker on all nodes:

USE [master]
GO
CREATE LOGIN [pacemakerLogin] with PASSWORD = N ‘<Your Password>’
ALTER SERVER ROLE [sysadmin] ADD MEMBER [pacemakerLogin]
It is also possible not to give sysadmin permissions, and give them sufficient permissions as follows:

GRANT ALTER, CONTROL, VIEW DEFINITION ON AVAILABILITY GROUP :: UbuntuAG TO pacemakerLogin
c) On all nodes, save the SQL Server Login information:

echo ‘pacemakerLogin’ >> ~ / pacemaker-passwd
echo ‘<Your Password>‘ >> ~ / pacemaker-passwd
sudo mv ~ / pacemaker-passwd / var / opt / mssql / secrets / passwd
sudo chown root: root / var / opt / mssql / secrets / passwd
sudo chmod 400 / var / opt / mssql / secrets / passwd # Only readable by root
d) Create AG resources on the Primary node in the Cluster:

sudo pcs resource create ag_cluster ocf: mssql: ag ag_name = UbuntuAG --master meta notify = true
e) Create a virtual IP resource on the Primary node in the Cluster:

sudo pcs resource create virtualip ocf: heartbeat: IPaddr2 ip = ** <10.2.38.204> **
f) Configure the dependencies and startup sequence of the Cluster resources:

sudo pcs constraint colocation add virtualip ag_cluster-master INFINITY with-rsc-role = Master
sudo pcs constraint order promote ag_cluster-master then start virtualip
g) Finally check the status of the Cluster as follows:

sudo pcs status
This AG can be accessed with a virtual IP (10.2.38.204):

At this point, the SQL Server Always On Availability Group managed by Cluster on Ubuntu is set up.

Note:

When the AG is added to the Cluster as a resource, TSQL can no longer be used to failover the AG. The SQL Server server does not know the existence of Cluster. The entire system is controlled through Linux Cluster. Use the pcs command in Ubuntu and RHEL and the crm command in SLES.
After all the configurations are completed, you can use the virtual IP to access the entire AG. At this time, you can manually register a Listener name in DNS to point to this virtual IP, and you can use it as an AG Listener in Windows.
SQL Server 2017 CTP 1.4 introduces a sequence_number concept to prevent data loss. For details, refer to Understand SQL Server resource agent for pacemaker (https://docs.microsoft.com/en-us/sql/linux/sql-server-linux -availability-group-cluster-ubuntu).
Reference link:

Configure Always On availability group for SQL Server on Linux
Configure Ubuntu Cluster and Availability Group Resource
Configure read-scale availability group for SQL Server on Linux
Overview of Always On Availability Groups (SQL Server)

This article mainly introduces how to configure the AG and how to solve the problems encountered during the configuration. The management and use of the AG will be described in detail later. If there is an error or the introduction is insufficient, please forgive me.

[Original article, reproduced please indicate the source, for study and research purposes only, if there is an error, please leave a message, if you like, please recommend, thank you for your support]

[Original: http://www.cnblogs.com/lavender000/p/6906280.html, from Forever Kaoru]

Configure Always On Availability Group for SQL Server on Ubuntu

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More