In Linux HA Cluster, high availability is not the high availability of hosts, but the high availability of services.
What is high availability? A server may be down in a variety of ways, and any failure may lead to risks. The cost of server offline is very high, especially for web sites, when a server that provides services goes down, it is called high availability if the service is not terminated.
What is heartbeat: connect multiple servers over the network, then, each server keeps notifying the host of the standby server in the same network of the information that it is still online, and tells the host that it is still online, when the heartbeat information is received by other servers, the server is considered to be online, especially the master server.
How to send heartbeat information and who will receive it is actually that communication between two hosts in the process cannot communicate, and only the network function can be used to listen to the process on a socket, data transmission and data requests are implemented, so multiple servers have to run the same process. The two processes continuously communicate with each other, and the master node (master server) the software keeps sending heartbeat information to the same node of the other party, which is called the baseline level of the highly available cluster, also called the heartbeat information transmission layer and the thing information transmission layer, this is a process running on each node in the cluster. This process is a service software and needs to be started after Shutdown before information can be transmitted between hosts, generally, the master node is sent to the slave node.
The so-called resource: Taking web as an example, vip is a resource, web Service is also a resource, and web page is also a resource. A service includes multiple resources, and shared storage like web is also a resource, different services require different resources, and shared storage is the most difficult problem to solve in highly available clusters.
If the master node fails, how can we select one slave node as a node to provide services? Which slave node should we choose as a mechanism to provide services?The process of cluster transaction decision-making.
Ha_aware: if an application can use the function of the underlying heartbeat information transmission layer to complete the process of cluster transaction decision-making, it is called ha_aware.
DC: The Coordinator selected by the Designated Coordinator. When the host where the DC is located crashes, a DC is selected first, and then the DC makes transaction decisions.Note: The most important and underlying management unit in a highly available cluster isResources are combined into a service.
Any resources in a high-availability cluster should not be started by themselves, but started by CRM management;
CRM: Cluster Resources Manager manages Cluster Resources. CRM is the only option to make decisions.
Heartbeat v1 has the concept of resource management, and v1's resources are included in heartbeat, which is called haresources. This file is a configuration file; this configuration file interface is called haresources;
WhenWhen heartbeat v2 is in version 2, heartbeat is greatly improved. You can run it as an independent process and receive user requests through it. It is called crm, during running, it needs to run a crmd process on each node. This process is usually monitored on a socket and the port is 5560. Therefore, the server is called crmd, the client is called crm (or crm shell) and is a command line interface. Through this command line interface, you can communicate with crm on the server. heartbeat also has its graphical interface tool, the heartbeat-GUI tool can be configured through this interface.
The third version of heartbeat v3 is independently divided into three projects: heartbeat, pacemaker, and cluster-glue. The architecture is separated, you can work with other components.
RA: resource agent is actually a tool that can receive CRM scheduling to manage a resource on a node. This management tool is usually a script, therefore, we are generally referred to as resource proxies. Any resource proxy must adopt the same style and receive four parameters: {start | stop | restart | status}, including IP addresses. The agent of each resource must output the four parameters.
When a node fails, the above resources are automatically transferred to another normal standby node and started.Failover, also knownFailed transfer (failover ).
If a faulty node comes back, we need to add the node back. The process of adding the node is calledReturns invalid, also knownFailback ).
Resource Competition and resource isolation:
In case of a cluster split, in order to avoid resource contention when resources are no longer used by nodes on the cluster, system files with file systems mounted may crash, if you become a new cluster, you will be given a shot of the nodes that are no longer in the cluster, so that the services of nodes that are not in the cluster are completely lost and no longer receive requests. This is calledStonith (shoot the other node in the head), which is calledResource isolation. The consequences of competing for shared storage are very serious. If shared storage crashes, the entire file system crashes and all data is lost.
There are two levels of resource isolation:
Node level: this is called STONITH. In this case, the power of the other party is directly cut off. Generally, such hosts are connected to the power switch.
Resource level: this type of operation depends on some hardware devices. For example, if an optical fiber switch is connected to a shared storage device, the optical fiber interface of the node to be kicked out is blocked, this is called resource-level isolation.
The split between the left and right sides of the server is usually called brain-split, and the left and right sides are inconsistent, in high-availability clusters, avoiding resource contention to achieve resource isolation is a problem that must be filtered in the design of high-availability clusters.
In the mode of two nodes, once a cluster is separated, one of the nodes fails. When we cannot determine which node is abnormal, normal nodes must be connected to the Internet. In this way, normal nodes can communicate with the frontend routing. Therefore, we regard the frontend routing as the third node, this is calledPing the node. When each node contacts the other node, ping the front-end node first. If the node can be pinged, it indicates that it is normal, this indicates that the node has multiple tickets, and the front-end ping node is calledThe arbitration device helps the node determine which node is the winner. When the number of even nodes is reached, the arbitration device is used.
RHCS does not use the ping node for determination. It usesFor devices with shared storage, even nodes are in active nodes and constantly write data to the disk. The device writes a data bit to the disk at every frequency of heartbeat information, as long as the device updates the data bit every other heartbeat interval, it indicates that the device is active. If the node is found to have not written the data bit for multiple times, it is considered that the node has crashed. This is also calledQdisk ). There are two types of arbitration devices: ping node and qdisk;
How is the heartbeat transmitted? How does one work well with each other on multiple hosts? Two highly available master-slave nodes;
Messaging Layer: the heartbeat information of the master and slave nodes must be implemented based on the information Layer. It is also called the underlying infrastructure Layer for transmitting heartbeat information, corosync and heartbeat are the components of openAIS,
Resource Allocation layer (Resource Allocation) is also called the Resource Manager layer. The core component of this layer is CRM (Cluster Resourcce Manager Cluster Resource Manager). A Resource on CRM must be elected as a Manager, it is called Leader, which is used to determine all things in the cluster.DC (Designated Coordinator specifies the Coordinator), any DC will run two additional processes, one is calledPE (Policy Engine ), the policy engine collects information on all nodes in the cluster at the underlying information layer to generate a big image to define the node on which the policy node runs, and notifies the resource manager on the node to start and close the resource.TE (Transition Engine transmission Engine), which notifies the PE of the decision to the CRM of the corresponding node;
The cluster resource manager must use the Messageing Layer to advertise to each node and automatically broadcast or group to each node. This ensures that the information on each node is the same, how does this data interact with data in the computer? Here we need to implement data format Transfer Based on the Extended Markup Language. This is calledSemi-structured data is based on XML. Therefore, configuration information stored between nodes is stored in an XML file. To understand the information stored in this XML file, a tool calledCIB (Cluster Information Base Cluster Information database); you can configure this XML file as long as you can connect to CRM. First, it is saved to the XML file of DC, then, the DC will synchronize and support the XML files on each node;
Resources Layer: the PE (policy engine) obtains the resource configuration information based on the XML library, does not obtain the activity information of the current node through the Messaging Layer, and then makes a decision, once the decision is made, the resources are started. Therefore, PE uses the local Messaging Layer notification to send the cluster information library of the actual node to transmit certain resource information, for example, if other CRM systems need to start a certain Resource, after receiving the information, the CRM system is not responsible for starting it. Instead, it is started by LRM (Local Resource Manager Local Resource management), and each node runs on this LRM, concurrent resources use Resource Agent (Resource Agent) to manage resources. This is how it works. CRM is responsible for collecting information and recommending it to the PE running of DC, PE is responsible for integrating all resources in the entire cluster and ensuring that some resources run on the appropriate nodes. Once a decision is made, it will be announced to CRM on other nodes, after receiving the notice, the CRM on the corresponding node will call its own LRM, And the LRM directs RA to complete related operations;
The following describes how to implement heartbeat v1:
Install and configure a high-availability cluster: Implements heartbeat v1
1. the node name is critical. The names of each node in the cluster must be resolved;
Using the hosts file, the forward and reverse resolution results of the host names in/etc/hosts: hosts must be consistent with those of "uname-n;
2. The time must be synchronized, and the network time server must be used for synchronization;
3. Each node can authenticate and communicate with each other based on the ssh key;
1) configure the Host Name,
The Host Name of the first node is node1.tanxw.com, and the Host Name of the second node is node2.tanxw.com.
# Change the Host Name of vim/etc/hosts. Note that you need to add both nodes and add several resolutions for each node.
172.16.249.61 node1.tanxw.com
172.16.249.62 node2.tanxw.com
# uname -n
# hostname node1.tanxw.com
# hostname node2.tanxw.com
# Cat/etc/sysconfig/network. If this is different from node1 or node2, modify the configuration file to ensure that the host name remains valid during the next system restart,
# If the change does not take effect, press ctrl + d to log out and log on again.
2) two or more hosts communicate without a key over ssh
# Ssh-keygen-t rsa-p' generates a public key and a key with an empty password. Copy the public key to the peer node.
# Ssh-copy-id-I. ssh/id_rsa.pub root@node2.tanxw.com opposite host name with Login User Name
The two hosts must communicate with each other, so both hosts must generate a key and a copy public key. The hosts file on each node must be resolved to the Host Name of the other host, 172.16.249.61 node1.tanxw.com
172.16.249.62 node2.tanxw.com
# Ssh node2.tanxw.com 'date'; date to test whether mutual trust is established
3) install the heartbeat v1 program. Both nodes must install the heartbeat package.
# Install these packages, but there is a dependency. You need to solve the problem:
heartbeat-2.1.4-12.el6.x86_64.rpm、heartbeat-pils-2.1.4-12.el6.x86_64.rpm、
heartbeat-stonith-2.1.4-12.el6.x86_64.rpm
# Resolve dependencies:
# yum -y install perl-TimeDate net-snmp-libs libnet PyXML
# rpm -ivh heartbeat-pils-2.1.4-12.el6.x86_64.rpm heartbeat-stonith-2.1.4-12.el6.x86_64.rpm heartbeat-2.1.4-12.el6.x86_64.rpm
A highly available cluster depends on: 1. Information layer; 2. Resource Manager; 3. Resource proxy
The configuration process can be configured at this level;
Note: In the network, the desired node cluster becomes the node we need and the information in the cluster cannot be transmitted at will, the heartbeat node is transmitted Based on the multicast address. If heartbeat is installed on another node, it is not safe, we need to authenticate the information transmission on each node. This authentication is based on HMAC (message authentication code). The message authentication code is implemented using one-way encryption calculation method, and there are generally three types of one-way encryption: crc (Cyclic Redundancy verification code), md5 (information digest algorithm), and sha1. Heartbeat is based on the udp protocol and listens to port 694;
4) Configure heartbeat. Its configuration file is in/etc/ha. d/directory, but this configuration file does not exist in this directory after the program is installed, only the/usr/share/doc/heartbeat-2.1.4/directory contains ha. sample of the main configuration file of cf, which can be copied to/etc to modify the configuration file. There is also an authkeys authentication file, this file is the authentication password and authentication mechanism stored during the authentication of each node. Therefore, the permission of this file is critical and must be 600. Otherwise, the service cannot be started. The third haresources, when defining a resource, the resource manager is required to read the file;
# cp /usr/share/doc/heartbeat-2.1.4/{ha.cf,authkeys,haresources} /etc/ha.d/
# cd /etc/ha.d/
# Openssl rand-hex 8 generate a 16-bit random number
ee869d3d86e1556f
# vim /etc/ha.d/authkeys
Auth 2, here 2 is the same as the number of options below, there is no limit
2 sha1 ee869d3d86e1556f
# chmod 600 authkeys
# Vim/etc/ha. d/ha. cf enable the following parameters and functions
logfile
/var/log/ha-log
# Where are the log files and normal log information recorded?
keepalive 1000ms
# The interval at which the heartbeat information is sent, measured in seconds. The interval is ms.
deadtime 8
# Interval between how long it takes to detect the kill interval when the other party is offline
warntime 10
# Warning time
udpport 694
mcast eth0 225.0.0.1 694 1 0
# Defining multicast addresses
auto_failback on
# Enable failover
node node2.tanxw.com
# Define two nodes
node node1.tanxw.com
crm on
# Enabling crm
ping
172.16.0.1
# Ping a node
compression bz2
# Compression format
compression_threshold 2
# Indicates that the data is not compressed when it is less than 2 K
Define resources: defined in the Resource Manager configuration file;/etc/ha. d/haresources, in/etc/ha. d/resource. d. There are various resource types. When defined in the resource configuration file, the resource type here will be called to run the corresponding program;
# vim /etc/ha.d/haresources
node1.tanxw.com 172.16.249.66 httpd
#172.16.249.66 this is a floating address.
Note: node1.tanxw.com: indicates which host is the master node and who is more inclined to use it.
[node1.tanxw.com 172.16.249.61
/16/eth0/172
. 16.0000255 httpd can also be defined as follows
Node2.tanxw.com 172.16.249.62 how is httpd called?
/etc/ha
.d
/resource
In the. d directory, if not, go
/etc/init
Find httpd in the. d/directory and start it.]
# scp -p authkeys haresources ha.cf node1.tanxw.com:/etc/ha.d/
# service heartbeat start
End:
When one node fails, the other node will go up and become the master node. The service can still provide services without service termination, here, we should prepare two different web pages for different nodes to differentiate the content. During the test, we will terminate another web service, so we can see the effect, heaetbeat automatically switches to another normal node to provide intermittent services.
This article from the "warm boiled frog" blog, please be sure to keep this source http://tanxw.blog.51cto.com/4309543/1401096