Explanation of heartbeatv1 and NFS file sharing

Source: Internet
Author: User
Tags snmp install perl

High Availability Basics

I. Definition of high-availability clusters
High Availability cluster (hacluster) is a group of computers that provide users with a set of network resources as a whole. These individual computer systems are cluster nodes ).
The emergence of highly available clusters is to make the overall services of the cluster as available as possible, thus reducing the loss caused by computer hardware and software error. If a node fails, its standby node takes over its responsibilities in seconds. Therefore, the cluster will never stop.
The main function of highly available cluster software is to automate Fault Detection and service switching. A high-availability cluster with only two nodes, also known as dual-node Hot Backup, uses two servers to back up each other. When one server fails, the other server can take on service tasks, so that the system can continue to provide external services without manual intervention. Hot Standby is only one type of high-availability cluster. The high-availability cluster system supports more than two nodes and provides more advanced functions than hot standby, it can better meet the changing needs of users.
Ii. high-availability cluster measurement criteria
HA (high available), high availability clusters are measured by system reliability and maintainability (maintainability. In engineering, MTTF is usually used to measure the reliability of the system, and MTTR is used to measure the maintainability of the system. Therefore, the availability is defined as HA = MTTF/(MTTF + MTTR) * 100%.
Specific ha metrics:
99% downtime for one year cannot exceed 4 days

99.9% downtime for one year cannot exceed 10 hours

99.99% downtime for one year cannot exceed 1 hour

99.999% downtime for one year cannot exceed 6 minutes

1. cluster consensus menbership Service, then, the results are transmitted to the upper layer, allowing the upper layer to decide what measures to take. CCM can also generate a topology Overview map of each node status, from the perspective of this node, ensure that the node can take corresponding actions under special circumstances.
2. crmd component (cluster resource manager, cluster resource manager, or pacemaker): implements resource allocation. Each action of resource allocation must be implemented through CRM, which is the core component, CRM on each node maintains a CIB to define the specific attributes of resources and which resources are defined on the same node.
3. CIB component (cluster information base, cluster InfoNation base): a configuration file in XML format. It is a configuration file for cluster resources in an XML format in the memory and is mainly stored in the file, at work time, It is resident in the memory and needs to be notified to other nodes. Only CIB on the DC can be modified. CIB on other nodes are copied to the DC. Methods for configuring CIB files include command-line configuration and GUI configuration at the front-end.
4. lrmd component (local resource manager): used to obtain the status of a local resource and manage local resources. If no heartbeat information is detected, to start local service processes.
5. pengine components:
PE (policy engine): A policy engine that defines a complete set of transfer methods for resource transfer. However, it is only a policy maker and does not come in person to participate in the process of resource transfer, instead, let te execute its own policy.

Te (transition engine): it is used to execute PE policies and only run PE and Te on the DC.

6. stonithd component
Stonith (shoot the other node in the head, "headers") directly operates the power switch. If one node fails, if the other node can detect it, A command is issued through the network to control the power switch of the faulty node. the faulty node is restarted by means of temporary power failure and power-on. This method requires hardware support.
In the stonith application case (master-slave server), the master server does not have time to respond to heartbeat information at a certain end of time because of busy services. If the slave server suddenly grabs service resources, however, at this time, the master server has not been down, which will lead to resource preemption, so that users can access the master and slave servers. If only the read operations are okay, if there is a write operation, this will cause the file system to crash, so everything will be done. Therefore, when resources are preemptible, some isolation methods can be used to achieve this, that is, when the slave server grabs resources, directly sending the master server to stonith is what we often call "headers ".

5. high-availability cluster Software
Messaging and membership layer (Information and relationship layer ):
Heartbeat (V1, V2, V3), heartbeat V3 split heartbeat pacemaker cluster-glue

Corosync

CMAN

Keepalived

Ultramokey

Cluster Resource Manager layer (resource management layer (CRM ):
Haresource, CRM (Heartbeat V1/V2)

Pacemaker (Heartbeat V3/corosync)

Rgmanager (CMAN)

Common combinations:
Heartbeat V2 + haresource (or CRM) (Description: generally used in centos 5.x)

Heartbeat V3 + Pacemaker (Description: generally used in centos 6.x)

Corosync + Pacemaker (Note: The most common combination)

CMAN + rgmanager (Description: components in the Red Hat Cluster suite, including gfs2 and clvm)

Keepalived + LVS (Description: High Availability commonly used in LVS)

Summary: we often see in our technical blogs that heartbeat + pacemaker achieves high MySQL availability, or corosync + pacemaker achieves high MySQL availability. Some bloggers will ask, what are our best practices? After the above instructions, you should know something!

Vi. shared storage
Speaking of clusters, we have to say that shared storage, because non-management is also highly available for Web, MySQL is highly available, and their data is shared, all objects must be stored in shared storage, and can be accessed by the master node or slave node. The following is a brief description of shared storage.
1. Das :( Direct Attached Storage) directly attaches Storage
Note: The distance between devices directly connected to the host bus is limited, and the device needs to be remounted. there is a delay in data transmission between devices.
Raid Array

SCSI Array

2. NAS :( network attached storage) network attached storage
Note: file-level sharing
NFS

FTP

CIFS

3. San: (Storage Area Network) Storage Area Network
Description: block-level, simulated SCSI protocol
FC optical network (the optical interface of a switch is very expensive, with a charge of about 20 thousand. If this is used, the cost is too high)

Fast access to ipsan (iSCSI), block level, and low cost

VII. Cluster File System and cluster LVM (cluster logical volume management clvm)
Cluster File Systems: gfs2 and ocfs2
Cluster LVM: clvm
Note: It is generally used in high-availability dual-master models (for example)

8. Working Principles of high-availability clusters
Note: The Master/Slave node height can be used to describe the working principle.
The master server and slave server establish dual-machine Hot Standby, basically share a storage, take MYSQL as an example. Generally, database files are mounted on the master database server. You can connect to the master server to perform database operations. When the master server fails, the slave server automatically mounts database files and takes over the master server. The user connects to the database file from the database without notice. After the failure of the master server is repaired, the service can be provided again;
Then, how does the slave server know that the master server has crashed? This requires a certain detection mechanism, such as heartbeat detection. That is to say, each node will periodically notify other nodes of its heartbeat information, especially for the primary server, if the slave server has not been detected within several heartbeat cycles (you can set the heartbeat cycle by yourself), it is deemed that the primary server is down, however, TCP transmission is not allowed when the heartbeat information is advertised during this period. If TCP detection is used, three handshakes are required. When the heartbeat cycle is reached, therefore, when detecting heartbeat information, UDP port 694 is used to transmit information. If the master server does not have time to respond to heartbeat information at a certain end of time due to busy services, at this time, if the master service resource (shared data file) is snatched from the server, but the master server has not been down yet, this will lead to resource preemption, in this way, the user can access both the master and slave nodes. If only the read operations are okay, and if there are write operations, the file system will crash and everything will be done, therefore, some isolation methods can be used to achieve resource preemption, that is, when the server is used to seize resources, the master server is directly assigned to "stonith ", it is what we often call "headers ";
So, how can we detect the heartbeat information? The heartbeat line is used for detection. The heartbeat running on the slave server can detect the running status of the master server through an Ethernet connection. Once the heartbeat cannot detect the heartbeat of the master server, it automatically takes over the resources of the master server. In general, the heartbeat connection between the master and slave servers is an independent physical connection, which can be a serial cable and an Ethernet connection implemented by the "crossover. Heartbeat can even detect the active status of the master server through multiple physical connections at the same time, as long as the master server is active through one of the connections, the master server is considered normal. From the perspective of practical experience, it is recommended to configure multiple independent physical connections for heartbeat to avoid single point of failure in heartbeat communication lines.
In the above principle, we mentioned the "isolation method". Next we will talk about two isolation methods: node isolation and resource isolation. Node isolation is what we often call shoot the other node in the head (also known as "head burst"), which means to directly cut off the power supply; the common method is that all nodes are connected to a power switch. If a fault occurs, the voltage of the node is unstable or the power is down, so that faulty nodes can be restarted or shut down. For example, resource isolation means fencing directly intercepts a certain resource.

 

 

Classroom Environment
First, start with two time syntaxes.
Ntpdate 172.16.0.1
Ensure time synchronization between two machines

Data-s mm/DD/yy hh/MM/SS

Modify hostname
Vim/etc/hosts
192.168.1.11 node1.www.dingchao.com node1
192.168.1.12 node2.www.dingchao.com node2
Modify both sides. We can ping node1 to test whether it is a local IP address.

Two hosts can log on without a password based on key authentication.
Ssh-keygen-T RSA
Ssh-copy-ID-I/root/. Ssh/id_rsa.pub [email protected] error may occur
We can use another command to copy
The IP address of SSH-copy-ID-I/root/. Ssh/id_rsa.pub [email protected] will not go wrong.
Both sides are generated and sent to the other party. Here we can test whether the key-based authentication is successful.
SSH node1.dingchao.com 'date'; Date

Software Installation is tricky
Four packages installed in Yum have dependency problems, so the dependency can only be solved first.
Yum install Perl-TimeDate pyxml Libnet-SNMP-libs
The Libnet package is not installed here. I don't know what to do, so that the subsequent packages cannot be installed.
Yum install heartbeat-2.1.4-12.el6.x86_64.rpm heartbeat-pils-2.1.4-12.el6.x86_64.rpm heartbeat-stonith-2.1.4-12.el6.x86_64.rpm

Cd ~

1. SCP-r heartbeat2/node2:/root
Node2
Same software installation
Yum install Perl-TimeDate pyxml Libnet-SNMP-libs

Yum install heartbeat-2.1.4-12.el6.x86_64.rpm heartbeat-pils-2.1.4-12.el6.x86_64.rpm heartbeat-stonith-2.1.4-12.el6.x86_64.rpm

Configure the multicast address 224.0.1.0 -- 238.255.255.255

Grep 694/etc/services

Ha. Cf main configuration file
Authkeys authentication key. The permission must be a group and cannot be accessed.
Files used by haresources for resources

CD/etc/ha. d
Ls

CP/usr/share/doc/heartbeat-2.1.4/{ha. Cf, haresources, authkeys }./
Chmod 600 authkeys

CD/etc/ha. cf

Vim ha. cf
Enable
Logfile/var/log/ha-log
Ompression_threshold 2 COMPRESSION

MCAST eth0 225.100.90.101 694 1 0 broadcast address

Node node1.www.dingchao.com
Node node2.www.dingchao.com

Ping 172.16.0.1 is generally a gateway Ping Gateway

Vim authkeys Authentication Key

Auth 2
2 sha1 XXXXX random code OpenSSL rand-hex 8

Vim haresources

In the virtual
Define Resources

Primary resource VIP (Mobile IP)/subnet/NIC alias/broadcast address the second resource is same as inbound and outbound
Node1.dingchao.com 172.16.100.23/16/eth0/172.16.255.255 httpd

Make sure there are httpd Resources

Define resource IP script

/Etc/ha. d/resource. d

Ipaddr and ipaddr2

Resource httpd script
Is/etc/rc. d/init. d/httpd

SCP-P ha. Cf haresources authkeys node2:/etc/ha. d

Configure Node 2 and send the script of Node 1 to him.

Vim/var/www/html/index.html provides webpage

Test curl http: // XXXXX
Disable service httpd stop
Do not start
Chkconfig httpd off

Configure and define httpd resources in the same way as node 2

Test the experiment results
Master Node
Service heartbeat start
SS-unl 694
SSH node2.dingchao.com 'service heartbeat start'
Tail-F/var/log/ha-Log
View SS-unl 694 on node 2
Ifconfig does not display the flow IP configuration
Service heartbeat stop
Web page test successful

Share MySQL web pages for high-availability Web Access

172.16.1.143
Vim/etc/exports
/Www/html 172.16.0.0/16 (no_root_squash, RW)

Mount the webpage on the node to test whether the webpage is normal.
Mount-t nfs 172.16.1.143:/www/html/var/www/html
Curl http: // Local Web Service

Disable service heartbeat stop on all nodes
Redefine Resources
Vim ETC/ha. d/haresources
Node1.dingchao.com 172.16.100.23/16/eth0/172.16.255.255 filesystem: 172.16.100.9:/var/www/html: NFS httpd

SCP haresources node2:/etc/ha. d/
Service haretbeat start
SSH node2.dingchao.com 'service heartbeat start'
Mount

Ifconfig
Mount
Test

Offline Node
Test webpage

Restart the node and test the node offline.
SS-unl

Use of another command

CD/usr/lib64/heartbeat

Hb_standby
Hb_takeover

Heartbeat V2 Configuration

Explanation of heartbeatv1 and NFS file sharing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.