In the Heartbeat of V2, to combine resource monitoring and switching, multi-node clusters are also supported, heartbeat provides an integration policy to control the switching policies between nodes of each resource in the cluster. This point mechanism is used to calculate the total score of each node. The highest score is used to manage a (or a group) Resource in the active state. If
In the Heartbeat of V2, to combine resource monitoring and switching, multi-node clusters are also supported, heartbeat provides an integration policy to control the switching policies between nodes of each resource in the cluster. This point mechanism is used to calculate the total score of each node. The highest score is used to manage a (or a group) Resource in the active state. If
In the Heartbeat of V2, to combine resource monitoring and switching, multi-node clusters are also supported, heartbeat provides an integration policy to control the switching policies between nodes of each resource in the cluster. This point mechanism is used to calculate the total score of each node. The highest score is used to manage a (or a group) Resource in the active state.
If no configuration is made in the configuration file of CIB, the initial score (resource-stickiness) of each resource is 0 by default, in addition, the score (resource-failure-stickiness) of each resource after each failure is also 0. In this case, heartbeat only performs the restart operation no matter how many times a resource fails. In general, the value of resource-stickiness is positive, and the value of resource-failure-stickiness is negative. Another special value is positive INFINITY and negative INFINITY ). If the node score is negative, no matter what happens, the node will not take over the resources (Cold Standby node ). As the status of the resource changes, the score on each node changes. As the score changes, once the score of a node is greater than the score of the node currently running the resource, heartbeat switches the resource. nodes running the resource will release the resource, and nodes with higher scores will take over the resource.
In CIB configuration, you can define a score for each resource and set it through resource-stickiness. You can also set a score that is lost after a failure, use resource-failure-stickiness to set. As follows:
...
The preceding configuration configures two scores for the resource mysql_db. The scores (resource_stickiness) and the scores (resource_failure_stickiness) that will be lost when running the resource successfully ), if the two scores are the same, the score is 100 points for success and-100 points for failure.
In addition to setting two scores for each resource, you can also set all resources to the same score, as shown below:
...
...
...
In this configuration, two default scores are set for all resources, saving the trouble of setting each resource separately. Of course, if the default score is set for some or all resources, the scores set for each resource separately are used instead of the default scores.
In addition to the resource scores, nodes also have scores. The node score can be set as follows:
...
...
Note that the node score settings are placed under the constraints configuration item in the configuration item, and are set through rule. The node host name is used for matching (in fact, many heartbeat configurations are sensitive to host names ). The value here is the host name of the node, and the score in the rule is the score of a node.
Through the above configuration, we can make the following calculations:
A. If heartbeat is started on both sides at the beginning, neither side starts to run this resource. The resource itself has no score, so only the node score is calculated:
Mysql1 score: node + resource + failcount * failure = 200 + 0 + (0 * (-100) = 200
Mysql2 score: node + resource + failcount * failure = 150 + 0 + (0 * (-100) = 150
Heartbeat will choose to run the resource mysql_db on mysql1, and then the score of mysql1 changes, because the scores of resources are added:
Mysql1 score: node + resource + failcount * failure = 200 + 100 + (0 * (-100) = 300
Mysql2 score: node + resource + failcount * failure = 150 + 0 + (0 * (-100) = 150
B. After a while, the heartbeat monitor finds that the resource crash (or other problems) of mysql_db changes immediately, as shown below:
Mysql1 score: node + resource + failcount * failure = 200 + 100 + (1 * (-100) = 200
Mysql2 score: node + resource + failcount * failure = 150 + 0 + (0 * (-100) = 150
Heartbeat finds that the score of the mysql1 node is higher than that of mysql2. If the resource is not migrated, restart operations will be performed.
C. Continue running for a period of time and find that there is another problem (or the restart after B does not get up), and the score changes again:
Mysql1 score: node + resource + failcount * failure = 200 + 100 + (2 * (-100) = 100
Mysql2 score: node + resource + failcount * failure = 150 + 0 + (0 * (-100) = 150
At this time, heartbeat finds that the mysql2 node has a higher score than the mysql1 node, and the resource will be migrated and switched. mysql1 releases the resources related to mysql_db, mysql2 takes over the relevant resources, and runs the resource mysql_db on mysql2. At this time, the node score will change as follows:
Mysql1 score: node + resource + failcount * failure = 200 + 0 + (2 * (-100) = 0
Mysql2 score: node + resource + failcount * failure = 150 + 100 + (0 * (-100) = 250
At this time, if the problem occurs three times on mysql2, the score of mysql2 will be changed to-50, and less than that of mysql1. The resource will be migrated back to mysql1, and the score of mysql1 will be changed to 100, the score of mysql2 is-150, because the resource owner's score is less than 100. Here, the score of mysql2 node is already negative. Heartbeat also has a rule that resources will never be migrated to a node with a negative score. That is to say, no matter how many times the mysql_db resource fails on the mysql1 node, No matter what problems the resource has, it will not be migrated back to the mysql2 node. The score of a node is reset to the initial state after the heartbeat restart of the node. You can also reset or view a resource or resource group of a node in the cluster by using related commands, as shown below:
Crm_failcount-G-U mysql1-r mysql_db # view the failcount of the resource mysql_db on the mysql1 Node
Crm_failcount-D-U mysql1-r mysql_db # The failcount of the resource mysql_db on the mysql1 node will be reset.
Of course, in practice, we usually put some associated resources together to form a Resource Group. Once a resource in the Resource Group has a problem, the resources of the entire Resource Group need to be migrated. This is not much different from the above situation for a single resource. You just need to change the above mysql_db settings to the Resource Group, as shown below:
...
... ...
...
In this way, if any resource in the Resource Group has a problem, it will be considered that the resource group has a problem. If the score is lower than that of other nodes, the entire Resource Group will be switched.
In addition, for the values of INFINITY and-INFINITY, the main purpose is to control whether to switch or not. Because it means that the positive and infinite scores and failures reach the negative infinity, which is mainly used to meet the simple configuration items of extreme rules.
In general, the formula for calculating the number of failures of a resource (or resource group) before one node is migrated to another can be as follows:
(NodeA score-nodeB score + stickiness)/abs (failure stickiness), that is, the total score obtained after A node score minus B node score plus the resource running score, divide by the absolute value of the resource failure score.
Original article address: Heartbeat's switching policy-points statistics method. Thanks to the original author for sharing.