Heartbeat CRM configuration for Linux high availability solution

Source: Internet
Author: User
By default, heartbeat cannot monitor resources. that is to say, if a resource crashes, no action will occur. it will only take action when it deems the other machine dead, that is, crashed, network disconnection, and so on. This obviously cannot achieve our goal. To...

By default, heartbeat cannot monitor resources. that is to say, if a resource crashes, no action will occur. it will only take action when it deems the other machine dead, that is, crashed, network disconnection, and so on. This obviously cannot achieve our goal. To achieve our goal, we need to adopt the crm (cluster resource management) mode.
The goal of this article is to allow ha to automatically monitor the running status of resources. Start the service ip address 192.168.0.222 and run the script echo. sh automatically.
The echo. sh script content is as follows:
#! /Bin/bash
Echo "ii">/var/lib/heartbeat/crm/1.txt
Exit 0
Configure crm
First, install and configure Heartbeat in the default mode.

After the default mode is successfully configured, add it to ha. cf.
Crm on
Convert the haresources resource file to the cib. xml file, 2.1.3 Self-carried conversion script
#/Usr/lib/heartbeat/haresources2cib. py haresources
Output file in/var/lib/heartbeat/crm/cib. xml
If the hacluster and haclient users and user groups are created after heartbeat is installed, run the following command to modify the permissions:
To modify the heartbeat directory permission, run the following command:
# Find/-type d-name "heartbeat"-exec chown-R hacluster {};
# Find/-type d-name "heartbeat"-exec chgrp-R haclient {};
In version 2.0, ipfail conflicts with the crm mode, so ipfail cannot be enabled in ha. cf.
Cib. xml file modification
The cib. xml file resource of Heartbeat has two configuration styles: ocf and lsb. before modifying the file, first introduce the differences between ocf and lsb:
The angle of the LSB format must support the status function, and must be able to receive the start, stop, status, three parameters. The starting angle of the lsb format is in the/usr/lib/lsb/resource. d/heartbeat directory.
For example, when running./mysql status in LSB-style scripts,
If the returned value contains OK or running, the resource is normal.
If the returned value contains stopped or No, the resource is abnormal.
The OCF format must support the start, stop, and monitor parameters. the status and monitor parameters are used to monitor resources. The startup script in ocf format is in/usr/lib/ocf/resource. d/heartbeat (maybe the directory on your machine is not this, you can search for ocf)
If it is an OCF-style script, when running./mysql monitor,
If 0 is returned, the resource is normal,
7 indicates a resource error.
In Heartbeat2.1.3, the default resource configuration style is ocf. you can modify the resource configuration style Cong ocf to lsb in either of the following ways:
1. modify cib. xml and change the resource ocf to lsb. Copy the script file to/usr/lib/lsb/resource. d/heartbeat (if the directory does not exist, create it manually and grant the permission to hacluster: haclient)
2. modify the script under/usr/lib/ocf/resource. d/heartbeat to make it work properly. Or copy the script under/etc/init. d/to make it support the monitor operation.
Start the heartbeat service after the configuration is complete.
#/Etc/init. d/heartbeat start
Create Cluster Resources
You can create the following types of resources:
Original resource: the original resource is the most basic resource type.
Resource Group: a resource group contains a series of resources that need to be placed together, started sequentially, and stopped in reverse order.
Clone resources: clone resources are resources that can be active on multiple hosts. If each resource proxy supports this function, all resources can be cloned.
Primary resource: a primary resource is a special clone resource. a primary resource can have multiple modes. The primary resource must contain only one group or one regular resource.
Resource options
You can define options for each added resource. Clusters use these options to determine how resources behave. they will tell CRM how to treat specific resources. You can use the crm_resource-meta command or GUI to set resource options.
Original resource options
If priority does not allow all resources to be active, the group Assembly stops resources with lower priority to keep the resources with higher priority active.
Target-role cluster test chart on the status of the resource, allowed values: Stopped and Started.
Is-managed: whether to allow the cluster to start and stop resources. Valid values: true and false.
The default value is default-resource-stickiness.
The maximum number of failures of this resource on the migration-threshold node can be determined if the node is not qualified to be in charge of this resource. default value: none.
Multiple-active if a resource is found to be active on multiple nodes, how does the cluster operate? Allowed values: block (Mark the resource as unmanaged), stop_only, and stop_start.
Failure-timeout takes a few seconds before it is restored to work normally as if it were not faulty (and allows the resource to return to the faulty node). default value: never.
Resource Operations
By default, the cluster does not ensure that your resources remain normal. To instruct the cluster to perform this operation, add a monitoring operation to the resource definition. You can add monitoring operations for all classes or resource proxies.
ID: your operation name. It must be unique.
Name: The operation to be performed. Common values: monitor, start, and stop.
Interval: the frequency of operations. Unit: seconds.
Timeout: How long does it take to declare that the operation fails.
Requires: what conditions must be met before this operation can take place. Allowed values: nothing, quorum, and fencing. The default value depends on whether the barrier and resource-enabled classes are stonith. For STONITH resources, the default value is nothing.
On-fail: the operation executed when this operation fails. Allowed values:
Ignore: pretend that the resource has not failed.
Block: no further operations are performed on resources.
Stop: stops a resource and does not start it elsewhere.
Restart: stop the resource and restart it (possibly on different nodes.
Fence: disables a node (STONITH) that fails the resource ).
Standby: removes all resources from the node where the resource fails.
If the value of enabled is false, the operation is deemed not to exist. Allowed values: true and false.
Parameters contained in the original resource
Meta attribute: Meta attribute is an option that can be added to a resource. They tell CRM how to handle specific resources.
Instance attribute: an instance attribute is a parameter of a specific resource class. it is used to determine the behavior of the resource class and the service instance it controls.
Operation: you can add monitoring operations for resources. The monitoring operation instructs the cluster to ensure that the resources are still normal. You can add monitoring operations to all resource proxy classes. You can also set specific parameters, such as setting the timeout value for the start or stop operation.
Define original resource: primitive unique ID resource proxy type: resource proxy provider: resource proxy name instance attribute operation meta attribute
The migration-threshold is used to define the number of resource failures. assume that a preferred location constraint has been configured for the resource to run on the node. If there is a failure, the system checks the migration-threshold and compares it with the fault count. If the fault count is> = migration-threshold, the resource will be migrated to the next selected node.
By default, a faulty resource can be run on a node only after the administrator manually resets the resource fault count (after the fault cause is fixed.
However, you can set the failure-timeout option of the resource to invalidate the fault count. If you set migration-threshold = 2 and failure-timeout = 60 s, the resource will be migrated to the new node after two failures, and it may be allowed to move back after one minute (depending on stickiness and constraint scores ).
There are two exceptions to the Migration Threshold Concept: when a resource fails to be started or stops, the failure count is set to INFINITY due to startup failure, which always leads to immediate migration. Stopping a fault will cause a barrier (this is the default setting when stonith-enabled is set to true ). If the STONITH resource is not defined (or stonith-enabled is set to false), the resource will not be migrated at all.
Reset resource fault count: execute the crm_resource-C and crm_failcount-D commands on the specified resource on the specified node.
If you set the initial status of a resource to stopped (the value of the target-role meta attribute is stopped) during creation, the resource will not be automatically started after creation. To start a resource, run the following command: crm resource start resource ID:
Configure resource monitoring (you can use the op monitor command to define a resource)
Although High Availability Extension can detect node faults, it can also detect when various resources on the node are faulty. If you want to ensure that the resource runs, you must configure resource monitoring for the resource. Resource monitoring includes the specified timeout and/or startup delay value and interval. Interval tells CRM how often to check the resource status.
Configure resource constraints
Configuring all resources is only part of the job. Even if the cluster is familiar with all the necessary resources, it may not be able to handle them correctly. Resource constraints allow you to specify the cluster nodes on which resources are running, the sequence in which resources are loaded, and the other resources on which specific resources are dependent.
Three different constraints:
Resource Location: A Location constraint defines the nodes on which a Resource can, cannot, or is preferred.
Resource Collocation: the arrangement constraint tells the cluster that resources can or cannot run together on a node.
Resource Order: sort constraints define the Order of operations.
When defining constraints, you also need to specify scores. Scores are an important part of the way clusters work. In fact, the entire process from migrating resources to deciding which resources to stop in the degraded group is achieved by manipulating scores in some way. Scores are calculated based on each resource. no node with a negative resource score can run the resource. After calculating the resource score, the cluster selects the node with the highest score. INFINITY is currently defined as 1,000,000. Addition or subtraction of infinity follows the following three basic rules:
Any value + infinity = infinity
Any value-infinity =-infinity
Infinity-infinity =-infinity
When defining resource constraints, you can also specify the scores of each constraint. The score indicates the value you assign to this resource constraint. The constraint with a higher score is applied first, and the constraint with a lower score is then applied. By using different scores to create more location constraints for a given resource, you can specify the order of the target node for resource failover.
Resource failover node
Resources are automatically restarted when a fault occurs. If the current node cannot be restarted, or if N failures occur on the current node, the resource will attempt to fail over to another node. You can define the number of resource failures (migration-threshold) multiple times. after this value, the resource will be migrated to the new node.
Resource fault recovery node (resource stickiness)
When the original node is online and in the cluster, the resource may fail back to the node. If you want to prevent a resource from replying to a node that runs before failover, or if you want to specify another node for a fault response, you must change the resource stickiness value. You can specify resource stickiness when creating a resource or after creating a resource.
When specifying the resource stickiness value, consider the following:
0: this is the default option. The most suitable location for storing resources in the system. This means that resources are transferred only when nodes with better load capabilities or poor load capabilities become available. This option serves almost the same purpose as automatic fault recovery, but resources may be transferred to non-active nodes.
Value greater than 0: the resource is more willing to stay in the current position, but it will move when a more suitable node is available. The higher the value, the more willing the resource to stay in the current position.
Value less than 0: the resource is more willing to move from the current location. The higher the absolute value, the more willing the resource to leave the current position.
Value: INFINITY: if the resource is forcibly transferred because the node is not suitable for running resources (node shutdown, node standby, reaching migration-threshold or configuration change), the resource is always in the current position. This option serves almost the same purpose as disabling automatic failover.
-INFINITY: The resource is always removed from the current location.
Define location constraints: Resource ID rules with unique location IDs
Define location constraints: Resource ID rules with unique location IDs
Location Prefer-Node1 (ldirectord)
Rule $ id = "prefer-node1-rule" 100: # uname eq NODE1
Resource arrangement constraints
The colocation command is used to define which resources should be run on the same host and which resources should be run on different hosts. Generally, the following sequence is used:
Crm (live) configure # order rsc1 rsc2
Crm (live) configure # colocation rsc2 rsc1
You can only set + INFINITY or-INFINITY scores to define resources that must be always or never run on the same node. For example, to always run two resources with the ID filesystem_resource and nfs_group on the same host, you can use the following constraints:
Crm (live) configure # colocation nfs_on_filesystem inf: nfs_group filesystem_resource
For master-slave configuration, in addition to running resources locally, it is also necessary to know whether the current node is the master node. This can be checked by attaching the to_role or from_role attribute.
Sorting constraints:
Sometimes it is necessary to provide the sequence of startup resources. For example, you cannot mount a file system before the device can be used in the system. Sort constraints can be used to start or stop a service before or after another resource meets a special condition, such as started, stopped, or upgraded to the primary resource. You can use the following command in crm to configure sorting constraints:
Crm (live) configure # order nfs_after_filesystem mandatory: filesystem_resource nfs_group
Configure a cluster Resource Group
Some cluster resources are related to other components or resources. each component or resource must be started in a specific order and run on the same server. To simplify the configuration, introduce the concept of resource groups.
Resource groups have the following attributes:
Start and stop resources: resources are started in the displayed order and stopped in the reverse order.
Relevance: If a resource in a group cannot run somewhere, no resource in the group following it is allowed to run.
Group content: the group may only contain some original cluster resources. To reference the child of a group resource, use the Child ID instead of the group ID.
Restriction: Although the child of a group can be referenced in the constraint, the group name is usually used.
Stickiness: the stickiness can be accumulated in the group. Members of each activity group can add their stickiness values to the group's total score. Therefore, if the default resource-stickiness value is 100, and there are seven members in the group, five of them are active, the total group score is 500.
Define Resource group: list of unique group ID resources
For example:
Group Load-Balancing Virtual-IP-Tomcat ldirectord
Configure clone resource:
You may want some resources to run on multiple nodes of the cluster at the same time. To do this, you must configure the resource as a clone resource. Resource examples that can be configured as clone resources include STONITH and cluster file systems (such as OCFS2 ). If supported by the resource proxy, you can clone any resource. The clone resource configuration is even different depending on the node where the resource resides.
There are three types of resource cloning:
Anonymous clone: this is the simplest clone type. The clone type runs in the same way at all locations. Therefore, only one anonymous clone instance on each computer is active.
Globally unique clone: these resources are different. The clone instance running on one node is different from the instance running on another node, and any two instances running on the same node are also different.
Status clone: the active instances of these resources are divided into two states: active and passive. It is also called the primary and secondary, or the primary and slave. Status clone can be an anonymous clone or a globally unique clone.
Define clone resource: clone unique ID resource ID
For example:
Clone cl-tomcat
Clone cl-mysql
Status clone:
Primitive drbd_mysql ocf: linbit: drbd \
Params drbd_resource = "mysql "\
Op monitor interval = "15 s"
MS ms_drbd_mysql drbd_mysql \
Meta master-max = "1" master-node-max = "1 "\
Clone-max = "2" clone-node-max = "1 "\
Required y = "true"
Set other CRM attributes
For clusters with two nodes, set no-quorum-policy to ignore. if one node is down, the other node can still run normally. Set start-failure-is-fatal to false, which allows you to set the migration-threshold attribute for each resource. If no stonith resource is defined, stonith-enabled must be set to false.
Property no-quorum-policy = "ignore "\
Start-failure-is-fatal = "false "\
Stonith-enabled = "false"
Migrate Cluster Resources
Crm (live) # resource
Crm (live) resource # migrate VIP node2
Start/Stop resources:
Crm resource start resource-ID
Crm resource stop resource-ID
Run the following command on a specific node:
Turning a node into a backup node
Crm node standby
Change a node to an active node
Crm node online
Cib. xml
 



# Configure resource metadata


# Set the default score to 0
# Set the default score loss to 0

















# Configure resource group attributes









# Configure ip score
# Configuring ip score loss







# Remove the interval value to avoid continuous execution of the echo. sh script









# Configure resource constraints

# Configure the default score for the ip address resource on ha1. Note that if the resource group restriction such as group_1 is configured in rsc, if only the resource score is set, if no default resource is configured, the restriction is invalid. I cannot successfully test the resource group configuration score. if you have any success, I hope you can share your configuration experience with me.



# Configure the default score for script resources on ha1





# Configure the default score for ip address resources on ha2




# Configure the default score for script resources on ha2






 

Author: "CzmMiao's blog Life"
 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.