RHCS principles and operations

Source: Internet
Author: User

RHCS principles and operations
RHCS component introduction:
1.1 distributed Cluster Manager (CMAN)
Cluster Manager (CMAN) is a distributed Cluster management tool that runs on each node of the Cluster and provides Cluster management tasks for RHCS. CMAN is used to manage cluster members, messages, and notifications. It monitors the running status of each node to understand the relationship between node members. When a node in the cluster fails, the node member relationship will change, and CMAN promptly notifies the underlying layer of this change, and then make corresponding adjustments.
1.2 Lock Management (DLM)
Distributed Lock Manager (DLM) is a Distributed Lock Manager. It is an underlying component of RHCS and provides a common Lock operation mechanism for clusters, in the RHCS cluster system, DLM runs on each node of the Cluster. GFS uses the lock manager to synchronize access to the file system metadata. CLVM synchronously updates data to LVM volumes and volume groups through the lock manager. DLM does not need to set the lock management server. It adopts the peering lock management method, which greatly improves the processing performance. At the same time, DLM avoids the performance bottleneck of overall recovery when a single node fails. In addition, DLM requests are local and do not require network requests. Therefore, the requests take effect immediately. Finally, the DLM uses a layered mechanism to implement parallel lock modes for multiple lock spaces.
1.3 configuration file management (CCS)
Cluster Configuration System (CCS) is mainly used for managing Cluster Configuration files and synchronizing Configuration files between nodes. CCS runs on each node of the cluster and monitors a single configuration file/etc/cluster on each cluster node. conf status. When this file changes, it is updated to every node in the cluster, and the configuration file of each node is synchronized at all times. For example, the administrator updates the cluster configuration file on node A. After CCS finds that the configuration file of node A has changed, the change will be immediately transmitted to other nodes. The RHCS configuration file is cluster. conf, which is an xml file, including the cluster name, cluster node information, cluster resources and service information, and fence devices.
1.4 gate device (FENCE)
FENCE devices are an essential part of the RHCS cluster. FENCE devices can be used to prevent the "split-brain" phenomenon caused by unpredictable situations and the appearance of FENCE devices, to solve these problems, Fence devices directly issue Hardware Management orders to servers or storage through the hardware management interfaces of servers or storage, or external power management devices, restart or shut down the server, or disconnect from the network. FENCE works in the following way: when a host is abnormal or down due to an accident, the Standby opportunity first calls the FENCE device, and then restarts or isolates the abnormal host from the network through the FENCE device, after the FENCE operation is successfully executed, the information is returned to the slave machine. After receiving the successful information of FENCE, the slave machine starts to take over the services and resources of the host. In this way, resources occupied by abnormal nodes are released through the FENCE device, ensuring that resources and services always run on one node. Rhcs fence devices can be divided into two types: Internal FENCE and external FENCE. Common internal FENCE devices include ibm rsaii cards, HP iLO cards, and IPMI devices, external fence devices include UPS, san switch, and network switch.
RHCS operation:
1. Start the RHCS Cluster
The core processes of the RHCS cluster include cman and rgmanager. To start the cluster, execute the following commands on each node of the cluster in sequence:
Service cman start
Service rgmanager start
Note that the two commands are executed sequentially. You must first start cman and then start rgmanager. After the cman service is successfully started on all nodes in the cluster, start the rgmanager service on each node in sequence.

Ii. Disable the RHCS Cluster
The command to disable the RHCS cluster is as follows:
Service rgmanager stop
Service cman stop
First, disable the rgmanager service on each node of the Cluster. After the rgmanager service on all nodes is successfully disabled, disable the cman service on each node in turn to close the services of the cluster.
When you disable the cman service, you may be prompted to disable the service. In this case, you can check whether the local shared storage GFS2 file system has been uninstalled, you can also check whether the rgmanager service of other nodes is properly disabled.

3. Manage application services
After the cluster system is started, the application service is automatically started by default. However, if an application service is not automatically started, you must start it manually. The command used to manage application services is clusvcadm, which can be used to start, shut down, restart, and switch application services in the cluster.
1. Start an Application Service
You can start the application service of a node as follows:
Clusvcadm-e-m
Where:
? Service: the name of the Application Service created in the cluster.
? Node: indicates the name of the cluster Node.
For example, to start the webserver service on node web1, perform the following operations:
[Root @ web1 ~] # Clusvcadm-e webserver-m web1
Member web1 trying to enable service: webserver... Success
Service: webserver is now running on web1
You can use the/var/log/messages file to view the details of starting the application service. After the webserver is started, service-related cluster resources, such as virtual IP addresses and application service scripts, are also started. You can run the following command to check whether the cluster resources have been properly loaded:
2. Disable an Application Service
You can disable the application service of a node as follows:
Clusvcadm-s-m
For example, to disable the mysqlserver service on node Mysql1, perform the following operations:
[Root @ Mysql1 ~] # Clusvcadm-s mysqlserver-m Mysql1
Member Mysql1 stopping service: mysqlserver... Success
You can view the details of disabling the application service in the/var/log/messages file. When mysqlserver is disabled, cluster resources related to services, such as virtual IP addresses and application service scripts, are also released.
3. Restart an Application Service
You can restart the application service of a node as follows:
Clusvcadm-R-m
For example, to restart the webserver service on node web1, perform the following operations:
[Root @ web2 ~] # Clusvcadm-R webserver-m web1
Member web1 trying to restart service: webserver... Success
This command is executed on the web2 node, but it can also successfully restart the webserver on the web1 node. It can be seen that the clusvcadm command can be executed on any node in the cluster.
4. Switch a service
You can switch an application service from one node to another by using the following method:
Clusvcadm-r-m

For example, to switch the service of node web1 to node web2, perform the following operations:
[Root @ web1 ~] # Clusvcadm-r webserver-m web2
Trying to relocate service: webserver to web2. .. Success
Service: webserver is now running on web2

Iv. Monitoring the RHCS cluster status
By monitoring RHCS, you can understand the health status of each node in the cluster, identify problems, and solve problems in a timely manner. The RHCS cluster provides a variety of Status view commands, this section describes how to use cman_tool, clustat, and ccs_tool.
1. cman_tool command
Cman_tool has many parameters, but its usage is relatively simple. The basic format is as follows:
Cman_tool [options]
The following are some simple examples:
[Root @ web1 ~] # Cman_tool nodes-
Node Sts Inc Joined Name
0 M 0 2010-08-23 01:24:00/dev/sdb7
1 M 2492 01:22:43 web2
Addresses: 192.168.12.240
2 M 2492 01:22:43 Mysql1
Addresses: 192.168.12.231
3 M 2492 01:22:43 Mysql2
Addresses: 192.168.12.20.
4 M 2488 01:22:43 web1
Addresses: 192.168.12.230
This command displays the node name, the corresponding node IP address, and the time when the node was added to the cluster.
To learn more about cluster nodes, run the following command:
[Root @ web1 ~] # Cman_tool status
Version: 6.2.0
Config Version: 35 # cluster configuration file Version
Cluster Name: mycluster # Cluster Name
Cluster Id 56756
Cluster Member: Yes
Cluster Generation: 2764
Membership state: Cluster-Member
Nodes: 4 # Number of cluster Nodes
Expected votes: 6 # Expected number of votes
Quorum device votes: 2 # voting disk voting value
Total votes: 6 # size of all voting values in the Cluster
Quorum: 4 # Cluster legal vote value. If it is lower than this value, the cluster will stop providing services.
Active subsystems: 9
Flags: Dirty
Ports Bound: 0 177
Node name: web1
Node ID: 4 # ID of the current Node in the Cluster
Multicast addresses: 239.192.221.146 # Cluster broadcast address
Node addresses: 192.168.12.230 # IP address of the current Node
2. clustat command
The clustat command is very simple to use. For details, you can use "clustat-h" to obtain help information. Here are just a few examples.

[Root @ web1 ~] # Clustat-I 3
Cluster Status for mycluster @ Mon Aug 23 18:54:15 2010
Member Status: Quorate
Member Name ID Status
--------------------
Web2 1 Online, rgmanager
Mysql1 2 Online, rgmanager
Mysql2 3 Online, rgmanager
Web1 4 Online, Local, rgmanager
/Dev/sdb7 0 Online, Quorum Disk
Service Name Owner (Last) State
------------------------------
Service: mysqlserver Mysql1 started
Service: webserver web1 started
The output content is as follows:
The "-I" parameter of clustat can display the running status of each node and service in the cluster system in real time. "-I 3" indicates that the cluster status is refreshed every three seconds.
In this output, we can see that each node is in the "Online" status, indicating that each node runs normally. If a node exits the cluster, the corresponding status should be "Offline ", at the same time, we can also see that the two services of the cluster are also in the "started" status and run on Mysql1 and web1 respectively.
In addition, you can use the "ID" column to know the ing relationship between cluster nodes. For example, web2 corresponds to Node 1 in this cluster. Similarly, web1 corresponds to Node 4. Understanding the cluster node sequence helps you interpret cluster logs.

3. ccs_tool command
Ccs_tool is used to manage the cluster configuration file cluster. conf. You can use ccs_tool to add/delete nodes, add/delete fence devices, and update cluster configuration files in the cluster.
Below are several Application Instances of ccs_tool:
After modifying the configuration file on a node, you can run the "ccs_tool update" command to update the configuration file on all nodes. For example:
[Root @ web1 cluster] # ccs_tool update/etc/cluster. conf
Proposed updated config file does not have greater version number.
Current config_version: 35
Proposed config_version: 35
Failed to update config file.
Ccs_tool is based on cluster. the "config_version" value in conf determines whether to update the cluster. after the conf file, make sure that the cluster. the config_version value of conf is updated, so that the configuration file can be updated only when ccs_tool is executed.
[Root @ web1 cluster] # ccs_tool update/etc/cluster. conf
Config file updated from version 35 to 36
Update complete.

5. manage and maintain the GFS2 File System
The GFS2 file system provides many management and maintenance tools, including gfs2_fsck, gfs2_tool, gfs2_jadd, gfs2_quota, and gfs2_convert. The usage of the first three commands is described here.

1. gfs2_fsck command
Similar to the fsck. ext3 command in the ext3 file system, it is mainly used to detect and fix file system errors. In fact, GFS2 also has a fsck. gfs2 command, which is exactly the same as gfs2_fsck.
Gfs2_fsck is used as follows:
Gfs2_fsck [-afhnpqvVy]
The following are examples:
[Root @ Mysql1 ~] # Gfs2_fsck-y/dev/sdb5
Initializing fsck
Validating Resource Group index.
Level 1 RG check.
(Level 1 passed)
Starting pass1
Starting pass1c
Pass1c complete
............
Pass5 complete
Gfs2_fsck complete

2. gfs2_tool command
Gfs2_tool has many command parameters, but its usage is not complicated. It is mainly used to view and modify parameters of the GFS2 file system.
The following are examples:
1) view the mounting information of the GFS2 File System
[Root @ web1 ~] # Gfs2_tool df/gfs2
/Gfs2:
SB lock proto = "lock_dlm"
SB lock table = "mycluster: my-gfs2"
SB ondisk format = 1801
SB multi host format = 1900
Block size = 4096
Journals = 4
Resource Groups = 19
Mounted lock proto = "lock_dlm"
Mounted lock table = "mycluster: my-gfs2"
Mounted host data = "jid = 2: id = 65539: first = 0"
Journal number = 2
Lock module flags = 0
Local flocks = FALSE
Local caching = FALSE

Type Total Blocks Used Blocks Free Blocks use %
------------------------------------------------------------------------
Data 1220724 136578 1084146 11%
Inodes 1084263 117 1084146 0%

(2) gfs2_tool command
2) Lock and unlock the GFS2 File System:
[Root @ node1 gfs2] # gfs2_tool freeze/gfs2
[Root @ node1 gfs2] # gfs2_tool unfreeze/gfs2
After the GFS2 file system is locked, you cannot perform read/write operations until it is unlocked.
3) query the number of points that can be attached to GFS2
[Root @ web1 ~] # Gfs2_tool journals/gfs2
Journal2 to 128 MB
Journal3 to 128 MB
Journal1-128 MB
Journal0-128 MB
4 journal (s) found.
The number of nodes that can be attached is 4 and the size of each journal is 128 MB.
4) display the GFS2 version information:
[Root @ web1 ~] # Gfs2_tool version
Gfs2_tool 0.1.62 (built Mar 31 2010 07:34:25)
Copyright (C) Red Hat, Inc. 2004-2006 All rights reserved

(3) gfs2-jadd commands
Gfs2-jadd is mainly used to configure the number and size of GFS2 Journals, the usage is very simple:
Gfs2_jadd [-cDhJjqV]/path/to/filesystem
The following are some examples:
Set the Journals size to 64 MB.
[Root @ Mysql1 ~] # Gfs2_jadd-J 64 M
Increase the number of nodes simultaneously mounted to GFS2 to 5
[Root @ Mysql1 ~] # Gfs2_jadd-j 5/gfs2
In addition, gfs2_quota is used to manage the disk quota of the GFS2 file system. gfs2_convert is a data conversion application that can update the metadata of the GFS file system and convert it to a GFS2 file system. For more information, see help.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.