Linux cluster resource management and system management

Source: Internet
Author: User
Tags ibm server

This article introduces the main tasks of resource management and system management in the cluster system, and then lists and compares several common resource management software and system management software.

1. Cluster Job Management
From the user's perspective, the cluster system is like a server or PC. Many users can use this system at the same time. However, when too many users use the cluster system, the system performance will become poor. Resource management is to manage jobs submitted by users, allocate resources to each job reasonably, so as to make full use of the computing power of the cluster system and obtain the computing results as quickly as possible. To put it simply, the implementation of cluster resources is as follows:

  • Resource Manager: to ensure proper resources are allocated to jobs, a database must be maintained for cluster resource management. This database records the attributes and status of various resources in the cluster system, all user-submitted requests, and running jobs. The Policy Manager generates a priority list based on the data and the specified scheduling policy. The Resource Manager schedules jobs based on the priority list. The resource manager should also be able to reserve resources. In this way, not only powerful resources can be reserved for required jobs, but also redundant resources can be reserved to cope with node failures and sudden computing in the cluster.
  • Job Scheduling Policy Manager: Based on the Resource Manager, the policy manager obtains the resource status of each node and the system job information to generate a priority list. This list tells the resource manager when to run the job on which nodes. Policy Manager not only provides a complex set of parameters to define the computing environment and jobs, but also provides a simple and flexible expression for this definition to allow the system administrator to implement policy-driven resource scheduling.

2. Job Management Software in the Beowulf Cluster
There are many options to manage resources in the cluster system. PBS Resource Manager and Maui Job scheduler are most suitable for cluster systems.

2.1 PBS
PBS (Portable Batch System) is a flexible Batch processing System developed by NASA. It is used in Cluster Systems, supercomputers, and large-scale parallel systems. PBS has the following features:

  • Ease of use: provides unified interfaces for all resources and is easy to configure to meet the needs of different systems. Flexible job schedulers allow different systems to adopt their own scheduling policies.
  • Portability: complies with POSIX 1003.2 standards and can be used in shell, batch processing, and other environments.
  • Adaptability: it can adapt to various management policies and provide scalable authentication and security models. Supports Dynamic Distribution of loads on the Wide Area Network and virtual organizations built on multiple physical entities in different locations.
  • Flexibility: supports interaction and batch processing jobs.

Open PBS (http://www.OpenPBS.org/) is the implementation of Open Source of PBS. For commercial PBS, see http://www.pbspro.com /.

2.2 Maui
Maui is an advanced Job scheduler. It uses active scheduling policies to optimize resource utilization and reduce job response time. Maui's resource and load management allows advanced parameter configurations: Job Priority, Scheduling and Allocation, Fairness and fair share) and Reservation Policy ). Maui's QoS mechanism allows direct transfer of resources and services, Policy Exemption, and restricted access to specified features. Maui uses an advanced Resource Reservation architecture to precisely control when, where, WHO, and how resources are used. The reserved Maui architecture fully supports non-intrusive metadata scheduling.

Maui is designed thanks to the experience of the world's largest high-performance computing center. Maui itself also provides test tools and Simulators for estimating and tuning system performance.

Maui needs the resource manager to work with it. We can think of Maui as an insert part in PBS.

For more Maui information, visit: http://www.supercluster.org

3. Cluster System Management
From the perspective of system composition, the cluster system is composed of multiple computers. However, from the perspective of end users, the cluster system is a computer, that is, the structure of the cluster system is transparent to users. Therefore, the purpose of cluster system management is to make the cluster system as easy as a computer for management. To sum up, cluster system management generally completes the following tasks:

3.1 Resource Management
Simply put, resource management is to allocate system resources and monitor the Usage Status of system resources. Resources here are a broad concept. Various hardware devices, data, and programs can all be considered as resources, such as CPU, storage, network card, and even system events and logs.

3.2 Event Service
An Event is a change in the system status. For example, "CPU usage exceeds 90%" can be understood as an event. Simply put, the event service is the event notification service, that is, when an event occurs, it notifies the individual interested in such events. Event Services can be divided into Push (also called Subscribe-Publish) and Pull modes. The system administrator should also be able to set the system's automatic response to the event through the event service.

3.3 distributed commands and files
Distributed commands and files means that commands and file operations are executed concurrently on the entire cluster node or a specified group of nodes.

Distributed command functions are usually provided through distributed Shell. This Shell is generally called dsh (distributed shell) or psh (parallel shell ). You can use rsh or ssh to implement distributed Shell.

Distributed files are mainly used to synchronize configuration files in the cluster. The cluster system is actually composed of multiple nodes, so a configuration of the cluster system needs to be published to each node (or a group of nodes ). For example, to configure that Apache on each node supports CGI, You need to publish the configuration file under/etc/httpd to/etc/httpd on each node. To put it simply, the cluster system configuration management is to publish one or more configuration files to the specified node. There are many open-source tools to help complete the Distributed File functions of the cluster system, such as rdist and cfengine.

3.4 Monitoring and Diagnosis
For a continuously running cluster system, when the system runs normally, you need some tools to monitor the running status of each part of the system, such as system processes, CPU utilization and memory utilization. On a Common Unix system, you can simply use ps and top to implement these functions. However, in the cluster system, you do need some special tools, and it is best to monitor the system to support a variety of network management protocols, such as SNMP and WBEM. When the cluster system is not working properly, you need other tools to assist in system diagnosis. For example, if a system does not provide services, you may need to ping the system to check whether the network is faulty. When multiple node services are deployed at that time, you need to ping the service concurrently to check whether the service is a network error.

3.5 hardware control
The simple management function on PC may be difficult for cluster systems. For example, it is difficult to manually restart a group of nodes. Therefore, the cluster system requires some special hardware devices to complete these functions. The following are some special management functions required by hardware:

  • Remote Power Management: it is mainly used to remotely shut down, enable and restart the node, and query the node Power status. Use ASM in IBM eServer Cluster 1300.
  • Remote Console: when a remote node encounters a problem or requires special software, you must log on to the node to complete the operation. The KVM Switch can meet this requirement, but when there are many nodes, the KVM Switch will be very complicated. The KVM Switch must be manually switched and cannot be used by software. The Terminal Server overcomes the disadvantages of KVM Switch. The Terminal Server is connected to the serial port of the node, and the serial port is virtualized into a Terminal device on the Management node. Of course, you need to configure the operating system of the node.

3.6 installing the system
Cluster System installation mainly refers to the installation of the operating system, file system, parallel program running library, job management software and system management software on each node. It is a prerequisite for the cluster system to be put into application, so the installation of the cluster system is a very important task. Generally, the cluster system is composed of dozens or even hundreds of thousands of computers. Obviously, it is almost impossible to manually install the system. The general cluster system installation mechanism is:

  1. Network start: Set the node to be installed to start the network, and then manage the node to remotely restart the node to be installed. After a node is started, a small operating system kernel is obtained from the startup server. Generally, Intel PXE (Pre-Execution Environment) standard is used for network startup. PXELinux is a network startup server that supports PXE. It can start a small Linux core and run the specified Init program at the network startup node. The Init program is responsible for subsequent installation.
  2. Network installation: this operating system kernel is responsible for obtaining installation packages or system images from the Installation server (usually a file server) and installing the system locally. There are multiple Linux tools for network-based system installation. Typical examples of these tools are KickStart, ALICE (Automatic Linux Installation and Configuration Environment), SIS (System Install Suite), and PartImage. These tools can be divided into the following types:
    1. A. Script-based installation: In this installation method, the installation process is controlled by the installation Script. You can modify the installation Script to configure the installation process. In this installation method, the Installation server is actually a file server, which provides the software package to be installed to the node. Except the software package is not local, this installation method is not much different from local installation. All local installation steps (configure hardware, install software package, configure system, etc.) are required. KickStart is the installation method. Script-based installation is flexible, but it is operating system-dependent. Like KickStart only supports Redhat Linux.
    2. B. Imaging-based installation: Unlike Script-based installation, Imaging-based installation does not require local installation steps. It only needs to copy the Image that needs to be installed on the file service to the local hard disk. This system image comes from a prototype that has been installed and configured. Imaging is installed independently of the operating system, but depends on the file system supported by the operating system kernel started by the network. The disadvantage of Imaging is that it is difficult to provide Configuration Methods independent of the operating system. PartImage is the installation method of Imaging. SIS is a hybrid Installation Method of Script and Imaging. SIS uses the Linux chroot command to install a virtual operating system image in a file directory of the Installation server. At the same time, the SIS allows you to provide Shell scripts to complete the configuration after installation.
    3. C. Cloning-based installation: similar to Imaging, Cloning installation also uses a system image. However, the system image in Cloning is the Clone of the hard disk partition on the prototype. Therefore, Cloning does not need to identify the file system type in the system image. Therefore, it is independent from the file system. It only depends on the hard disk device type (IDE or SCSI) supported by the operating system kernel ). Like Imaging, Cloning is difficult to provide Configuration Methods independent of the operating system. Besides, Cloning is less efficient than Imaging. You can simply use the dd command to implement Clone.

The following table summarizes the features of several installation tools:


Installation Tools Installation Method Supported Systems Supported network protocols
KickStart Script Redhat Linux NFS and FTP
SIS Mix Script and Imaging Redhat Linux
SuSE Linux
Turbo Linux
...
Rsync
PartImage Imaging EXT2, FAT, NTFS, HPFS... Private Protocol

3.7 Domain Management
You can simply regard the domain management of the cluster system as node management. It mainly includes the following simple functions:

  • Add, delete, and list nodes in the cluster system
  • Group nodes in a cluster

In fact, we also include job management in cluster system management tasks. However, compared with other system management tasks, Job Management plays a more important role in the cluster system, and the common cluster system management software does not directly implement the job management function. Therefore, we regard job management as an important software part of the cluster system, rather than a task of cluster system management.

4. Several Cluster System Management Software
The cluster system management software is as diverse and diverse as the cluster system. The following describes several cluster system management software and compares their functions.

Ibm csm 4.1
Ibm csm (Cluster Systems Management) is a system Management software on IBM eServer Cluster 1300. Part of IBM's Linux cluster strategy is to port the PSSP software running on the RS/6000 SP platform to the xSeries-based Linux cluster system. Most of the CSM functions come from the SP platform, but it also integrates WebSM 2000, xSeries, open source tools and other technologies. CSM is a fully functional management tool and is still developing.

4.2 XCAT
XCAT is a system management software for IBM eServer Cluster 1300. It was developed by Egan Ford. It is basically written by shell scripts, which is quite simple. However, it implements most of the content of cluster system management and is a very good management software.

4.3 Mon
Mon is developed on the Linux platform, but also known for running on Solaris. Mon servers and customers are developed based on perl, so it is easy to transplant to other UNIX and UNIX-like platforms.

The following table compares the above three cluster system management software:

Project CSM XCAT Mon
Supported Cluster Systems IBM server Cluster 1300 IBM server Cluster 1300 Not specific to a cluster system
Supported Operating Systems Redhat, SuSE Redhat, the node can use Imaging and Cloning to install other operating systems, or even Windows Developed on Linux, but well known for running on Solaris. It is easy to transplant to other Unix and non-Unix operating systems.
Resource management It provides unified, scalable, and comprehensive resource management, but it is very complicated to use because of its strength. Basically none Basically none
Event Service Provides an event subscription and publishing mechanism, and pre-defines many system events and responses to events. In the future, Mon will be integrated to complete the event service. Supported
Configuration Management Supported None None
Monitoring and Diagnosis Supports distributed Shell (dsh) and SNMP Supports concurrent Shell (psh) and concurrent ping (pping) Support for SNMP
Hardware control Rpower Remote Console) Rpower remote console (rcon, wcon) None
System Installation Support for KickStart and SIS and PXE Supports KickStart, Imaging, and Cloning. Supports PXE and etherboot. None
Domain Management Comprehensive Basically none Basically none
Integration Except for the necessary open source software packages, it is not integrated with any other software. However, underlying resource management and Event Services provide programming interfaces for easy integration. The upper layer can use commands to call integration. Automatically install PBS, Maui, Myrinet, and MPI. SgridEngine Scheduler will be supported in the future Basically none. It should be possible to integrate through command line
Ease of use Provides powerful command line tools and simple GUI tools The command line tool will be integrated with Ganglia in the future to provide a certain GUI Provides command line and Web-based tools

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.