Linux High-Performance cluster-Software Architecture

Source: Internet
Author: User
Linux operating system High-Performance cluster-Software Architecture-Linux general technology-Linux programming and kernel information, the following is a detailed description. This article is the third part of the High-Performance cluster series. In this article, I take IBM eServer Cluster 1300 as an example to introduce the hardware and network architecture and components of Beowulf clusters.

1 Beowulf cluster Software Structure



(400) {this. resized = true; this. width = 400; this. alt = 'click here to open new window';} "onmouseover =" if (this. resized) this. style. cursor = 'hand'; "onclick =" window. open ('HTTP: // linux.ccidnet.com/col/attachment/2006/12/949305.gif'); ">
 
Is the software system of Beowulf clusters.



Generally, Beowulf clusters are composed of the following software components:

Operating System: Undoubtedly, the operating system is the software foundation of any computer system. Compared with the desktop system, the cluster system has higher requirements on the Job Scheduling and file management of the operating system.

Parallel Development Library: as long as it refers to the software library used for Process Communication in the cluster. Message Passing and thread are two basic communication methods. However, for Beowulf clusters, message passing is more suitable. Common development libraries of Beowulf clusters are MPI and PVM.

Job Management: scheduling jobs and managing cluster system resources are the resources of the cluster system that are most utilized.

System Management: manages and monitors the entire cluster system.

Development Environment: Develop and debug highly efficient application development tools.

Standard applications: some standard high-performance applications such as CFD.

Customer applications: custom applications.

2 Operating System

Not all operating systems are suitable for high-performance Cluster Systems. Theoretically, the hardware architecture, the task scheduling mode of the operating system, and the IPC mode are the main factors that determine the effect of application parallelism. Based on these three factors, we can summarize the following five parallel application implementation platforms:

Single-task operating system: the CPU only processes one task in the task queue at the same time. Ms dos is a representative of such systems.

Multi-task operating system: a multi-task operating system based on the time-sharing technology. Although all processes are running in the same period of time, at a certain point in time, the CPU only executes one process. These types of operating systems can be preemptible or non-preemptible. Single-CPU Unix and NT belong to this type.

Multi-CPU multi-task operating system: unlike a single-CPU multi-task operating system, multiple processes can run simultaneously at a certain time point because multiple CPUs exist. Unix and NT with multiple CPUs belong to this type.

Multi-CPU multi-task operating system + thread: Some tasks run faster when they are divided into parallel sub-tasks and executed on multiple CPUs at the same time, although the total CPU usage for running this task is longer. With multiple CPUs, the task end time is shortened. Because of the characteristics of the application, the performance does not linearly increase as the number of CPUs increases. The Amdal law illustrates this situation. The Unix and NT + threads running on multiple CPUs on the same motherboard belong to this type. This method is suitable for SMP systems.

Multi-CPU multi-task operating system + message transmission: In SMP systems, because shared memory is used, the CPU communication time can be almost ignored. However, in a system like a cluster, communication time has to be considered. In this case, using threads is a luxury. In this case, message transmission is a better method. (The second part of this series explains this situation ). Multiple CPUs + Unix and NT + message passing on the same or multiple boards belong to this type.

Beowulf clusters use 5th types of platforms. It can be composed of SMP and PC servers. It uses Linux as the operating system and MPI or PVM as the communication method.

3. File System

A file system is an important part of an operating system and is used to store programs and data. Efficient, consistent, and simple data sharing among nodes is a challenge posed by the cluster system to the file system.

3.1 clusters and file systems

Obviously, file systems (such as EXT and FAT) that can only manage local storage cannot meet the file sharing requirements of the cluster system. In a cluster environment, distributed file systems are also the most easily implemented file systems. Equivalent to a local file system, a distributed file system has the following advantages:

Network transparency: Remote and local file access can be completed through the same system call.

Location transparency: The full file path does not need to be bound to the file storage service. That is to say, the server name or address is not part of the file path.

Location independence: because the server name or address is not part of the file path, changing the location of the file storage will not change the file path.

The Distributed File System allows you to easily share nodes in a cluster. However, in order to provide performance, the distributed file system usually needs a local Cache, so it is difficult to ensure data consistency within the cluster system. In addition, the distributed file system usually has only one copy of data, so it is easy to have a single point of failure.

Parallel file systems built on shared disks can overcome these shortcomings of distributed file systems. By using storage devices shared on nodes, parallel file systems have many advantages:

High Availability: overcome the single point of failure (spof) on the server in the distributed file system and improve the availability of the file system.

Server Load balancer: there are multiple access points that can coordinate the load.

Scalability: it is easy to expand capacity and access bandwidth.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.