Cluster scalability and its distributed architecture (2)

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Architecture comparison and scalability (I)

Lin Fan iamafan@21cn.com

Chen Xun software studio R & D department manager
November 2001

This article is the second half of "scalability and distributed architecture of clusters, we will continue to introduce several common parallel computing architectures, extensible and single system images, and important metrics of clusters.

Scalable Parallel Computing Architecture
First, let's take a look at the main types of computer system architecture development. Each architecture is not very different. The key lies in the interconnection technology, node complexity, and coupling degree. In cluster computing and distributed systems, the following three architectures are representative.

No shared Architecture

No shared Architecture
(Currently, most clusters use this method. Each node is an independent PC or workstation)
Most of the cluster systems we study belong to this type of architecture. Each node in the cluster is a complete set of independent operating systems and hardware devices. Nodes are connected through a LAN or a switch array in a loosely coupled manner, sharing some or even all of the available resources of the node: CPU, memory, disk, Io device, etc, to form a single, powerful computer system. These systems have weak SSI capabilities and require special middleware or OS extensions.
Shared disk Architecture

Shared disk Architecture
(The node is basically an independent computer and does not use a local disk file system)
Distributed File systems are the application of such architectures. Common NFS, AFS, or GFS belong to this category. The hardware solution is often implemented through the shared disk array or San. This architecture mainly solves the capacity problem of regional storage space. By constructing a single Virtual File System, it provides a huge storage device to the entire cluster. In some high-availability scenarios, shared disk arrays often solve reliability problems such as file system fault tolerance and data consistency.
Shared Storage Architecture

Shared Storage Architecture (most difficult to implement, with strong SSI capabilities)
In terms of implementation difficulty, whether it is the complexity of hardware manufacturing or the difficulty of software implementation, this architecture is much more than the implementation of other types of architecture. The cluster systems that implement such architecture include technologies such as DSM (distributed shared storage cluster), NUMA, and ccnuma. In this architecture, the computing resources of multiple nodes can be combined to form a single system with consistent memory space. In subsequent studies, we know that such a system has the best SSI (single system image) capabilities.

Scalability and Single System Image
In the end, we hope that the parallel clusters we face will have good scalability and acceptable unit computing costs, regardless of whether they are applicable to intensive computing or highly reliable commercial environments, predictable technical prospects. Therefore, to design a computing system, especially a cluster system in a parallel environment, never forget the core requirement of scalability.

However, when we look at parallel computing clusters from another perspective, there will be different conclusions. In fact, for end users and programmers, the focus of the parallel computer model is what they see on the computer, which is what we usually call SSI (a single system image ).

If you are a programmer and want to be faced with a single machine instead of a bunch of machines, one machine means a single addressing space without the need to process messages or remotely call such complex programming technologies. In this sense, a cluster system with a single address space can have this capability; or users want to have a huge consistent (only one root directory) file system, then SSI should be performed at the file system level.

But from the user's perspective, he does not care about how you handle things that seem unrelated to him, such as address space and message transmission, the user only cares that he uses an independent computer system, which can reduce the complexity of his use and does not need to switch back and forth between multiple systems, you can easily manage the "One Machine" you are facing ". It is necessary to provide SSI at the management level and usage level.

Therefore, the parallel computing model is an abstract parallel computer seen by users (including programmers and users). It is similar to the Feng's structured computer, computer systems capable of sequential computing (which may be parallel computing programs) and parallel computing tasks.

Parallel systems are classified by processor, memory, OS, and interconnection methods. Based on the two aspects of mutual scalability and single system image, we can get the following figure:

Architecture comparison of clusters, distributed systems, MPP and SMP

The node can be a PC, a workstation, or SMP server. Node complexity refers to the capability of software and hardware. Generally, cluster nodes are more complex than MPP, because each cluster node has an independent operating system and peripheral device, and the nodes in MPP may be only a microkernel of the operating system.

The node complexity of SMP servers is relatively higher than that of PCs and commercial clusters. Taking the most common X86 architecture SMP server for example, not only the main board, bus technology is much more complex than PC. In addition, to support enterprise-level application environments, SMP also needs to support more high-end peripherals, provide hot plugging capabilities for storage devices, memory data correction, and other high-end technologies, the application of these technologies will inevitably increase the complexity of SMP.

MPP is a large-scale parallel processing system that uses a non-shared resource structure. Generally, it includes hundreds of processor nodes. Generally, a node runs an incomplete OS (also called a microkernel ), nodes are interconnected through high-speed switches. Such a proprietary system often has good scalability, but the technology upgrade is limited by the proprietary system itself.

As a major element of Cluster implementation, SSI covers a single application level, subsystem, runtime system, operating system kernel, and hardware level. In other words, SSI is not absolute. It is a relative concept. It depends on the user's perspective on the system, whether it is an IP address, memory space, or file system ssi, this is determined by the final application environment.

In the scope of distributed systems, systems often provide multiple system images, which present a set of systems with multiple entries and images. Each node has a high degree of autonomy. MPP and SMP provide relatively single computing resources in a compact manner, just like a huge workstation. In a distributed system, in addition to homogeneous nodes, heterogeneous platforms are often used as needed, which will inevitably increase the design difficulty and management complexity of the distributed system. Other features are shown in the following table:

Features	MPP	SMP	Cluster	Distributed System
Number of nodes	100 ~ 1000 orders of magnitude	10 ~ 100 orders of magnitude	About 100 orders of magnitude	10 ~ More than 1000 orders of magnitude
Node complexity	Fine Granularity to medium Granularity	Medium or coarse granularity	Medium Granularity	Large scope
Inter-node communication	Message transmission or shared variable	Shared storage	Message transmission	Shared File, RPC, and message transmission
Task Scheduling	Single host queue	Single running queue	Multi-queue collaborative operation	Independent operation queue
Single System Image	Partially supported	Supports full SSI	Support at a certain level	Not Supported currently
Node Operating System	One main kernel and multiple microkernels	Independent and complete OS	N similar OS	Similar OS or heterogeneous OS
Address Space	Multi-address/Single-address space (Distributed Shared Memory)	Single	Multiple	Multiple
System availability	Low or medium	Low	High or fault-tolerant	Moderate
Affiliated Unit	One organization	One organization	Multiple organizations (reusable)	Multi-organization
Connection distance	Tightly coupled, in a physical space	Tightly coupled, within a single Chassis	Loose coupling, floor area range (dependent on connection Media)	Loosely Coupled, cross-region (region or country)

Comparison of various parallel systems

For these four types of systems, SMP has the highest level of SSI. It provides SSI at all levels, that is, sharing all system resources: A single address space, a single file system, and a single operating system kernel look like a single CPU. MPP only supports SSI at some application and system layers. The cluster provides a lower level of SSI, which can only meet the SSI requirements of one or two aspects. For distributed systems such as grids, the implementation of SSI is much lower. Using a cross-platform tool like Java, a distributed system may provide SSI capabilities in a certain sense, such as a single Java runtime space.

Important metrics of a cluster
For clusters, we can get a simple concept: a cluster is a collection of all computers (also called nodes) that are physically connected by a high-performance network or LAN. In typical cases, each computer node can be an SMP workstation or the most common PC. Most importantly, these independent computers must be able to work together, and "outside" seems to be a single integrated computer resource.

If you simply connect a cluster with a LAN, it is called a cluster. It cannot be of practical value. It is important to test the cluster's performance and function indicators.

Usability:Since each node in the cluster runs a traditional platform, users can develop and run their programs in a familiar and mature environment. The general platform provides programming environment, operation interface, control monitoring system tools, and even GUI, allowing users to run a large number of programs on their original workstation without modification. Therefore, we can regard the cluster system as a large workstation. As a user, it is similar to normal operation, but the performance has improved a lot.

Availability:Availability refers to the percentage of time in which a system is engaged in productive use (MTBF average failure-free time ). Traditional overall systems, such as host systems and Fault-Tolerant Systems, rely on expensive custom designs to achieve high availability. Clusters do not use custom components, while low-cost commercial components are used to provide high availability. High equipment redundancy is the most common method used by clusters:

Processors and memory: the cluster has multiple processors and memory components. When a component fails, other components can still be used without affecting the overall operation of the cluster. In SMP, because the processor communicates with the bus through the shared memory, once the memory fails, the system will crash. Memory becomes the "single point of failure" of SMP ".
Disk Array: our common RAID 0 or 5 can meet the Disk redundancy and fault tolerance requirements of computers. In a cluster, multiple local disks are often used to support Fault Tolerance requirements through standard shared protocols (such as NFS and Ifs. When the local disk of a node fails, you can continue running through the remote disk. A common NAS device is a disk device dedicated to cluster network storage. You can also use the Distributed File System software to implement disk fault tolerance between multiple cluster nodes.
Operating System: Generally, a cluster can implement a single system image at a certain level. However, multiple operating system images still exist. Each node has an independent operating system. When a node crashes due to software or hardware faults, other nodes are still not affected and the entire cluster is no different from the original one. We sometimes call this feature "Node Fault Tolerance ".
Communication Network: Good cluster design fully considers various possible failures and takes all feasible measures to avoid them. Communication faults at cluster nodes must also be considered. In a large and complex cluster, failure of a communication link may lead to failure of more than one node, or even make the entire cluster unavailable. Therefore, it is necessary to take appropriate redundant links between the key points of the cluster. Generally, considering that the entry node, master node, or monitoring node of the cluster is easy to become a single point of failure, using the backup link in the access policy of these nodes can achieve better results.

Scalability:The computing power of a cluster increases with the increase of nodes. Second, the scalability of clusters is group scalability. Because it is a loosely coupled structure, the cluster can be expanded to several hundred nodes. For SMP, it is very difficult to have more than dozens of nodes.

In SMP, shared memory and memory bus are bottlenecks in system performance. When the same Assembly runs on the cluster, there is no memory bottleneck. Each node can be executed on one node to make full use of local memory. For such applications, clusters can provide higher overall memory bandwidth and reduce memory latency. The local disks of the cluster are also clustered into large disk space, which can easily exceed the centralized raid disk space. The enhanced processing, storage, and I/o capabilities allow clusters to solve large-scale application problems by using well-developed parallel software packages such as PVM or MPI.

SMP does not have high scalability because it uses a competitive bus and centralized shared storage. Single Operating System Images and shared storage are two potential single-failure points, which will reduce SMP availability.

Fault Tolerance systems have high availability, but expansion is expensive. MPP is more scalable and can maintain better SSI capabilities. At present, the cluster is in a relatively compromise and will expand towards higher performance and higher availability.

Price/performance:Clusters can effectively obtain the preceding advantages at a cost. The cost of traditional supercomputer and MPP is easily tens of millions of dollars. In contrast, the price of clusters with the same peak performance is 1 to 2 orders of magnitude lower. Clusters use a large number of commercialized components, and their performance and price follow Moore's law, so that the cluster's performance/cost ratio is growing much faster than MPP.

Comprehensive comparison of availability and scalability

To design a cluster with good scalability, we must take into account all the above aspects.

First, try to make the components of the cluster independent of each other, so that independent local expansion is possible, and ensure backward compatibility. Commercial components, including OS, interconnected networks, host systems, and even application programming environments, should also be used as much as possible. Final Implementation: algorithms are independent from the architecture, applications are independent from the platform, languages are independent from the machine, and nodes are independent from the network.

The second is to select an appropriate implementation model to design the cluster system, and try to use popular open standard parts to reduce the unit cost.

Finally, we should try our best to balance the performance during the design to avoid the "Barrel Principle" in the system (as we all know, the volume of water in the barrel is limited by the shortest piece of wood in the barrel ); in addition, you should also pay attention to the single point of failure when considering availability, so as not to cause the entire system to become unavailable due to small errors in actual applications.

So let's take a look at what we expect from the discussion.

Conclusion
The reason why we spend a lot of time introducing several important architecture concepts of the cluster is that these concepts constitute the final whole of the cluster. Finally, let's take a look at the following elements about the cluster:

Independent node: each node is a complete computer, generally a single system.
Single System image capability: a cluster is a single computing resource. The cluster uses the node as a separate resource and uses a single system image technology to achieve a unified concept of a single resource portal. SSI makes the cluster easier to use and manage.
Effective connection between nodes: nodes in a cluster usually use commercial networks, such as Ethernet, FDDI, optical fiber, and ATM. In addition, standard network protocols are used to establish inter-node communication mechanisms. These ensure the effective communication between clusters.
Enhanced availability: Clustering provides a cost-effective method to increase the availability of a system. Compared with mainstream component-level fault-tolerant products, clusters often provide more reasonable costs to achieve results. Most commercial applications are designed to enhance system availability. Therefore, they can be implemented using technologies in clusters.
Better performance: In general, the birth of a cluster is partly driven by performance. In scientific computing, engineering applications, remote virtual reality simulation, and other service fields, clusters should be able to provide higher performance and can be used as super servers, in the shortest time, you can complete tasks that could not be successfully completed by the original standalone system, or provide huge disk and memory space to implement "impossible tasks ".

References

Scalable Parallel Computing Technology, architecture, programming Kai Hwang Zhiwei Xu
Cluster Computing White Paper Mark Baker
High Performance cluster computing ubuntures and systems volume 1 Rajkumar buyya

About the author
Lin Fan is currently engaged in Linux-related scientific research at Xiamen University. With great interest in cluster technology, I hope to communicate with like-minded friends. You can contact him by email iamafan@21cn.com.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Cluster scalability and its distributed architecture (2)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Cluster scalability and its distributed architecture (2)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support