How to differentiate distributed/cluster/parallel file systems?

Source: Internet
Author: User
Tags glusterfs

Distributed File Systems, cluster file systems, and parallel file systems are easy to confuse. In fact, they are often used in different regions. Some people always ask about the differences and connections between the three. In fact, there are indeed overlapping concepts between them, but there are also significant differences.


Distributed File System

Naturally, "distributed" is the focus, which is relative to the local file system. A Distributed File System is usually a C/S architecture or Network File System. User data is stored on a remote storage server instead of being directly connected to a local host. Nfs/CIFS is the most common Distributed File System. This is what we call NAs. In a distributed file system, the number of storage servers may be one (such as traditional NAS) or multiple (such as cluster NAS ). For a single node Distributed File System, spof and performance bottleneck problems exist. In addition to NAs, typical distributed file systems include AFS and the cluster file systems (such as lustre, glusterfs, and pvfs2) to be introduced below ).


Cluster File System

"Clusters" are mainly divided into high performance clusters (HPC), high availability clusters (HAC) and load balancing clusters (LBC ). A cluster file system is a file system that works with multiple nodes to provide high performance, high availability, or load balancing. It is a subset of a distributed file system, eliminating single point of failure and performance bottle problems. The cluster is transparent to the client. It is a single global namespace, and user file access requests are distributed to all clusters for processing. In addition, scalability (including scale-up and scale-out), reliability, and ease of management are also the targets of cluster file systems. In terms of metadata management, you can use dedicated servers, Server Clusters, or a fully peer-to-peer Distributed serverless metadata server architecture. Currently, typical cluster file systems include sonas, isilon, ibrix, netapp-Gx, lustre, pvfs2, glusterfs, Google file system, loongstore, and czss.


Parallel File System

This file system supports parallel applications, such as MPI. In a parallel file system, all clients can concurrently read and write the same file at the same time. Concurrent reads can be implemented in most file systems. Concurrent write implementation is much more complicated. To ensure data consistency and maximize parallelism, you must design a lock mechanism, such as fine-grained byte locks. Generally, San shared file systems are parallel file systems, such as gpfs, stornext, gfs, and bwfs. Most cluster file systems are parallel file systems, such as lustre and panasas.


How to differentiate?

The key to distinguishing these three keywords is the three prefix keywords: "distributed", "cluster", and "Parallel. In short, non-local direct connections are established over the network, which is a distributed file system. In a distributed file system, server nodes are composed of multiple nodes, which are cluster file systems; supports parallel applications (such as MPI), such as parallel file systems. The preceding examples show that the three concepts overlap. For example, lustre is both a distributed file system and a cluster and parallel file system. However, they are also different. The cluster file system is a distributed file system, but vice versa, such as NAS and AFS. The San file system is a parallel file system, but it may not be a cluster file system, such as stornext. GFS and HDFS are cluster file systems, but they may not be parallel file systems. In reality, after clarifying the three concepts and analyzing the characteristics of the file system, it is still easy to classify them correctly.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.