Design of Distributed (cluster) File System

Last Update:2014-08-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Zookeeper the distributed file system described in this article is implemented through clusters, so it is also a cluster file system. This section describes common problems in distributed file systems and solutions in GFS.
Design Points: PerformanceThe method to improve performance is parallel, which means that a task is divided into multiple tasks and run at the same time.
The idea in GFS is to block files. Each Chunk is a chunk, and each chunk is saved separately. The node that saves the chunk is called chunkserver. Reading and Writing files can be converted to reading and writing chunks. Different chunks can run in parallel to improve efficiency. Each chunk has a unique chunk handle, indicating that the chunk size (chunk size) needs to be determined according to the characteristics of the application. The chunk size is too large, affecting the degree of parallelism; too small, occupying a lot of other metadata space, and many other Chunk parsing time.

High availability and reliability)Availability refers to the proportion of the average system downtime to the total usage time. The smaller the availability, the better. Controllability refers to the average running time of the system without faults. The longer the execution time, the better.
High availability can be achieved through hardware redundancy (redundency). When a hardware cannot work, it can quickly switch to the backup system. Can I reduce the switching time? System availability. Fault Tolerance can improve system reliability. A component fails without affecting the overall function.
In GFS, compaction uses hardware redundancy to achieve high availability and reliability. Both master and chunkserver have backups. For chunkserver, three nodes backup each other. The backups are performed in chunk units. Master slave is backed up using Master/Slave. The master node (primary master) synchronizes its operation log and checkpoint to the backup node. When the master node is unavailable, the backup node can be used by the new master node.
The chunkserver volume is backed up using load balance. Each Chunk is stored on three chunkservers, and each chunkserver initiates and Processes requests. For read requests, the chunkserver closest to the client is selected. For write requests, the master selects a primary replica, and the primary replica is responsible for synchronizing client data to other backup nodes.

ScalabilityThe scale of the cluster can be dynamically expanded according to the need, mainly refers to the scalability of the storage capacity, that is, add? The capabilities of the chunkserver. To support dynamic expansion, the location transparency of the chunk storage location is required. You do not need to know the machines on which each chunk is saved, but are dynamically maintained by the system. In this way, the chunkserver is added in the system? And reduce will not affect the use of the client.
The idea in GFS is to use a master node to store all metadata (metadata) to achieve location transparency of chunk. Master's main work: File namespace managementSimilar to the folder structure in the traditional file system, the existing files in the system are maintained. Map Files to chunksEach file is composed of several chunks. From the client's perspective, the chunk starts from 0 and ends with N. The master must map the logical number (,... N) of the chunk to the internal chunk handle (the unique global identifier of the chunk ). Chunk Location Management Record the location of the chunkserver to which each chunk belongs. Each Chunk is saved on multiple chunkservers. Based on this information, the chunk location independence can be achieved. The master will communicate with each chunkserver and obtain the chunk information saved on each chunkserver. When a new Chunk is created, the master selects the chunkserver based on certain algorithms (three by default and three backups are saved) to save the chunk. Fault tolerance ProcessingThe chunkserver may be damaged during system execution. The master must deploy the chunk stored on this machine again (re-replicatoin ). What else can I do with the addition of machines ?, The chunk storage must be balanced again (rebalancing, Chunk migration between chunk servers ).
Metadata storage, backup, and recoveryThe implementation of the above functions depends on metadata, which is mainly stored in the memory. The change operation (mutation) of files in the system will record the operation log (operation log, which is similar to the log in the relational database system ). The metadata in the memory is also stored on the disk as a checkpoint. Based on a checkpoint and the operation log on it, the system status can be restored (you can replay all operations that have been run on the basis of the checkpoint ). The master node backs up the checkpoint and operation log to other machines for backup and recovery.

Consistency Model) Consistency indicates whether other clients can see the content and when the content of the file is changed by a client.
In GFS, a strong consistency model is used. In this model, when a client is changed, all other clients will immediately see the changes (no matter which backup the client reads ). This is consistent with the common single-host file system. GFS does not provide Concurrency Control for file read/write. When multiple clients perform the same change operation, the file content is undefined)

Concurrency Control)

When multiple clients run an operation at the same time, ensure that the system is correct.

In GFS, atomic operations are performed on file namespace operations, and multiple clients are used at the same time. There is a certain internal concurrency control policy (implemented through the lock mechanism ). However, concurrency control is not provided for reading and writing the file content.
GFS also provides an atomic operation to append records (atomic record appends ).
Summary: these problems occur in every Distributed File System. Different systems may have different solutions. When learning, you can focus on these points first.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Design of Distributed (cluster) File System

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Design of Distributed (cluster) File System

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support