A comparative introduction to GFS, HDFs and other Distributed file systems

Source: Internet
Author: User
Tags data structures file system hypertable

Transferred from: http://www.nosqlnotes.net/archives/119

A lot of distributed file systems, including Gfs,hdfs, Taobao Open source tfs,tencent for album Storage for TFS (Tencent FS, for ease of differentiation, follow-up called QFS), and Facebook Haystack. Among them, tfs,qfs and haystack need to solve the problem as well as the architecture is very similar, these three file systems are called Blob FS (BLOB file system). This paper compares three typical file systems from the perspective of distributed architectures.

Let's look at GFs and HDFs first. HDFs can basically be thought of as a simplified implementation of GFS, so there are many similarities. First of all, GFS and HDFs both use a single master control machine + multi-machine mode, by a master controller (master) storage System all metadata, and realize the data distribution, replication, backup decision, the main control machine also realizes the meta-data checkpoint and operation log recording and playback function. The working machine stores data and data storage, data migration and data calculation according to the instruction of the master control machine. Second, both GFS and HDFS provide higher reliability and higher performance through data chunking and replication (multiple replicas, typically 3). The Copy auto-copy feature is available when one of the replicas is unavailable. At the same time, for the data read more than write characteristics, the read service is allocated to multiple copies of the machine, providing the overall performance of the system. Finally, both GFS and HDFs provide a tree-structured file system that enables similar files to be copied, renamed, moved, created, deleted, and easily managed under Linux.

However, GFS and HDFs differ greatly in the design of key points, and HDFs simplifies the complexity of the GFS in order to avoid it. First, the most complex part of GFS is to append the same file to multiple clients concurrently, that is, the multi-client concurrency append model. GFS allows files to be opened multiple times or by multiple clients at the same time to append data to records as units. Assuming that the GFS append record size is between 16KB ~ 16MB and the average size is 1MB, it is obviously inefficient to access GFS master each time it is appended, so gfs authorizes the write permission of each chunk to chunk Server through the lease mechanism. The implication of writing lease is that Chunk server has write permission for a Chunk within the lease validity period (assuming 12s), lease server with Chunk is called primary Chunk server, if primary Chunk Server down, chunk write lease can be assigned to other chunk servers after the lease validity period. Multiple clients concurrently appending the same file causes chunk server to need to sequence the records, the client's write operation may retry, resulting in duplicate records, coupled with the client API as an asynchronous model, resulting in a record chaos problem. The lease mechanism, especially the lease of the same chunk, can be migrated between chunk servers, which greatly improves the complexity of the system design and consistency model under the Append model. In HDFs, the HDFs file only allows data to be opened and appended at once, and the client writes all the data to the local temporary file until the amount of data reaches a chunk size (typically 64MB), requesting HDFs Master to assign the work machine and the chunk number, Writes a chunk data to the HDFs file once. Due to the accumulation of 64MB data in the actual write HDFs system, the pressure on the HDFs master is not the same as the GFS in the write lease authorization to the work machine mechanism, and there is no duplication of records and the problem of chaos, greatly simplifying the system design. However, we must know that HDFS does not support many of the problems caused by the Append model, and the hypertable and hbase built on HDFs need to use HDFS to store the operation log of the tabular system. Because HDFS clients need to save up to 64MB of data to be written to HDFs at once, the table service node in hypertable and HBase (corresponding to the tablet Server in bigtable) if it goes down, some operations logs are not written to HDFs, Data may be lost. The second is the master single-point failure processing. GFS in the master-slave mode to back up the system metadata, when the master master fails, can be replaced by a distributed electoral standby master Master to continue to provide services, and because the replication and the master and slave switching itself has some complexity, HDFS Master's persisted data is written only to the native computer (which may be written to multiple disks stored on the master machine to prevent a disk from damaging) and requires manual intervention in the event of a failure. The other point is support for snapshots. GFS uses COPY-ON-WRITE data structures internally for cluster snapshot functionality, while HDFs does not provide snapshot functionality. In a large-scale distributed system, the program has a bug is very normal situation, although the bug can be fixed in most cases, but it is difficult to compensate for the system to restore the data to a consistent state, often require the underlying system to provide snapshot capabilities to restore the system to a recent consistent state.

In short, HDFs can basically be considered as a simplified version of GFS, due to time and application scenarios and other reasons for the function of GFS has made a certain simplification, greatly reducing the complexity.

The requirements and Gfs/hdfs of the Blob File system are actually different.
(1) GFs and HDFs are common, and can be used in GFS and HDFs to build a common tabular system, such as bigtable,hypertable and HBase, and the blog File system scenario is generally a picture, album such BLOB data.
(2) GFS data is a 1.1-point append to the system, while BLOB data is generally the whole BLOB block ready to write to the Blob file system, such as a user uploading a picture.
(3) GFs is a large file system, considering the throughput, can be built on the general-purpose KV or universal tabular system, and the Blob file system is a small file system, generally only used to store BLOB data.

GFS and BLOB FS also seem to have a lot in common.
For example, GFS and TFS are currently using a single master machine + multi-machine mode, the main control machine to achieve data distribution, replication, backup decision, the work machine to store data, and according to the Master command of the data storage, migration and so on.

However, the difference between the two is still relatively large.
Because of the different business scenarios, in BLOB FS, because the entire BLOB block data is ready, the BLOB FS data write model is inherently simpler, and each write requests the master to allocate the BLOB block number and the list of machines written. It is then written to multiple machines at once. However, the challenge for Blob FS is that the metadata is too large. Since BLOB FS stores a large number of BLOB blocks, such as the company aims Yangzhou Taobao image stored in TFS, assuming that the metadata of each picture occupies 20 bytes, all the metadata occupies 10G * = 200GB, the single-machine memory is not stored, and the data volume expands rapidly. As a result, TFS, QFS, and Facebook haystack all take the same approach, Blob FS does not store metadata, and metadata is stored in external systems. For example, Taobao TFS Metadata is the ID of the image, these image IDs are stored in an external database, such as a commodity library, the external database is generally Oracle or MySQL sharding cluster. Blob FS is also organized according to chunk block data, each blob file is a logical file, the internal chunk block is a physical file, multiple logical files share the same physical file, thereby reducing the number of physical files for a single work machine. Because all of the physical file metadata can be stored in memory, each time the BLOB logical file is read only one disk IO, the basic can be considered to achieve optimal.

In short, HDFs and GFS can be considered similar, GFS basically covers the functionality of HDFs, while the BLOB FS and GFS are facing different problems, the design starting point is not the same, the two types of systems have essential differences. If you need to unify GFS and BLOB FS into a single system, this system needs to support both large and small files, and the master node itself needs to be distributed because the amount of metadata stored is too large. The complexity of this unification system is very high, and only GFS v2 is likely to do so at this time.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.