A comparative introduction to GFS, HDFs and other Distributed file systems

Source: Internet
Author: User
Tags hypertable

Turn from: http://www.nosqlnotes.net/archives/119

A lot of distributed file systems, including Gfs,hdfs, Taobao Open source tfs,tencent for the album Storage of TFS (Tencent FS, in order to facilitate the distinction between follow-up called QFS), and Facebook Haystack. Among them, tfs,qfs and haystack need to solve the problem and the architecture is similar, these three file systems are known as BLOB FS (BLOB file system). This paper compares three typical file systems from the perspective of distributed architecture.

Let's look at GFs and HDFs first. HDFs is basically considered to be a simplified version of GFS, which has a lot in common. First of all, GFS and HDFs all adopt a single master machine + multiple working machines mode, the main control machine (master) storage System all metadata, and realize the data distribution, replication, backup decision-making, the main controller also implemented metadata checkpoint and operation log recording and playback function. The working machine stores the data, and according to the instruction of the main control machine, data storage, data migration and data calculation etc. Second, both GFS and HDFS provide higher reliability and higher performance through data chunking and replication (multiple replicas, typically 3). When one of the replicas is not available, the system provides the Copy auto copy feature. At the same time, for data read more than write features, read services are assigned to multiple copies of the machine, providing the overall performance of the system. Finally, both GFS and HDFs provide a tree-structured file system that is similar to Linux for file copying, renaming, moving, creating, deleting operations, and simple rights management.

However, GFS and HDFs differ greatly in the design of the key points, and HDFs to circumvent the GFS complexity in a number of simplifications. First, the most complex part of GFS is to append the same file to multiple clients concurrently, that is, the multiple client concurrent append model. GFS allows files to be opened by multiple or multiple clients at the same time to append data to record units. Assuming that the GFS append record has a size of 16KB ~ 16MB, with an average size of 1MB, it is obviously inefficient to access GFS master every time it is appended, so gfs authorizes the Write permission for each chunk to chunk Server through the lease mechanism. The meaning of writing lease is that Chunk server has write permissions for a Chunk within the lease validity period (12s), and lease server with Chunk is called primary Chunk server if primary Chunk Server downtime, chunk write lease can be assigned to other chunk servers after the lease valid period. Multi-client concurrent append the same file causes chunk server to order the records, the client's write operation may retry, resulting in duplicate records, plus the client API for the asynchronous model, but also produced a record chaos problem. The append model, such as duplicate records, chaos and other problems, plus the lease mechanism, especially the lease of the same chunk, can migrate between chunk server, greatly improving the complexity of system design and consistency model. In HDFs, the HDFs file allows only one open and append data, the client writes all the data to a local temporary file, waits until the amount of data reaches a chunk size (usually 64MB), requests HDFs Master to assign the work machine and the chunk number, Writes a chunk data to the HDFs file at once. Since the accumulation of 64MB data for the actual write HDFs system, HDFs master caused little pressure, do not need a similar GFS in the write lease authorized to the machine mechanism, and there is no duplication of records and chaotic sequence of problems, greatly simplifying the system design. However, we must be aware that the hypertable and hbase built on HDFs need to use HDFS to store the operational logs of the table system because of the many problems that the Append model HDFs, Because HDFS clients need to save up to 64MB of data to write to HDFs, the table service node in hypertable and HBase (corresponding to the tablet Server in bigtable) is not written to HDFs if it is down, Data may be lost. The second is master single point failure processing. GFS in the master-slave mode backup of the system metadata, when the main master failure, can be replaced by a distributed election backup machine Master to continue to provide services, and due to replication and the main standby switch itself has a certain degree of complexity, HDFS Master's persisted data is written only to the local computer (possibly written to multiple disks stored on the master machine to prevent damage to a disk) and requires human intervention if a failure occurs. Another point is support for snapshots. GFS uses a COPY-ON-WRITE data structure internally to implement cluster snapshot functionality, while HDFs does not provide snapshot functionality. In a large scale distributed system, it is normal to have bugs in a program, although in most cases, bugs can be fixed, but it is difficult to restore the system data to a consistent state through compensating operations, often requiring the underlying system to provide snapshot capabilities to restore the system to a recent consistent state.

In short, HDFs can be considered as a simplified version of GFS, because of the time and application scenarios and other reasons for the GFS function has been simplified, greatly reducing the complexity.

The need for Blob File system and the Gfs/hdfs are actually different.
(1) GFs and HDFs are more versatile and can build common form systems on GFS and HDFS, such as bigtable,hypertable and HBase, while the blog File system is typically a picture of BLOB data such as a photo album.
(2) GFS data is 1.1-point appended to the system, while BLOB data is typically the entire BLOB block ready to be written to the Blob file system, such as a user uploading a picture.
(3) GFs is a large file system, considering throughput, you can build a general kv or a common form system, while the Blob file system is a small file system, is generally used only to store BLOB data.

GFS and BLOB FS also seem to have a lot in common.
For example, GFS and TFS currently use a single master machine + multiple working machine mode, the main control machine to achieve data distribution, replication, backup decision-making, the work machine storage data, and according to the Master command data storage, migration and so on.

However, the difference between the two is still relatively large.
Because of different business scenarios, the two problems are different, in BLOB FS, because the entire BLOB block data is ready, BLOB FS data writing model is inherently simpler, each write request Master to allocate BLOB block number and the list of machines to write, And then write to more than one machine at a single. The challenge for Blob FS, however, is that the metadata is too big a problem. Because BLOB FS storage blob block number is generally very large, such as the Bai in TFS store Taobao pictures, assuming that each picture's metadata occupies 20 bytes, all the metadata occupies 10G * = 200GB, stand-alone memory storage, and the volume of data expansion quickly. As a result, TFS, QFS, and Facebook haystack all adopted almost the same idea, Blob FS does not hold metadata, and metadata is stored in external systems. For example, the meta data in the Taobao TFS is the ID of the picture, which is stored in an external database, such as a commodity library, and an external database is typically an oracle or MySQL sharding cluster. BLOB FS internal is also organized according to chunk block data, each blob file is a logical file, the internal chunk block is a physical file, multiple logical files share the same physical file, thereby reducing the number of physical files of a single machine. Since the metadata of all physical files can be stored in memory, it takes only one disk IO to read the blob logical file each time, and is generally considered to be optimal.

In short, HDFs and GFS can be considered similar, GFS basically covers the functions of HDFs, while BLOB FS and GFs face different problems, the design of the starting point is not the same, the two types of systems are fundamentally different. If you need to unify GFS and BLOB FS into a single system, the system needs to support both large and small files, and the master node itself needs to be designed to be distributed because of the amount of metadata it holds. The complexity of this unification system is very high and currently only GFS V2 is likely to do so.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.