Comparison between independent MDS and independent MDS in a distributed file system

Source: Internet
Author: User
Tags gluster

Metadata is a key element in a file system. The core of each distributed file system is the design of MDS.


Distributed File systems, such as HDFS, clustre, and fastdfs, adopt an independent MDS architecture, and CEpH uses a design architecture that also describes the distribution of MDS, gluster is designed to store metadata in combination with data files. Basically, it only stores metadata messages related to local files. Gluster is evaluated as a representative without independent MDS.

This article is a personal comparison between independent MDS and independent MDS. You are welcome to discuss it.


Compared with gluster, independent MDS has the following advantages:

First, the traversal operation will be much faster than the gluster traversal, And the MDS can flexibly and centrally store the directory tree. Basically, you can only find the content in the query.

Second, for small files or some application scenarios that have a large number of metadata queries, the actual file operations are not necessarily many scenarios (many operations such as rename and STAT), targeted acceleration can be achieved, in addition, the query and read/write Io separation methods can be preliminarily implemented.

In addition, it is easy to implement multipart and deduplication functions. MDs can record the location of each data block, and can record the feature values (such as hash) of each data block ), for deduplication, you can simply record the location of the data block in the MDS, so that the same data block of different files can point to the same data block location.


It also has the following Disadvantages:

First, the performance of MDS may become the capacity and performance bottleneck of the entire distributed file system. When the storage scale is up, the number of MDS needs to be increased. When the storage scale is up, the number of metadata increases accordingly, and the size of MDS needs to be increased. New problems will be introduced.

Second, the data security of MDS. If metadata information is lost, the storage of the entire cluster will be paralyzed and all data will be unavailable. Therefore, the High Availability design of the MDS itself is required. Here we will introduce some arbitration issues and cluster locks when data inconsistency occurs (synchronization issues between the MDS instances ).

In addition, synchronization between data and metadata needs to be considered.

This article is from the "einstcrazy" blog, please be sure to keep this source http://einst.blog.51cto.com/9493625/1567525

Comparison between independent MDS and independent MDS in a distributed file system

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.