Comparison of popular open-source Distributed File Systems

Source: Internet
Author: User

The following content is reproduced from http://shen2.cn/tag/moosefs/

I now have a massive amount of data files (10 million files) that need to be stored, so that other computers can easily access the data, and the data is priceless. I also want this file system to provide redundancy.

The first thing I noticed was the Ubuntu enterprise cloud provider eucalyptus. It provides cloud computing interfaces that are almost fully compatible with AWS (Amazon Web Service. It seems like a reliable cloud storage solution.

Eucalyptus imitates Amazon's S3 service and provides a storage service component called walrus.

However, after some exploration, I found that it is not easy for eucalyptus to love you.

On the one hand, because eucalyptus is difficult to configure and lacks documentation, almost no help can be found on the Internet,

On the other hand, although eucalyptus is theoretically compatible with AWS's EC2/S3, this is not the case. Many tools available on AWS are not available on eucalyptus.

The most important thing is that I didn't realize that walrus is not a cloud storage system with redundancy as I thought. It's just a single-host software that implements the S3 interface.

In fact, Walrus and eucalyptus have no association with another SC (storage controller) component. Walrus only provides interfaces consistent with S3, and its implementation method does not require redundancy, it cannot be deployed on multiple servers separately.

So I began to find a real Distributed File System to solve my storage problems. Once found, the market of a wide variety of Distributed File Systems, an endless stream. List several major issues:

Mogilefs: a key-value Metafile system that does not support fuse. APIs are required for applications to access it. It is mainly used in the Web field to process massive small images, which is much more efficient than moosefs.

Fastdfs: An Improved Key-value File System Based on mogilefs. It also does not support fuse and provides better performance than mogilefs.

Moosefs: supports fuse and is relatively lightweight. It has a single point of dependency on the master server. It is written in Perl and has poor performance. It is widely used in China.

Glusterfs: supports fuse, which is larger than moosefs

CEpH: fuse is supported, and the client has entered the linux-2.6.34 kernel, that is, you can select CEpH as the file system like ext3/rasierfs. It is completely distributed and has no single-point dependency. It is written in C and has good performance. Based on immature btrfs, it is also very immature.

Lustre: An Oracle enterprise-level product that relies heavily on the kernel and ext3

NFS: an old Network File System. I don't know more about it. NFS has not been developed in recent years and cannot be used.

I originally intended to use mogilefs, because it is used by the most people, and my main needs are in the Web aspect.

However, after studying its API, it is found that the key-value file system does not have a directory structure, so it cannot use all the files in a list subdirectory, and cannot be operated like a local file system, everything requires an API, which is quite unpleasant.

Mogilefs may be affected by the listening port + API mode of memcached, another well-known product of the same development team, or when mogilefs was initially designed, fuse has not become popular.

In short, I am determined to find a fuse-supported distributed file system, and finally select it in moosefs, glusterfs, and CEpH. From the technical point of view, CEpH must be the best, written in C, into the linux-2.6.34 kernel, based on btrfs file system, to ensure its high performance, the structure of multiple masters completely solves the single point of dependency problem, thus achieving high availability. However, CEpH is too immature, and its btrfs Based on is not mature. its official website also explicitly states that CEpH should not be used in the production environment.

In addition, there are few users in China. In the Linux release, the kernel version of ubuntu10.04 is 2.6.32, and CEpH cannot be used directly.

Glusterfs is suitable for large-scale applications with poor reputation, so it is not considered.

Finally, I chose moosefs, which has the same disadvantages and advantages. Although it has a single point of dependency, its master occupies a very high percentage of memory. But based on my needs, moosefs is enough to meet my storage needs. In China, moosefs has a large number of people, and many of them are used in the production environment, which makes my choice more persistent.

We plan to use a high-performance server (dual-channel Xeon 5500, 24 GB memory) as the master, and two HP dl360g4 (six SCSI 146 GB) as the chunk server, build a 2-level Distributed File System for each server in the Web service.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.