Open source Distributed File system comparison

Source: Internet
Author: User
Tags glusterfs
from:http://www.lnmpblog.com/archives/323
To use the Distributed file system to reduce costs, we searched for open source Distributed file systems. After the installation of deployment and testing, I encountered in the use of some of the problems summed up, I hope to help you, I also have some problems do not understand, I hope to communicate with you, and common progress.
First: CEPH
Online Search Some information, said Ceph performance, C + + code written, support fuse, and no single point of failure dependency, so download installation, because ceph use Btrfs file system, and Btrfs file system requires Linux 2.6.34 of the kernel to support, obviously I make With the RHEL5 kernel does not support the Btrfs file system, so download the latest kernel to upgrade, for 2 days without a successful upgrade, the compilation will take 1 hours to complete, and finally found the latest version of the Ubuntu system to support the Btrfs file system, so install the Ubuntu virtual machine, The Btrfs file system was fixed, but the associated process that started Ceph failed to start successfully. So it's not a test.
In Ceph, a more advanced algorithm crush algorithm is used to design an upgradeable pseudo-random data distribution function for distributed object-based storage System, which can effectively manage data objects and storage devices without the need of a central directory. Since large systems are dynamic, crush is designed to be a convenient addition or removal of storage devices when unwanted data migrations are minimized. This algorithm provides a wide range of different types of data replication and reliability mechanisms, as well as the allocation of data based on user-defined policies that force data replication to detach from the fault area.
In addition, the file system used by Ceph is Btrfs, which has many advanced features for the file system used by the next generation Linux.
Btrfs may end up with more threats to ZFS, such as online defragmentation (only the solid-state disk has this feature), Copy-on-write technology, data compression, mirroring, data strips, snapshots, and so on.
In addition, Btrfs is more perfect in data storage than ext. It includes a number of logical volume management and RAID hardware features that allow internal metadata and user data to be validated and embedded with snapshot capabilities. Ext4 can also implement some of these features, but it requires communication with the file system and the logical Volume Manager.
So many advanced functions, but it is still unable to digest ah ...
Second: Glusterfs
Online said Glusterfs relatively good, stable, suitable for large-scale applications, the key is no single point of failure dependency, C language code, support fuse, so download installation research. The installation configuration is simple enough to test after startup.
It felt really good at first, it was cool. Later, the pressure test tool to test its throughput, found that performance does not meet our production needs, do not know where the configuration problem,
We are testing large file read operation and large file write operation, throughput in 5mb/seconds, obviously can not meet the requirements. But did not find a specific bottleneck, after all, the program is written by others, to check the bottleneck is not easy.
Detailed information about Glusterfs, you can see the brother's article, he did more in-depth. Http://zhoubo.sinaapp.com/?cat=22
Third: Moosefs
This network said performance is good, have a single point of failure dependence, C code writing, support fuse, download try it.
The installation configuration is fairly simple.  The environment was soon set up and the tests were carried out. The test performance was good. Throughput is above 15mb/seconds.
IV: MogileFS
This is the highest performance on the Web, but Perl code, the external provision of API to use, build a relatively complex, because the need to install a lot of dependent Third-party Perl package, and also install the MySQL database to support.
When the installation is complete, the server is up, the client has Java, PHP, PERL, RUBY, etc. developed, I need to support FUSE, but this distributed file system, the support of the FUSE need to install a PERL and C communication module, this module is still compiled not past, the last No The method test succeeds, helpless only then has the time to continue to study.
V: Fastdfs
Online said is "The Chinese people on the basis of mogilefs to improve the Key-value file system, also does not support fuse, providing better performance than MogileFS", this is not a rip. MogileFS is written in Perl, if Fastdfs is improved on the basis of mogilefs, it should also be written in Perl, but after downloading Fastdfs code, people are C code, how could it be based on the mogilefs to improve it. Look at the specific structure of FASTDFS, accurate said should be "borrowed from the mogilefs of ideas", but can not say "on the basis of mogilefs improvement."
I installed a bit, the installation is also simple, do not support fuse, upload files will generate an HTTP download address, through the HTTP way to download. This is obviously not the right place for the production environment I want.
Below is a netizen writes Fastfds and mogilefs contrast article, the feeling is more objective true, therefore here gives everybody reprint.
Some ideas of mogilefs were borrowed from the Fastdfs design. Fastdfs is a sophisticated distributed file storage system that reads and writes files through the client API. It can be said that mogilefs all the functional characteristics of FASTDFS are available, MogileFS Web site: http://www.danga.com/mogilefs/.
In addition, relative to the MOGILEFS,FASTDFS has the following characteristics and advantages:
1. The Fastdfs perfect degree is high, does not need two times development can direct use;
2. Compared with mogilefs, FASTDFS has reduced the database for tracking, with only two roles: tracker and storage. The architecture of Fastdfs simplifies the system and eliminates the bottleneck of performance.
3. Servers that add any role to the system are easy: When adding a tracker server, you only need to modify storage and client profiles (add a row of tracker configurations), and when adding storage servers, you typically do not need to modify any of the configuration files. The system automatically copies the files that are already on the volume to the server;
4. Fastdfs is more efficient than mogilefs. These are shown in the following areas:
(1) See above 2nd, Fastdfs and MogileFS compared, there is no file index database, Fastdfs overall performance is higher;
(2) from the use of the development of language, Fastdfs than the mogilefs lower level, more efficient. Fastdfs is written in C, with less than 20,000 lines of code, no reliance on other open source software or packages, and is particularly concise for installation and deployment, while MogileFS is written in Perl;
(3) FASTDFS uses socket communication directly, which is more efficient than MogileFS HTTP mode. and Fastdfs use sendfile transfer files, the use of memory 0 copies, the system cost less, file transfer efficiency is higher.
5. Fastdfs has detailed design and use documentation, while MogileFS documents are relatively scarce.
6. Fastdfs log records are very detailed, any error messages that occur when the system is run are logged to the log file, which makes it easy for the administrator to locate the error when there is a problem.
7. Fastdfs also accesses file-attached properties (that is, meta data, such as file size, image width, height, and so on), and applications do not need to use a database to store this information.
8. Fastdfs from V1.14 to support the same file content to save only one copy, which can save storage space, improve file access performance.
Sixth: Lustre
There was a lot of hope for this distributed file system, and after being acquired by Oracle, this thing didn't even have a download address. Crazy Dizzy ...
If the brother finds the download address, please let me know, thank you.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.