To use a distributed file system to reduce costs, we searched for an Open Source Distributed File System.
After installation, deployment, and testing, I will summarize some of the problems I encountered during use and hope to help you. I also have some questions that I don't understand. I hope to communicate with you and make progress together.
First: CEpH
I searched for some materials on the Internet, saying that CEpH has the highest performance, C ++ compiled code, supports fuse, and has no single point of failure (spof) Dependency, So I downloaded and installed it. Since CEpH uses the btrfs file system, the btrfs file system must be supported by a kernel later than Linux 2.6.34. Obviously, the rhel5 kernel I use does not support the btrfs file system, so I downloaded the latest kernel for upgrade, after two days, the upgrade was not successful. It took more than one hour to complete the compilation. Finally, we found that the latest Ubuntu system supports the btrfs file system, so we installed the Ubuntu virtual machine, the btrfs file system has been completed, but the CEpH startup process failed due to an error. So we can't talk about testing it.
CEpH uses a relatively advanced crush algorithm. According to the translation, it designs an upgradeable pseudo-random data distribution function for the Distributed Object-based storage system, it can effectively manage data objects and storage devices without passing through a central directory. Since large systems are dynamic, crush is designed as a storage device that can be conveniently added or removed when unnecessary data migration is minimized. This algorithm provides a wide range of data replication and reliability mechanisms, as well as data distribution based on user-defined policies, which force data replication to be separated from the fault field.
In addition, CEpH uses the btrfs file system, which has many advanced features for the next generation of Linux file systems.
Btrfs may eventually pose more threats to ZFS and so on. It has the online fragment function (only the solid state disk has this function), copy-on-write technology, data compression, images, data strip, and snapshots.
In addition, btrfs is better than ext in terms of data storage. It includes some logic volume management and raid hardware functions, which can be used to inspect internal metadata and user data, and built-in snapshot functions. Ext4 can also implement the above functions, but it needs to communicate with the file system and the logical volume manager.
It's a pity that there are so many advanced functions that cannot be eliminated now ......
Second: glusterfs
It is said on the Internet that glusterfs is good, stable, and suitable for large-scale applications. The key is that there is no single point of failure (spofs) Dependency. The C language code supports fuse, So download and install the research. The installation and configuration are still simple. Test the configuration after startup.
It was really nice at first. Later, I used a stress testing tool to test its throughput. I found that the performance could not meet our production needs and I don't know where the configuration problem was,
We test the reading and writing operations on large files and large files. The throughput is about 5 Mb/s, which obviously cannot meet the requirements. But no specific bottleneck is found. After all, it is not easy to check the bottleneck because the program is written by others.
For details about glusterfs, I can refer to this brother's article. He has made a deep dive.
Http://zhoubo.sinaapp.com /? Cat = 22
Third: moosefs
This online says that the performance is good, there is a single point of failure (spof) Dependency, C code writing, support fuse, download and try it.
The installation and configuration are still simple. Soon the environment was set up, so the test was conducted. The test performance is good. The throughput is more than 15 MB/second.
Fourth: mogilefs
On the Internet, this is the best performance, but it is only the code written in Perl that provides APIs for external use. It is relatively complicated to build because many third-party Perl packages depend on need to be installed, in addition, you must install the MySQL database to support this function.
After the installation is complete, the server is started. The client is developed for Java, PHP, Perl, Ruby, etc. What I need is to support fuse, but this distributed file system, to support fuse, you need to install a module for Perl to communicate with C. This module cannot be compiled in real time and cannot be tested successfully. However, you have to wait for time to continue the study.
Fifth: fastdfs
It is said on the Internet that "the key-value File System improved by Chinese people on the basis of mogilefs also does not support fuse and provides better performance than mogilefs". Isn't that a joke? Mogilefs is written in Perl. If fastdfs is improved on the basis of mogilefs, it should also be written in Perl. However, after downloading the fastdfs Code, everyone else uses C code, how can it be improved based on mogilefs? Taking a look at the specific structure of fastdfs, we should accurately say that we should learn from the mogilefs idea, rather than "improving on mogilefs ".
After installing the file, the installation is simple and does not support fuse. After the file is uploaded, an HTTP file is generated and downloaded through HTTP. This method is obviously not suitable for the production environment that I want.
The following is a comparison between fastfds and mogilefs written by a netizen. It feels more objective and authentic, so I 'd like to post it here.
Fastdfs draws some ideas from mogilefs during design. Fastdfs is a complete distributed file storage system that reads and writes files through client APIs. It can be said that all features of mogilefs are provided by fastdfs. The mogilefs website is http://www.danga.com/mogilefs /.
In addition, fastdfs has the following features and advantages over mogilefs:
1. fastdfs is well-developed and can be directly used without secondary development;
2. Compared with mogilefs, fastdfs reduces the database used for tracking. There are only two roles: tracker and storage. The fastdfs architecture simplifies the system and eliminates performance bottlenecks;
3. it is easy to add servers with any roles in the system: when adding a tracker server, you only need to modify the storage and client configuration files (add a tracker configuration line); when adding a storage server, you do not need to modify any configuration files. The system automatically copies existing files in the volume to the server;
4. fastdfs is more efficient than mogilefs. It is manifested in the following aspects:
1) See the above 2nd. fastdfs and mogilefs have higher overall performance than non-file index databases;
2) fastdfs is more underlying and efficient than mogilefs in the development language. Fastdfs is written in C language, with less than 20 thousand lines of code and no dependency on other open-source software or packages. The installation and deployment are particularly concise, while mogilefs is written in Perl;
3) fastdfs directly uses the socket communication mode, which is more efficient than the http mode of mogilefs. In addition, fastdfs uses sendfile to transmit files and uses zero-copy memory, which results in lower system overhead and higher file transmission efficiency.
5. fastdfs has detailed design and usage documents, while mogilefs documents are relatively lacking.
6. The fastdfs log records are very detailed. Any error information generated when the system is running will be recorded in the log file. When a problem occurs, the administrator can locate the error.
7. fastdfs also accesses the file's additional attributes (such as the file size, image width, and height). applications do not need to use databases to store the information.
8. fastdfs supports only one copy of the same file content from v1.14, which saves storage space and improves file access performance.
Sixth: lustre
I still have infinite hopes for this distributed file system. After being acquired by Oracle, this thing is lost !!!
If this guy finds it, please let us know. Thank you.
Comparison of Open-Source Distributed File Systems