Selection notes for DFS Distributed File System and selection notes for dfs
The requirements are prioritized as follows:
1) Stores Small and Medium files of 3 TB or more, and the images are dominant, with an average of 500 ~ 700 k, generally within 1 m.
2) Cluster-based, support for load balancing, high availability and high performance. Some large enterprises use the best endorsement.
3) provides a Java program to upload files. Java code can be debugged in Windows.
4) It must be open-source and updated by the author.
5) There are O & M monitoring tools to quickly locate the problematic servers.
6) (additional points) when adding a storage server, you do not need to change the Nginx Server Load balancer and Java program configurations.
I have read a lot of materials and have no perfect solutions. There are only three candidates:
Framework |
Introduction |
File Storage Method |
High Availability |
Capacity Expansion means |
Gzip support for browsers |
Browser cache |
Program access |
FastDFS |
I know that many Chinese startups are using it. I have used it for a while and it is relatively stable. Some netizens have written a lot of Chinese materials, but there are almost no official documents. We recommend that you use fastdfs-nginx-module instead of reverse proxy to directly access a storage server. O & M: When access is unavailable, check logs distributed in multiple locations (nginx> nginx fastdfs module> storage> tracker). unfamiliar users may not be able to find the cause. Home: https://github.com/happyfish100/fastdfs Deployment method Description: http://blog.csdn.net/xifeijian/article/details/38567839 Docker: https://hub.docker.com/r/hhland/fastdfs/, https://hub.docker.com/r/season/fastdfs/ A good reference solution: https://github.com/daniellitoc/xultimate-resource |
Key-value storage. There is no upload directory concept, and the development/test environment requires independent deployment of self-owned file servers. File List is not supported. FUSE is not supported. |
A cluster consists of multiple trackers and multiple Storage servers. There is no cluster between multiple trackers, and the client solves the failover problem. The storage server is organized in groups. Different storage server files in the same group are identical. They are mainly used for load balancing and fault tolerance, similar to the hard disk raid 10 solution. Multiple data centers are not supported. |
TB-level storage solution 1. You can specify multiple store_path in the configuration of the same storage server to add hard disks. 2. You can increase the server capacity by group. 3. The total capacity is the sum of all groups. |
Temporary compression by reverse proxy |
If-Modified-Since supported |
Language SDK: supports Java through a dedicated SDK. REST interface: None Http File Reading: You can use the http service of the storage server or nginx with fastdfs-nginx-module installed. We recommend that you use the latter. |
Baidu BFS (To be studied) |
Powerful functions, but few documents are available on the Internet. Baidu search has not found any useful articles. Description: Baidu is used by the entire company. Home: https://github.com/baidu/bfs Docker: Provides Dockerfile, but it is not placed in Docker Hub. |
Directory storage. Supports file list and FUSE |
The cluster consists of NameServer, MetaServer, and ChunkServer. NameServer uses the raft algorithm and selects the Leader based on Neuxs or Zookeeper. When the Leader fails, it automatically resends the Leader. High Availability of the ChuckServer: to be analyzed For multiple data centers, multi-server support is the best. |
PB-level storage solution |
To be analyzed |
To be analyzed |
Language SDK: It is accessed through a dedicated SDK and does not support Java, but can be implemented through FUSE bridging. In Windows, it is estimated that Cygwin is required for access. REST interface: None Http Read File: NameServer provides access. |
Seaweedfs |
Powerful functions, it seems very promising, with few Chinese materials. The Doc says "zhongtong Express" is in use. Since it has never been used, it is difficult to say whether it is convenient for O & M. Home: https://github.com/chrislusf/seaweedfs Deployment and use instructions: http://blog.chinaunix.net/uid-25057421-id-5676348.html Official Docker: https://hub.docker.com/r/chrislusf/seaweedfs/ |
Key-value storage. You can upload data to a specified directory. Therefore, the R & D/test environment can share the same file server. Filer supports file lists, but does not support FUSE. |
A cluster consists of multiple masters and Multiple volume servers. The replication behavior between volume is determined by the replication policy. Supports multiple data centers and multiple replication policies. |
PB-level storage solution Increasing the capacity of a volume server is related to the replication policy. |
Supports compressing files in gzip format by pre-compression to convert files into gzip files |
Etag and If-Modified-Since are supported. |
Language SDK: Actually All sdks are accessed through the REST interface. Java version is available. REST interface: the volume and filer servers provide different levels of interfaces. volume adopts the key-value Method and filer adopts a directory-like method. Http File Reading: Provided by the filer Server |
In summary, the seaweedfs ecosystem is quite complete and the author has been updating it. FastDFS is also a good choice.