Selection notes for DFS Distributed File System and selection notes for dfs

Last Update:2017-02-18 Source: Internet

Author: User

Tags nameserver nginx server docker hub

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Selection notes for DFS Distributed File System and selection notes for dfs

The requirements are prioritized as follows:

1) Stores Small and Medium files of 3 TB or more, and the images are dominant, with an average of 500 ~ 700 k, generally within 1 m.

2) Cluster-based, support for load balancing, high availability and high performance. Some large enterprises use the best endorsement.

3) provides a Java program to upload files. Java code can be debugged in Windows.

4) It must be open-source and updated by the author.

5) There are O & M monitoring tools to quickly locate the problematic servers.

6) (additional points) when adding a storage server, you do not need to change the Nginx Server Load balancer and Java program configurations.

I have read a lot of materials and have no perfect solutions. There are only three candidates:

Framework	Introduction	File Storage Method	High Availability	Capacity Expansion means	Gzip support for browsers	Browser cache	Program access
FastDFS	I know that many Chinese startups are using it. I have used it for a while and it is relatively stable. Some netizens have written a lot of Chinese materials, but there are almost no official documents. We recommend that you use fastdfs-nginx-module instead of reverse proxy to directly access a storage server. O & M: When access is unavailable, check logs distributed in multiple locations (nginx> nginx fastdfs module> storage> tracker). unfamiliar users may not be able to find the cause. Home: https://github.com/happyfish100/fastdfs Deployment method Description: http://blog.csdn.net/xifeijian/article/details/38567839 Docker: https://hub.docker.com/r/hhland/fastdfs/, https://hub.docker.com/r/season/fastdfs/ A good reference solution: https://github.com/daniellitoc/xultimate-resource	Key-value storage. There is no upload directory concept, and the development/test environment requires independent deployment of self-owned file servers. File List is not supported. FUSE is not supported.	A cluster consists of multiple trackers and multiple Storage servers. There is no cluster between multiple trackers, and the client solves the failover problem. The storage server is organized in groups. Different storage server files in the same group are identical. They are mainly used for load balancing and fault tolerance, similar to the hard disk raid 10 solution. Multiple data centers are not supported.	TB-level storage solution 1. You can specify multiple store_path in the configuration of the same storage server to add hard disks. 2. You can increase the server capacity by group. 3. The total capacity is the sum of all groups.	Temporary compression by reverse proxy	If-Modified-Since supported	Language SDK: supports Java through a dedicated SDK. REST interface: None Http File Reading: You can use the http service of the storage server or nginx with fastdfs-nginx-module installed. We recommend that you use the latter.
Baidu BFS (To be studied)	Powerful functions, but few documents are available on the Internet. Baidu search has not found any useful articles. Description: Baidu is used by the entire company. Home: https://github.com/baidu/bfs Docker: Provides Dockerfile, but it is not placed in Docker Hub.	Directory storage. Supports file list and FUSE	The cluster consists of NameServer, MetaServer, and ChunkServer. NameServer uses the raft algorithm and selects the Leader based on Neuxs or Zookeeper. When the Leader fails, it automatically resends the Leader. High Availability of the ChuckServer: to be analyzed For multiple data centers, multi-server support is the best.	PB-level storage solution	To be analyzed	To be analyzed	Language SDK: It is accessed through a dedicated SDK and does not support Java, but can be implemented through FUSE bridging. In Windows, it is estimated that Cygwin is required for access. REST interface: None Http Read File: NameServer provides access.
Seaweedfs	Powerful functions, it seems very promising, with few Chinese materials. The Doc says "zhongtong Express" is in use. Since it has never been used, it is difficult to say whether it is convenient for O & M. Home: https://github.com/chrislusf/seaweedfs Deployment and use instructions: http://blog.chinaunix.net/uid-25057421-id-5676348.html Official Docker: https://hub.docker.com/r/chrislusf/seaweedfs/	Key-value storage. You can upload data to a specified directory. Therefore, the R & D/test environment can share the same file server. Filer supports file lists, but does not support FUSE.	A cluster consists of multiple masters and Multiple volume servers. The replication behavior between volume is determined by the replication policy. Supports multiple data centers and multiple replication policies.	PB-level storage solution Increasing the capacity of a volume server is related to the replication policy.	Supports compressing files in gzip format by pre-compression to convert files into gzip files	Etag and If-Modified-Since are supported.	Language SDK: Actually All sdks are accessed through the REST interface. Java version is available. REST interface: the volume and filer servers provide different levels of interfaces. volume adopts the key-value Method and filer adopts a directory-like method. Http File Reading: Provided by the filer Server

In summary, the seaweedfs ecosystem is quite complete and the author has been updating it. FastDFS is also a good choice.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More