Open-Source Distributed File System

Source: Internet
Author: User

Fastdfs is an open-source lightweight distributed file system that manages files, including file storage, file synchronization, and file access (file upload and download, it solves the problems of large-capacity storage and load balancing. It is especially suitable for online services with files as the carrier, such as photo album websites and video websites.

The fastdfs server has two roles: tracker and storage ). The tracker is mainly used for scheduling and serves as a server Load balancer for access.

The storage node stores files and implements all the functions of file management: storage, synchronization, and access interfaces. fastdfs also manages the metadata of files. The meta data of a file is related to the file. It is expressed in key valuepair mode, for example, width = 1024, where key is width and value is 1024. The object metadata is a file property list that can contain multiple key-value pairs.

Shows the fastdfs system structure:

Both the tracker and storage nodes can be composed of multiple servers. Servers in the tracker and storage nodes can be added or removed at any time without affecting online services. All servers in the tracker are equal to each other and can be increased or decreased at any time according to the pressure on the server.

To support large capacity, storage nodes (servers) are organized by means of volumes (or groups. The storage system consists of one or more volumes. The files between the volumes are independent of each other. The file capacity of all volumes is the file capacity of the entire storage system. A volume can be composed of one or more storage servers. The files on the storage servers under a volume are the same. Multiple storage servers in the volume play the role of redundant backup and load balancing.
When a server is added to a volume, the system automatically synchronizes the existing files. After the synchronization is completed, the system automatically switches the new server to the online service provider.

When the storage space is insufficient or is about to run out, you can dynamically Add a volume. You only need to add one or more servers and configure them as a new volume to expand the storage system capacity.
The file ID in fastdfs is divided into two parts: the volume name and the file name. Both are indispensable.


Fastdfs File Upload
Interaction Process of uploaded files:
1. The client asks the tracker about the storage to be uploaded. No parameters need to be attached;
2. Tracker returns an available storage;
3. The client directly communicates with storage to complete file upload.

Fastdfs File Download
Interaction process of downloading files:
1. The client asks the tracker about the storage of the downloaded file. The parameter is the file ID (volume name and file name );
2. Tracker returns an available storage;
3. The client directly communicates with storage to complete file download.

It should be noted that the client is the caller using the fastdfs service, and the client should also be a server. Its calls to tracker and storage are both inter-server calls.

Google Code: http://code.google.com/p/fastdfs/
Google Code: http://code.google.com/p/fastdfs/downloads/list

Mogilefs: An Open-Source Distributed File System

1. Application Layer-no special component requirements
2. No spof-the three components started by mogilefs (storage nodes, trackers, and databases for tracking) can run on multiple machines, so there is no spof. (You can also run the tracker and storage node on the same machine, so that you do not need to use four machines.) We recommend that you have at least two machines.
3. automatic File Replication-files are based on their "classes". files can be automatically copied on multiple storage nodes. This is to use the "class" only when there are as few copies as possible. Three copies of JPEG images are added to your image site, but only one or two copies are available. Then, mogile can create a new number of missing copies. In this way, mogilefs (not raid) can be saved on disks. Otherwise, you will store multiple copies of the same data, which is completely unnecessary.
4. "Better than raid"-in the creation of a non-san raid in a non-storage area network, the disk is redundant, but the host is not. If your entire machine is broken, files cannot be accessed. Mogilefs Copies files between different machines, so files are always available.
5. transmission is neutral, and there is no special protocol-The mogilefs client can communicate with the mogilefs storage node through NFS or HTTP, but first inform the tracker.
6. Simple namespace: The file is determined by a given key, which is a global namespace. You can generate multiple namespaces by yourself, as long as you want, but this may cause conflicting keys in the same mogilefs.
7. Don't share anything-mogilefs doesn't need to rely on expensive San to share disks. Every machine only needs to maintain its own disks.
8. Raid is not required. The disk in mogilefs can be raid or not. For security purposes, raid is not required because mogilefs already provides raid.
9. The file system itself will not be unknown-the disk of the storage node in mogilefs can be formatted into multiple cells (ext3, reiserfs, etc ). Mogilesfs will hash its internal directory, so it will not encounter some restrictions on the file system itself, such as the maximum number of files in a directory. You can use it with confidence.

Comparison between fastfds and mogilefs

Fastdfs draws some ideas from mogilefs during design. Fastdfs is a complete distributed file storage system that reads and writes files through client APIs. It can be said that all features of mogilefs are provided by fastdfs. The mogilefs website is http://www.danga.com/mogilefs /.

In addition, fastdfs has the following features and advantages over mogilefs:
1. fastdfs is well-developed and can be directly used without secondary development;
2. Compared with mogilefs, fastdfs reduces the database used for tracking. There are only two roles: tracker and storage. The fastdfs architecture simplifies the system and eliminates performance bottlenecks;
3. it is easy to add servers with any roles in the system: when adding a tracker server, you only need to modify the storage and client configuration files (add a tracker configuration line); when adding a storage server, you do not need to modify any configuration files. The system automatically copies existing files in the volume to the server;
4. fastdfs is more efficient than mogilefs. It is manifested in the following aspects:
1) See the above 2nd. fastdfs and mogilefs have higher overall performance than non-file index databases;
2) fastdfs is more underlying and efficient than mogilefs in the development language. Fastdfs is written in C language, with less than 20 thousand lines of code and no dependency on other open-source software or packages. The installation and deployment are particularly concise, while mogilefs is written in Perl;
3) fastdfs directly uses the socket communication mode, which is more efficient than the http mode of mogilefs. In addition, fastdfs uses sendfile to transmit files and uses zero-copy memory, which results in lower system overhead and higher file transmission efficiency.
5. fastdfs has detailed design and usage documents, while mogilefs documents are relatively lacking.
6. The fastdfs log records are very detailed. Any error information generated when the system is running will be recorded in the log file. When a problem occurs, the administrator can locate the error.
7. fastdfs also accesses the file's additional attributes (such as the file size, image width, and height). applications do not need to use databases to store the information.
8. fastdfs supports only one copy of the same file content from v1.14, which saves storage space and improves file access performance.

 

Fastdfs is an open-source distributed file system written in pure C language. She manages files, including file storage, file synchronization, file access (File Upload and file download), and solves the problems of large-capacity storage and load balancing. It is especially suitable for online services with files as the carrier, such as photo album websites and video websites.

For more information, see fastdfs introduction.
Fastdfs Official Website: fastdfs Official Website

Http://www.cnblogs.com/lzjsky/archive/2011/06/28/2092231.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.