Fastdfs principle and process

Source: Internet
Author: User
Tags current time file size http redirect

Preface:

(1) Each time the file is uploaded, an address will be returned and the user will need to save the address.

(2) In order to support large capacity, storage nodes (servers) are organized in a sub-volume (or grouping) way. The storage system consists of one or more volumes, and the files between the volumes and volumes are independent of each other, and the cumulative file capacity of all volumes is the file capacity of the entire storage system. A volume can consist of one or more storage servers, and the files in a storage server under a volume are the same, and multiple storage servers in the volume play a role of redundant backup and load balancing.


NET pick 1

Fastdfs is an open source lightweight Distributed file system consisting of three parts of the tracking Server (tracker server), storage server (storage server) and client (clients), which mainly solves the problem of massive data storage. It is particularly suitable for online services with medium and small files (recommended range: 4KB < file_size <500MB).

Storage Server

Storage Server (hereafter referred to as Storage) is organized in groups (volume, group or volume), a group contains multiple Storage machines, data is backed up, and storage space is based on the Storage with the smallest group content. Therefore, it is recommended that multiple storage within the group be configured identically to avoid wasted storage space.

Group-based storage facilitates application isolation, load balancing, and customization of replicas (the number of storage servers within a group is the number of copies of that group), such as storing different application data in different groups to isolate application data. At the same time, the application can be assigned to different group based on the access characteristics of the application to do load balancing, the disadvantage is that the capacity of group is limited by the capacity of single-machine storage, and when the group of machines is broken, data recovery can only rely on the group of other machines in the mainland, resulting in a long recovery time.

The storage of each storage in the group relies on the local file system, storage can be configured with multiple data storage directories, such as 10 disks, mounted in/data/disk1-/data/disk10, respectively, You can configure these 10 directories as storage data store directories.

Storage when a write file request is received, the file is stored by selecting one of the storage directories according to the configured rules (described later). In order to avoid too many files in a single directory, at the first start of the storage, a 2-level subdirectory is created in each data storage directory, each level 256, a total of 65,536 files, the new file will be hashed in a way to be routed to one of the subdirectories, The file data is then stored directly in the directory as a local file.

Tracker Server

Tracker is the coordinator of FASTDFS, responsible for managing all storage server and group, each storage will connect tracker after launch, inform the group and other information, and maintain a periodic heartbeat, Tracker establishes a mapping table for Group==>[storage server list based on storage heartbeat information.

Tracker need to manage the meta-information is very small, will be stored in memory, and the meta-information on the tracker is generated by storage reported information, itself does not need to persist any data, which makes tracker very easy to expand, Directly increase the tracker machine can be extended to tracker cluster to serve, cluster each tracker is fully equivalent, all tracker accept stroage heartbeat information, generate metadata information to provide read and write services.

Upload file

FASTDFS provides users with basic file access interfaces, such as upload, download, append, delete, etc., which are provided to the user in the form of client libraries.

Select Tracker Server

When there is more than one tracker server in the cluster, the client can select any trakcer when upload files because tracker is a fully peer relationship.

Select the stored group

When tracker receives a request for upload file, it assigns a group that can store the file, supporting the following rules for selecting Group: 1. Round Robin, all the group polling 2. Specified Group, specify a certain group 3. Load balance, excess storage space Group priority

Select Storage Server

When group is selected, tracker selects a storage server within the group to the client, supporting the following rules for selecting storage: 1. Round Robin, poll 2 across all storage within the group. First server ordered by IP, sorted by IP 3. First server ordered by priority, prioritized (priority is configured on storage)

Select storage Path

When storage server is assigned, the client sends a write file request to storage, storage will assign a data store directory to the file and support the following rules: 1. Round Robin, poll 2 between multiple storage directories. Most of the remaining storage space takes precedence

Generate Fileid

After the storage directory is selected, storage creates a Fileid for the file, which is made up of storage server IP, file creation time, file size, file Crc32, and a random number, and then the binary string is Base64 encoded and converted to a printable string.

Select Level Two directory

When the storage directory is selected, storage will assign a Fileid to the file, each storage directory has a sub-directory of level two 256*256, storage will be a file Fileid two hash (guess), routed to one of the subdirectories, The file is then stored in Fileid as the filename in this subdirectory.

Generate file name

When the file is stored in a subdirectory, the file is considered to be stored successfully, then a file name is generated for the files, which is made up of group, storage directory, level two subdirectory, Fileid, file suffix name (specified by the client, mainly used for distinguishing file types).

File synchronization

When a file is written, the client writes the file to a storage server within the group that the file is considered successful, and after the storage server finishes writing the file, the file is synchronized by the background thread to the other storage servers in the same group.

Each storage write a file, at the same time will write a copy of the Binlog,binlog does not include the file data, only the file name and other meta-information, the Binlog for background synchronization, storage will record the progress of other storage synchronization in the group, So that the last progress can be resumed after the restart, the progress is recorded in a timestamp, so it is best to keep the clocks of all servers in the cluster in sync.

Storage synchronization progress will be reported to the tracker as part of the metadata, Tracke will use synchronization progress as a reference when choosing a read storage.

For example, a group has a, B, c three storage server,a to C synchronization to a progress of T1 (T1 previously written files have been synchronized to B), B to C synchronization to a timestamp of T2 (T2 > T1), Tracker received these synchronization progress information, will be organized, the smallest one as the synchronization timestamp C, in this case, T1 is the synchronization timestamp of C is T1 (that is, all T1 previously written data has been synchronized to c); Similarly, according to the above rules, Tracker generates a synchronization timestamp for a and B.

Download file

After the client upload file succeeds, it will get the file name generated by the storage, and then the client can access the file according to the file name.

As with upload file, the client can select any tracker server when download file.

Tracker send download request to a tracker, must bring the file name information, Tracke from the file name to resolve the group, size, creation time and other information, and then select a storage for the request to service read requests. Since the files in the group are synchronized asynchronously in the background, it is possible that the files are not synchronized to some storage servers at the time of reading, in order to avoid access to such storage as far as possible. Tracker Select the readable storage within group according to the following rules.

1. The file is uploaded to the source storage-source storage as long as the survival, must contain this file, the source address is encoded in the file name. 2. File creation timestamp ==storage is synchronized to the timestamp and (current time-file creation timestamp) > file synchronization maximum time (such as 5 minutes)-After the file is created, it is believed that after the maximum synchronization time, it must have synchronized to the other storage. 3. File creation Timestamp < storage is synchronized to the timestamp. -The file before the sync timestamp was determined to be 4 synchronized. (Current time-file creation timestamp) > Sync delay threshold (e.g. one day). -After the synchronization latency threshold time, the file must have been synchronized.

Small file Merge Storage

Merging small files with storage primarily addresses the following issues:

1. The number of local file system Inode is limited, and the number of small files stored is limited. 2. Many files in the multi-level directory + directory, resulting in a high cost of access to the file (may result in multiple IO) 3. Low backup and recovery efficiency by small file storage

Fastdfs in the V3.0 version of the introduction of small file merge storage mechanism, can save multiple small files to a large file (trunk file), in order to support this mechanism, FASTDFS generated file Fileid need an additional 16 bytes

1. trunk file ID 2. The file is offset 3 inside the trunk file. The amount of storage space used by the file (byte alignment and delete space reuse, the file occupies the storage space >= file size)

Each trunk file is uniquely identified by an ID, and the trunk file is created by the trunk server within the group (trunk server is tracker selected) and synchronized to the other storage within the group. After the file store is merged into a trunk file, the file can be read from the trunk file according to its offset.

Files in the trunk file offset encoding to the file name, it is determined that the location within the trunk file can not be changed, you can not through the compact way to recover files within the trunk file deleted space. However, when the trunk file is deleted, the deleted space can be reused, such as a 100KB file is deleted, the next storage of a 99KB file can be directly reused this deleted storage space.

HTTP Access Support

Fastdfs's tracker and storage have built-in support for the HTTP protocol, which allows clients to download files via the HTTP protocol, tracker the request via HTTP redirect mechanism to redirect requests to the storage on which the file resides. In addition to the built-in HTTP protocol, FASTDFS provides support for downloading files via Apache or nginx extension modules.

Other features

FASTDFS provides an interface for setting/getting File extension properties (Setmeta/getmeta), with extended properties stored on storage with the same name file (with a special prefix or suffix) in key-value pairs, such as/group/m00/00/01/ Some_file is the original file, the extended properties of the file are stored in the/group/m00/00/01/.some_file.meta file (which is not necessarily the case, but the mechanism is similar) so that the file that stores the extended properties can be located according to the file name.

The above two interface authors do not recommend, the additional meta-file will further "enlarge" the large amount of small file storage problems, and because the meta is very small, its storage space utilization is not high, such as 100bytes meta file also needs to occupy 4K (block_size) of storage space.

Fastdfs also provides support for Appender file, which is stored through the Upload_appender_file interface, and Appender file allows for append operations upon creation. In fact, the Appender file is stored in the same way as normal files, but the Appender file cannot be merged into trunk.

Question Discussion

From the whole design of Fastdfs, basically has been simple as the principle. For example, the backup data in machine unit simplifies the management of tracker; storage simplifies the management of storage by storing files as-is directly with the local file system, which simplifies the process of writing files to storage, which is successful and then synchronized in the background. But the problems that simple solutions can solve are usually limited, and FASTDFS currently has the following problems (welcome discussion).

Data security Write a success: from the source storage write files to sync to other storage in the Group time window, once the source storage failure, it can cause user data loss, and the loss of data is usually unacceptable to the storage system. Lack of an automated recovery mechanism: When a disk in the storage fails, only the disk can be swapped, and then manually recover the data; Due to machine backup, there seems to be no automated recovery mechanism, unless there is a pre-prepared hot spare disk, the lack of automated recovery mechanism will increase the system operation. Data recovery efficiency is low: when recovering data, it can only be read from other storage in group, and the efficiency of file recovery is very low due to the low efficiency of small files, and the lower recovery efficiency means that the data is in an unsafe state longer. Lack of multi-machine room disaster support: At present to do more room disaster, only additional tools to synchronize data to the backup cluster, no automation mechanism.

Storage space utilization The number of files stored by a single machine is limited by the number of inode each file corresponds to a storage local file system file, and on average each file will have BLOCK_SIZE/2 storage empty

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.