File synchronization mechanism of FASTDFS Distributed File system

Source: Internet
Author: User

in the previous articles we introduced the FASTDFS System overview, tracker server, storage server, and file upload, download, delete and other functions,

This article describes the synchronization between different storage servers in the same group and the addition of Storage server synchronization.

Fastdfs File System Structure


Fastdfs File System principles


From the Fastdfs file system structure we can see that whether it is uploading files, deleting files, modifying files and adding Storager server, the synchronization of files is the same group

Between multiple Storager servers within the server. Let's look at how the Fastdfs file system developer describes the synchronization mechanism (from Chinaunix):


Storage server does not appear in the configuration file for tracker server, and all tracker servers are listed in the configuration file for storage server.

This determines that the connection between storage server and Tracker server is initiated by storage server, and storage server initiates a thread for each tracker server

to connect and communicate .


Tracker server saves storage groupings and storage servers under each group in memory and connects to its own storage server and its groupings

Saved to a file so that storage related information can be obtained directly from the local disk the next time the service is restarted. Storage server records all the servers in the group in memory,

and logs the server information to a file. Tracker server and Storage server synchronize storage server list with each other:


1. If the state of a new storage server or storage server is increased within a group, tracker server will storage the server column

The table is synchronized to all storage servers within the group. Take the new storage server as an example, because the newly joined storage server Active connection Tracker Server,tracker

When server discovers a new storage server join, it returns all storage servers in the group to the newly joined storage server and re-storage the group

The server list is returned to other storage servers within the group;

2. If a new tracker Server,storage server is added to the tracker server, the storage server returned by the tracker server is found

The list is less than the native record, and the storage server that is not on the tracker server is synchronized to the tracker server.


Storage servers within the same group are peers, and file uploads, deletions, and so on can be performed on any of the storage servers. File synchronization

Only between storage servers in the same group, using push mode, where the source server synchronizes to the target server. Take file upload as an example, assuming that there are 3 storage in a group

Server A, B, and C, file F is uploaded to Server B, and B synchronizes file F to the remaining two servers A and C. We might as well upload the file F to Server B as the source of the operation.

The f file on Server B is the source data, and the file F is synchronized to server A and C for backup operations, the F file on a and C is the backup data. The synchronization rules are summarized as follows:


1. Synchronize only between storage servers in this group;

2. The source data needs to be synchronized, the backup data does not need to be synchronized again, otherwise it constitutes a loop;

3. The exception to the second rule above is that when a new storage server is added, all data already available from an existing storage server (including source data and

Backup data) to the new server;


Storage server has 7 states, as follows:

# Fdfs_storage_status_init: Initialization, not yet available source server for synchronizing existing data

# Fdfs_storage_status_wait_sync: Waiting for synchronization, the source server has been synchronized with the existing data

# fdfs_storage_status_syncing: In sync

# fdfs_storage_status_deleted: Deleted, the server is removed from this group (note: The functionality of this state has not yet been implemented)

# Fdfs_storage_status_offline: Offline

# Fdfs_storage_status_online: Online, no service available

# fdfs_storage_status_active: Online, can provide service


When the status of STORAGE server is Fdfs_storage_status_online, when the STORAGE server initiates a heart beat to tracker server,

Tracker server changes its status to Fdfs_storage_status_active.



When a new storage server A is added within the group, the system automatically completes the existing data synchronization and the processing logic is as follows:


1. Storage Server A connects tracker Server,tracker server to set storage Server A's status to Fdfs_storage_status_init.

Storage Server A queries the source server for the append synchronization and the append synchronization up to a point in time if only storage server A or the number of files that have been successfully uploaded within the group is 0.

No data needs to be synchronized, storage Server A can provide online services, tracker its status to Fdfs_storage_status_online, otherwise

Tracker server will set its status to Fdfs_storage_status_wait_sync and proceed to the second step;


2. Assume that tracker server assigns a source storage server that synchronizes existing data to storage server A to B. Storage server and Tracker server in the same group

The communication learns that storage server A is new, initiates the synchronization thread, and asks tracker server for the source server and the up-time point at which to append synchronization to storage Server A.

Storage Server B synchronizes all data before the point-in-time to storage server A, while the remaining storage servers synchronize normally from the point of time, only

Synchronize the source data to storage server A. By the end of the time, storage Server B's synchronization with storage server A will be switched to normal synchronization by the append synchronization.

Synchronization of source data only;

3. Storage Server B synchronizes all data to storage server A, storage Server B requests Tracker server to synchronize when no data is being synchronized

Storage Server A has the status set to Fdfs_storage_status_online;

4 Tracker server changes its state to fdfs_storage_status_active when STORAGE Server A initiates heart beat to tracker server.


From the above description, I think the author's description is very clear, so that our users can be very easy to understand.


Synchronization Time Management


We have just learned from above the mechanism of synchronization between multiple storage servers in a Fastdfs file system, and when does the file synchronization take place? After the file upload is successful,

Other storage servers start syncing, how do other storage servers feel, and tracker server notifies storage server?


Managing Synchronization Time

When a file is uploaded successfully, and the client immediately initiates a request to download the file (or delete the request), how does tracker select an applicable storage server?

In fact, each storage server needs to periodically escalate its own information to tracker, which includes local synchronization time (that is, the timestamp of the latest file synchronized to).

Tracker, based on the escalation of each storage server, is able to know which file has just been uploaded and whether synchronization has been completed in that storage group. In storage server, this information is

exists in the form of a binlog file.


Binlog file

When Storaged server starts, a Base_path/data/sync synchronization directory is created, and the files in that directory are the same as the other storaged servers in the same group.

Synchronize state files, such as 192.168.1.2_33450.mark 192.168.1.3_33450.mark binlog.100 (Binlog.index);

192.168.1.2_33450.mark 192.168.1.3_33450.mark binlog.000 Binlog.index

Binlog.index records the Binlog file ordinal that is currently in use, such as 10, which indicates the use of binlog.010

Binlog.100 truly Binlog file

192.168.1.2_33450.mark sync state file, log the sync status of machine to 192.168.1.2_33450

Content in the Mark file: consists of Binlog_index and binlog_offset two items, with 192.168.1.2_33450.mark as an example where Binlog_index represents the last synchronization 192.168.

1.2 The last piece of the machine

The Binlog file index, Binlog_offset, represents the last Binlog offset of the last synchronization 192.168.1.2 machine, and if the program restarts, it is only starting from this position to synchronize backwards.

Binlog file contents: In this file is composed of Binlog log, for example:

1470292943 C m00/03/61/qkipafdqcl-aqb_4aaiai4iqlzk223.jpg

1470292948 C M00/03/63/qkipafdwpucafiraaag93go_2ew311.png

1470292954 D m00/03/62/qkipafdwoyeao3euaabvalumg64183.jpg

1470292959 C m00/01/23/quipafdvqz2al_o-aaamrbamk3s679.jpg

1470292964 C m00/03/62/qkipafdvoscacxeqaagtdbqsdvs062.jpg

1470292969 C M00/03/62/qkipafdvonkaxu1naabq9pkfsms63.jpeg

1470293326 D M00/03/62/qkipafdvmngazyszaabq9pkfsms33.jpeg


Each of these records is divided into three fields using space characters, respectively:

The first field represents a file upload timestamp such as: 1470292943

The second field represents a file execution operation with the following values

c indicates source creation, C represents replica creation

A indicates that the source is appended, a represents a copy append

D means source Delete, D for copy Delete

T means the source truncate, T represents the replica truncate

Where the source means that the storage that the client directly operates is the source, and the other storage are replicas

The third field represents a file such as M00/03/61/qkipafdqcl-aqb_4aaiai4iqlzk223.jpg


Storage Server Specific synchronization process


From the Fastdfs file synchronization principle we know that synchronization between storaged servers is handled by a separate thread, and all operations in this thread are synchronized

Implementation of the. For example, a group of servers have a, B, c three machines, then on each machine there are two threads responsible for synchronization, such as a machine, thread 1 is responsible for synchronizing data to B, thread 2 is responsible for the same

Step data to C. Each synchronization thread is responsible for synchronizing to a single storage, in a blocking manner.

Take the IP-192.168.1.1 storaged Severe server as an example, its synchronization directory has 192.168.1.2_33450.mark 192.168.1.3_33450.mark binlog.100

The files are now storaged severe will synchronize data from the storage of storaged severe with IP 192.168.1.2.


1) Open the mark file for the corresponding storage server, such as sync to 192.168.1.1 to open the 192.168.1.2_33450.mark file from which to read Binlog_index,

Binlog_offset two field values, such as the value of: 100, 1000, then open the binlog.100 file, seek to 1000 this position.

2) Enter a while loop, trying to read a line, if not read the sleep wait. If a row is read, and the row is operated as a source operation, such as C, A, D, T

(in uppercase), the action specified by the line is synchronized to the other (the non-source operation does not require synchronization), and the Binlog_offset flag is updated after the synchronization succeeds, and the value is periodically written to the 192.168.1

Among the. 2_33450.mark files.

Synchronization may occur because synchronization is slow, causing the file to be deleted by the client before synchronizing a file, at which point the synchronization thread prints a log and then directly processes the

The binlog of the face.


Note: The Mapping and section descriptions of this document refer to the Internet, Chinaunix Fastdfs discussion area and the UC technology blog, if it involves copyright issues, please contact me to delete.


File synchronization mechanism of FASTDFS Distributed File system

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.