TFS cluster data comparison

Source: Internet
Author: User

TFS does not adopt the current three-copy policy. Instead, it uses the cluster to configure two replicas and synchronize the data to a secondary cluster. The secondary Cluster also has two replicas; this method is convenient for Disaster Tolerance of remote data centers. The data synchronization of the secondary cluster is initiated by the data server of the primary cluster in the background. Currently, the primary cluster can be read and written, and the secondary cluster is read-only, the dual-cluster read/write function has been developed but is not used online.

Since data is synchronized to the secondary cluster asynchronously, when a file read operation falls on the secondary cluster, the synchronization of the file may not be completed yet. At this time, no data can be read from the secondary cluster, this problem is currently avoided through retry (iteration) between clusters after the failure. Another way is to actively synchronize (recursion) the secondary cluster when the secondary cluster finds that the file does not exist ), after synchronization, data is returned to the user. This logic is correct, but there are several problems: (1) recursively synchronize file data from the master cluster in a user request, if the file size is large, the user request times out. (2) The synchronization protocol between the primary and secondary clusters is complicated, when the secondary cluster specifies to synchronize a file (instead of receiving the request in the order in which the primary cluster receives the request), it will disrupt the synchronization timing logic and cause some problems. For example, the file is written (), update (B) and other operations to reach the current status, then after synchronizing to the secondary cluster, the next time the data server (DS) replays the log synchronization, write operation () in fact, it becomes an update operation. In addition, TFs also allows users to delete and hide files. When these operations are combined with the write operation, more subtle problems will occur.

Data inconsistency may occur in all copies of the Primary and Secondary clusters. First, we will discuss the inconsistency in the cluster.

  1. Memory or disk errors on a data server may cause data changes. This problem is currently avoided by checking the file CRC before writing to the disk;
  2. Writing multiple copies only succeeds on the master node, and all other slave copies fail. Because the write operation in the TFS cluster is strongly consistent among multiple copies (theoretically), this problem occurs, will return the failure information to the user, the user does not know the existence of this file, this inconsistency will not cause problems;
  3. The deletion and hiding operations of TFS ensure that the status of the master copy is correct, and the status of the slave copy is not guaranteed; a file may be deleted, but it still exists on some copies; because the file update will only be synchronized to the slave copy after the master completes, the file will not be changed even if it exists on the slave, and the user will not normally access the file after deleting it, malicious users cannot access the file because they do not know the corresponding TFs file name.
  4. If a copy is lost, there will be a background copy task to complete the copy to ensure data security.

Data inconsistency in the cluster is handled accordingly, but data maintenance between clusters is performed independently without affecting each other, once the primary and secondary data is inconsistent (with a low probability), there is no action to deal with it. Data comparison between clusters aims to discover and process data inconsistency between clusters.

The simplest way to compare clusters is to compare blocks one by one (files one by one is unrealistic). Currently, the largest cluster uses about 850 blocks, it takes about 50 ms to obtain the information of all the files on a block (the block needs to be traversed), and the time for a full comparison is about 5 days (the theoretical value of continuous uninterrupted requests ), if a full comparison is not acceptable, an incremental check scheme must be introduced.

Data inconsistency mainly occurs when data is modified. To achieve incremental comparison, you only need to compare the modified block. The main design philosophy is as follows:

  1. The block that the DS record modifies.
  2. Add the checkserver to collect and compare the block information modified by DS.
  3. Perform corresponding processing based on the comparison results.

DS must add a hook to the interface for modifying the block content to record the modified block. Currently, only the hook is added when the write (including update) operation, block replication, and compression are completed, record the ID and modification time of the modified block to a map (the information of the modified block is only recorded in the memory, and it records one copy on all machines corresponding to the block copy, these changes do not affect the system even if they are lost, so persistent Storage is not required. When the checkserver (CS) sends a check request, DS returns the block information (File quantity, total file size, etc.) modified during the last check interval to CS, CS summarizes and compares the information of the same block on all primary and secondary clusters.

Theoretically, After DS returns the block information each time, it can delete the block data modified in this period of time. However, when the number of Ds in the cluster is large, if CS goes down during the check, this check process cannot be reproduced (unless CS persistently obtains block information from DS). Even if the block information is inconsistent, it can only be found when the block is modified next time. According to incomplete statistics, a 2 t disk has about blocks. Since each block has only 12B of metadata, even if all blocks are modified, the total data usage will not exceed 1 MB, it has little impact on system memory usage. Therefore, to simplify CS and make it stateless, the block modification data of DS hook is not deleted.

The CS check is a periodic task. Each CS check is a block from the last check to the interval of this check. To avoid the impact of asynchronous synchronization (write on the master cluster, has not been synchronized to the secondary cluster), adds a limit to the condition of each block check, ModifiedAndIt is not modified after the specified time (for example, 5 min, it can be configured)The CS check interval and block stability time are configured on CS, and the target period of the check is also calculated on CS and passed to ds to avoid the need to restart the DS when modifying the configuration item.

After CS collects the block information modified on DS, it merges the information with the same blockid. Finally, it selects the block with the highest version from each cluster for comparison, if the block data on the primary and secondary clusters is inconsistent, the block needs to be synchronized (the task is separated from the check ). The key issue is how to determine whether the block on the primary and secondary clusters is different? The simplest way is to directly compare the number of files and the total size of files on a block. For blocks that do not exist on a block, you must synchronize them from another cluster. However, this policy has many problems:

  1. A certain check may show that a modified block is stable on the master cluster and is not stable on the cluster (and will soon become stable ), in this case, CS considers that the block needs to be synchronized to the standby Cluster, and the synchronization process is obviously redundant. The solution discussed with @ daoan at first was to add an overlap check time (there was a overlap between the two checks), but it was impossible to distinguish whether the block needs to be synchronized or the above critical State occurs.
  2. The check request for a DS instance fails due to network or other reasons, but in fact everything is normal. CS will also find that the block does not exist on a cluster and think it needs to be synchronized, this synchronization process is also redundant.
  3. The block may be compressed on a cluster, or after a file is written to the master cluster, It is deleted before synchronization, at this time, it is inaccurate to compare the block based on the number of files in the index and the total number of files.

From the above three points, it can be seen that obtaining the hook block and obtaining the file information from the index is inaccurate. To determine whether the data of the two blocks is inconsistent, the deleted files in the block must be excluded, so the whole block needs to be traversed (the overhead is much larger than the index traversal). The final implementation solution is: first, CS performs a check, the block information is collected and compared for a preliminary screening. For blocks with inconsistent data, add them to a recheck list. Finally, for all blocks in the recheck list, for more details (excluding the impact of file deletion), if the block data is still different, it is considered that synchronization is required.

Finally, the cluster synchronization tool synchronizes the block to be synchronized Based on the CS check results to make the entire cluster consistent.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.