Microsoft Sync Framework basics 3: Microsoft Sync Framework metadata and synchronization process

Source: Internet
Author: User
Microsoft Sync Framework Metadata and synchronization process Metadata (Metadata)

Microsoft Sync Framework provides a complete synchronization platform for offline and collaborative applications, data storage, and devices without the following restrictions:

  • Synchronous data type
  • Data storage type
  • Transmission Protocol
  • Network topology, such as point-to-point or client-server Topology

On the contrary, the Sync Framework allows the Sync Framework to complete the following work through a common metadata model:

  • Implement synchronization process interoperability
  • Reduces the amount of data transferred between two data stores involved in synchronization
  • Make synchronization independent of any network topology, data type, data storage and transmission protocols

In this blog, we will learn and understand the general metadata model and its components in detail. Of course, we will also discuss how the Sync Framework uses metadata to synchronize different data storage and copies.

What is metadata

Literally, metadata is "data about data ". Microsoft Sync Framework uses two types of metadata:

  • Copy metadata (Replica metadata)
  • Item metadata)

In the Sync Framework, a copy usually refers to a real data storage. For example, if we are synchronizing two databases, each database is a copy and the copy can contain projects. For example, for a database, a project can be a record in a table.

To synchronize two copies, the Sync Framework requires the synchronization provider to use a common metadata model, which is also the core of the Sync Framework. The Sync Framework provider uses metadata to detect replica updates, but the provider itself does not need to understand synchronization metadata (this is the responsibility of the Sync Framework during runtime, it will help the provider parse the synchronized metadata ).

The synchronization provider queries metadata during runtime to find the data updates made by a copy since the last synchronization. Synchronization of metadata also helps detect and process conflicts. A conflict occurs when the same project is modified simultaneously in two copies during synchronization. The synchronization provider uses metadata to determine whether a project is in a conflict state and uses metadata to resolve the conflict. Synchronization metadata is also used when the Sync Framework is running to solve common synchronization problems such as network failures, conflicting data, and application errors.

Metadata Store)

You may be curious about where metadata is stored. In fact, metadata can be stored anywhere: a file, a separate database, or a copy involved in synchronization. The only thing we need to ensure is that the metadata storage can be accessed through programming. The Sync Framework obtains and updates metadata by calling different methods of the synchronization provider. To store our own metadata, we need to return and modify our metadata when the corresponding method is executed.

However, in most cases, we host tasks that operate on metadata to Sync Framework runtime, this is because Microsoft Sync Framework provides a complete metadata storage implementation based on SQL Server Compact Edition, as shown in. This storage is not necessary, but using it means you don't have to worry about how to store synchronization metadata.

Whether you use built-in metadata storage or custom metadata storage depends on the developer who creates the synchronization program. However, using built-in metadata storage means you do not have to worry about how to store synchronization metadata.

Metadata synchronization component

Metadata for data storage can be divided into three main components:

  • Version)
  • Knowledge)
  • Tombstones
Version)

The synchronized version is associated with the project in the copy: This information records the project where the project was created and the changes occurred, and the Project ID associated with the project. For example, in Database Synchronization, the synchronized database can be replica A and replica B. A project may be a table in the database, or a record in the table, or even a column of a row in the table.

When the project changes, the stored information about the changes will includeCreate versionAndUpdate version. These versions contain two components:

  • Tick count: A logical clock that uniquely identifies a change within the range of the source copy
  • Copy ID: Used to uniquely identify the data storage that has changed.

When a project is created for the first time, the created version is the same as the updated version. Subsequent Updates to the project are only updated.

For example, assume that the data in the Customer table in copy A involved in synchronization is as follows:

If we use the ID field in the table as the Project ID, the version information recorded in the table can be expressed in the following table:

The process of recording project versions is also called change tracking ). To implement change tracking, the synchronization provider needs to update the synchronized version of the modified project at any time in the copy. There are two ways to implement change tracking (version:

  • Inline tracking:

In this method, version modification occurs when the project in the copy is modified. This method is usually used when we can embed the modified version information into the copy itself. Taking the database as an example, we can use the trigger to update the change tracking table immediately after updating the row.

  • Asynchronous tracking:

In this method, an external process is run to scan for changes. Any updates found will be added to the version information. This process may be part of a regular execution process, or it may be executed before synchronization. This process is usually used when there is no internal mechanism to automatically update version information when the project is updated (for example, the version update logic cannot be added in the update process ). The common method for checking changes is to store the project status and compare the storage status with the current project status. For example, you can check whether the last write time or file size has changed since the last synchronization.

Knowledge)

Knowledge is a simplified representation of data changes that can be perceived by copies. The version is associated with the project, while the knowledge is associated with the sync scope. Knowledge contains information that is directly or indirectly modified on a copy. The synchronization provider generally does not directly use knowledge. Instead, the Sync Framework will call the provider's method to operate on the copy when running. The purpose of knowledge is to make synchronization more effective because it helps limit the amount of information sent between replicas. When version information is updated, the knowledge used for data storage is also updated.

The provider uses replica knowledge for the following purposes:

  1. Enumeration changes: confirm that the other copy has no perceived changes.
  2. Check conflict: determines which operation is performed without mutual knowledge.

The knowledge of A copy is composed of the copy ID and the maximum tick count in the copy. The above Database Synchronization example shows that the knowledge of copy A in the Customer table is A2.

Tombstones

Each copy must also maintain the tombstone information for each deleted project. If you do not track deletion information, the provider cannot notify you that a project (such as a file) has been deleted. In this case, the provider cannot send the changed version information to other providers. Tombstone must contain the following information:

  • Global ID: Used to uniquely identify the copy ID and tick count of the tombstone project among all copies.
  • Delete version: Updated Version associated with the tombstone Project
  • Version creation: ID and tick count of the associated copies when the project is initially created

Because the information in the tombstone log will increase over time, it is necessary to create a process to regularly clean up the storage. Clearing tombstone data saves space and helps improve synchronization performance. Microsoft Sync Framework supports tombstone information management.

 

Synchronization Process

After learning about the synchronization metadata, we can learn the synchronization process. The copy that initiates the synchronization is called the source and the copy connected by the source is called the target. The following section describes the synchronization process. For bidirectional synchronization, the process is executed twice, and the source and target are exchanged during the second iteration.

1. Initiate a synchronization session

In this phase, a synchronization session is established to create a link from the source to the provider.

2. prepare and send knowledge to the target

As mentioned above, each copy stores its own knowledge. The knowledge stored on the target end is transferred to the source.

3. The target knowledge is used to determine the changes to be sent.

At the source end, you will compare the knowledge you just received with the local project version to determine the project that the target end does not know. It is worth noting that the version sent is not the actual project, but the abstract of the last change location of each project.

4. Change the version and source knowledge to the target end.

After the source has prepared the list of required version changes, these versions will be transmitted to the target end.

5. Retrieve the local version of the changed project and compare it with the source version and knowledge.

The target end uses these versions to prepare the list of projects to be sent from the source. The target also uses this information to check whether there is a restriction conflict. A restriction conflict is a violation of Project restrictions, such as a folder relationship or a location with the same name in a file system.

6. Detect and resolve or delay conflicts

Basically, changes to the same project on the two replicas will conflict during the two synchronization. When Microsoft Sync Framework is running, a conflict is detected when the changed version of one copy does not contain the changed knowledge of the other. The following "Conflict Examples" section describes how the detection process works in more detail.

Replicas allow you to implement various policies to resolve conflicts between the synchronization topology. The following lists some common Conflict Resolution policies:

  • Source win: when a conflict is detected, changes made to the source copy are always used.
  • Win: always use the changes made by the target copy.
  • Merge: changes made to the source and target replicas are merged. Inventory statistics may be an example where you want to merge (SUM) The values of two replicas, rather than selecting one of them as the correct values.
  • Record conflicts: records or deferred conflicts.

7. Request project data from the target to the source

At this stage, the target has determined the project to be retrieved in the source and sent the request to the source.

8. prepare and send project data from the source

The source receives the project data request and prepares the actual data to be transmitted to the target. If the project to be tracked is a row in the database, the row is sent. If the project is a file in a folder, the file is transmitted.

9. Apply the project to the target

The target receives and applies the project. If any errors (such as network disconnection) occur during this process, the project is marked as abnormal and corrected during the next synchronization. The knowledge received from the source is added to the target knowledge.

 

Synchronization example

By using the synchronization process described above, we will perform an actual file synchronization example. This example shows how Microsoft Sync Framework uses metadata to enumerate changes and finally applies project data. In this example, there are two copies: Copy A and copy B. Copy A starts synchronization with copy B (that is, copy A is the source and copy B is the target ). Suppose we want to synchronize the files between two copies. The project to be tracked is a file In the folder, represented as In (for example, I1, I2, I3 ...). When creating a new file (I1), the metadata associated with the file should be updated as follows:

If the file is updated again, the version table should be as follows:

In the preceding example, the updated tick count is set to 5 because the logical clock used for the tick count plays a role throughout the source, namely: tick count 2-4 is used to change other items in the copy.

For example, in the following example, there are two additional items I2 and I3 in the trace copy. As you can see, as more projects are created, more and more version information will be generated. Microsoft Sync Framework does not require the storage of previous versions. It only needs to know the latest version.

If the current project status of this copy is used, we can express the knowledge of copy A:

Copy A knowledge = A5

As mentioned above, knowledge is a simple expression of data changes that can be perceived by copies. In this example, A is the unique ID assigned to the copy, and 5 is the current answer count, which enables the copy to understand the maximum number of changes currently. If the replica has been synchronized with any other replica, we will also see this knowledge in the list.

There may also be many files on copy B. The copy is as follows:

Copy B

The current knowledge of copy B is:

Copy B knowledge = B4

Now we choose to start synchronization between two copies. Copy A will become the source (start the synchronization copy), and copy B will become the target.

In the synchronization process, the target sends its knowledge to the source. As mentioned above, the knowledge of the two replicas is as follows:

Copy A knowledge = A5

Copy B knowledge = B4

The source (copy A) receives this knowledge and uses it to determine which version to send to the target. Because copy B does not know any items in copy A, it sends all content. In this example, copy A contains the following versions.

CopyChange batches

The target will receive these versions and enumerate them to determine which items need to be requested from the source. It also uses this information to determine whether there is any conflict (for example, the same file is updated on two copies ).

After the request is completed, the target request SOURCE sends a project that is not perceived by the target request source. In this example, copy A is sent with I1, I2, and I3.

The target receives these files and adds them to its own folder. The project of copy B will now contain the project received from copy.

Copy B-Updated project table

After the synchronization is completed, the process will be executed again. This time, the source will become the target and the target will become the source. This allows copy A to receive any files created or changed on copy B (I104 and I105 ).

After synchronization, both copies should contain the following update knowledge.

Copy A knowledge = A5, B4

Copy B knowledge = A5, B4

Conflict examples

In the previous example, the two copies are now "synchronized", and the target version of each item is as follows:

Similarly, the knowledge of two replicas is as follows:

Copy A knowledge = A5, B4

Copy B knowledge = A5, B4

At this time, both copies determine to update the same file (Project I2 ).

On copy A, the target version table is updated:

On copy B, the target version table is updated:

The knowledge of the two replicas is also updated:

Copy A knowledge = A6, B4

Copy B knowledge = A5, B5

Copy A starts synchronization with copy B. Skip the step of sending the project version and knowledge from the source to the target, and perform the following steps for project I2.

1. Copy B to see the new change of Project I2, which is:

2. view the knowledge (A6, B4) received from copy A and confirm that copy A does not understand the changes made by copy B to the same project:

3. Transfer the detected conflicts to the application or provider for processing.

As described above, the application can choose how to handle conflicts or delay processing. If the conflict is delayed, it will appear again at each synchronization before it is resolved. Once the conflict is resolved, the original copy receives the updated value during the next synchronization.

In the first part of the series of blogs, we learned the background of synchronization, the advantages of synchronization, and the implementation of Microsoft Sync Framework; describes the Core Components and system architecture of the Sync Framework and various types of synchronization participants. discusses in detail the functions and synchronization processes of the Sync Framework metadata, the instance shows how metadata is used for synchronization during running. So far, we should have a good understanding and understanding of the theoretical basis of the Sync Framework. Next, we will discuss in detail how to use the specific Synchronization provider (Synchronization Providers) to synchronize various data, such as databases, files, Web Feeds...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.