Implementing distributed data Replication with SQL

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The concept of replication

Replication, as an important and powerful technology, provides strong support for the storage and processing of distributed data. Microsoft SQL Server can generate a number of
According to the copy, and can distribute the copy of these data to different places, automatic data synchronization, keep all the data copy has the same data. Sql
The server replicates between two databases using a replication pattern called "loosely consistent". These two databases can be on the same computer or
To connect to each other over a local area network (LAN), wide area network (WAN), on different computers.

Our company in the development of "Xi ' an National tax bureau Internal Network Information System", it is the use of SQL Server replication function, to achieve the entire distribution of Xi ' an city
Replication synchronization of the tax data. Xi ' An state tax bureau, including municipal authorities and subordinate 18 of the grass-roots collection and Management branch, located in Xi ' an city different geographical position, through
DDN Line or X.25 line connection, forming the XI ' an National tax bureau WAN. Urban Bureau Information Center Building a Windows NT Server4.0 with Windows NT
Primary domain network environment, each subordinate branch has a server running Windows NT Server 4.0 or Windows NT Server 3.51, joined to
Run as a stand-alone server in the context of a single master domain. Each branch has its own local area network, which is a hybrid network ring of Novell and Windows NT.
Border.

All the taxpayers in XI ' an are to the designated collection and Management branch for tax registration and tax declaration. The IRS has developed FoxPro based on Novell network environment
Tax collection and management software, the collection hall in each branch by Novell's diskless workstation responsible for tax data entry, written to Novell server several DBF files
In Because the data of tax collection and management is distributed in different collection and administration bureaus in the city, it is difficult to search and analyze the tax information.

The development of the new system, on the original basis, the preparation of a special data conversion program, stored in Novell server DBF files to the local branch of Windows
SQL Server database on NT. The SQL Server database for each branch office is replicated with the SQL Server database of the city Information Center, in time for the latest
The tax data is transferred to the Urban Bureau Information Center. So in the City Bureau Information Center, there will be all the tax collection and management data. The Information Center builds intranet Web sites,
The user's computer can access the IIS server using IE browser as long as it is connected to the LAN of each branch station, or to the RAS server of the Information center through modem dialing,
The IIS server sends a data request to the SQL Server server and returns the results to the user's browser in HTML format.

The design of the system adopts customer browsing/server multi-layer structure, and the client adopts standard browser access mode. The user installs IE on his or her own computer
The browser (WIN98, WINNT4.0 has built-in IE4 browser), enter the Web site address, you can access the system.

In the middle tier, the Internet Information Server accomplishes most of the user business rules, such as restrictions on user rights, data request
Generation and so on. IIS generates different pages based on the different permissions that different users have. When business rules change in the future, you only need to be on the Internet
The changes are made on the information server, and this change is reflected in the next time the system is accessed by each user of the system.

At the lowest level of the system, it is supported by a large SQL Server relational database. Xi ' an city's tax data is a GB-per-unit large database, NA
Tax data tables in more than millions of data records, in such a large number of data for rapid data retrieval and data analysis, the general database software is difficult to complete
Into the. SQL Server relational database, it is better to complete the users of various data retrieval requirements, but also is the use of SQL Server data recovery
Technology, the data of nearly 20 branches are copied to the information center through WAN, so that users can get the tax information quickly.

An important part of the system is the duplication of the tax collection and management data of each branch. SQL Server replication has the following features:

1. Transaction-based

Transactions with replication flags are read from the transaction log of the source database and sent to the destination database. For example, an INSERT statement was executed to make the data in the source database
Change, this INSERT statement is written to a database, which we call the publishing database (Distrbution). At the right time, this insert
Statement is sent to the replicated destination database and executes the INSERT statement to ensure that the data on both ends is synchronized.

2. Use the server in three different ways

(1) Publishing (publication): As the source data server (in this system is the branch of the SQL Server server), when the flag copied table
(Table) When the data changes, the transaction that produces the change is engraved with a duplicate flag, which is sent by the Log Reader (LogReader) to the distribution database
(Distribution Database). These transactions are hosted in the distribution database until they are delivered to the destination server.

(2) Distribution (distribution): contains a distribution database. The Distributor accepts the publication server's change transaction and saves it to the distribution database. In
When appropriate, send these transactions to the subscriber. By default, the Distributor is the same computer as the publishing server, but it can also be different
Computer.

(3) Subscription (Subscription): Accept publication data from the publisher.

3. Use a dedicated SQL Server database as a reliable queue for replicating data

SQL Server replication uses a "loosely consistent" replication model characterized by the fact that the source and copy data are not identical at all times,
There is a delay between them. Changes in the source database are not immediately reflected in the subscription database, but instead are written to a specialized distribution data for the changed transaction
Couchen (distribution). When these transactions accumulate to a set value, transactions are sent by the transaction distributor (distribution Task) to the Subscriber. Booking
Read the server and then perform these transactions to keep the source data consistent with the copy data. The distribution database completes a buffering effect when a network or other problem makes the complex
When the system cannot be completed, the change of the source data will be kept in the distribution database until the problem is resolved and all saved transactions are automatically sent out, and the number of the two ends is copied
It remains consistent. This approach makes the replication of SQL Server data extremely adaptive and reliable, minimizing user intervention.

The main components of SQL Server replication are log reader, initial sync (synchornization), data distribution
(distribution) and the distribution database.

Initial synchronization is the first step in which replication really starts. Similar to taking a snapshot of the source database before starting the transfer transaction, the moment before the transfer transaction
The engraved data is copied to the subscription database. The essence of that time is the data generated bcp file, through the network upload to the subscription database. When the initial sync is complete, open again
The log reads the transaction after the initial synchronization, and the data distribution process transfers transactions from the distribution database to the Subscriber.

Implementation of replication

Below we combine in XI ' an City National Tax Bureau Internal Network Information System Instance, introduces the SQL Server6.5 configuration as well as the matter which needs to pay attention to.

Pre-Replication Preparation

(1) Use a character set: The configuration of the SQL Server for the municipal and branch offices uses a character set.

(2) Allow the branch server to have sufficient transaction log space.

(3) It is best to set up the MSSQL service and SQLExecutive service Auto Start at Boot time.

Replication Installation

(1) Run Microsoft SQL Enterprise Manage and select the source data server in the Server Manager window.

(2) Select Replication Configuration from the Server menu and select Install Publishing from the submenu that appears. Appear install
Replication Publishing dialog box.

(3) Select the Local-install New local distribution database option (Configure the Server to act as its own distributor).

When establishing a distribution database device for the distribution database, estimate the size of the distribution data device based on the update rate of the source data, and try to build the distribution database as large as possible
Some.

Setting up publishing and subscribing servers and databases

In the previous installation distribution database, if you were asked whether to select Yes in this Configuration Server Options dialog box, the
Replication-publishing dialog box. Select the subscribed server in the Enable Publishin to these Servers (left list) directory, in the actual
example, the SQL Server for the Information center. Under Publishing Database (list on the right), select the databases that are allowed to publish data.

Return to server Manager, select Subscriber, select Replication Configuration from the Server menu, and select from the Drop-down menu that appears
Subscribing, the Replication-subscribing dialog box appears. Select the source data SQL Server in the Publishing server box on the left, and on the right
box to select a copy of the database.

Copy Settings for data

After you install SQL Server to replicate and set up replicated servers and databases, the publication is set up to specify that those tables in the replicated database are to be reused
System, a publication can include many tables.

From the Server Manager window, select the source data server.

From the Manage menu, select Replication, and then select the Publication dialog box.

Select ' NEW ' to appear in the Edit Publications dialog box. Set a name for the replicated data, and select the table to copy to the right of the list.

This is noted here, because we want to copy data from multiple source databases to a central server database, make sure that the source data
will not affect each other. The copied table should have the source of the field definition data.

In the Edit Publications dialog box, select the table you want to copy from the list on the right, press the "Edit" button, and enter the Manage Article dialog box, at the bottom
restriction clause, enter the criteria for identifying unique source data.

In the Manage Article dialog box, select the scripts tag. Press the "Generate" button to display the Mode source Settings screen. You need to set the initial sync here
of the model. When copying data from multiple source databases to a central database, it is important to select "DELETE Data Using restriction Clause".
That is, the data is manipulated with the preceding unique identification condition. Otherwise, replication data from different data sources can affect each other.

Development experience

SQL Server replication is designed to be implemented with minimal user intervention. Once the setup is complete, the user is basically not required to do the processing. Although the initial synchronization is
SQL Server is automated, but in practical applications, WAN connections (DDN, X.25) often do not complete this synchronization, so we need to manually complete the initial
Synchronization, such as the source database bcp data export, and then import into the subscription database, select the initial synchronization mode is no Data synchronization.

SQL Server replication is transaction based. In the case of a wide area network connection, do not update a lot of records, such as a
No conditional UPDATE statement, which may affect 1000 records in the source database, but in transaction routing, it emits 1000 update statements and acts as a
, it is easy to cause replication failure. At this point, it is best to decompose the updated operations very small, and each UPDATE statement is routed as a separate transaction.

Increase the size of the distribution database appropriately to ensure that the distribution database has enough space to hold the transactions of the update operation in a different network.

Clear the distribution database in a timely manner. After replication is set up, SQL Server generates a task to purge the distribution database, which distributes the distribution database to subscriptions
Transaction deletion of the server. However, by default, the purge task is 105 every night, if your SQL Server servers are not 24-hour boot, pay attention to
The task is adjusted to be performed within the boot time range.

For large database data query retrieval of SQL statements, especially for the use of queries with group class totals and multiple table connections, be sure to note
Meaning As far as possible to avoid the group after the redundant two fields of query, the need to group groups by multiple fields query divided into several times, using the middle table to complete the query. Appropriate
Increase the size of the Temdb database, use a separate server as the SQL Server server, and allocate as much memory as possible to SQL Server. For example in 128M
Memory configuration, allocate 100M of memory to SQL Server.

In Microsoft's newly launched SQL Server 7, the function of data query is greatly enhanced, in many aspects, such as multiple table joins, aggregate grouping, etc.
High. Data replication in SQL Server6.5 is one-way, and only the data from the source database is updated to the subscription database. In the new version of SQL Server7, duplicate
System has achieved bidirectional replication, any one end of the data changes, will be reflected at the other end. Replication in SQL Server7, publishing, and subscriber servers can be
Using a different code page, this is less restrictive than the SQL Server6.5.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Implementing distributed data Replication with SQL

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Implementing distributed data Replication with SQL

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support