The SQL Server replication divides large transactions into small transaction distribution methods.
In SQL server replication, when a large transaction is executed in the published database, for example, 100,000 or more data is operated at a time. After the operation data is executed in the published database, the Log Reader Agent scans the transaction logs and transmits them to the distributed database at one time. If the previous transaction is not completed and multiple transactions are executed consecutively, the Log Reader Agent will scan the logs and send multiple transactions to the distribution database at the same time. By default, a maximum of 500 transactions can be scanned. If millions or tens of millions of data records are executed multiple times, it will be blocked for a long time.
The Log Reader Agent can be configured to divide large transactions into multiple small transactions for delivery to the distribution database, and the distribution queue is distributed to the subscription database according to the small transactions, so that data is synchronized quickly!
Before modifying the proxy parameters, I inserted 0.3 million of the data to the publishing table once. After the insertion is complete, the monitoring data is published to the distribution as follows:
As you can see, the commands of this transaction have to be delivered once before they can be distributed, which consumes time. Here, waiting too long affects the real-time nature of the transaction.
If there are other transactions, the default 500 (reference parameter:-ReadBatchSize) will also be passed together, which takes a long time.
Now you can change the parameters and dispatch the scan to around 1000. You need to set the following parameters:
-MaxCmdsInTran number_of_commands
Note: This parameter can only be added to the Log Reader Agent. this parameter is not set in the agent configuration file.
Restart the Log Reader Agent.
Insert 0.3 million of data again !~ View on monitor
We can see that the command is distributed around 1000. Now we can view the subscribed database and synchronize the data, which saves the time for more scanning commands.
For more details, see the number of commands for each transaction as follows:
SELECT top 10 A.xact_seqno,A.entry_time,COUNT(*) AS cmdsFROM distribution.dbo.MSrepl_transactions A(NOLOCK) INNER JOIN distribution.dbo.MSrepl_commands B(NOLOCK) ON A.xact_seqno=B.xact_seqnoGROUP BY A.xact_seqno,A.entry_timeORDER BY cmds DESC
Although this parameter is good, it may also cause data consistency.
For example:
A batch of data is released and updated, but the subscription query is different.
Distribution transactions encounter conflicts or deadlocks, which also results in inconsistent data.
Reference: copy the Log Reader Agent