Big Data efficient copy processing case analysis summary, copy case analysis summary

Source: Internet
Author: User
Tags field table

Big Data efficient copy processing case analysis summary, copy case analysis summary

An old customer wants to quickly copy data from a table in SQL Server to the SQLite database for regular backup. There are about more than 0.5 million records in the data table, the table has about 100 fields. In addition to the hope of rapid improvement, the efficiency is the first. His own test is always more than an hour or two. When the customer puts forward such a demand, I think it is definitely not a good use of the features of transactions, otherwise the speed should be much faster, but the specific degree is approaching, I am not sure. According to this requirement, such a large table data copy is used as a case study. In the end, the big data copy and update process can be completed in less than 20 minutes. This article describes how to process this requirement based on actual development needs to achieve fast and efficient data replication, and provides relevant implementation ideas and code for reference.

1. Data Replication requirements and development ideas

Because customers need to perform regular data backup, such replication is performed, so big data replication efficiency is certainly very important and should be completed as soon as possible. There are about more than 0.5 million records in the data table, and about 100 fields in the table need more data than in the conventional table. Therefore, a good test is required, based on this background, we use a test case to test the performance.

In this way, it is very tiring to manually match data fields in a table with multiple fields. Therefore, we use the code generation tool Database2Sharp for rapid development, in this way, we don't need to pay too much attention to the underlying processing, and we can generate different data access layers for different data processing.

At the underlying layer, we mainly adopt the database access module of Microsoft's Enterprise Library. Therefore, it can well abstract various database transactions to adapt to the transaction processing of different databases. Using Microsoft's Enterprise Library module, you can well support databases such as SQLSever, Oracle, Mysql, Access, and SQLite.

Development Framework and Common hierarchical modes, which can be divided into the UI Layer, BLL layer, DAL layer, IDAL layer, Entity layer, and public class library layer.

The base class of the Framework encapsulates a large number of universal processing functions, including the base class of the data access layer and business logic layer. All the base class functions basically haveDbTransaction trans = nullThe definition is that we can use transactions, or by default we can not use transactions, is an optional transaction parameter.

The following figure shows the implementation of the data access interface and SQLServer-based Data Access class.

In the data access layer of the most advanced abstract base class AbstractBaseDAL, there are a large number of interfaces available for data operations and related transactions. Therefore, we inherit the child classes at the underlying layer, if we are dealing with data addition, deletion, modification, query, and other operations, we basically do not need to do any extended code, which is very suitable for our rapid development purpose.

In the entire data access layer of the framework, we have defined many common interfaces with transaction parameters. If we use transactions in conventional data processing, it is also very convenient. Using batch transaction processing is very efficient for SQLite operations. Specifically, you can use transactions to operate SQLite data for batch insertion in my previous articles, the Source Code explains how to increase the speed of batch data writing. There is a big gap in the processing efficiency between them.

 

2. Use the code generation tool to generate the required code

As mentioned above, the development of such a data replication processing program, such a multi-field table, the one-to-one correspondence of data fields, must be manual very tired, therefore, we use the code generation tool Database2Sharp for rapid development.

Therefore, you can use the code generation tool to quickly generate the required code. After you expand the database, right-click [code generation] [Enterprise Library code generation] to generate the code at the standard interface layer. Because our entire case is not a standard data replication process, you do not need to use the code generation tool to generate the Winform interface.

Step-by-Step operations on generating code. Finally, confirm that the relevant underlying code can be generated.

Finally, we generate the project code at the BLL, DAL, IDAL, and Entity layers. The Inheritance relationships of the entire project code have been processed, it also has basic operations such as addition, deletion, modification, and query of the base class.

We have two replication operations for different databases. The key is to generate two code for the classification classes of different databases (that is, after a standard SQLServer is generated, copy the code, modify the inheritance base class). The following code is the code of the two data pipeline classes, which can meet the needs of the current project without adding any interfaces.

The final project structure is as follows.

 

3. Winform interface code logic for Data Replication

To facilitate the progress display of the entire replication process (very important), we designed the progress bar and text content to display the progress and time consumption of the processing process. The final interface design is as follows.

The whole interface design uses the background thread to process data replication, so that the progress can be displayed in time without blocking the interface thread.

The specific interface code is as follows.

Public partial class FrmMain: Form {private TimeSpan ExecuteTime; private int currentCount = 0; private BackgroundWorker work = new BackgroundWorker (); // use the background thread for processing, public FrmMain () {InitializeComponent (); // defines the processing work of background threads. doWork + = work_DoWork; work. workerReportsProgress = true; work. progressChanged + = work_ProgressChanged; work. runWorkerCompleted + = work_RunWorkerCompleted;} // notification completion after the thread is completed Bundle void work_RunWorkerCompleted (object sender, RunWorkerCompletedEventArgs e) {this. toolStripProgressBar1.Value = 100; this. toolStripProgressBar1.Visible = false; MessageUtil. showTips ("Operation completed"); ShowMessage (this. toolStripProgressBar1.Value ); // complete} /// <summary> /// display text information on the page /// </summary> /// <param name = "percent"> completion percentage </ param> private void ShowMessage (int percent) {if (this. executeTime! = Null) {this. lblTips. text = string. format ("[current number of completions: {0}, percentage of completions: {1}, execution time: {2} millisecond | {3} minutes {4} seconds]", this. currentCount, percent, this. executeTime. totalMilliseconds, this. executeTime. minutes, this. executeTime. seconds); }}/// <summary> // displays the related quantity, time consumption, and other content when reporting the progress. /// </summary> void work_ProgressChanged (object sender, progressChangedEventArgs e) {this. toolStripProgressBar1.Value = e. progressPercentage; this. statusStrip1.Refresh (); ShowMessage (e. progressPercentage);} // <summary> // logic code executed by the background thread /// </summary> void work_DoWork (object sender, DoWorkEventArgs e ){CopyDataUtil util= New CopyDataUtil ();// Use the Lamda expression of an Action to execute the notification interface to process util. start (percent, ts, current) => {work. reportProgress (percent); this. executeTime = ts; this. currentCount = current;});} private void btnCopy_Click (object sender, EventArgs e) {if (! Work. isBusy) {// if you need to use a blank database for testing each time, delete the old database first, and then copy the backup to the past. string dbfile = Path. combine (Environment. currentDirectory, "localdb. db "); string bakfile = Path. combine (Environment. currentDirectory, "db. db "); if (this. chkCopyEmptyDb. checked & File. exists (dbfile) {File. delete (dbfile); File. copy (bakfile, dbfile, true);} // displays the progress bar and asynchronously executes the thread this. toolStripProgressBar1.Visible = true; work. runWorkerAsync () ;}} pri Vate void FrmMain_FormClosing (object sender, FormClosingEventArgs e) {// events related to canceling registration to prevent exceptions when exiting if (work! = Null & work. isBusy) {work. progressChanged-= work_ProgressChanged; // cancel the notification event work. runWorkerCompleted-= work_RunWorkerCompleted; // cancel the task. dispose ();}}}

In the above form interface code, the most critical code is the processing logic of the specific background process, as shown in the following code.

/// <Summary> /// logic code executed by the background thread /// </summary> void work_DoWork (object sender, DoWorkEventArgs e ){CopyDataUtil util= New CopyDataUtil ();// Use the Lamda expression of an Action to execute the notification interface to process util. start (percent, ts, current) => {work. reportProgress (percent); this. executeTime = ts; this. currentCount = current ;});}

For convenience, the above processing logic places the copied content of the data into a helper class, and passes in the Start method of the helper class the Action processing function of the interface notification, so that we canCopyDataUtilMessage notification can be made at any time during processing.

The Start Method of Data replication is defined as follows.

/// <Summary> /// Start to execute the value assignment /// </summary> public void Start (Action <int, TimeSpan, int> doFunc) {StartTime = DateTime. now; // start time InternalCopry (doFunc); // process the data replication logic and execute the external function EndTime = DateTime. now; // time end}

Entire helper classCopyDataUtilThe class defines two objects of different database types to facilitate database assignment and operation, and defines the start time and end time, so as to collect the total time consumption information, as shown in the following code.

/// <Summary> /// processing class of the copied data /// </summary> public class CopyDataUtil {// use a timer to time the operation record private DateTime StartTime, endTime; // private Database Table object private ProductSqlServer SQLServer sqlserver = null; // SQLite data table object private ProductSqlite sqlite = null; public CopyDataUtil () {// build object, and specify the database configuration item SQLServer = new ProductSqlServer (); sqlserver. dbConfigName = "sqlserver"; // construct the object and specify the SQLite database configuration item sqlite = new ProductSqlite (); sqlite. dbConfigName = "sqlite ";}

The entire logic of data replication is mainly based on transactional processing. According to paging rules, data is retrieved from SQLServer in batches based on a certain amount of data each time, and then inserted into the SQLite database, transactions allow SQLite to write data very efficiently and quickly. The specific code is as follows.

/// <Summary> /// processing logic of big data replication /// </summary> /// <param name = "doFunc"> externally called function </param> private void InternalCopry (Action <int, timeSpan, int> doFunc) {// set the primary key and specify the number of pages to improve the search efficiency. string primaryKey = "h_id"; int pageSize = 1000; PagerInfo info = new PagerInfo () {PageSize = pageSize, CurrenetPageIndex = 1}; // retrieve the total number of pages: int totalPageCount = 1; int totalCount = sqlserver. getRecordCount (); if (totalCount % PageSize = 0) {totalPageCount = totalCount/pageSize;} else {totalPageCount = totalCount/pageSize + 1;} totalPageCount = (totalPageCount <1 )? 1: totalPageCount; // write SQLite data using transactions to Improve Execution response efficiency DbTransaction trans = sqlite. CreateTransaction (); if (trans! = Null) {// retrieves data from the specified page number based on the number of pages in turn for (int I = 1; I <= totalPageCount; I ++) {info. currenetPageIndex = I; // sets the current page and retrieves int j = 1; List <ProductInfo> list = sqlserver. findWithPager ("1 = 1", info, primaryKey, false); foreach (ProductInfo entity in list) {// get the current quantity and progress percentage int current = (I-1) * pageSize + j; int percent = GetPercent (totalCount, current); // the time consumed by the computing program. When an external function is executed, the TimeSpan ts = DateTime is notified. now -StartTime; doFunc (percent, ts, current); // execute notification processing // if no primary key record exists, write it; otherwise, update if (! Sqlite. isExistKey (primaryKey, entity. h_id, trans) {sqlite. insert (entity, trans);} else {sqlite. update (entity, entity. h_id, trans);} j ++;} trans. commit ();}}

So far, the code of the entire project has been basically introduced. Throughout the entire replication process, more than 0.5 million of the data in a single table and about 100 fields are tested. The replication is completed in less than 20 minutes on the development machine, it is indeed a good result. If you change it to the server environment for Specific Replication processing, the speed will certainly increase a lot.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.