Compare and test the efficiency of inserting massive data into the database, and compare and test massive data volumes
Abstract: Using. NET related technologies to insert massive data into databases is a common operation. This article compares ADO. NET and LINQ, and uses SqlBulkCopy () and InsertAllOnSubmit () Methods to perform operations. It is concluded that the insertion efficiency of the same insert workload (200 pieces of data) is times different!
Test scenario:
Prepare two databases, TestDb and TestDb2, with the table T_Users. Shows the table structure:
SqlBulkCopy () is inserted as follows:
private static Stopwatch InsertUsingSqlBulkCopy(List<UserInMemory> list) { Stopwatch stopWatch = new Stopwatch(); stopWatch.Start(); DataTable dt = new DataTable(); dt.Columns.Add("ID"); dt.Columns.Add("UserName"); for (int i = 0; i < list.Count; i++) { DataRow row = dt.NewRow(); row["ID"] = list[i].ID; row["UserName"] = list[i].UserName; dt.Rows.Add(row); } using (SqlConnection con = new SqlConnection(connStr2)) { con.Open(); using (SqlBulkCopy bulkCopy = new SqlBulkCopy(con)) { try { bulkCopy.DestinationTableName = "dbo.T_UserName"; bulkCopy.ColumnMappings.Add("ID", "ID"); bulkCopy.ColumnMappings.Add("UserName", "UserName"); bulkCopy.WriteToServer(dt); } catch (Exception ex) { Console.WriteLine(ex.ToString()); } finally { con.Close(); } } } stopWatch.Stop(); return stopWatch; }
The following figure shows how to insert the data into the table:
private static Stopwatch InsertUsingLINQ(List<T_User> _list) { Stopwatch stopWatch = new Stopwatch(); stopWatch.Start(); DataClasses1DataContext dataContext = new DataClasses1DataContext(); dataContext.T_Users.InsertAllOnSubmit(_list); dataContext.SubmitChanges(); stopWatch.Stop(); return stopWatch; }
Use the preceding code to import 10 thousand User data to the database.
The result is as follows:
Conclusion:
1. The SqlBulkCopy () method under ADO. NET is a 266-fold difference in the insert efficiency of the InsertAllOnSubmit () method under linq to SQL!
2. convenient use, short code, low learning costs, and elegant language expression. However, the ADO. NET method is recommended for operations related to massive data volumes that require high efficiency.
Related Source: http://download.csdn.net/detail/fanrong1985/8130953
How can we prompt efficiency when analyzing and summarizing massive data?
I have seen that a multinational company uses preprocessing to obtain the desired data after the customer sets the time period, the system generally runs three hours ahead of schedule, so the data is amazing.
In BI, data compression, interval, table sharding, and re-aggregation are basically used.
For example, the sales log -- is compressed into the sales records of a product within one day.
In this way, 10 thousand items are 10 thousand rows per day. The maximum value for the year is about 3.66 million, which is five seconds for the database.
Therefore, it is the principle to compress the data into the minimum unit of measurement for the query and aggregate the data into the maximum data output.
As for caching, data drilling is only available after data is processed.
Essentially, it is dedicated to table and analysis.
Database insertion Efficiency
It is faster to insert all data at a time. It is best to generate a file and then import it as a file, but insert it as a single-pick record, every time a database inserts a record, it performs Syntax Parsing, execution plan generation, and statement execution. Therefore, the efficiency is very low, and the small data volume is not obvious, and the large data volume is very obvious.