C # Notes (7)-A Preliminary Study on the Efficiency of sqlserver's massive data insertion

Last Update:2018-12-05 Source: Internet

Author: User

Tags try catch sql server express intel pentium

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Note: This test is a non-professional test conducted by non-professionals. It does not guarantee the rationality of the test and the correctness of the results. It is only for entertainment.

I have nothing to do. I want to try's efficiency. I will insert 1 million records, as shown in the following figure:

Hardware configuration:
Lenovo series computers
Intel Pentium (R) Dual E2180 @ 2.00 GHz
2.00 GHz, GB memory

Software Environment:
Windows XP Professional SP2
VS2005
SQL Server 2005 Express (VS2005 learning Edition)

Test code:
The database connection has been processed. Here, you are only responsible for data insertion and deletion.
Private void TestHugeData_Write () // test inserting large amounts of data
{
String test_ SQL;
DateTime begin = DateTime. Now;
Int I = 0;
For (; I <1000000; I ++)
{
Test_ SQL = "insert into employee information values (" + I + ", 'wang dashuai ', 'male', 333, 'International Youth Political College', 'doctorate ', 'Human transformation learn',-4-3, 'confidential, without disclosure ', 'all things, how can be done?', 'I have not been married, maybe not as good as anyone? ')";
// ExeSQL (test_ SQL );
// You do not need to call a function. Instead, you can focus on SQL statement execution to improve efficiency.
// It has been proved by practice that this can save most of the time. When the exesql function is used for execution, the efficiency is extremely low. It is estimated that it may not be completed in several hours. If it is executed directly, it will be completed in less than one minute.
Cmd. commandtext = test_ SQL;
Cmd. Connection = connection;
Cmd. executenonquery ();
}
Datetime end = datetime. now;
// Display data
Viewdata (showallstring );
Datetime show = datetime. now;
MessageBox. show ("insert" + I + "records in total, use" + (end-begin) + ", \ r \ n display time" + (show-end ), "execution completed! ");
}

Private void delalldata () // delete all data
{
Datetime begin = datetime. now;
String strdelall = "delete from employee information where e_id between-2147483648 and 2147483647 ";
// The Int field indicates the range, that is, deleting all records. In terms of efficiency, this may be better than reading a record and then deleting it.
// ExeSQL (strdelAll );
Cmd. CommandText = strdelAll;
Cmd. CommandTimeout = 99999999;
// If a large amount of data needs to be deleted, deletion may fail if the default 30-second timeout is not enough.
Cmd. Connection = connection;
Cmd. ExecuteNonQuery ();
DateTime end = DateTime. Now;
MessageBox. Show ("when all records are deleted," + (end-begin), "after execution! ");
}

The table design in the database is as follows:
E_id int
E_name varchar (12)
E_sex char (2)
E_age smallint
E_school varchar (50)
E_education varchar (10)
E_major varchar (30)
E_workdate datetime
E_studyExp text
E_workExp text

E_skill text

When executing an SQL statement, we used to call another ExeSQL function that specifically executes the SQL statement for execution. However, we found that this is extremely efficient, because most of the time is wasted on executing try catch in ExeSQL, the Code is as follows:

Try
{
// MessageBox. Show (strSql );
Cmd. CommandText = strSql;
Cmd. Connection = connection;
Int count = cmd. ExecuteNonQuery ();
ViewData (showAllString );
Return true;
}
Catch (SqlException sqlExcep)
{
MessageBox. Show (sqlExcep. Message );
Return false;
}
In addition, ViewData must be re-called for display when an SQL statement is executed, which also takes a lot of time. In addition, function calls and value return are required, the final result is that it takes several seconds to insert a record-this is intolerable. Therefore, you can directly execute SQL statements and put the main time overhead on executing SQL statements, which significantly improves the efficiency. The execution result of the program is as follows:

It can be seen that it takes 11 minutes and 33 seconds for the program to insert 1 million records, which is relatively efficient. However, it took nearly 25 minutes to display the data, which is twice as much as the insertion process. This is the result displayed using the datagridview, and there may be no way to improve it.
During the insertion process, you can see in the task manager that the first half of the time is mainly because the servr process consumes resources, while the "Employee management" process basically does not consume any resources, when the insertion is complete for display, the situation is completely reversed.

The code in TestHugeData_Write can also be improved:
For (; I <1000000; I ++)
{
......
Cmd. CommandText = test_ SQL;
Cmd. Connection = connection;
Cmd. ExecuteNonQuery ();
}
Cmd. Connection = connection; one sentence can be moved out of the loop and the efficiency is determined:

It seems that there is no improvement.

The deletion process is as follows (delete the above 1 million records ):

It can be seen that deletion is much faster than insertion.

Finally, the query:
Cmd. Connection = connection;
DateTime begin = DateTime. Now;
Adapter. SelectCommand = cmd;
Table. Locale = System. Globalization. CultureInfo. InvariantCulture;
Table. Clear (); // Clear the original data and prepare to load new data. If this process is not performed, the line data will be added to the original data
Adapter. Fill (table );
DateTime end = DateTime. Now;
MessageBox. Show ("query time" + (end-begin), "execution completed! ");
Bindingsource. DataSource = table;
DataGridView1.DataSource = bindingsource;

Select * from employee info. where e_id = 823154

The SQL Server version may affect the efficiency. The following is the SQL Server product specification.

Maximum memory supported by SQL Server 2005 (32-bit) (64-bit)

Enterprise Edition maximum operating system memory
Developer Edition maximum operating system memory 32 TB
Standard Edition maximum operating system memory 32 TB
Workgroup Edition 3 GB not applicable
SQL Server Express Edition 1 GB not applicable
Evaluation Edition maximum operating system memory

Number of processors supported by SQL Server 2005 (32-bit) (64-bit)
Enterprise Edition maximum operating system memory 1 Maximum operating system memory 1
Developer Edition 32 64
Standard Edition 4 4
Workgroup Edition 2 is not applicable
SQL Server express Edition 1 is not applicable
Evaluation edition maximum operating system memory

We can see that Express Edition does not support multi-processor (of course, dual-core and multi-processor may not be the same). It only supports 1 GB of memory, so that the hardware does not fully play a role.

The above is a test of 1 million record sets, with insufficient heart. Continue the test: Insert 10 million records.

After a long wait, although there is no problem with insertion, the datagridview display fails, as shown below:

All code and environment are the same as above (except insert 1 million to insert 10 million), and the final database file obtained is 3.7 GB,

The log file is 5.3 GB, as shown below:

However, when the dataset is displayed and transmitted to the datagridview, the final failure is caused by memory allocation failure:

Try
{
Cmd. CommandText = strView;
Cmd. Connection = connection;
Cmd. CommandTimeout = 99999999;
DateTime begin = DateTime. Now;
Adapter. SelectCommand = cmd;
Table. Locale = System. Globalization. CultureInfo. InvariantCulture;
Table. Clear (); // clear the original data and prepare to load new data. If this process is not performed, the line data will be added to the original data
Adapter. Fill (table );
Datetime end = datetime. now;
// MessageBox. Show ("query time" + (end-begin), "execution completed! ");
Bindingsource. datasource = table;
Datagridview1.datasource = bindingsource;
}
Specifically, it is executed to Adapter. Fill (table). A memory allocation failure causes outofmemoryexception:

At the same time, the program consumes more than 1.5 GB of memory:

The details are as follows:

System. outofmemoryexception not processed
Message = "exception type:" system. outofmemoryexception. "
Source = "system. Windows. Forms"
Stacktrace:
In system. Windows. Forms. featuresupport. ispresent (Object feature)
In system. Windows. Forms. application. get_visualstylestate ()
In system. Windows. Forms. visualstyles. visualstylerenderer. get_issupported ()
In system. Windows. Forms. visualstyles. visualstylerenderer. get_handle ()
In system. Windows. Forms. visualstyles. visualstylerenderer. drawbackground (idevicecontext DC, rectangle bounds, rectangle cliprectangle)
In system. Windows. Forms. Maid. drawheader (Graphics g, rectangle bounds, int32 headerstate)
In system. windows. forms. datagridviewcolumnheadercell. paintprivate (Graphics g, rectangle clipbounds, rectangle cellbounds, int32 rowindex, datagridviewelementstates datagridviewelementstate, object formattedvalue, datagridviewcellstyle cellstyle, datagridviewadvancedborderstyle advancedborderstyle, datagridviewpaintparts paintparts, Boolean paint)
In System. windows. forms. dataGridViewColumnHeaderCell. paint (Graphics graphics, Rectangle clipBounds, Rectangle cellBounds, Int32 rowIndex, delimiter, Object value, Object formattedValue, String errorText, DataGridViewCellStyle cellStyle, advanadvancedborderstyle, paintParts)
In System. Windows. Forms. DataGridViewCell. PaintWork (Graphics graphics, Rectangle clipBounds, Rectangle cellBounds, Int32 rowIndex, export cellState, DataGridViewCellStyle cellStyle, DataGridViewAdvancedBorderStyle advancedBorderStyle, paintParts)
In System. Windows. Forms. DataGridView. PaintColumnHeaders (Graphics g, Rectangle clipBounds, Boolean singleBorderAdded)
In System. Windows. Forms. DataGridView. PaintGrid (Graphics g, Rectangle gridBounds, Rectangle clipRect, Boolean singleVerticalBorderAdded, Boolean singleHorizontalBorderAdded)
In System. Windows. Forms. DataGridView. OnPaint (PaintEventArgs e)
In System. Windows. Forms. Control. PaintWithErrorHandling (PaintEventArgs e, Int16 layer, Boolean disposeEventArgs)
In System. Windows. Forms. Control. WmPaint (Message & m)
In System. Windows. Forms. Control. WndProc (Message & m)
In System. Windows. Forms. DataGridView. WndProc (Message & m)
In System. Windows. Forms. Control. ControlNativeWindow. OnMessage (Message & m)
In System. Windows. Forms. Control. ControlNativeWindow. WndProc (Message & m)
In System. Windows. Forms. NativeWindow. DebuggableCallback (IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)
In System. Windows. Forms. UnsafeNativeMethods. DispatchMessageW (MSG & msg)
In System. Windows. Forms. Application. ComponentManager. System. Windows. Forms. UnsafeNativeMethods. IMsoComponentManager. FPushMessageLoop (Int32 dwComponentID, Int32 reason, Int32 pvLoopData)
In System. Windows. Forms. Application. ThreadContext. RunMessageLoopInner (Int32 reason, ApplicationContext context)
In System. Windows. Forms. Application. ThreadContext. RunMessageLoop (Int32 reason, ApplicationContext context)
In System. Windows. Forms. Application. Run (Form mainForm)
In employee management. Program. Main () Location D: \ job \ employee management 1.3.1 \ employee management \ Program. cs: row number 17
In System. AppDomain. nExecuteAssembly (Assembly assembly, String [] args)
In System. AppDomain. ExecuteAssembly (String assemblyFile, Evidence assemblySecurity, String [] args)
In Microsoft. VisualStudio. HostingProcess. HostProc. RunUsersAssembly ()
In System. Threading. ThreadHelper. ThreadStart_Context (Object state)
In System. Threading. ExecutionContext. Run (ExecutionContext executionContext, ContextCallback callback, Object state)
In System. Threading. ThreadHelper. ThreadStart ()

The System. data. comman. dbdataapter. fill functions are described as follows:

The Fill method uses the SELECT statement specified by the associated SelectCommand attribute to retrieve rows from the data source. The connection object associated with the SELECT statement must be valid, but you do not need to open it. If the connection is closed before Fill is called, open it to retrieve data, and then close it. If the connection has been opened before Fill is called, it will remain open.

Then, the Fill operation adds rows to the target able object in the DataSet. If the DataTable object does not exist, these objects are created. When creating a able object, the Fill operation usually only creates column name metadata. However, if the MissingSchemaAction attribute is set to AddWithKey, an appropriate primary key and constraint will be created.
If you encounter a duplicate DbDataAdapter column when filling the able, the columns following the column are named in the order of columnname1, columnname2, and columnname3. If the incoming data contains unnamed columns, they are placed in DataSet in the "Column1" and "Column2" modes. Add multiple result sets to DataSet. Each result set is placed in a separate table.
The reload of Fill with DataTable as the parameter obtains only the first result. Multiple results are obtained by using the Fill overload with DataSet as the parameter.
The Fill method supports the following scenarios: DataSet contains multiple able objects whose names are case-insensitive. In this case, Fill performs a case-sensitive comparison to find the corresponding table. If no matching table exists, create a new one.

The estimated performance is now the upper limit.
Although not all can be displayed, You can query and delete them and see how efficient they are:
Select * from employee info. where e_id = 8888888

It seems to be quite fast.
Delete:

It is 10 times the size of 1 million entries.

In addition, you can findWhen reading big data, it is usually necessary to read data by page.It is a pity that you have no time to study it carefully.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More