Advanced internal information ].. NET data batch Write Performance Analysis article 2 in the previous article, we have already talked about some advance knowledge, so from the beginning of this article, we began to formally study the performance of batch insert. First, let's take a look at the things we mainly test. Because our articles in this series study the performance of SqlBulkCopy and SSIS, We will compare them in detail. For SqlBulkCopy, we are mainly interested in the following attributes, because these attributes have a great impact on performance:Table locking: During batch data insertion, an exclusive lock is often created on the table to which data is inserted. On the one hand, this lock makes data insertion faster. On the other hand, other operations such as reading the table are waiting. We will use many different scenarios to test this attribute, so that you can have a clear understanding.BatchSize: This parameter mainly defines the number of data entries inserted in batches each time. We found that there is no document describing the impact of this value on performance, so we will also study this attribute value.Use Internal Transaction: In fact, many people often have incorrect understanding of SqlBulkCopy transactions. Through a lot of tests and practices, we found that if we do not need transactions when inserting data in SqlBulkCopy, the speed will be very fast. Of course, this speed is at the cost of other features. In addition to some of the above attributes, we will also test the problems related to batch data import, as shown below:OK. Now let's prepare the test environment. First, create a simple table with six fields, as shown in the following code: 650) this. width = 650; "border =" 0 "alt =" 20120406092602.png" src =" http://www.agilesharp.com/Services/BlogAttachment.ashx?AttachmentID=210 "Width =" 550 "/> obviously, the data size of each row in the table is about 320 bytes. During the test, we will fill the table with different data volumes, which will be from 60 to 6 million, it also checks whether there is a linear relationship between the relevant performance data. To test the clustered index table and heap table, we also create an index as needed, as shown in the following code: 650) this. width = 650; "border =" 0 "alt =" 20120406092634.png" src =" http://www.agilesharp.com/Services/BlogAttachment.ashx?AttachmentID=211 "Width =" 550 "/> when we test a scenario, the table is destroyed from the database and then rebuilt. The data in the inserted Table is randomly generated. In addition, two tests based on random values are also common: 1. the values of each row produced by each Producer, that is, the data Producer, are the same. The values of each field must be random. May be as follows: 650) this. width = 650; "border =" 0 "alt =" 20120406092718.png" src =" http://www.agilesharp.com/Services/BlogAttachment.ashx?AttachmentID=212 "Width =" 550 "/> 2. the values of each row produced by each Producer are completely different, for example: 650) this. width = 650; "border =" 0 "alt =" 20120406092739.png" src =" http://www.agilesharp.com/Services/BlogAttachment.ashx?AttachmentID=213 "Width =" 550 "/> to perform the above two tests, the main purpose is to check whether the data rows with the same data are compressed before network input because the same data values are repeated multiple times. This is a good opportunity to compress data, this greatly reduces network bandwidth and makes data insertion faster ). In addition, we adopt the simplest log mode for the database where our tables are located, which is also the method adopted by most databases. We will not test the Attribute combination of SqlBulkCopy and all the conditions above, because the results of many combinations can be inferred from experience, not all combinations affect performance. To make the test more accurate, before SqlBulkCopy writes data in batches, all data producers will first prepare all the data, and then pass the data to SqlBulkCopy. We are concerned about SqlBulkCopy write performance and its impact on the database. Please remember this. Next, we will enter the first scenario: a consumer of consumer data. Here, the data consumer is SqlBulkCopy. It accepts 6 million pieces of data and then imports the data into a heap table. Through multiple tests, we found that the consumer execution time was 56 seconds. At the same time, we set the values of Tablock and BatchSize to 0. At the same time, we found that the log files were increased by 6 MB and 650) this. width = 650; "border =" 0 "alt =" 2012040609281_png" src =" http://www.agilesharp.com/Services/BlogAttachment.ashx?AttachmentID=214 "Width =" 550 "/> in this process, the network conditions are as follows: 650) this. width = 650;" border = "0" alt = "20120406092844.png" src =" http://www.agilesharp.com/Services/BlogAttachment.ashx?AttachmentID=215 "Width =" 550 "/> we can see that about 25% of the network is used for testing 1 GB of bandwidth), that is to say, the bandwidth is not fully utilized. In addition, the CPU shows that only one thread is running and does not fully use all cores. That is to say, if the data import client has multiple cores, we can increase the number of SqlBulkCopy and use parallel technology to import data to improve performance. Next, let's test the above idea: 650) this. width = 650; "border =" 0 "alt =" 201204060922.png" src =" http://www.agilesharp.com/Services/BlogAttachment.ashx?AttachmentID=216 "Width =" 550 "/> we can see the figure above. We use four SqlBulkCopy. In this case, the network is used as follows: 650) this. width = 650; "border =" 0 "alt =" 20120406092949.png" src =" http://www.agilesharp.com/Services/BlogAttachment.ashx?AttachmentID=217 "Width =" 550 "/> at this time, the bandwidth is fully utilized and the speed is increased by 300%. Today we are here to make everyone feel! Next, let's talk about more things.
This article is from the "Yan Yang Tian" blog, please be sure to keep this source http://yanyangtian.blog.51cto.com/2310974/827562