Performance Comparison:data Access Techniques
Priya Dhawan
Microsoft Developer Network
January 2002
Original link: https://msdn.microsoft.com/en-us/library/ms978388.aspx
Summary: In a typical application environment, compare the performance performance of different data access technologies. Applies to Microsoft. NET Framework Beta2 and Microsoft SQL Server 2000. (23 printed pages)
Brief introduction
The architectural choice of how data is accessed affects the performance, extensibility, maintainability, and ease-of-use of the program. The focus of this article is to discuss the different performance characteristics of these choices. Data access technologies include: Microsoft ADO Command,datareader,dataset and XmlReader, which use Microsoft SQL ServerTM 2000 databases Compare the differences between these different technologies in some typical application environments. In these comparisons, a series of command operations are performed on Customer,order and orderdetail data within a certain user load range.
Code examples that show these different data access technologies can also be used in articles that discuss the data access technology of ADO. These examples include using ADO to access individual values, single-row, multiline, and hierarchical data.
Test scenario
The performance of any data operation depends on the following factors:
Object construction and object padding in data access can be a significant system overhead. For example, using the ADO dataset for instance and population operations is more overhead than using DataReader or XmlReader for the same operation.
Data access technology is not the same as the load situation of the database. For example, when an application reads data, the dataset and DataReader use a different connection method. Data access techniques that use stored procedures have less workload on the database than in the way that dynamic SQL expressions are used. The conversion between relational data and XML is similar to the use of server resources.
The number of data round-trip accesses to the database is also a factor, especially in locks and transactions that span multiple data back and forth.
The amount of data transmitted over the network is also a key factor, and the data presented in XML format is much larger than the data in other formats.
We use some of the operations commonly used in business applications, such as getting a customer column, querying a customer's related orders or inserting an order to compare the different data access technologies of ADO. To make the test more reliable, the database loaded more than 100,000 lines of customer accounts, 1 million lines of orders (10 orders per customer) and more than 5 million lines of order details (each order has 5 details). The data exists in a SQL Server 2000 database that is connected to SQL Server through SQL Server. NET data provider. Some of the methods that are compared here use the XML attributes of SQL Server 2000.
Getorderstatus
The Getorderstatus method takes a OrderID and returns an integer representing the status of the order.
GetCustomer
The GetCustomer method takes a CustomerID parameter, and then returns a row of records about the customer information.
GetCustomers
The GetCustomers method takes a CustomerID and a parameter that indicates the number of rows you want to read. The top n rows of data are read and returned in all rows that are CustomerID greater than the CustomerID passed to the Web service method.
We perform a test with pagination in a whole bunch of customer records with different pages. The number of pages for these customer records is 100,500 and 1000, respectively.
GetOrders
The GetOrders method obtains a series of hierarchical orders from the database and their corresponding details. This method takes a OrderID and a parameter that indicates how many orders to read. In all records where the OrderID is greater than the incoming OrderID, the top n rows of records will be read.
We perform a test with pagination in a whole bunch of customer records with different pages. The number of pages for these customer records are: 10 orders (50 details), 50 orders (250 details) and 100 orders (500 details).
InsertCustomer
The InsertCustomer method takes a customer data and inserts a customer row into the database, and then returns CustomerID as an integral type.
Insertcustomers
The Insertcustomers method takes a series of customer class collections, and then inserts a customer record of the multiline corresponding to the database.
InsertOrder
The InsertOrder method accepts data that contains an order record with multiple detail data and inserts the corresponding order and OrderDetails information into the database. The test method is done by inserting an order header and different details.
Test tools
For our testing purposes, we use the application Center Test (ACT), which is suitable for stress testing the Web server and analyzing the performance and extensibility issues of the Web program. Web programs include ASP pages and the components they use. To learn more about how to create and run your tests, refer to ACT documentation. It is appropriate to use Act to test different data access techniques in a Web server, because it provides a number of useful features to complete the test. First, it simulates a large set of client concurrent operations by opening multiple connections to the server and quickly sending HTTP requests. Second, it also allows us to establish a real test environment in which we can invoke the same method with a series of random parameters. This is a very important feature because users should not use the same parameters repeatedly to invoke the same method. Another more important feature is that application Center test records test results that provide the most important information about the performance of Web programs.
While testing data access technology directly, rather than testing it through a Web server as we do, results in better throughput and response time, but in a stateless environment, it is closer to the real program application environment. And, because we are basically comparing the relative performance of these data access technologies, in a stateless environment (that is, behind a Web server), the overhead of testing is the same in all cases.
All of the data access technologies we discussed earlier are implemented through. NET Framework assemblies. Using act to generate a client load on an assembly, we implement the Wrapper.aspx page, all of the client requests are sent to this interface, and then the assembly is called. The methods in these assemblies implement data manipulation using the ADO technology. They are simple sub-processes and do not return data to the. aspx page. When data rows are obtained from the database, these methods iterate through the record rows and assign the column values to local variables. By adding a delay when reading data from an ADO object, we simulate the overhead of doing some processing operations with this data.
Test scripts are written using Microsoft VBScript. Depending on the specific method executed in test script, we randomly request a different customer or order. Like what:
Dim URL Dim UB, LB ' Set the upperbound for orders list UB = 1000000 ' Set the lowerbound for orders list LB = 1 ' S Et the URL url = "http://myServer/DataAccessPerf/DataReader.aspx" ' Use the Randomize funtion to initialize the Rnd function Randomize test.sendrequest (URL & "? orderid= "& Int ((ub–lb + 1) *rnd + LB))
machine configuration
The table below provides a summary summary of the test bench configuration that is being tested:
Table 1. Client Configuration
# of clients |
machine/cpu |
# of CPUs |
Memory |
Disk |
Software |
1 |
Dell Precision WorkStation 530 MT 1694 MHz |
1 |
Mb |
16.9 GB |
- Microsoft WINDOWS®XP
- Application Center Test
|
Table 2. Web server Configuration
# of Servers |
machine/cpu |
# of CPUs |
Memory |
Disk |
Software |
1 |
Compaq Proliant-MHz |
4 |
640 MB |
GB |
- Windows Advanced Server SP 2
- . NET Framework Beta 2
|
table 3. Database Server Configuration
# of Servers |
machine/cpu |
# of CPUs |
Memo Ry |
Disk |
software |
1 |
American megatrends atlantis 80 0 MHz |
2 |
1 GB |
GB |
- Windows Advance Server SP 2
- SQL Server Enterprise Edition SP 2Throughput and latency are the key performance indicators. For a given amount of data being returned, throughput is the number of the client requests processed within a certain unit of Time, typically within a second. Because Peak throughput may occur @ A response time that's unacceptable from a usability standpoint, we tracked latency, Measured as response time using the report generated by application Center Test for each of the test run, and capped the Testing of a given method once the response time exceeded 1 second.
|
Performance Test Results
Getorderstatus
Here we compare the performance of a single value from a database using different data access techniques.
Figure 1. Getorderstatus: Throughput and latency
Annotations
- All accesses use stored procedures.
- In ExecuteScalar mode, a single value is returned using the ExecuteScalar method of the Command object.
- In the output parameter mode, a single value is returned as an output parameter of the command object.
- In DataReader mode, DataReader is used to get a single value.
- In the XmlReader mode, a SQL query with a FOR XML clause is specified to obtain a single value, which is stored as XML in XmlReader.
As shown in 1, the Executescalar,output parameter and DataReader methods get a single merit performance that is close to all user load ranges.
But the ExecuteScalar method requires less code than other methods, so it is the best choice from a code maintainability standpoint.
The XmlReader method produces lower peak throughput compared to other methods, and the stored procedure that contains the FOR XML query takes more time than the stored procedure used by other methods.
GetCustomer
Here we compare the performance differences of different data access technologies when acquiring single-row records from a database.
Figure 2. GetCustomer: Throughput and latency
Annotations:
- All methods take a stored procedure.
- In the output parameter method, a single row record is returned by the command object's output parameter set.
- In the DataReader method, DataReader is used to obtain a single row of records.
- The XmlReader method uses a SQL query with a FOR XML clause to fetch a row record from the database, which is stored as an XML file in XmlReader.
- The DataSet method populates a single row of records into a dataset.
As shown in 2, the output parameters and the DataReader method are consistent across different user loads and produce better network throughput than the other two methods. The XmlReader method behaves slightly better than the dataset in terms of throughput and response time.
In the XmlReader method, SQL queries that use for XML take longer to execute than other methods.
In this case, the overhead caused by the creation of the DataSet object is the main cause of the relatively low throughput.
GetCustomers
In this section, we compare the performance of reading multiple rows of records (data access technology). We perform a test that returns a result set of 100 rows, 500 rows, and 1000 rows of records to observe the performance impact of the data return amount.
Figure 3. GetCustomers (customers=100): Throughput and latency
Annotations:
- All methods take a stored procedure.
- In the DataReader method, DataReader is used to obtain multiple rows of records.
- The XmlReader method uses a SQL query with a FOR XML clause to fetch ROW records from the database, which are stored as XML files in XmlReader.
- The DataSet method fills the row record into the dataset.
As you might expect, reading more rows of records from a database decreases the number of requests per second because more row records need to be processed and these row records are sent.
Figure 3 shows that the throughput of the DataReader method is almost twice times greater than the other two methods. The DataSet and XmlReader methods perform almost the same performance, but in terms of throughput, XmlReader is a little bit better than the DataSet method.
Figure 4. GetCustomers (customers=500): Throughput and latency
When we increase the number of records returned from a database request to 500, the requests per second for all methods are further reduced.
The DataReader method further expands its advantage in terms of throughput, and its throughput is twice times greater than the other two methods. The DataSet method is a little better than the XmlReader method.
Figure 5. GetCustomers (customers=1000): Throughput and latency
When returning 1000 rows of records, the DataReader throughput is three times times that of the other methods.
The amount of user load for the dataset and XmlReader method is limited to 50 users, as we can see from the response time-user load graph that the response time exceeds the threshold of 1 seconds.
GetOrders
In this section, we compare the performance of a hierarchical result set (different data access technologies). We perform the following tests on the returned result set: 10 Order Records (50 detail records), 50 order Records (250 detail records), 100 order records (500 detail records), 200 Order records (1000 detail records).
Figure 6. GetOrders (orders=10, details=50): Throughput and latency
Annotations:
- All methods use stored procedures.
- In the DataReader method, DataReader is populated by a stored procedure that returns two result sets. Two result sets returned: one containing the order record and one containing the detail record. Because DataReader is only moving forward, when the order's detail record is read from the DataReader, it is not associated with their order.
- The XmlReader method defines a SQL query statement with a FOR XML display clause to read the order record and the corresponding detail recordset in a hierarchical XML form. When the order record and the detail record are read separately from the XmlReader, they are interrelated.
- The DataSet method populates the dataset with the same stored procedure as DataReader. A parent-child relationship is established between the order record and the DataTable that detail records, and is correlated when they read records from the dataset.
We start with 10 order records and 50 detail records as return sets. As shown in 6, the DataReader method achieves the throughput vertex at 50 concurrent users, which is a little better than the XmlReader method. XmlReader throughput starts to drop at 20 concurrent users.
The DataSet method has a smaller maximum throughput and a significant delay. In this case, it needs to withstand the overhead of creating two DataTable storage order and detail records, as well as other sub-objects related to dattable records.
Figure 7. GetOrders (Orders=50, details=250): Throughput and latency
As can be seen clearly in Figure 7, DataReader provides good throughput because, as explained in the annotations, it does not need to correlate order and detail records.
Although the throughput of both the XmlReader and the DataSet is similar, the FOR XML display query used by XmlReader consumes more CPU resources on the database server than the DataSet method, which consumes more CPU resources on the Web server.
Figure 8. GetOrders (orders=100, details=500): Throughput and latency
As shown in 8, the throughput of the DataReader method is twice times that of the other two methods. However, it should be remembered that the DataReader method causes the application itself to associate the order record with the detail record.
DataReader vs. DataSet
In all of the above tests we have seen that the performance of the DataReader method far exceeds the dataset. As mentioned earlier, the DataReader method can provide better performance because it avoids the overhead of performance and memory when the dataset creates itself.
DataReader is a good choice for read-only and forward-only data access. The sooner you read the data from the DataReader and close the DataReader, and then close the database connection, the better performance you'll get. Because the database connection established by DataReader cannot be used for other purposes when the application reads data, scalability is limited if the program remains DataReader aware of competition. The dataset needs to maintain a data connection only when it is populated, and when the data is populated, it closes the connection connection immediately and returns it to the cache pool. Delaying reading data from a dataset does not cause competition because the connection connection has been returned to the cache pool.
In all of the above tests, the maximum capacity of the connection pool was 100 (the default) because the number of concurrent users never exceeded 100, so there was no connection contention. If you want to observe the performance of the DataReader and DataSet methods, we can delay a while after reading all the data, causing connection to compete.
In the next test, after we read the data, we have a delay of 20 milliseconds, and the maximum capacity of the cache pool is set to 10. This will lead to connection competition.
Figure 9. GetCustomers (with contention): Throughput and latency
Annotations
- In the DataReader method, DataReader is used to obtain hierarchical data that contains the order record and the corresponding detail record.
- The DataSet method populates the order and corresponding detail records into a dataset.
When the number of concurrent users reaches 20, the competition for database connection objects begins to appear, so the DataReader performance begins to decline, as shown in 9. When the concurrent user exceeds 20 o'clock, the dataset's performance begins to exceed DataReader. At this point, the DataReader response time is more than the dataset.
Next, after reading the data, we defer to 50 milliseconds and set the maximum buffer pool capacity to 12.
Figure 10. GetCustomers (with contention): Throughput and latency
Annotations
- In the DataReader method, DataReader is used to obtain hierarchical data that contains the order record and the corresponding detail record.
- The DataSet method populates the order and corresponding detail records into a dataset.
As shown in 10, DataReader's performance dropped when more than 20 concurrent users, and competition for database connections began to emerge. In addition, the DataReader response time also exceeds the dataset.
Insert customer record (InsertCustomer)
After comparing the differences in reading data from different data access data, we continue to compare the differences in their write data. Below, we insert a user record into the database.
Figure 11. InsertCustomer: Throughput and latency
Annotations
- The input parameter method and the Insercommand method use stored procedures.
- In the input parameter method, populate the Command object's input parameter with the new customer information, and then call the ExecuteNonQuery method to execute the stored procedure.
- In the Autogen method, when the Update method is called, the Insert command is created according to the DataAdapter Select command.
- In the InsertCommand method, the InsertCommand property of the DataAdapter is defined.
In the Autogen method, when the Update method of the DataAdapter method is called, the CommandBuilder object associated with DataAdapter is based on the SELECT command you pay to DataAdapter. Automatically generates the INSETR command. This also explains why it is slower than the other two methods. The advantage of this approach is that it simplifies and reduces the code because you don't need to define insert,update and delete commands. We can compare the InsertCommand property with the display definition DataAdapter.
In the input Parameters method, the input parameter of the command object is populated with new customer information, and then the ExecuteNonQuery method is called to execute the SQL stored procedure. As shown in 11, this method performs better than the other two. The InsertCommand method provides performance and memory overhead for creating a DataSet object. Because the output parameters method avoids these costs, it has a better performance.
Insertcustomers
To see the effect of inserting multi-line customer records on the performance of multiple ADO objects, we perform the following tests. We change the customer row record to increase the amount of data being inserted.
Figure 12. Insertcustomers (n=5): Throughput and latency
Notes
- The InsertCommand method uses stored procedures.
- In the Autogen method, when the Update method is called, Dataadaper automatically generates an INSERT command based on its select command.
- In the InsertCommand method, you indicate the InsertCommand property of the DataAdapter.
As expected, the InsertCommand method (because we have defined the InsertCommand property of DataAdapter) will run faster than the Autogen method. It has shorter response times and better throughput across the entire user-load family.
Figure 13. Insertcustomers (n=10): Throughput and latency
As shown in 13, the InsertCommand method still has better throughput than the Autogen method when more data is inserted. But the difference between the two is shortened because the overhead of automatically generating commands is offset by the need to process more data.
InsertOrder
The final series of tests is to compare performance differences when inserting hierarchical order and corresponding detail records. We change the number of detail records to increase the number of inserted data.
Because the Orders data table has an automatically incremented column (ORDERID) as its primary key, the Insert command that is automatically generated by the DataAdapter associated with the Orders data table is Inserting a hierarchical row record into the orders (parent table) and OrderDetails (child tables) data tables becomes impossible. This is mainly because of some of the problems with the auto-generated insert command: The primary key ID as the identity column is not returned to the dataset. Therefore, the Autogen method is a hybrid method where the InsertCommand of the Orders data table is a stored procedure (it returns OrderID), The insert command for the DataAdapter associated with the OrderDetails data table is automatically generated.
In the OpenXML method, the InsertCommand property of the DataAdapter associated with the Orders data table is set to a stored procedure that receives an order in XML form and a corresponding detail record. After inserting rows into orders and OrderDetails data tables in a DataSet object, the XML data stored in the dataset is extracted and then passed to the stored procedure, which uses the sp_xml_preparedocument system stored procedure. The OpenXML method then inserts the appropriate content into the orders and OrderDetails data tables in the database.
Figure 14. InsertOrder (Order=1, details=2): Throughput and latency
- The InsertCommand and OPENXML methods use stored procedures.
- The Autogen method is a hybrid method that uses a stored procedure for order insertion, and for OrderDetails, automatically generates an INSERT command based on its SelectCommand.
- In the InsertCommand method, we indicate the InsertCommand of two DataAdapter objects.
- In the OpenXML method, the XML data representing the order and OrderDetails Records is passed as the nvarchar parameter to the stored procedure, then parsed, and the appropriate insert is executed.
As shown in 14, Performance: Insetcommand>autogen>openxml method.
During execution, the overhead of the automatically generated insert command and the efficiency of the stored procedure are better than dynamic SQL to explain why it is slower than the InsertCommand method.
Although the OpenXML method has fewer round trips than the other two, it is not the main factor that affects the performance of a test when performing three insertions.
Figure 15. InsertOrder (Order=1, details=10): Throughput and latency
Next, we test the insertion of an order and the corresponding 10 detail records. As shown in 15, the OpenXML method is generally better than the other two methods, because the cost of the data round trip is a major factor. Here, the OpenXML method uses only 1 data round trips, which can replace 11 data round trips for other methods.
Figure 16. InsertOrder (Order=1, details=100): Throughput and latency
Annotations
- In the OpenXML method, the order and OrderDetails records in the form of XML are passed to the stored procedure as the ntext type, then converted, and the appropriate insertion method is called.
Finally, we tested the insertion of an order record and 100 detail records. Here, OPENXML uses only one data round trip, instead of 101 data round trips for other methods. As shown in 16, the throughput of the Oepnxml method is much larger than other methods. In addition, due to the system cost of automatic generation, the throughput of Autogen and InsertCommand methods is almost the same in this test.
Conclusion
Performance and scalability are the most issues to consider when choosing data access technologies. As shown in the comparison above, one data access technology typically has a much larger throughput than the other, but no access technology is performing well on all occasions. Because the overall performance is affected by many factors, there is no other way to perform performance testing instead of using real-world scenarios.
Compare different data access technologies in ADO (performance Comparison:data access Techniques)