Using Ado.net object to optimize data query code

Source: Internet
Author: User
Tags filter bind end expression sql server query net object model string
ado| Objects | data | Optimization There is no doubt that Ado.net provides a powerful, emulated database object model that can store data records in memory. In particular, the DataSet class of ado.net is not only functionally equivalent to the centralized memory of database tables (central repository), but also supports various constraints and logical relationships between tables. In a nutshell, the DataSet object is actually an offline data container.

At first glance, by uniting all the attributes of the DataSet class, you can eliminate complex clauses in SQL query commands, such as those that are flooded with layers of nested INNER join clauses or GROUP by clauses. Complex clauses can be decomposed into two or more separate simple clauses, while the query results of each simple clause are stored in different DataTable objects, and the necessary "referential integrity" between the original tables can be reconstructed by simply analyzing the constraints and logical relationships between the memory data ( referential integrity).

For example, you can save the customer (Customers) table and order (orders) table to two different DataTable objects, then bind (BIND) through the DataRelation object. In this way, SQL Server (or other DBMS systems) eliminates the heavy burden of the INNER JOIN clause and, more importantly, reduces the network transport load considerably. The solution to simplifying SQL queries like this is effective, but not always the best choice, especially if your database is large and frequently updated.

This article will introduce another technique for simplifying SQL queries that leverages ado.net memory data objects to ease the burden on users and DBMS systems.

Decomposing SQL query commands

Many books on ado.net, such as David Sceppa's masterpiece "Programming Ado.net Core Reference" (Microsoft Press), have suggested splitting complex SQL query commands into simple subqueries, The returned results of each subquery are then saved to several DataTable objects within the same DataSet container. Take a look at an example.

Suppose you need to obtain some customer order information, require that the order be submitted in the specified year and grouped by customer, and that the order contain at least 30 items. At the same time, you also want to obtain the name of the submitter (employee) for each order and the company name of the client (customer). You can use the following SQL query statement to implement it:

DECLARE @TheYear int
SET @TheYear = 1997

SELECT O.customerid, Od.orderid, O.orderdate, O.shippeddate,
SUM (Od.quantity*od.unitprice) as Price,
C.companyname, e.lastname from Orders as O
INNER JOIN Customers as C on C.customerid=o.customerid
INNER JOIN Employees as E on E.employeeid=o.employeeid
INNER JOIN [order Details] as OD on O.orderid=od.orderid
WHERE year (o.orderdate) = @TheYear and Od.orderid=o.orderid
GROUP by O.customerid, C.companyname, Od.orderid,
O.orderdate, O.shippeddate, E.lastname
Having SUM (od.quantity) >30
ORDER BY O.customerid

Aside from the ADO or ADO you use. NET Bar. Execute the above SQL query with the most original command submission, and you can see the result set as shown in Figure 1:

Figure 1. The output of the first SQL query command, generated and displayed by SQL Server query Analyzer.

In this query, one clause is the core, and the other two INNER JOIN clauses act as auxiliary. The function of the core clause is to query all orders submitted to the specified year, containing at least 30 items, from the database. The core clauses are as follows:

SELECT O.customerid, O.orderid, O.orderdate,
O.shippeddate, SUM (od.quantity*od.unitprice) as Price, O.employeeid
From Orders as O
INNER JOIN [order Details] as OD on O.orderid=od.orderid
WHERE year (o.orderdate) = @TheYear and Od.orderid=o.orderid
GROUP by O.customerid, O.orderid, O.orderdate, O.shippeddate,
O.employeeid
Having SUM (od.quantity) >30
ORDER BY O.customerid

In the return result set, both the customer and the submitter are represented by IDs. However, this example requires the customer's company name (Compayname) and the submitter's name (LastName). The ORDER BY O.customerid statement at the end is particularly simple, but its functionality is important: Because the client company name and submitter name contain more characters, the statement can be used to avoid their recurrence, resulting in a more compact result set.

To sum up, the entire SQL query can be decomposed into 3 subquery commands--1 core subqueries, used to obtain order records, 2 auxiliary subqueries to establish the submitter ID-submitter name and customer ID-Customer company name two tables, namely:

SELECT EmployeeID, LastName from Employees
SELECT CustomerID, CompanyName from Customers

The following Ado.net code demonstrates how to save the return result set of these 3 subqueries to a DataSet object.

Dim conn As SqlConnection = New SqlConnection (connstring)
Dim Adapter As SqlDataAdapter = New SqlDataAdapter ()

Conn. Open ()
Adapter. SelectCommand = New SqlCommand (cmdcore, conn)
Adapter. SELECTCOMMAND.PARAMETERS.ADD ("@TheYear", 1997)
Adapter. SELECTCOMMAND.PARAMETERS.ADD ("@TheQuantity", 30)
Adapter. Fill (ds, "Orders")
Adapter. SelectCommand = New SqlCommand (cmdcust, conn)
Adapter. Fill (ds, "Customers")
Adapter. SelectCommand = New SqlCommand (CMDEMPL, conn)
Adapter. Fill (ds, "Employees")
Conn. Close ()

Note that when executing SQL query commands sequentially, you usually have to operate the database connection yourself in order to avoid unnecessary open/close operations. The adapter of this example. The Fill method automatically performs open/close operations unless you set up adapter. The Selectcommmand property explicitly associates a connection to the adapter object.

To establish a chain of relationships between the memory data tables, you can create two relationships, associate EmployeeID (the submitter's ID) with LastName (the submitter's name), and associate CustomerID (the customer's ID) to CompanyName (the customer's company name). In general, you can use the DataRelation object to create a one-to-many relationship between two separate tables within the same DataSet object. However, this example needs to establish a many-to-many relationship, which is very rare. In fact, as long as the parent table (Orders) in a one-to-many relationship into a child table, and the table (Employees, Customers) into the parent table on the line.

Figure 2. The parent table in a relationship swaps with the child table role

The DataRelation object in Ado.net is quite flexible enough to build a many-to-many relationship. Each generation of a DataRelation object, ado.net establishes a consistency constraint in the background to avoid duplicate key values within the parent table. Of course, once the duplicate key value appears, Ado.net throws (throw) an exception. Please see the following code:
Dim Relorder2employees as DataRelation
Relorder2employees = New DataRelation ("Orders2employees", _
Ds. Tables ("Orders"). Columns ("EmployeeID"), _
Ds. Tables ("Employees"). Columns ("EmployeeID"))
Ds. Relations.Add (Relorder2employees)
The DataRelation object constructor here initializes three parameters: the first is the relationship name, and the following two are the DataColumn objects, representing the two columns (column) that make up the relationship: The previous DataColumn object represents the parent column, and the latter DataColumn object represents a child column. Once the constructor discovers that a legal record does not exist in the parent column, a ArgumentException exception is activated (raise). The simplest solution to eliminate this exception is to add a Boolean value to the constructor as the fourth parameter:
Relorder2employees = New DataRelation ("Orders2employees", _
Ds. Tables ("Orders"). Columns ("EmployeeID"), _
Ds. Tables ("Employees"). Columns ("EmployeeID"), _
False)

When the constructor's fourth parameter value is False, the ado.net does not establish a consistency constraint, which is the culprit that caused the ArgumentException exception.

After you set up a data relationship, you can add two columns to the Orders table to display its contents with a column expression (computed column). Theoretically, this is perfectly logical:

Dim orders as DataTable = ds. Tables ("Orders")
Orders. Columns.Add ("Employee", GetType (String), _
"Child (Orders2employees). LastName")
Orders. Columns.Add ("Customer", GetType (String), _
"Child (Orders2customers). CompanyName")

Unfortunately, it doesn't work. Worse, when it runs to code that contains the child, it throws (throw) a "syntactic error" message, which can easily mislead the programmer. (For more information about column expressions, see last month ' s column.) )

Why did it go wrong? This allows the child keyword to be used in a column expression only if there is a consistency constraint on the parent column. This is not clearly stated in the development document, but it is factual and very important. It is puzzling that you will not only have a smooth access to the child elements of any one row of the Orders table, but also direct access to any column of the Employees table or Customers table. The following code can prove this:

Dim orders as DataTable = ds. Tables ("Orders")
Dim Employee as DataRow = orders. Rows (0). GetChildRows (Relorder2employees)
MsgBox Employee ("LastName")

Therefore, the so-called "syntactic error" does not mean that you can not establish a many-to-many relationship. It simply reminds you that you cannot use the child keyword in a column expression unless you have established a consistency constraint beforehand.

In the initial relationship, the Orders table is the parent table. However, in order to get the submitter's name or the customer's name from the ID, you must change the role of the table: Let the Orders Table act as a child table, and the Employees table, Customers table as the parent table. To ensure this, you must change the name of the column in the DataRelation Object Builder code, and use the column expression like this:

Dim orders as DataTable = ds. Tables ("Orders")
Orders. Columns.Add ("Employee", GetType (String), _
"Parent (orders2employees). LastName")
Orders. Columns.Add ("Customer", GetType (String), _
"Parent (orders2customers). CompanyName")

Summary: In this case, we decompose a complex SQL query into 3 simpler subqueries, eliminating two INNER JOIN statements, reducing the burden on the database server, and, more importantly, significantly reducing the network transport load from the server to the client. So it seems to be the best solution?

Alternative scenarios

The previous solution is based on a comparison table, and there is no data filtering during the generation of the comparison table. Once the scale of the table is too large, what will be the consequences? Are you willing to download 10,000 records from the server in order to get the name of a mere hundreds of submitter? Are you willing to download that huge pile of redundant data? Besides, those redundant data are useless to you!

However, please think about it from another angle. Tables are often valuable throughout the lifecycle of an application. In other words, although it is too extravagant to download many records to build a complete table for a single query, it is not necessarily a fair deal for the entire application.

In this case, why don't we try to use another technique to reduce the size of the table? The easiest solution is to use the WHERE clause to narrow the result set. Unfortunately, this scenario is either difficult to implement or ineffective, especially if the list does not include the object you are querying. For example, to filter the name of the submitter, you must make a joint query of the other tables-such as the order table and the Orders details table. I think the best scenario is to retrieve the returned result set from the last SQL query and parse out the information for each submitter. That is, after completing the SQL query, send an almost identical query command again, causing the database server to re-run the decomposed subquery. In this way, the database returns exactly the same data at the lowest query cost. Even better, the SQL server specifically sets up the query optimization engine, which minimizes the cost of such duplicate queries.

SELECT DISTINCT T.customerid, t.companyname from
(SELECT O.customerid, O.orderid, O.orderdate, O.shippeddate,
SUM (Od.quantity*od.unitprice) as Price,
C.companyname, O.employeeid
From Orders as O
INNER JOIN Customers as C on C.customerid=o.customerid
INNER JOIN [order Details] as OD on O.orderid=od.orderid
WHERE year (o.orderdate) = @TheYear and Od.orderid=o.orderid
GROUP by O.customerid, O.orderid, C.companyname,
O.orderdate, O.shippeddate, O.employeeid
Having SUM (od.quantity) >30) as T

All in all, the biggest advantage of data retrieval code that is based on several simple queries is that it shifts the burden of data connectivity (joining) from the server to the client. On the other hand, because the client distributes data records in several independent, easily linked tables, the data query operation is extremely flexible.

It's good news for client applications to read records from a database and save them to an extra DataTable object with some short, simple SQL queries. But what if some of the data returned by these subqueries does not conform to the consistency constraint?

Application of the transaction

Typically, each query command, regardless of how complex it is, is executed in the same default transaction (transaction). That is why it ensures that the overall consistency of the data is not compromised by the interference of other code during execution. But what if you split the query command as a logical whole into several subqueries?

Assume that your software needs to deal with a changing environment, and that the records in the database are rapidly being updated. Perhaps your subquery is still being executed, and the data has been replaced by a user. In the previous example program, this kind of accident does not cause much damage, because the second and third subquery commands only deal with the tables that have been generated. However, once someone has deleted a submitter's record after you get the order data, the data you are querying may violate the consistency constraint. Therefore, the decomposed subquery and related processing code must be put into the same transaction. Besides, you have no choice.

The so-called transaction (Transaction) refers to a set of operations, which strictly adheres to the following rules when executing:

• Non-atomicity (Atomic sex)
• Consistency (consistency)
• Independence (isolation)
• Sustainability (durability)

People usually extract the first letter of these 4 rules (4 attributes) to the ACID. For this example, the most important rule (attribute) is independence. Independence refers to the ability of a database to ensure that each running transaction is not interfered with by any other parallel firm. When your query command is running in a transaction, if other users ' database operations are running concurrently in other transactions, the result you end up with is related to the level of independence of the transaction. Normally, the database can allocate the independence level reasonably according to the operation in each transaction. If the application requires absolute consistency of data and does not allow "illusory rows" (phantom rows), it must obtain a "serializable" (serializable) level of independence.

When a "serializable" transaction is running, it locks all related tables, preventing any other users from updating or inserting fields. The table is unlocked only when the transaction is finished running. Under this independence level, "read pollution" (dirty reads, that is, read unauthorized data) and "Unreal Rows" (phantom rows, which are not yet recorded, or rows that have been deleted by other transactions) will not appear, but the overall consistency of the data is still not guaranteed.

Since your subquery commands may appear to be deleted and the order records are modified during the run, you should of course wrap all the subqueries into a "serializable" transaction.

SET TRANSACTION Isolation Level SERIALIZABLE
BEGIN TRANSACTION
--Get Orders
--Get Customers
--Get Employees
COMMIT TRANSACTION

Again, if conditions permit, your transaction should have a "serializable" level of independence. If you are concerned about the negative impact of locking all tables, try the Ado.net memory data object. We'll discuss these objects later.

Unlock Data connections

Let's take a look back at the sample query at the beginning of this article. Its goal is to read from the database All order records that are submitted in the specified year and that contain the number of items that meet the criteria, and we also need to know the total number of orders, the customer's company name, and the name of the order submitter.
DECLARE @TheYear int
DECLARE @TheAmount int
SET @TheYear = 1997
SET @TheAmount = 30

SELECT O.customerid, Od.orderid, O.orderdate, O.shippeddate,
SUM (Od.quantity*od.unitprice) as Price,
C.companyname, e.lastname from Orders as O
INNER JOIN Customers as C on C.customerid=o.customerid
INNER JOIN Employees as E on E.employeeid=o.employeeid
INNER JOIN [order Details] as OD on O.orderid=od.orderid
WHERE year (o.orderdate) = @TheYear and Od.orderid=o.orderid
GROUP by O.customerid, C.companyname, Od.orderid,
O.orderdate, O.shippeddate, E.lastname
Having SUM (od.quantity) > @TheAmount
ORDER BY O.customerid

The SQL query command above does allow you to return all the required data at once. Just having them run in the same transaction ensures consistency and "serializable" returns of the data. However, this scheme is obsolete and we have not chosen it. Why, then?

In fact, it has two problems: first, the rows of the result set are returned from 3 separate tables:

• Order (Orders)
• Customer (Customers)
• Submitted by (Employees)

This does not include the order Details table.

Second, the INNER JOIN statement causes some unnecessary data movement. We can't solve the second problem, but some ado.net code helps solve the first problem. Therefore, we still have the opportunity to improve the feasibility and effectiveness of the whole solution.

The specific ideas are as follows: First execute the SQL query, save the returned result set to a DataTable object, and then scatter the data in the DataTable into 3 different but related DataTable objects. The results of the final output are no different from querying the tables separately, but it saves the cost of defining and setting a "serializable" transaction, while avoiding downloading redundant records from the database; The drawback is that each row may contain a small amount of redundant information.

When can this program be adopted? I found this scenario a good choice when clients needed to build a complex master-slave composite (master/detail) view with the help of group-by functions and various filters. By the way, it is very effective to use a number of different and related tables at this time, Ado.net also provides a lot of optimization features.

Let's discuss the specific operation. The following code demonstrates its main process:

Function Splitdata (ByVal ds as DataSet) as DataSet
Dim _dataset as New dataset ()

' Make a ' full worker copy of the DataSet
_dataset = ds. Copy ()

Createcustomers (_dataset, DS)
Createemployees (_dataset, DS)

' Remove columns from Orders (CompanyName [2] and LastName [4])
_dataset. Tables ("Orders"). Columns.removeat (1)
_dataset. Tables ("Orders"). Columns.removeat (3)

Return _dataset
End Function

The code first completely duplicates the DataSet object (DS) as the Orders table in the new DataSet object (_dataset). Next, the code dynamically adds the Customers table and the Employees table to the new DataSet object. Finally, it deletes the columns contained in the other two child tables from the Orders table of the new DataSet object. The following illustration shows the contents of the Customers table in the new DataSet object. Look, it only leaves the orders in the Order table (all) the customer ID and company name of two columns. Since both tables have CustomerID columns, they can still establish relationships.

Figure 3. A newly generated Customers table based on the return result of the first query

The following is a brief discussion of the code necessary to create and populate customers tables and employees tables.

Initially, you must invoke the Clone method to clone the original order table to create a new DataTable object. Unlike the Copy method, the Clone method only replicates metadata (metadata). Because the DataTable interface does not allow a single column to be cloned, this method is the easiest way to generate a peer table. However, this generated table will contain some extra columns that we must delete.

As long as you analyze the structure of the first DataSet object, you will find that the CustomerID column and the CompanyName column are the first and second columns that return the result set.

Dim _customers as DataTable = orig. Tables ("Orders"). Clone ()
_customers. TableName = "Customers"

' Remove unneeded columns
Dim I as Integer
For i = 2 to _customers. Columns.count-1
_customers. Columns.removeat (2)
Next

After you have established the table structure, you also have to load the data. However, the same submitter may appear multiple times in the Orders table. In addition, you must filter the data in the source DataSet object. Fortunately, the Orders table has been sorted according to the CustomerID column, so you just loop through all the rows and select those that match the criteria.

Dim Row as DataRow
Dim Customerkey as String = ""
For each row in _dataset. Tables ("Orders"). Rows
' already sorted by CustomerID
If customerkey <> Row ("CustomerID") Then
' SELECT DISTINCT
_customers. ImportRow (Row)
Customerkey = Row ("CustomerID")
End If
Next

' Add to the DataSet
_dataset. Tables.add (_customers)

ImportRow is the quickest way to export a specified row to a new table from a database. Typically, the ImportRow method picks the exported columns according to the schema requirements.

In principle, the creation of a employeess table is roughly the same as the creation of a Customers table. Of course, you should delete the columns differently. From the structure of the Orders table, we must keep columns 3rd and 4th. The following code deletes columns 1th and 2nd First, and then resolves the other columns in one loop.

Dim _employees as DataTable = orig. Tables ("Orders"). Clone ()
_employees. TableName = "Employees"

' Remove unneeded columns
_employees. Columns.removeat (0)
_employees. Columns.removeat (0)
Dim I as Integer
For i = 2 to _employees. Columns.count-1
_employees. Columns.removeat (2)
Next

Finally, you must also clear the duplicate rows in the Employees table. In this case, ordering the Orders table helps simplify the operation. You can first create a sorted view of the Orders table (sorted view), and then loop through all the rows.

Dim EmployeeKey as Integer = 0
Dim View as DataView = New DataView (_dataset. Tables ("Orders"))
View. Sort = "EmployeeID"

Dim Rowview as DataRowView
For each rowview in view
If employeekey <> Convert.ToInt32 (Rowview ("EmployeeID")) Then
' SELECT DISTINCT
_employees. ImportRow (Rowview.row)
EmployeeKey = Convert.ToInt32 (Rowview ("EmployeeID"))
End If
Next

' Add to the DataSet
_dataset. Tables.add (_employees)

Summarize

This article demonstrates a complex example of SQL queries and discusses 3 scenarios that improve their efficiency. Admittedly, classic ADO has limited help in solving such problems, but Ado.net lets you build a powerful off-line data object model to improve program performance. This article mentions several solutions, which is the best choice? Hardly. There are many factors affecting the efficiency of operation, such as: the effective bandwidth of the network, the consistency of data inheritance, the requirement of the data stability of the program, and so on. To determine the best scenario, you have to try each of these scenarios and test their performance separately.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.