Profiling ado.net Batch update (in-depth research data access)

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

ado| Access | The interaction between the data ado.net application and the underlying data source is based on a dual architecture with a two-way channel. You can use individual provider-specific commands or batch update procedures to access the data source to read and write rows. In both cases, data access produces a fully bidirectional binding and involves a variety of different objects and methods. You can use command classes such as SqlCommand and OleDbCommand to execute a single command. You can use a data adapter object to download disconnected data and submit an updated rowset. Although a dataset is a container object used by a data adapter to return and commit a record block, each command returns data through a data reader object.

Updates are done through individual commands, stored procedures, and typically any command text understood by the managed provider is generally referred to as an update. The update command always executes the new data embedded in the body of the statement. The update command always requires an open connection and may require an ongoing transaction or a new transaction. A batch update is a slightly different method branch. From the highest level of abstraction, you don't issue commands, no matter how complicated it may be. Instead, you commit a snapshot of the current row that was modified at the client and wait for the data source to be approved. The key concept behind batch updates is the concept of data disconnection. You download a row table, typically a dataset, modify it on the client as needed, and then submit a new image of those rows to the database server. Instead of executing a command to create changes to the data source, you are committing the changes. This is the essential difference between the update (which I discussed in July column) and the batch update.

The following figure illustrates the dual-update architecture of the ado.net.

Figure 1. Ado.net Two bidirectional interactions between applications and data sources

Before discussing the Ado.net batch update in more detail, I need to clarify one aspect of the batch update model that often leads to some misunderstanding. Although update and batch updates are intrinsically different from actual implementations within Ado.net, they follow the same update model. Updates and batch updates are done through direct and provider-specific statements. Of course, because batch updates usually involve more rows, these statements are grouped into a batch call. A batch update loops through the rows of the destination dataset, issuing the appropriate update command (INSERT, DELETE, or update) whenever an updated row is found. When you communicate with the updated rows, a predefined direct SQL command is run. In essence, this is a batch update.

This process is taken for granted. In fact, if the batch update uses a completely different update model, special support from the data source is required. (This is what happens when you submit an XML updategram to SQL Server 2000.) Batch update is simply a software mechanism that is provided by a client that simplifies the submission of multiple row updates. In any case, each new row commit is always done through a normal channel with the direct command of the data source.

So far, only SQL commands have been mentioned in this article, but these references clearly show an important distinction between the ADO batch update implementation and the Ado.net batch update implementation. In ADO, a batch update can only occur on a sql-based data source. In ado.net, batch updates can occur on any kind of managed provider, including managed providers that should not expose their data through the SQL query language. Now we can begin to discuss the key elements of Ado.net batch update programming.

Prepare data sets for submission
The ado.net batch update is done through the "Update" method of the data adapter object. Data can only be submitted on a per-table basis. If you do not specify a table name when you invoke update, the default table name is used. If a table with that name does not exist, an exception is generated. Update first checks the RowState property of each table row, and then prepares a custom insert, UPDATE, or DELETE statement for each inserted row, updated row, or delete row in the specified table.

The "Update" method has several overloads. It can take a pair, a data table, or even an array of DataRow objects provided by datasets and datasheets. The method returns an integer value, that is, the number of rows that were successfully updated.

To minimize network traffic, "Update" is typically invoked on a subset of the dataset that is being manipulated. There is no doubt that this subset contains only the rows that were modified at that time. You can get such a subset by calling the GetChanges method of the DataSet.

if (ds. HasChanges ())
{
DataSet dschanges = ds. GetChanges ();
Adapter. Update (Dschanges, "MyTable");
}

In addition, you can use the HasChanges method to check whether a dataset has changed. HasChanges returns a Boolean value.

The data set returned by GetChanges contains rows that were inserted, deleted, or modified at that time. But what was the time when it was said? This is one aspect of the complexity of the Ado.net batch update that must be handled with the current state of the table row.

Back to the top of the page
The status of the row
Each row in the datasheet is rendered through the DataRow object. The DataRow object exists primarily as an element of the Rows collection of the parent data Table object. Conceptually, a database row is inherently linked to the structure of a given table. It is for this reason that the DataRow class in ado.net does not provide a public constructor. The only way to create a new DataRow object is to invoke a method named NewRow for a live instance of a datasheet object. The row you just created does not belong to the rows collection of the parent table, but the relationship of that row to this collection determines the state of the row. The following table shows some of the desirable values for the RowState property. These values are grouped in the DataRowState enumeration.

Added
The row has been added to the table.

Deleted
The row has been marked for deletion from the parent table.

Detached
The row has been created but has not been added to the table, or the row has been deleted from the collection of table rows.

Modified
Some of the columns in the row have changed.

Added
The row has been added to the table.

Unchanged
No changes were made to the row after it was created or after the AcceptChanges method was last called.

The RowState property of each row affects the return value of the HasChanges method and the contents of the child dataset returned by GetChanges.

From the range of these desirable values, it can be seen that the value of the RowState depends largely on what has been done to the row. The Ado.net table is based on two methods-AcceptChanges and RejectChanges-to implement a commit model for similar transaction processing. When you download a table from a data source or when you create a new table in memory, all rows are unchanged. Any changes you enter do not immediately change to permanent and can be rolled back and forth at any time by calling RejectChanges. You can call the RejectChanges method at three levels:

• All changes, regardless of changes, can be rejected at the dataset level.

• You can cancel all changes in a table at the datasheet level.

• Restore to the previous state of the row at a specific line level.

Method AcceptChanges can commit all the changes that are in progress. It enables the data set to accept the current value as the new original value. Therefore, all pending changes are cleared. As with RejectChanges, you can also call AcceptChanges for an entire dataset, a table, or a row.

When you start a batch update operation, only the rows that are marked Added, Deleted, and Modified are considered for submission. If you call AcceptChanges just before the batch update, no persistent changes are made to the data source.

On the other hand, once the batch update operation completes successfully, you must call AcceptChanges to clear the pending changes and mark the current dataset value as the original value. Note that if you omit the last call to AcceptChanges, the dataset retains the pending changes, causing the changes to be made again the next time the batch update is made.

Get changes in the DataSet
Dschanges = ds. GetChanges ();
Performs the batch update for the given table
Da. Update (Dschanges, strtable);
Clears any pending change in memory
Ds. AcceptChanges ();

The code above illustrates the three main steps behind the ado.net batch update.

If you delete rows from the dataset table, be aware that the method you are using is delete or remove. The Delete method performs a tombstone by marking the row as "delete". The Remove method physically deletes the row from the Rows collection. Therefore, rows deleted by "Remove" are not marked for deletion and therefore will not be processed during subsequent batch updates. If your final delete goal is to delete rows from the data source, you should use delete.

Back to the top of the page
Advanced content of the update
There are three actions that can change the state of a table:

• Insert a new row

• Delete an existing row

• Update an existing row

For each of these key operations, the data adapter defines a custom command object that is exposed as a property. Such properties include InsertCommand, DeleteCommand, and UpdateCommand. Programmers are responsible for assigning meaningful command objects to these attributes, such as SqlCommand objects.

The InsertCommand, DeleteCommand, and UpdateCommand attributes provided represent a great breakthrough from ADO to Ado.net. With this property, you can have unprecedented control over the way in-memory updates are committed to the database server. If you are not satisfied with the update code generated by Ado.net, you can now modify these update codes without denying the overall nature of the batch update. When you use ADO, you have no control over the SQL commands generated by the library silently. In Ado.net, you can apply updates using custom stored procedures or SQL statements that are more consistent with the user's expectations, taking advantage of publicly displayed command objects. In particular, you can use a batch update system for a cross-reference table and even target non-SQL data providers such as Active directory™ or indexing Services.

The update command should be run for each changed row in the table and must be very generic to accommodate different values. For this task, it is ideal to use command parameters as long as you can bind them to the values of the database columns. The Ado.net Parameter object exposes two properties for this binding, such as SourceColumn and SourceVersion. Especially SourceColumn, which represents an indirect way of indicating the value of a parameter. Instead of using the Value property and setting it with a scalar value, you can set the SourceColumn property by using the column name and make the batch update mechanism extract valid values from time to time.

SourceVersion indicates which value on the column should be read. By default, Ado.net returns the current value of the row. Alternatively, you can select the original value and all the values in the DataRowVersion enumeration.

If you want batch updates for several columns in the Employees table of Northwind, you can use the following custom command. The INSERT command is defined as follows:

StringBuilder sb = new StringBuilder ("");
Sb. Append ("INSERT Employees (FirstName, LastName) VALUES (");
Sb. Append ("@sFirstName, @sLastName)");
Da. InsertCommand = new SqlCommand ();
Da.InsertCommand.CommandText = sb. ToString ();
Da. Insertcommand.connection = conn;

All parameters bind the Parameters set added to the data adapter to a datasheet column.

SqlParameter p1 = new SqlParameter ("@sFirstName", SqlDbType.NVarChar, 10);
P1. SourceVersion = DataRowVersion.Current;
P1. SourceColumn = "FirstName";
Da. InsertCommand.Parameters.Add (p1);
SqlParameter P2 = new SqlParameter ("@sLastName", SqlDbType.NVarChar, 30);
P2. SourceVersion = DataRowVersion.Current;
P2. SourceColumn = "LastName";
Da. INSERTCOMMAND.PARAMETERS.ADD (p2);

Note that automatically incrementing columns should not be listed in the syntax of the INSERT command because their values are generated by the data source.

The UPDATE command needs to determine a specific row to apply its changes. To do this, you can use the WHERE clause to compare parameterized values with key fields in the clause. In this case, the arguments used in the WHERE clause must be bound to the original value of the row, not the current value.

StringBuilder sb = new StringBuilder ("");
Sb. Append ("UPDATE Employees SET");
Sb. Append ("Lastname= @sLastName, Firstname= @sFirstName");
Sb. Append ("WHERE employeeid= @nEmpID");
Da. UpdateCommand = new SqlCommand ();
Da.UpdateCommand.CommandText = sb. ToString ();
Da. Updatecommand.connection = conn;
P1 and P2 set as before
:
P3 = new SqlParameter ("@nEmpID", SqlDbType.Int);
P3. SourceVersion = datarowversion.original;
P3. SourceColumn = "EmployeeID";
Da. UPDATECOMMAND.PARAMETERS.ADD (p3);

Finally, the Delete command needs to use a WHERE clause to determine which row to delete. In this case, you need to use the original version of the row to bind the parameter values.

StringBuilder sb = new StringBuilder ("");
Sb. Append ("DELETE from Employees");
Sb. Append ("WHERE employeeid= @nEmpID");
Da. DeleteCommand = new SqlCommand ();
Da.DeleteCommand.CommandText = sb. ToString ();
Da. Deletecommand.connection = conn;
P1 = new SqlParameter ("@nEmpID", SqlDbType.Int);
P1. SourceVersion = datarowversion.original;
P1. SourceColumn = "EmployeeID";
Da. DeleteCommand.Parameters.Add (p1);

The actual structure of the SQL command depends on you. These commands are not necessarily normal SQL statements, they can be more efficient stored procedures (if you want to take this direction). If there is a very specific risk that other people may update the rows you read and modify, you may want to take some more effective precautions. If this is the case, you can use a more restrictive where clause in the DELETE and UPDATE commands. The WHERE clause can explicitly determine a row, but should also ensure that all columns retain their original values.

UPDATE Employees
SET field1= @new_field1, field2= @new_field2,??? .., fieldn= @new_fieldn
WHERE field1= @old_field1 and
Field2= @old_field2 and
:
fieldn= @old_fieldn

Note that you do not need to populate all command parameters, only those that you plan to use. If the code wants to use a command that has not been specified, an exception is thrown. Setting up a command for a batch update process may require a lot of code, but you do not need to write a lot of code every time you make a batch update. In fact, in quite a few cases, ado.net can automatically generate a valid update command for you.

Back to the top of the page
Command Builder
To take advantage of the default commands, you must meet two requirements. First, you must assign a valid command object to the SelectCommand property. You do not need to populate other command objects, but SelectCommand must point to a valid query statement. A valid query for batch updates is a query that returns a primary key column. In addition, the query must not include INNER joins, computed columns, or references to multiple tables.

The columns and tables listed in the SelectCommand object will actually be used to prepare the body of the update and INSERT statements. If you do not set SelectCommand, you cannot implement the Ado.net command generation automatically. The following code shows how to write code for the SelectCommand property.

SqlCommand cmd = new SqlCommand ();
Cmd.commandtext = "SELECT EmployeeID, FirstName, LastName from Employees";
Cmd. Connection = conn;
Da. SelectCommand = cmd;

Don't worry that SelectCommand may have an impact on performance. The related statement executes only once before the batch update process, but it retrieves only the column metadata. No matter how you write an SQL statement, you will never return any rows to the caller. This occurs because, at execution time, SelectCommand appends the SQL batch statement that begins with the following code to the last

SET fmtonly off
SET no_browsetable on
SET fmtonly on

Therefore, the query does not return rows and returns column metadata information.

The second requirement that your code must meet is related to the command builder. A command builder is a class that is specific to a managed provider, works on top of a data adapter object, and automatically sets its InsertCommand, DeleteCommand, and UpdateCommand properties. The command builder first runs SelectCommand to gather enough information about the tables and columns involved, and then creates an update command. The actual command creation is done in the command builder class constructor.

SqlCommandBuilder cb = new SqlCommandBuilder (DA);

The SqlCommandBuilder class ensures that the specified data adapter can be used successfully for batch updates to a particular data source. SqlCommandBuilder leverages some of the properties defined in the SelectCommand object. These properties are Connection, CommandTimeout, and Transaction. Whenever you change any of these properties, you need to invoke the RefreshSchema method of the command builder to change the structure of the build command for further batch updates.

You can mix command builder with custom commands. If the InsertCommand property points to a valid command object before calling the command builder, the generator generates code only for DeleteCommand and UpdateCommand. Instead of the empty SelectCommand property is the key to the command generator to work correctly.

Typically, you use the command builder because you think it's too complicated to write your own SQL commands. However, if you want to view the source code generated by the generator, you can call methods such as GetInsertCommand, GetUpdateCommand, and GetDeleteCommand.

The command Builder is a provider-specific attribute. Therefore, it is not possible to expect all types of managed providers to support it. SQL Server 7.0 and later providers and OLE DB providers Support Command Builder.

The command builder has a good feature that detects fields that are automatically incremented and optimizes the code accordingly. In particular, as long as it has the means to recognize that some fields are automatically incremented, the auto increment field is extracted from the INSERT statement. This process can be implemented in two ways. For example, you can manually set the AutoIncrement property of the appropriate DataColumn object, or, better yet, make it automatically based on the properties of the column in the data source, such as SQL Server. To automatically inherit such a property, be sure to change the MissingSchemaAction property of the data adapter from the default value Add to AddWithKey.

Back to the top of the page
Conflict detection
The batch update mechanism has a very optimistic view of concurrency. Each record is not locked after reading and is still exposed to other users for reading and writing. In this case, there may be some potentially inconsistent scenarios. For example, after a row is passed from a SELECT statement to your application, it may have been modified or even deleted before the batch update process actually returned the changes to the server.

If you update the data on the server while the data has been modified by another user, a data conflict may occur. To prevent new data from being overwritten, the ado.net command Builder generates statements with a WHERE clause, which takes effect only if the current state of the data source row is consistent with the state that the application read previously. If such a command fails to update the row, the Ado.net runtime throws a DBConcurrencyException type exception.

The following code fragment shows how to perform batch update operations with Ado.net in a more accurate way.

Try
{
Da. Update (Dschanges, "Employees");
}
catch (DBConcurrencyException Dbdcex)
{
Resolve the conflict
}

The Update method of the data adapter that you are using throws an exception for the row where the first update fails. At this point, control returns to the client application, and the batch update process stops. However, all previously committed changes will still be performed. This process represents another transformation from the ADO batch update model to the ado.net.

The Row property of the DBConcurrencyException class allows you to use the DataRow object that is involved in the conflict update. This DataRow object contains the row's commit and original values. It does not contain a value that is currently stored in the database for a given column. This value-that is, the UnderlyingValue property of ADO-can only be retrieved through another query command.

The way in which conflicts are resolved and it is possible to continue batch updates is strictly application-specific. If there is a situation where your application needs to continue to perform updates, you should be aware of a subtle, yet intractable, problem. Once you have resolved the conflicts on the line, you must also come up with a way to accept the changes in the rows of memory that have been successfully completed by the batch process. If you ignore this technical detail, a new conflict is created for the first row that was previously successfully updated! This happens repeatedly, and your application will soon enter a deadlock state.

Back to the top of the page
Summary
Batch-processing updates in Ado.net are more powerful and more accessible than ADO. In ADO, the batch update mechanism is a black box, and it's almost impossible for us to go deep inside it and make a slight change to the task you need to perform. The batch update in Ado.net is more biased towards a low-level solution, and its implementation provides several pointcuts for you to get inside and control events. The trickiest part of the Ado.net batch update is conflict resolution. The authors sincerely suggest that you spend as much time as possible on testing and testing. This investment can be rewarded with all the time saved by the command generator.

Back to the top of the page
Dialog bar: Null values in a datasheet
I extracted the dataset from the database and everything went well. Then I tried to save this dataset to an XML file, and I still went well. However, when the XML file is read back to the dataset, the problem arises. This is because all columns with NULL values cannot be persisted to the XML. Can you use some method to add null values to the resulting XML as empty tags?

This behavior is designed by design and is introduced with the best intention of saving several bytes in the process of XML serialization. If this behavior occurs on a network (for example, within an XML Web service), the advantages it brings are obvious.

In other words, you can solve your problem in a very simple way. The trick is to extract the columns from the ISNULL T-SQL function. We do not use the following code:

SELECT MyColumn from MyTable

Instead, you should use:

SELECT ISNULL (MyColumn, ') from MyTable

In this case, any null value of the column will automatically become an empty string and will not be ignored during serialization of the DataSet into XML. Non-specific values are not necessarily empty strings. Numeric columns can use 0 or any other logical null value that you want to use.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Profiling ado.net Batch update (in-depth research data access)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Profiling ado.net Batch update (in-depth research data access)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support