Improve the performance of the data access layer (2)

Last Update:2018-12-03 Source: Internet

Author: User

Tags rowcount

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

3. Select the performance optimization function

3.1. Use parameter tags as stored procedure parameters

When calling a stored procedure, use the parameter tag as the parameter and try not to use characters as the parameter. When a JDBC driver calls a stored procedure, it either executes the process like other SQL queries, or uses RPC to optimize the execution process. If a stored procedure is executed like an SQL query, the database server first parses the statement, verifies the parameter type, and converts the parameter to the correct data type. Obviously, this call method is not the most efficient.
SQL statements are always sent to the database server as a string. For example,
& Quot; {call getcustname (12345)} & quot )}".
In this case, even if the programmer assumes that the unique parameter getcustname is an integer, the parameter is actually transmitted to the database as a string. The Database Server parses this statement, separates a single parameter value of 12345, and converts the string "12345" to an integer value before executing the process as an SQL language.
By calling the stored procedure on the database server through RPC, you can avoid the overhead caused by using SQL strings.
Scenario 1
In this example, you cannot use the server-side RPC to optimize the calling of the stored procedure. The call process includes parsing statements, verifying the parameter type, and converting these parameters to the correct type before execution.
Callablestatement cstmt = conn. preparecall ("{call getcustname (12345)}"); resultset rs = cstmt.exe cutequery ();
Scenario 2
In this example, you can use the server-side RPC to optimize the calling stored procedure. Because the application avoids the overhead of passing text parameters, and JDBC can directly call the stored procedure in the database in RPC mode to optimize the execution, the execution time is greatly shortened.
Callablestatement cstmt =
Conn. preparecall ("{call getcustname (?)} "); Cstmt. setlong (1,12345 );
Resultset rs = cstmt.exe cutequery ();
JDBC optimizes performance based on different purposes. Therefore, we need to select between the preparedstatement object and the statement object based on the purpose. If a separate SQL statement is executed, select the statement object. If the statement is executed twice or more times, select the preparedstatement object.
Sometimes, we can use statement pools to improve performance. When a statement pool is used, if a query is executed once and may no longer be executed, the statement object is used. If the query is rarely executed, but it may be executed again within the lifecycle of the statement pool, use preparedsatement. In the same case, if no statement pool exists, the statement object is used.

3.2. Use batch processing instead of preparedstatement statements

Updating a large amount of data usually requires preparing an insert statement and executing the statement multiple times, resulting in a large number of network round trips. To reduce the number of JDBC calls and improve performance, you can use the addbatch () method of the preparedstatement object to send multiple queries to the database at a time. For example, let's compare the example of the lower side of the limit, for example, Case 1 and Case 2.
Scenario 1: Execute the preparedstatement statement multiple times
Preparedstatement PS = conn. preparestatement ("insert into employees values (?, ?, ?) ");
For (n = 0; n <100; n ++ ){
PS. setstring (name [N]);
PS. setlong (ID [N]);
PS. setint (salary [N]);
Ps.exe cuteupdate ();
}
Case 2: Batch Processing
Preparedstatement PS =
Conn. preparestatement ("insert into employees values (?, ?, ?) ");
For (n = 0; n <100; n ++ ){
PS. setstring (name [N]);
PS. setlong (ID [N]);
PS. setint (salary [N]);
PS. addbatch ();
}
Ps.exe cutebatch ();
In Case 1, A preparedstatement is used to execute an insert statement multiple times. In this case, 100 network round trips are required for 101 inserts, one of which is used to prepare the statement, and the other 100 network round trips are used to execute each operation. When the addbatch () method is used, as described in Case 2, only two network round trips are required, one preparation statement, and the other executes batch processing. Although batchcompute requires more database CPU computing overhead, the performance can be obtained from reduced network round-trips. Remember to make the JDBC driver have a good performance, it is necessary to reduce the network communication between the JDBC driver and the database server.

3.3. Select the appropriate cursor

Selecting the appropriate cursor can improve the flexibility of the application. This section summarizes the performance of three types of cursors. Forward cursors provide excellent performance for all rows in a continuous read table. In terms of table data retrieval, no data retrieval method is faster than forward cursor. However, when an application must process a non-consecutive row, it cannot be used.
For applications that require high-level database concurrency control and the ability to scroll forward and backward of result sets, the imperceptible cursor used by the JDBC driver is the most ideal choice. The first request for a imperceptible cursor is to obtain all rows (or read some rows when JDBC uses the "lazy" method) and store them on the client. The first request will be very slow, especially when long data is retrieved. Subsequent requests no longer require network traffic (or when the driver adopts the laziness mode, there is only limited network traffic) and can be handled quickly. Because the processing of the first request is slow, the imperceptible cursor should not be used for a single request with one row of data. To return long data, the memory is easily exhausted. Therefore, developers should avoid using imperceptible cursors. Some imperceptible cursors cache data in zero-time tables in the database, avoiding performance problems. However, most of the information is cached locally.
Imperceptible cursor, sometimes called a keyset-driven cursor, uses an identifier, such as the rowid that already exists in your database. When you scroll through the result set, data suitable for identifiers will be retrieved. Because each request generates network traffic, the performance will be very poor. However, returning non-continuous rows does not affect performance.
For further explanation, we can see an application that normally returns 1000 rows of data. When the execution or the first line is requested, JDBC does not execute the SELECT statement provided by the application. Instead, the JDBC driver replaces the query select list with a key identifier, for example, rowid. The modified query will be executed by the driver, and all 1000 key values will be retrieved from the database and cached by the driver. Every request from the application to the result line will be forwarded to the JDBC driver. to return a suitable row, JDBC queries the key value in its local cache and constructs a request similar to "where rowid = ?" The statement that contains the where optimization statement executes this modified query and then retrieves a single result row from the server.
When an application uses insensitive cursor data from the cache, the sensitive cursor is the preferred cursor mode in dynamic situations.

3.4. Use the get method effectively

JDBC provides many methods to retrieve data from the result set, such as getint (), getstring (), and GetObject (). The GetObject () method is the most common method, but it provides the worst performance when it does not describe non-default ing. This is because the JDBC driver must perform additional processing to determine the type of the retrieved value and generate a suitable ing. Therefore, we always use methods that can clearly define the data type.
To improve performance, provide the column numbers in the retrieved column, such as getstring (1), getlong (2), and getint (3), instead of column names. If the column number does not indicate that the network traffic is not affected, but the cost of conversion and search increases. For example, suppose you use getstring ("foo ")... The driver may have to convert the column identifier Foo to uppercase (if necessary) and compare it with all column names in the column list. If column numbers are provided, most of the processing is saved.
For example, if you have a result set of 15 columns and 100 rows, the column name is not included in the result set. You are interested in three columns: emploeement (string), employeenumber (long integer), and salary (integer ). If you specify getstring ("employeename"), getlong ("employeenumber"), and getint ("salary"), the column names in each column must be converted into the matching case in the database metadata, no doubt the query will increase accordingly. If you describe getstring (1), getlong (2), and getint (15), the performance will be greatly improved.

3.5. Search automatically generated keys

Many databases have hidden columns (also known as pseudo columns) that describe the unique keys in each row of the table ). Generally, because the pseudo column describes the physical disk address of the data, using this type of column access row in the query is the fastest way. Before jdbc3.0, the application can only immediately execute the SELECT statement after inserting data to retrieve the values of pseudo columns.
For example:
// Insert rowint
Rowcount = stmt.exe cuteupdate ("insert into localgeniuslist (name) values ('karen ')");
// Now get the disk address-rowid-for the newly inserted row
Resultset rs = stmt.exe cutequery ("select rowid from localgeniuslist where name = 'karen '");
This method has two main disadvantages. First, a separate query statement needs to be sent to the server for execution through the network to retrieve pseudo columns. Second, because the table may not have a primary key, the query conditions may not be able to uniquely identify rows. In the following case, multiple pseudo column values are returned, and the application may not be sure which is the most recently inserted row.
An optional feature of the JDBC specification is that when a row is inserted into a table, the automatically generated key information of the row can be retrieved.
For example:
Int rowcount = stmt.exe cuteupdate ("insert into localgeniuslist (name) values ('karen ')",
// Insert row and return
Keystatement. return_generated_keys );
Resultset rs = stmt. getgeneratedkeys ();
// Key is automatically available
Even if the table does not have a primary key, this provides the application with the fastest way to uniquely determine the row value. When accessing data, the ability to retrieve pseudo-column keys provides JDBC developers with flexibility and creates performance.

4. Manage connections and data updates

4.1. Manage connections

Connection Management directly affects application performance. Optimize your application by creating multiple statement objects in one connection, instead of executing multiple connections. Avoid connecting to the data source after the initial connection is established.
A bad coding habit is to connect and disconnect SQL statements several times. A connection object can have multiple statement objects associated with it. Because the statement object is a memory storage that defines SQL statement information, it can manage multiple SQL statements. In addition, you can use a connection pool to significantly improve performance, especially for applications connected over a network or through www. The connection pool allows you to reuse the connection. Closing the connection is not to close the physical connection with the database, but to put the used connection in the connection pool. When an application requests a connection, an active connection is retrieved from the connection pool for reuse. This avoids the network I/O generated by creating a new connection.

4.2. Manage the commit in the transaction

Because of disk I/O and potential network I/O, it is often slow to submit transactions. Wsconnection. setautocommit (false) is often used to disable automatic submission settings.
What does the submission actually include? The database server must refresh every data page on the disk that contains updates and new data. This is usually a process of writing log files continuously, but it is also a disk I/O. By default, automatic submission is enabled when you connect to the data source. because a large number of disk I/O is required to submit each operation, the automatic submission mode usually degrades the performance. In addition, most databases do not provide the local automatic submission mode. For this type of server, the JDBC driver must explicitly send a commit statement and a begin transaction to each operation.
Although transactions are helpful for application performance, do not over-use them. To prevent other users from accessing this row, long-held locks on the row reduce throughput. Committing a transaction in a short time can maximize the concurrency.

4.3. Select the correct transaction mode

Many systems support distributed transactions. That is to say, transactions can span multiple connections. Distributed transactions are four times slower than normal transactions because they record logs and network I/O between all components (JDBC drivers, transaction monitors, and database systems) in distributed transactions. Avoid using distributed transactions unless necessary. If possible, use local transactions. It should be noted that many Java application servers provide a default transaction behavior that utilizes distributed transactions. For the best system performance, design the application to run under a single connection object unless necessary to avoid distributed transactions.

4.4. Use the updatexxx Method

Although programming updates are not applicable to all types of applications, developers should try to update and delete programming, that is, update data using the updatexxx () method of the resultset object. This method allows developers to update data without building complex SQL statements. To update the database, you must call the updaterow () method before moving the cursor on rows in the result set.
In the code snippet below, the value of the age column of the result set object RS is retrieved using the getint () method, and the updateint () method is used to update the column with the integer value 25. The updaterow () method is used to update the rows with modified values in the database.
Int n = Rs. getint ("Age ");
// N contains value of age column in The resultset RS...
Rs. updateint ("Age", 25 );
Rs. updaterow ();
In addition to making the application easier to maintain, programming updates usually produce better performance. Because the pointer has been located on the updated row, the overhead caused by row location is avoided.

4.5. Use getbestrowidentifier ()

Use getbestrowidentifier () (see the databasemetadata Interface Description) to determine the optimal column set used in the WHERE clause of the update statement. Pseudo columns often provide the fastest data access, and these columns can only be determined by using the getbestrowidentifier () method.
Some applications cannot be designed to be updated or deleted using locations. Some applications may use queryable result columns, such as calling getprimarykeys () or getindexinfo () to locate columns that may be unique indexes, so that the WHERE clause can be simplified. These methods can usually work, but may produce very complex queries. Take a look at the example below:
Resultset wsrs = wss.exe cutequery ("select first_name, last_name, SSN, address, city, state, zip from EMP ");
// Fetch data...
Wss.exe cuteupdate ("Update EMP set address =? Where first_name =? And last_name =? And SSN =? And address =? And City =? And state =? And zip =? ");
// Fairly complex Query
The application should call getbestrowidentifier () to retrieve the columns (which may be pseudo columns) with clear records from the optimal set ). Many databases support special columns that are not explicitly defined in the table, but are "hidden" columns (such as rowid and TID) in each table ). Since they are pointers to the exact record location, these pseudo columns usually provide the fastest data access. Because pseudo columns are not defined in the table, they are not returned from getcolumns. To determine whether a pseudo Column exists, call the getbestrowindentifier () method.
Let's look at the example above:
...
Resultset wsrowid = getbestrowidentifier (... "EMP ",...);
...
Wss.exe cuteupdate ("Update EMP set address =? Where rowid =? ";
// Fastest access to the data!
If your data source does not contain special pseudo columns, the result set of getbestrowidentifier () is composed of unique indexes on the specified table (if a unique index exists ). Therefore, you do not need to call getindexinfo to find the smallest unique index.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More