From DB2 paged, about JDBC ResultSet processing Big Data volume

Source: Internet
Author: User
Tags db2

Recently in the processing DB2, the query, found the following problems. If a query is count (*), there are hundreds of thousands of rows, how pagination is implemented

Select Row_number () over (order by FID desc ) as Row_number, Other_field

From LoadData

If the result of this query returns hundreds of thousands of rows, how to page:

1 ORDER BY FID desc in FID This field must be indexed and, when indexed, be consistent according to the sort method in SQL

2 How to page out

Method 1 Select * FROM (

Select Row_number () over (order by FID desc ) as Row_number, Other_field

From LoadData

) A

Where A. Row_number > 500

Fetch first Rows only

Explain:

500: This value is calculated based on the current number of pages and the quantity per page.

The key is to navigate to the start line

Fetch first is only, assuming 10 data per page, then fetch is a very important way to improve performance.

Method 2 If you use the following SQL

SELECT * FROM (

Select Row_number () over (order by FID desc ) as Row_number, Other_field

From LoadData

) A

Where A. Row_number > Row_number <510

---fetch first rows only

As the number of tables increases, and the number of page turns increases, the speed of this query is significantly reduced.

Method 1 spends more than 100 milliseconds on paging millions of data.

and Method 2 when the total number of pages in more than 200,000 rows, query row_number between 5000 to 5010 records will be up to 1 minutes or more

--------------------------------------------------------------------------------------------------------------- -----------------------------------

If you use JDBC to query this millions data, how fast. Batch queries. This reminds me of the principle of resultset. Different drivers are different.

will client memory be taken for granted?

So I experimented with JDBC. There is no significant increase in the consumption of client memory found.

In fact, with JDBC resultset querying so much data, there is a connection timeout problem

------------------------------------------------------------------find the following information------------------------------------- -------------------------------------

Problem Description: In the usual three-tier framework, the client queries the database through browser requests the Web server, and the query results are thousands or even millions records that require the query results to be delivered to the client browser and paginated.

Factors to consider:

1. Resource consumption of the Web server, including: memory (used to store query results), database related resources (database connection object, ResultSet object, etc.);

2. The consumption of DB server resources, including cursors, sessions, and so on.

3. Network overhead, including establishing a session with the database, transferring the query results, and so on.

Several important classes in JDBC:

A ResultSet object maintains a cursor pointing to it current row of data. Initially the cursor is positioned before the first row. The next method moves the cursor to the next row, and because it returns false when there is no more rows in the Resultse T object, it can be used in a and loop to iterate through the result set.

ResultSet is to create a cursor directly on the database, and then to get a record of the specified row position by locating the interface resultset the row position. When a user obtains the content of a specific record through a Get method, resultset reads the required data from the database to the client.

Oracle's resultset implementations appear to cache user-read data locally, causing memory consumption to increase as the read data increases, so that if you query and read large amounts of data at once, even if you discard them immediately after reading the data (such as writing directly to the file), memory consumption increases as the query results increase.

The RowSet interface extends the standard Java.sql.ResultSet interface. A RowSet object may make a connection with a data source and maintain this connection throughout its life cycle, in which Case it is called a connected rowset. A rowset may also make a connection with a data source, get data from it, and then close the connection. Such a rowset is called a disconnected rowset. A disconnected rowset may do changes to its data while it's disconnected and then send the changes back to the original Source of the data, but it must reestablish a connection.

Rowset is the interface provided in JDBC2.0, and Oracle has a corresponding implementation of that interface, which is useful for oracle.jdbc.rowset.OracleCachedRowSet. Oraclecachedrowset implements all the methods in resultset, but unlike ResultSet, the data in Oraclecachedrowset is still valid after connection is closed.

Solution One: Use resultset directly to process

Reads the query result from resultset into collection, slows the presence of a httpsession or stateful bean, and pulls a page of data from the cache when it is paged. There are two main drawbacks to this approach: one is that the user may see outdated data, and the second is that if the data volume is very large, the first query traverses the result assembly for a long time, and the cached data consumes a lot of memory and the efficiency drops significantly.

An improvement to the above method is that when the user first requests the data query, it executes the SQL statement query, and the obtained ResultSet object and the connection object to be used are saved to its corresponding session object. Subsequent paging queries are positioned to obtain a record of the specified page (using Rs.last (), Rs.getrow () and the total number of records, using Rs.absolute () to navigate to the starting record on this page, through the ResultSet object that was obtained by the first execution of SQL. Finally, database access resources such as database connections and ResultSet objects are freed when the user is no longer paged or when the session is closed. One page of data is removed from the resultset each time it is paged. This approach in some databases (such as Oracle) in the JDBC implementation is almost back to cache all records and consume a lot of memory, but also very slow.

During the entire session of a use-case paging query, a user's paging query consumes a database connection object and a cursor for the result set, which consumes a large amount of access to the database and is not highly utilized.

Advantages: Reduce the multiple allocation of database connection objects, reduce the database SQL query execution.

Disadvantage: Consuming database access resources-database connection objects, and consuming resources on the database-cursors; it consumes a lot of memory;

Solution Two: Locate the rowset SQL query

SQL interface technology that uses the result set of a database product to locate the row range of the query. In the user's paged query request, each time you can get the parameters of the row range of the query request, and then use these parameters to produce a specified row range of SQL query statements, and then each request to obtain a database connection object and execute a SQL query, the results of the query returned to the user, and finally released the database access Resources said.

This method needs to execute the SQL query statement of the database every time the request is executed, and the access resources to the database are released immediately after use, and the database access resources are not consumed in vain. For specific database products (which provide the ability to locate the query result set), such as: Oracle (rowID or RowNum), DB2 (rowID or RowNum ()), PostgreSQL (Limit and OFFSET), MySQL (limit), and so on. (MS SQL Server does not provide this technology.) )

The following is an example of a query statement under Oracle:

SELECT * FROM (select row_.*, RowNum rownum_ from (...) row_ where rownum <= {pagenumber*rowsperpage}) where Rownu M_ > {(pageNumber-1) *rowsperpage}

Advantages: Access to the database resources (database connection objects, database cursors, etc.) are not wasted, and the full duplication of these resources is exploited.

Disadvantage: The Database access resource (database connection object and database cursor) is frequently obtained from the Web container for each paging query request and a connection is established, depending on the support of the specific database product.

From DB2 paged, about JDBC ResultSet processing Big Data volume

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.