How to get the best performance for DB2 enterprise applications

Source: Internet
Author: User
Tags ibm db2 knowledge base

Brief introduction

When it comes to ensuring that enterprise applications built with IBM DB2 (DB2 certified DB2 Training) ®universal Database "(DB2 UDB) and borland® tools (such as Delphi, C++builder, or Kylix) have the best performance, programmers The ability of the DB2 optimizer can be leveraged to handle even "hard-to-handle" SQL statements and give valid access paths. However, poorly coded SQL and application code can still give you performance problems, and it's easy to avoid these problems by learning a few basic guidelines. I'll show you how the DB2 optimizer works and provide guidelines for writing SQL that maximizes the efficiency of the optimizer. But even with DB2 optimization, writing valid SQL statements can still be a complex thing. This is especially tricky if programmers and developers are not familiar with relational database environments. So before we delve into the details of coding SQL to get the best performance, take some time to review the basics of SQL.

Basic knowledge

Because SQL differs from a procedural language, it provides a higher level of abstraction, so it allows programmers to focus on what data they need, rather than how to retrieve it. You do not have to use embedded data navigation directives to encode SQL. DB2 analyzes SQL and "behind the scenes" develops data navigation directives. These data navigation directives are called access paths. Having the DBMS determine the optimal access path to the data relieves the programmer of a heavy burden on the shoulders. In addition, the database can better understand the state of the data it stores, thus generating a more efficient and dynamic access path to the data. The result is that appropriately used SQL can be used for faster application development.

Another SQL feature is that it is more than just a query language. You can also use it to define the data structure, control access to it, and insert, modify, and delete data. By providing a common language, SQL simplifies communication between DBAs, system programmers, application programmers, system analysts, and end users. When all participants in a project use the same language, the collaboration they have built up can reduce overall system development time.

Historically, one of the most important features of ensuring SQL success is its ability to easily retrieve data using English-like syntax. Understanding this language is much easier than understanding the structure and program source code of a data page:

SELECT LASTNAME

From EMP

WHERE EMPNO = "000010";

Think about it: when accessing data in a file, the programmer must encode instructions to open the file, start a loop, read the record, check whether the EMPNO field equals the appropriate value, check the end of the file, go back to the beginning of the loop, and so on.

SQL is inherently flexible. It uses a free-form structure that allows users to develop SQL statements to suit their needs. The DBMS parses each SQL request before it executes to check that the syntax is correct and optimizes the request. SQL statements do not need to start from any given column, you can string them in one line, or split them into rows. For example, the following single-line SQL statement is equivalent to the three-row example I used earlier:

Select LASTNAME from EMP WHERE EMPNO = "000010";

Another flexible feature of SQL is that you can make a request in a number of different forms but functionally equivalent ways. For example, SQL can join tables or nested queries. You can always convert a nested query into an equivalent connection. You can see other examples of this flexibility in a number of functions and predicates. Examples of features that have equivalent functionality include:

Between vs =

In vs a series of predicates with or mates

INNER join vs. comma-delimited table in the FROM clause

OUTER JOIN vs Simple SELECT with UNION and related subqueries

Case expression vs Complex UNION all statement

This flexibility of SQL shows is not always desirable, because different forms but functionally equivalent SQL formulas can provide very different performance. I'll discuss the results of this flexibility later in this article, and provide guidelines for developing effective SQL.

As I said, SQL specifies what data to retrieve or manipulate, but does not specify how the database will complete these tasks. This makes the SQL itself very simple. If you can remember the characteristics of a relational database processing a collection (Set-at-a-time) at a time, you begin to grasp the nature and nature of SQL. An SQL statement can be used for multiple rows. The ability to act on a set of data without having to establish how to retrieve and manipulate the data defines SQL as a non-procedural language

Because SQL is a non-procedural language, a single statement can replace a sequence of procedures. Similarly, this is possible because SQL uses collection-level processing and DB2 optimization queries to determine data navigation logic. Sometimes, if you do not use SQL statements, a task that can be completed by one or two SQL statements requires a complete procedural procedure to complete.

Optimizer

Optimizer is the heart and soul of DB2. It parses the SQL statement and determines the most efficient access path that can satisfy each statement (see Figure 1). DB2 UDB accomplishes this by parsing SQL statements to determine which tables and columns must be accessed. The DB2 optimizer then queries the system information and statistics stored in the DB2 system directory to determine the best way to complete the tasks necessary to fulfill the SQL request.

Figure 1. DB2 Optimization in operation

The optimizer is functionally equivalent to an expert system. An expert system is a set of standard rules that, when combined with contextual data, returns an "expert" opinion. For example, the medical expert system uses a set of rules to determine which drugs can be used for which diseases, to combine the rule set with data that describes the symptoms of the disease, and to apply the knowledge base to the list of input symptoms. The DB2 optimizer generates expert advice on data retrieval methods based on contextual data stored in the DB2 system directory and query input in SQL format.

The concept of optimizing data access in the DBMS is one of the most powerful capabilities of DB2. Keep in mind that when you access DB2 data, you should tell DB2 what to retrieve, not how to retrieve it. Regardless of how the data is actually stored and manipulated, DB2 and SQL can access the data. The separation of access standards from physical storage features is called physical data independence. The DB2 optimizer is the component that completes the physical data independence.

If you do not index, DB2 still has access to the data (albeit less efficient). If you add a column to the table that is being accessed, DB2 can still manipulate the data without changing the program code. This is entirely possible because the physical access path to the DB2 data is not encoded by the programmer in the application but is generated by DB2.

This feature is very different from non-DBMS systems where programmers must know the physical structure of the data. If there is an index, the programmer must write the appropriate code to use the index. If someone deletes the index, the program will not work unless the programmer makes changes. This is not necessary with DB2 and SQL. This flexibility is entirely attributable to the ability of DB2 to automatically optimize data operation requests.

The optimizer performs complex calculations based on a number of information. To visualize how the optimizer works, you can think of the optimizer as a process that performs a four step:

Receives and validates the syntax of the SQL statement.

Analyze the environment and optimize the method that satisfies the SQL statement.

Creates a computer-readable instruction to execute an optimized SQL.

Execute instructions or store them for future execution.

The second step in this process is the most interesting. How does the optimizer decide how to execute a large number of SQL statements that you can send in its way?

The optimizer has many types of optimized SQL policies. How does it choose which of these policies to use in an optimized access path? IBM does not publish real and in-depth details about how the optimizer determines the best access path, but the optimizer is a cost-based optimizer. This means that the optimizer will always try to develop an access path that reduces the overall cost for each query. To achieve this goal, the DB2 optimizer applies a query cost formula that evaluates and balances four factors for each possible access path: CPU cost, I/O cost, statistics in the DB2 system catalog, and actual SQL statements.

Performance guidelines

Therefore, as long as you remember the information about the DB2 (DB2 certified DB2 Training) optimizer, you can implement these guidelines for better SQL performance:

1) Keep DB2 statistics up-to-date: if there are no statistics stored in the DB2 system directory, the optimizer will encounter difficulties in optimizing anything. These statistics provide the optimizer with information about the state of the table to which the SQL statement being optimized will be accessed. The types of statistics stored in the system catalog include:

Information about the table, including the total number of rows, information about compression, and total pages;

Information about the column, including the number of discrete values for the column and the distribution of the values stored in the column;

Information about the tablespace, including the number of active pages;

The current state of the index, including whether there is an index, the organization of the index (number of leaf pages and the number of levels), the number of discrete values for the index key, and whether the cluster index is clustered;

Information about table spaces and index node groups or partitions.

When the RUNSTATS or run STATISTICS utility is executed, the statistics populate the DB2 system directory. You can invoke the utility from Control center, a batch job, or by using a command line processor. Be sure to work with your DBA to ensure that statistics are accumulated at the right time, especially in production environments.

2) Build the appropriate index: perhaps the most important thing you can do to ensure the best DB2 application performance is to create the correct index for your table based on the queries that your application uses. Of course, it's easier said than done. But we can start with some basics. For example, consider the following SQL statement:

SELECT LASTNAME, SALARY

From EMP

WHERE EMPNO = ' 000010 '

and DEPTNO = ' D01 '

What index will be useful for this simple query? First, consider all the possible indexes that you can create. Your first short list might look like this:

The Index1 on the EMPNO

The Index2 on the DEPTNO

The Index3 on EMPNO and DEPTNO.

This is a good start, Index3 may be the best. It lets DB2 use an index to immediately find rows that satisfy the two simple predicates in the WHERE clause. Of course, if you already have many indexes on the EMP table, you might want to examine the effect of creating another index on the table. The factors to consider include:

Modify Impact: DB2 will automatically maintain each index that you create. This means that each insert and delete for the table will be inserted and deleted not only in the table, but also in its index. If you update the values of the columns in the index, you also updated the index. The index therefore speeds up the retrieval process, but slows down the modification.

Columns in an existing index: If you already have an index on EMPNO or DEPTNO, it might not be wise to create another index on that combination. However, changing another index to add missing columns may work. Not necessarily, however, because the order of the columns in the index may vary greatly depending on the query. For example, consider the following query:

SELECT LASTNAME, SALARY

From EMP

WHERE EMPNO = ' 000010 '

and DEPTNO > ' D01 ';

In this case, the EMPNO should be listed first in the index. The DEPTNO is then listed, allowing DB2 to perform a direct index lookup on the first column (EMPNO) and then scan the second column (DEPTNO) for the greater than sign.

Also, if there are already indexes on these two columns (one about EMPNO, one on DEPTNO), DB2 can use them to satisfy the query, so creating another index might not be necessary.

The importance of this particular query: the more important the query, the more likely you should be to tune by creating an index. If you are coding a query that the CIO will run on a daily basis, you should ensure that it provides the best performance. Therefore, it is important to build the index for that particular query. Conversely, staff queries may not necessarily look so heavy, so you might want to use an existing index to execute the query. Of course, the decision depends on the importance of the application to the business-not just the user.

The index design involves much more than what I have discussed so far. For example, you might want to consider index overloading to implement index-only access (Index-only access). If all the data that the SQL query is looking for is contained in the index, then DB2 may only use the index to satisfy the request. Consider our previous SQL statement. Given the information about EMPNO and DEPTNO, we are looking for LASTNAME and SALARY. We also start by creating indexes on EMPNO and DEPTNO columns. If we also include LASTNAME and SALARY in the index, we no longer need to access the EMP table because all the data we need is already in the index. This technology can greatly improve performance because it reduces the number of I/O requests.

Keep in mind that it is not prudent or even impossible to make each query an index-only access. You should use this technique sparingly for particularly tricky or important SQL statements.

SQL Coding Guidelines

When you write an SQL statement that accesses DB2 data, be sure to follow the following three coding SQL guidelines for best performance. Of course, SQL performance is a complex topic, and knowing every nuance of how SQL is performed can take a lifetime. However, these simple rules allow you to get into the right track for developing high-performance DB2 applications.

The first rule is to always provide only those columns that you really need to retrieve in the select list of each SQL SELECT statement. Another saying is "Don't use SELECT *". A short SELECT * indicates that you want to retrieve all the columns in the table being accessed. This applies to "fast but inappropriate" (quick and dirty) queries, but is an application's bad practice because:

The DB2 table may need to be changed in the future to include additional columns. SELECT * Also retrieves those new columns, and if no time-consuming changes are made, your program may not be able to process the additional data.

DB2 will consume additional resources for each column returned by the request. If the program does not need data, it will not look for it. Even if your program requires each column, it's a good idea to look for each column explicitly based on the name in the SQL statement to increase clarity and avoid previous mistakes.

Don't look for what you already know. This may sound obvious, but most programmers have violated this rule. To give a typical example, consider what the following SQL statement is wrong:

SELECT EMPNO, LASTNAME, SALARY

From EMP

WHERE EMPNO = ' 000010 ';

Give It up? The problem is that EMPNO is already included in the SELECT list. You already know that EMPNO will be equal to the value "000010", because that is what the WHERE clause is DB2 to do. However, EMPNO,DB2 is also responsible for retrieving the column as it is listed in the WHERE clause. This results in additional overhead, which reduces performance.

Use the WHERE clause in SQL to filter the data instead of using it everywhere in the program. This is also a mistake that beginners make easily. It is best to filter the data by DB2 before DB2 returns the data to the program. This is because DB2 uses additional I/O and CPU resources to fetch each row of data. The fewer rows passed to the program, the higher the SQL efficiency:

SELECT EMPNO, LASTNAME, SALARY

From EMP

WHERE SALARY > 50000.00;

This SQL is preferable to reading all data without using the WHERE clause, and then checking whether SALARY is greater than 50000.00 in the program.

Use parameterized queries. Parameterized SQL statements contain variables, also known as parameters (or parameter markers). Typical parameterized queries use these parameters instead of literal values, so WHERE clause conditions can be changed at run time. Usually the program is designed so that the end user can provide the value of the parameter before running the query. This allows a query to return different results based on the different values supplied to the parameter.

The main performance benefit of parameterized queries is that the optimizer can develop access paths that can be reused when executing statements repeatedly. This can add a lot of performance gains to the program compared to issuing a completely new SQL statement each time a new value is required in the WHERE clause.

However, these rules are not the ultimate and highest goal of SQL performance tuning-Never. You may need additional, in-depth tuning. But following the previous rules will ensure that you do not make "novice" errors that degrade application performance.

Specific database application Development tips

Whether you're using Delphi, C++builder, or Kylix, some tips and guidelines will help you make sure you get good performance when accessing DB2 data. For example, in some cases, using Dbexpress TM instead of ODBC/JDBC or ADO can improve query performance. Dbexpress is a cross-platform interface for processing dynamic SQL from Delphi (or Borland Kylix ").

Be sure to issue a COMMIT statement frequently in your application. The COMMIT statement controls the unit of work. Issuing a commit will "forever" log all work from the last commit statement to the database. You can use the ROLLBACK statement to roll back work before you issue a COMMIT. When data is modified (using INSERT, UPDATE, and DELETE) but no COMMIT is issued, DB2 will add a lock on the data and hold the lock-this lock will cause other applications to time out while waiting to retrieve the locked data. By issuing a COMMIT statement when the work is completed and ensuring that the data is correct, the data is freed for use by other applications.

Also, consider usage when building your application. For example, when a particular query returns thousands of rows to the end user, it should be handled with caution. For online interaction between the program and the end user, more than hundreds of rows of data are rarely used. You can use the FETCH first NROWS only clause on an SQL statement to limit the amount of data returned to the query. For example, consider the following query:

SELECT EMPNO, LASTNAME, SALARY

From EMP

WHERE SALARY > 10000.00

FETCH first + ROWS only;

The query will return only 200 rows. It doesn't matter if more than 200 lines are eligible; If you try to FETCH more than 200 rows from the query, DB2 will use +100 SQLCODE to indicate the end of the data. This method is useful when you want to limit the amount of data that is returned to the program.

DB2 supports another clause named OPTIMIZE for NROWS, which does not limit the number of rows to be returned to a cursor, but may be helpful from a performance standpoint. Use the OPTIMIZE for NROWS clause to tell DB2 how to handle SQL statements. For example:

SELECT EMPNO, LASTNAME, SALARY

From EMP

WHERE SALARY > 10000.00

OPTIMIZE for the ROWS;

This tells DB2 to try to save the first 20 lines as soon as possible. This is useful if your Delphi application displays 20 rows at a time when data rows retrieved from the database are displayed.

For read-only cursors, use the FOR Read only clause to ensure that the cursor is unambiguous. Delphi cannot perform location updates in a DB2 cursor, so attaching a for-read only to each SELECT statement can make the cursor an unambiguous read-only cursor, which is useful for DB2. For example:

SELECT EMPNO, LASTNAME, SALARY

From EMP

WHERE SALARY > 10000.00

for READ only;

Conclusion

Understanding the fundamentals of SQL coding for optimal performance will increase the performance of your Delphi enterprise applications immediately. But I've only uncovered one tip of the iceberg. You need to learn more about the complex types of SQL, including joins, sub-selection, and syndication. You also need to learn how to best write these SQL statements and how to discover the access paths selected by DB2 to meet your SQL requests. Indeed, there is much to learn. But you've learned some of the basics of how to make the most of DB2 SQL and enjoy it.

How to get the best performance for DB2 enterprise applications

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.