DB2 Executive Plan Introduction

Source: Internet
Author: User
Tags db2 joins sorts cpu usage

During database tuning, SQL statements are often the main cause of performance problems, while execution plans are the languages that explain the execution of SQL statements, and only a full understanding of the execution plan can perform well in database performance optimization.

In a common relational database, although execution plans are represented differently, the principle of execution is similar. In my opinion, the execution of SQL statements contains a total of two key links:

  • How to read data (scan): includes table scan and Index scan

  • How tables are connected (join): includes nest Loop, Merge join, Hash

    Join and semi-join, Inter-table connection order selection

By mastering these two key links, we can quickly identify the execution logic of SQL statements in the current database and discover the problems and pitfalls in the execution plan. Because the execution plan is represented differently between different databases. for the DB2 database, many people do not understand the meaning of the implementation plan, and then we will discuss in detail the various points of the execution plan in the DB2 database through the actual case.

01

How data is read

When it comes to reading data, it is inseparable from the physical objects in the database: tables and Indexes . So when it comes to the way data is read, simply understand the table scan and index Scan, you can basically grasp the source of the data.

02

Full table Scan

In DB2 LUW There are four ways to get the execution plan (DB2EXPLN tool) of the statement, although the tool is different, but the content of the presentation is basically similar, the biggest difference is the level of detail. The verbosity by large to small sorts is:. to facilitate the presentation, we are here to discuss only the execution plan information shown through DB2EXPLN. first, let's look at a complete execution plan for a DB2EXPLN crawl.

The information presented by DB2EXPLN is divided into three parts, namely the current SQL statement, execution plan details and execution plan diagram (to be shown by adding the-G parameter), and the first part is the current collection execution Plan statement (see figure 1-1):

Figure 1-1

This part of the content is basically at a glance, need to pay attention to is inside: HV ...: HI ... Information is the information that DB2 the parameter (which can be ignored).

Next, let's look at the most critical part of the execution plan details (see Figure 1-2):

This section first includes the character set (codepage) for the current environment, and the following is an estimate of the cost and the number of rows returned based on statistical information. In this example, the execution cost is 124470 and the number of rows returned is 1 rows. The position starting at line 6th is more important, and we take the line-wise explanation:




The third part is the execution Plan diagram, we can quickly and directly see the execution of the SQL statement through the plan diagram, the execution plan chart reading order is from bottom to top, from left to right, according to the number from large to small order to read. For example, in this example, first look at the third step, show the table scan operation (TBSCAN), and then the results of the scan group by operation and return the final result, this SQL statement is completed.

03

Index Scan

Next look at an example of an index scan, in order to quickly understand the execution plan, we can see the execution plan diagram, the SQL statement first read the index, get rid to the table to obtain additional data, the group by after the results returned.

The other part is similar to the previous one, not a detailed introduction, mainly look at the relevant details of the index scan. You can see from the information below that there are 4 fields in the index used, but this SQL only uses one field. None of the other three fields are used. If there are other indexes on the table that contain more of the fields used by this SQL, this index is definitely not the best choice.

There are more details about how the data is read, but no matter how the data is read, the core of the content is where the data is read, and simply there is no better index to replace the current scanning strategy, so when we optimize the SQL statement, The first step is to consider whether the current read method is valid enough.  

04

How tables are connected

Next we talk about the connection between tables, the children's shoes that have written SQL know that there are many ways to join when writing sql: Inner join,left join,right join,full join, etc., including some sub-queries, such as exist or in. For star queries, DB2 10 also supports Zzjoin.

05

Nest Loop (NLJOIN)

Nest loop is the simplest way to connect, the database will select the table and the appearance according to the number of records in the table, after the definition of the appearance, the appearance of the first full table scan, and then repeatedly scan the inner table and with each record in the appearance to match, and ultimately return the result set required by the program.

Therefore the total cost of nljoin is approximately the cost of the appearance scan + the number of rows returned by appearances x the cost of the internal table scan. Nljoin as the most use of the connection mode, when the appearance of the number of matching rows or the difference between the table row number is relatively high efficiency, but also because of the nljoin operation mode, also often occur performance hidden trouble.

06

Merge Join (Msjoin)

The merge connection is a way to resolve some of the problems that exist in the nest loop, and Msjoin sorts the two tables that need to be connected, and then matches the sorted result set in a cross-up fashion, eventually returning the concatenated results.

The total cost of msjoin is approximately the cost of a single external scan and the cost + sorting cost of a single internal table scan. Msjoin common scenarios are usually where SQL needs to return a sorted result, or if the main appearance is larger, and msjoin can only be applied to cases where the SQL statement contains a unique join predicate, when the primary outer order of magnitude is large and the index exists on the join predicate. Msjoin is more efficient (avoids sorting costs), usually msjoin is stable, even if the statistical information estimation error, will not lead to greater deviation in execution efficiency.

07

Hash Join (Hsjoin)

Hsjoin is a more advanced way to connect a hash table based on the connection predicate before connecting it to the inner table and returning the result. Similar to Msjoin, Hsjoin also only scans the internal surface separately, and Hsjoin also supports multiple join predicates. The two large tables are highly efficient when connected by a multi-join predicate.

Hsjoin the scanning cost is about the cost of the internal table scan + the external scan cost. However, it should be noted that the resulting hash table will be stored in the sorting heap, and once the sort heap memory overflows, additional physical IO will be generated, which requires special attention.

08

Half Connection (Semi-join)

The semi-connection is a strange connection, in many of the data is not divided into the connection mode, because sometimes, from the execution plan does not see the connection operator at all, such as the following SQL:

This is a typical subquery, we can guess the logic from the SQL statement, first read the table in the subquery, and then match the external table based on what is returned and return the result. However, no information about the connection can be seen from the execution plan diagram.

The execution plan diagram does not show any join information, except that more than one object is fetch, but more detail can be seen from the text description.

Stream 1 will first perform a full table scan of the internal table (Any/allsubquery), and the read result matches the external table, and the results are not resumed immediately after matching to the result (EXISTS subquery).

09

Selection of connection order between multiple tables

No matter how many table joins are included in the same SQL statement, only two tables are connected at the same time, but the order of connections between the multiple tables is the main reason for performance. Database selection of the order of the table, often based on the number of rows between two tables to be sorted, if the statistical information and the actual deviation, there may be due to improper connection sequence caused by the performance problems.  

10

Summarize

Through the analysis of the execution plan, we explained the main points of the SQL execution, but how to maintain the efficient and stable SQL in the production, but also need to have a deeper understanding of the execution plan. Here are some of the answers to some of the students ' common questions:

Q & A

Q1: in the query, there is a driver table, usually the first table after the from, after a heap left connected to the right, how to choose this driver table? Do you have any effect on performance? Does the order of our own do not affect the execution plan?

A1: in the database, according to the situation of the current table for the selection of internal and external tables, the SQL statement can only be used to determine the connection order to a certain extent, but does not make connections inside and outside the table decision.

For example, Select*from A, (Select*from b,c where b.id=c.id) where ..., such as this SQL, in the notation of the need to first connect the B-C table, and then connected to a table, but in the connection time and the way the table outside the choice of appearance , are determined by the database.

Q2: The choice of connection mode is determined by the sorting of the connected two tables and the connected fields?

A2: This is not absolute, but it will be one of the factors of choice.

Q4: access plan for accessing a table has changed, the statistics have not changed, what is the situation? Is this optimizer auto-tuning? But the optimizer generates an access plan based on statistics, so the truth should not change?

A4: The execution plan selection is based on database parameters and statistics as a reference, but the database collects some physical information during the compilation process. The physical distribution of data, for example, can have an impact on how scans are scanned.

Q5: What is this physical information, the tablespace information?

A5: a condition in which a table is stored in physics.

Q6: What means is there to track a SQL complete execution process, including what you say about dynamically collecting physical information?

A6: can catch trace or stack. DB2TRC, and Db2pd–stack.

Q7: teacher, DB2 Optimizer is the more complex SQL support the better? Do you have this statement?

A7:The DB2 Optimizer's support for complex SQL should be the best in relational databases, but for online trading systems, I think the stability of SQL is more important. However, there are too many variables involved in complex SQL, and any change in the statistics of a table can lead to a decrease in SQL performance, so it is not recommended to write complex SQL in online trading systems.

Q8: What should we pay attention to when we write SQL? The NL join is similar to the Cartesian set, with the highest time complexity, followed by the merge. I don't think I can avoid it from SQL because by choosing that column, you basically determine the connection type.

A8: It's hard to decide what connection to use when writing SQL, but there are some areas to be aware of, such as avoiding multiple large table joins, which can be done during development.

Q9:Hash connection, if the probe table is large, the built-in table is very small, the cost of hash is very high, because the probe table is scanned, no index, this how to optimize, can only reduce the return set of the probe table?

A9: You can create an appropriate index on the probe table.

Q10: Do I need to do rbind after updating the table?

A10: This depends on whether your application is static SQL or dynamic SQL. Static SQL words execution plan is kept in the database when bind, the statistic information is recommended rebind after updating, but the dynamic is not necessary.

Q11: usually predicates appear in the first field of the index should be a valid index, but sometimes the index exists, but a composite index, run Db2advis it is recommended to create a new single index on this predicate, why the database does not use the existing composite?

A11: Composite indexes are not necessarily efficient, and this needs to be judged by the distribution of the data, if the clusterration of a single index is very good (that is, the order in which the tables are stored is very high), so that a lot of prefetching can be used, and performance is much better than synchronous reading.

Q12: embedded C, C + +, COBOL package BND (including static SQL), to bind, the user SP also recommended binding it?

A12: The cost of UDI is in fact largely related to the design of the table. For example, when making DML statements, there is a row overflow and page reorganization, which consumes much more than the * * index. Related information can be seen DB2PD-TCB or snapshot for table.

Q13: What advice would you recommend for table compression? For example, to enlarge the table of compression, there are some quantitative indicators for reference, because some of the table opened the compressed batch * * More records when the shadow extended batch 1/3 of the time.

A13: for compression, the bottleneck of the current database needs to be analyzed. Compression reduces disk IO at the expense of the CPU, which is certainly helpful if the bottleneck is on disk IO, but if the bottleneck is only worse on the CPU.

Q14: adjust the Appendon? Are there some indicators for quantification?

A14: This is not very good quantification. For disk IO bottlenecks, you can start with indexes, statements, and even table designs. If all is well adjusted, there is an IO bottleneck, and the CPU usage is lower (below 30-40). You can consider compression.

DB2 Executive Plan Introduction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.