Power of table expressions

Last Update:2018-12-07 Source: Internet

Author: User

Tags ibm db2

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Some of these table expressions may be familiar to you, but some may be unfamiliar. I will discuss the examples of table expressions for each purpose to demonstrate their power.

I was invited to promote SQL skills globally. I believe that you will learn at least one usage of table expressions, and what you have learned will have a certain impact on your application, so please continue reading.

Pre-filter all external connections

In recent yearsCompositionThe performance of all external connections has been enhanced by removing parts and adding parallel connections. However, its syntax is not as magical as the left and right connections,
For example, automatic push-down and pass-down of predicates. The full outer connection shown in Figure 1 does not produce the expected results. For example, all orders within the specified date range are given,
Whether or not these orders have a description of less than 40 characters and all descriptions of less than 40 characters, whether or not they have the corresponding order:

Figure 1. This statement does not produce the expected results

Select coalesce (O. order_no, 'order number not available ')
, Coalesce (D. DESC, 'description not available ')
From order o
Full join
Descriptions d
On O. order_no = D. order_no
Where o. order_date between '2017-01-01 'and current date
And length (D. DESC <40)

On the contrary, the results of the preceding SQL statements look like an inner join, because the syntax used in Figure 1 forces local filtering to be applied in the second stage -- in other words, it is applied only after connection.
This means that many rows to be involved in the connection operation will be deleted in later stages. To forcibly apply partial filtering before the connection (phase 1), you can use a table expression, as shown in figure 2.

Figure 2. Force filter before connection to obtain correct results

Select coalesce (O. order_no, 'order number not available ')
, Coalesce (D. DESC, 'description not available ')
From (select O. Order
From order o
Where o. order_date between '2017-01-01 'and current date) as O
Full join
(Select D. DESC, D. order_no
From descriptions d
Where length (D. DESC <40) as d
On O. order_no = D. order_no

The 'O' table expression explicitly pre-filters External tables, while the 'D' table expression explicitly pre-filters internal tables. This syntax will give all orders within the specified date range, regardless of whether these orders have a description less than 40 characters,
And all the descriptions with less than 40 characters, whether or not there are corresponding orders. Before join condition ensures that only the final result row is processed in the connection operation.

Pre-filter tables with left join and right join replacement null

For left join and right join, if the table with null replaced is filtered in the WHERE clause, it is a real inner join, which may not be what you want when writing the join.
If you really want a left or right connection, you have two options:

Put the partial filter of the table with null replaced in the on clause.

This will put the filter into a join condition (during-join condition ). However, this does not eliminate the final result line. For the join condition of a table that replaces null,
They only apply to "null out" rows (Reserved but null-filled rows), and the final result rows, including values from protected tables, it is pre-determined by the connection conditions.
See Terry Purcell's previous expert talk column to see examples of conditions in the connection.

Put the partial filter of the table with null replaced into a table expression.
This will only apply to the "null out" row in the table that replaces null. The query in Figure 3 demonstrates the use of table expressions. It is very likely that this is exactly what developers need:

Figure 3. Use a table expression to apply partial filtering before connection

Select O. order_no
, Coalesce (D. DESC, 'short description not available ')
From order o
Left join
(Select D. DESC, D. order_no
From descriptions d
Where length (D. DESC) <40) as d
On O. order_no = D. order_no
Where o. order_date between '2017-01-01 'and current date

The query in Figure 3 returns all orders within the specified date range, regardless of whether these orders have a description less than 40 characters.

Detach group

If you are developing on the DB2 platform, instead of the 0s/390 and Z/0 s platforms, you can skip this section, either the optimizer will automatically rewrite the group by statement to separate the group by operation and eliminate the sorting,
Alternatively, you can construct an automatic summary table (AST), which can eliminate sorting. For all large hosts (mainframe), you need to consider rewriting your own query.

General reports often have to provide detailed information and summary information. Group by is a very strict clause in DB2:

The first rule of group by is to select the rows to be grouped first.
The second rule is that all other rows must be clustered using the built-in column function (aggregate.

The combination of these two rules forces you to connect to the table to combine the detailed information and summary information into a result line, as shown in the following query:

Select C. cust_id, min (C. cust_name) as cust_name,
Min (C. cust_phone) as cust_phone, sum (S. Sales) as total_sales
From customer C, sales s
Where C. cust_id = S. cust_id
And S. sales_date between: Date-Lo and: Date-Hi
Group by C. cust_id

This usually means that before processing the group by operation, the compliant detailed rows from each table need to be connected in advance. In this example,
There are 1000 rows from customer connected to 200,000 rows from sales, and then the group by operation is processed. On the contrary,
You can use a table expression to force the optimizer to perform clustering operations on a table. The query shown in Figure 4 explicitly advances the optimizer's group by processing:

Figure 4. Move group by to table expression to reduce the number of rows to be connected

Select C. cust_name, C. cust_phone, S. total_sales
From customer C,
, (Select S. cust_id, sum (S. Sales) as total_sales
From sales s
Where S. sales_date between: Date-Lo and: Date-Hi
Group by S. cust_id) as s
Where C. cust_id = S. cust_id

In this example, the table expression's explicitly separates the group by operation. In this way, only one or two tables need to be clustered. The detailed rows come from different tables.
This syntax allows the optimizer to search for the index support on the cust_id column of the sales table to execute group. If possible, sorting will be avoided. Whether or not the sorting is performed,
This greatly reduces the number of rows involved in the connection. In the preceding example, 1000 rows from customer are no longer connected to 200,000 rows from sales,
Currently, only 1000 rows from customer are connected to 10,000 sales summary rows. The group by statement compresses the table expression results to some summary rows.

Generate data from sources

At this point, one of the most powerful purposes of table expressions can be explained by a customer: "When we need data not provided in DB2, we use table expressions to create such data.
The query shown in Figure 5 contains a table expression that calculates the sales volume of the group that should be generated in the table expression.

Figure 5. Generate subcategory data in the table expression

Select C. subcategory, sum (S. Sales) as S. total_sales
From sales s,
(Select C. cust_id, substr (C. colx,: hvstart,: hvlngth) as subcategory
From customer C
Where C. cust_region between: reg-Lo and: reg-high) as C
Where C. cust_id = S. cust_id
Group by C. subcategory

This query is very effective for periodic reports, and can be competent even if sorting is required. This query is also great when you need to determine whether there is any implicit meaning in the column value. However,
If the end user needs to frequently request a public start position and length, that is, substr (C. colx, 2, 4), add this part to the table as its own column.
This allows an index for the new column to avoid sorting.

Figure 6 shows an example of generating a correct quarter for a quarterly report using a table expression.

Figure 6. Generate subclass data for the quarterly report

Select C. subcategory, sum (S. Sales) as S. total_sales
From sales s,
(Select C. cust_id,
(Case when month (sale_dt) between 2 and 4 then 'q1'
When month (sale_dt) between 5 and 7 then 'q2'
When month (sale_dt) between 8 and 10 then 'q3'
Else 'q4' end) as subcategory
From customer C
Where C. cust_region between: reg-Lo and: reg-high) as C
Where C. cust_id = S. cust_id
Group by C. subcategory

As long as the data generated by the as identifier is used in the table expression, the generated data can be referenced anywhere outside the table expression. This means that you can also sort by the identifier,
You can put it in the calculation, or place it anywhere the expression can appear. We can extend the example in Figure 6 to use host variable to process more options,
See Figure 7.

Figure 7. Use host variables for scalability

Select C. subcategory, sum (S. Sales) as S. total_sales
From sales s,
(Select C. cust_id,
(Case when month (sale_dt) between: hv1a and: hv1b then: hv1
When month (sale_dt) between: hv2a and: hv2b then: HV2
When month (sale_dt) between: hv3a and: hv3b then: hv3
When month (sale_dt) between: hv4a and: hv4b then: hv4
When month (sale_dt) between: hv5a and: hv5b then: hv5
When month (sale_dt) between: hv6a and: hv6b then: hv6
When month (sale_dt) between: hv7a and: hv7b then: hv7
When month (sale_dt) between: hv8a and: hv8b then: hv8
When month (sale_dt) between: hv9a and: hv9b then: hv9
When month (sale_dt) between: hv10a and: hv10b then: hv10
When month (sale_dt) between: hv11a and: hv11b then: hv11
Else: hv12 end) as subcategory
From customer C
Where C. cust_region between: reg-Lo and: reg-high) as C
Where C. cust_id = S. cust_id
Group by C. subcategory

Now this query can process endless requests for monthly groups-what do you want to group by now? The first four months as a period of time, the next two months are classified as another period,
What should I do if I add 6th months to the third period and the remaining month to the fourth period? Building applicationsProgramAlways remember this.
It takes several hundred views to simulate the same options. That's why most of my clients have lost their views from the production application and are only allowed to appear in the end user environment.

For greater power and flexibility, you can add a user-defined function, as shown in figure 8.

Figure 8. Use UDF to allow access to non-Relational Data

Select C. subcategory, sum (S. Sales) as S. total_sales
From sales s,
(Select C. cust_id, udfunc2 (: hvparm1,: hvparm2) as subcategory
From customer C
Where C. cust_region between: reg-Lo and: reg-high) as C
Where C. cust_id = S. cust_id
Group by C. subcategory

Now, this query can access data from any available sources on the large host platform. For example, udfunc2 can access data stored in flat files in a distributed environment,
Reference it in the table expression 'C' and call it subcategory. You can also use group by for it. What do you really want to use group by now?
Whenever you need to obtain or create data that is not stored locally in the DB2 environment, you need to perform relational operations (join, union, subquery, order by, group by, and so on)
Including this data and this data is only used for one query execution, you should think of using Table expressions. For multiple executions or iterations,
We recommend that you use a global temporary table (this is the topic to be discussed one day later ).

Use table expressions to help optimize queries

Query optimization is a big topic, so here I only discuss the role of table expressions in optimization. First, remove the view from sysibm. sysviews and add it to the from clause,
This helps the query optimizer better understand the target of the query. This is important if you are working early in the morning. The indirect benefit of removing a view is that,
Using host variables provides greater scalability.

To demonstrate the use of table expressions for prospective optimization, I will show you a query, introduce some performance problems one by one, and then give a solution to the problem. The first query is the join of the following five tables:

Select columns ?.
From tabx, taby, tabz, tab1, tab2
Where join conditions

The problem in the preceding query is the table connection sequence. The optimizer analyzes the number of rows participating in the join operation in each table, estimates the cost of different sequence and connection methods, and keeps starting with the combination of tabx, taby, and tabz.
The expected connection sequence starts with tab1 and tab2. There are many methods that affect the connection sequence for static queries. However, dynamic queries are not so lucky. Fortunately, there is a table expression!
By using the distinct keyword, table expressions force the optimizer to connect to the table in advance (also requires a sort) and thus play its role. As follows:

Select columns?
From tabx, taby, tabz
, (Select distinct col1, col2 ?.
From tab1, tab2
Where join conditions) as prejoin
Where remaining join conditions

Now the query:

Connect tab1 and tab2.
Sort to remove possible duplicate rows.
Process other tables.

The only drawback of this technology is that sorting and creating and scanning a logical working file brings additional overhead. But in most cases, this disadvantage can be compensated by the optimized connection sequence.

The following scenarios are more likely to appear on non-OS/390 platforms, but this solution is applicable to all platforms. In the following example, the optimizer selects the entire table expression T2,
Then use the combined scan connection technology to connect to T1:

Select columns from either set of data
From Table1 T1,
(Select t2.student _ year, sum (C8) as sum8, max (C9) as max9
From Table2 T2
Group by t2.student _ year) as T2
Where t1.major _ id = t2.major _ id

This solution requires that the keyword table be inserted before the table expression. This allows table expressions to be associated with external tables. This mutual association will affect the optimizer's tendency to nested loop connections, and greatly reduce materialized.
In the following example, a table expression can be associated with an external table by moving the join conditions into a table expression:

Select columns from either set of data
From Table1 T1,
Table (select t2.student _ year, sum (C8) as sum8, max (C9) as max9
From Table2 T2
Where t1.major _ id = t2.major _ id
Group by t2.student _ year) as T2

This query now processes a t1.major _ id at a time. Only the rows that match the major_id are materialized in the logical work file. The smaller the working file, the optimizer selects nested loop connections.
If you try this solution on DB2 on the OS/390 or Z/ostm platform, be careful because this solution does not always improve performance. Check the explain output and use the evaluation tool,
Or run a benchmark test to verify whether the optimization operation was successful.

Conclusion

If your goal is to reduce programsCodeAnd increase the processing throughput, so powerful and practical table expressions are indeed very helpful. If you can use a table expression to generate new data,
In addition, retaining table expressions in the DB2 engine to execute connections, groups, sorting, and computing can reduce a lot of program code, because no more than one cursor needs to be declared. If you add host variables to table expressions,
It is easier to increase the scalability of a query. Optimizing queries, separating group by operations, and pre-filtering tables for external connections can help increase the processing throughput. Select it with you.
This technology has been around for a while and you have to make good use of it. I hope that you will spread your new findings locally to help me update SQL skills around the world.

To top

About the author

Sheryl Larsen is an internationally renowned scholar, consultant and lecturer specializing in DB2 and is known for his extensive expertise in SQL. Sheryl has more than 16 years of experience in the DB2 field,
Published manyArticleAnd some popular DB2 posts, and a book written with people: DB2 answers, Osborne-McGraw-Hill. At the 1999 & 2001 idug conference,
She was elected best overall speaker and served as executive editor of The idug solutions journal magazine from 1997 to 2000 ). Currently,
She is the chairman of the Midwest database users group (mwdug.org), a member of the IBM DB2 gold consultants program, and also the President of Sheryl M. Larsen (www.smlsql.com,
This company specializes in advanced SQL consulting and education. You can contact her through SherylMLarsen@cs.com.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More