An MQT that helps business intelligence

Source: Internet
Author: User
Tags hash ibm db2

Now, business intelligence and data warehousing products are becoming more and more powerful and diverse in bringing value to the business of the user. The core of these products is of course the data collected in the user's online trading system. One of the main steps is to consolidate, categorize, and summarize the data to extract useful information. A typical scenario is that users will repeatedly query the same static dataset. For example, the user wants to summarize yesterday's or last week's sales or a summary report at the end of a certain stage (such as month-end or year-end), while the next phase of online data is continuously accumulating. It would be great if a database management system could provide a mechanism to summarize and summarize data for users beforehand. By creating and using materialized query tables (materialized query table, MQTS), IBM DB2 for I5/os implemented this functionality for you.

Mqts was introduced in the V5R3 version of I5/os and was enhanced in the V5R4 version. A materialized query table can be viewed as a materialized view or a table that is automatically aggregated, and can be used to significantly improve the performance of complex queries by performing a partial query beforehand and storing the results. Instead of querying one or more specific tables, rather than querying an MQT, the query optimizer can identify mqts and rewrite the query implicitly to achieve better performance without requiring the end user or program to modify the SQL. In some cases, query performance with MQTS can be as fast as several orders of magnitude.

Mqts is especially useful for queries that use complex joins and aggregations because they read and process large amounts of data. Business intelligence and data warehouses load data in stages, which is a particularly applicable scenario for MQTS. The way data is processed by day, week, and Month provides a natural opportunity to refresh or update mqts. In this case, MQTS does not need to keep data synchronized with its corresponding table, and refreshing the MQT as long as it is part of a batch operation that already has a day end, a week end, or a monthly knot.

Find query statement

To illustrate how MQTS can improve the performance of queries such as multiple users querying the same set of tables for almost the same information, let's first look at the SQL statements under which performance might benefit from MQTS. For example, managers in 10 departments will check the sales trend data early in the week. This results in 10 groups of SQL grouped query statements looking for the same information. Running these queries every Monday morning is a waste of time and resources. Using the Mqts and query optimizer's rewrite query (query-rewrite) functionality, you can dramatically shorten response times and reduce resource usage while still completing individual departmental queries. By using a method that creates and imports an MQT that has all the sales data, in years, weeks, and divisions, the optimizer can then process each query in the MQT--10 query to access the MQT's summary data instead of reading and processing the detailed data in the corresponding base table. The end user or application does not need to change their query, and the optimizer overrides the query for it. In addition, as part of the weekend processing process, MQTS can be updated at that time to prepare a summary dataset for the Monday morning query.

To identify a potential set of SQL statements in such an application scenario, we can open the SQL query plan cache Viewer for the system I navigator. The SQL query plan cache contains all of the current query plans and related running state information. You can open this view by right-clicking the database folder in the system I Navigator and choosing SQL plan Cache-> "show statements" (Figure 1 below).

Figure 1

In this case, the user is interested in statements that are the longest running time in the user program. To find them, you can select the "statements that reference the following objects" filter and fill in the schema name and the column name in the table. You can finally find the query statement you want by refreshing the list of SQL statements and sorting them by the number of SQL executions. Users can also sort by the column of run time, so that you can see a list of statements from the longest to the shortest. In the statement list, several queries use the same few tables, and only the columns of the query and the nuances of using different query criteria are used. Based on the above two points, we know which statements are the candidate statements for creating an MQT.

Users are most interested in the most frequently run statements. As in the following example, this is a query statement used by a manager:
SELECT year, WEEK, DEPARTMENT, SUM (SALES), COUNT (DISTINCT order_no)
From orders_table
WHERE DEPARTMENT = ' ABC123 '
GROUP by year, WEEK, DEPARTMENT
Order by year, WEEK, DEPARTMENT;

When designing this scenario, the user already knows some details about the query and potential data: The orders_table contains 1 million data for 3 years, 52 weeks a year, 25 departments, and the data is grouped by Year,week and DEPARTMENT. There will be a maximum of 3,900 different groups.

Select the SELECT statement in the statement list and right-click to select Use SQL statement, which opens a Run SQL Script window and displays the SELECT statement. Select "Visualexpain"-> "description" in the "Run SQL Script" toolbar so that you can see the query optimization plan for this statement (Figure 2).

Figure 2

A table scan is performed and a temporary, separate hash table is made, and the result of the VE operation is to display a sorted list scan of the hash table.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.