DB2 Data Warehouse Optimization

Source: Internet
Author: User

 

Differences between OLAP and OLTP Environments

1 The performance tuning in the DB2 data warehouse environment is different from that in the OLTP environment, mainly manifested in the following aspects:

1.1 transaction length. Transactions in the OLTP environment are usually short transactions, including multipleSQLStatements, which can often be completed within several seconds. In a data warehouse environment, it is often a long transaction, and only a few SQL statements are executed, but it takes several minutes or even hours to run.

1.2 The data update time is different. In the OLTP environment, data update operations occur at any time. For Data Warehouse applications, data is often updated at night, and a large number of extract, transform, and load (ETL) operations are often required.

1.3 The result set size is different. In the OLTP environment, query statements usually only obtain a small amount of data rows. Data Warehouse queries, especially those used to generate reports or perform multidimensional analysis, may return millions of rows of data.

1.4 The query complexity is different. The query statement in the OLTP environment is relatively simple. Generally, only a few tables are accessed. However,DatabaseThe query statements in the warehouse environment are complex and may contain association of more than a dozen tables, nested or common table expressions, recursive SQL statements, and Case expressions.

1.5 SQL statements are generated in different ways. In an OLTP environment, you can write SQL statements by yourself. In a data warehouse environment, OLAP tools often generate SQL statements and rarely have the opportunity to modify them before execution, DBA is responsible for setting up an appropriate DB2 runtime environment to make these queries run efficiently.

2. Set the target

During optimization, we must not only pursue good performance, but also ensure that the direction is correct. If a query runs for a long time but is acceptable to the user, the query may not need to be optimized. On the contrary, a frequently executed query is expected to be completed within one second, even if the running time is only 10 seconds, it will cause user complaints.

To determine where optimization is required, it is best to listen to the user's voice and give priority to the queries most concerned by the user. if the user has no complaints, then, you can take the time to adjust the queries with long running time and high execution frequency. Are full table scans required to evaluate the access paths of these queries? Do you want to use indexes? Which connection method is used to connect tables?

To correctly evaluate these access paths, you need to obtain statistical information from the DB2 catalog. If the statistical information in the cataloguing is inaccurate, use runstats to update it. In my opinion, DB2 has the best query optimizer on the market, but it must have enough statistics to make reasonable access path decisions. The best practice is to run runstats regularly or let DB2 automatically collect statistics.

For more information about slow query, see Chapter 9.

2.1 physical Optimization

Establish a DB2 data warehouse environment that helps improve query performance. Physical adjustments should be made in terms of CPU, storage, and memory:

2.1.1 Parallelism

We learned from section 10.2.1 that DB2 usually runs on SMPServerAnd MPP server cluster environment. For these two environments, the parallel capability of multi-core CPU should be used to improve query performance.

In an SMP server environment, a CPU usually has multiple kernels. DB2 can split the work required to process queries into fragments and execute these fragments in parallel, which significantly reduces the query time. Pass

DatabaseManagementIf the value of the program parameter intra_parallel is set to yes, the query concurrency of the SMP server can be enabled.

In addition to parallel queries on SMP servers, DB2 can also improve query performance through MPP Server clusters. Readers can configure the MPP environment according to section 10.2.

2.1.2 Storage Planning

The storage design ensures that each data BPU has its own set of physical disks and that all data objects in the data warehouse are evenly distributed on these disks. This design pattern ensures that all data objects in each BPU perform I/O operations in a balanced manner.

To avoid read/write conflicts between data and logs, data and logs should be stored on different disks. In addition, the master log file and the Image Log File are stored on different disks, so that when a disk fails, another log file can be used.

2.1.3 reasonably configure the buffer pool

For I/O-intensive data warehouse workloads, the larger the buffer pool, the better the performance is generally. This is because as the buffer pool size increases, the number of disk read I/O activities will decrease.

However, for 32-bit systems, the memory space is limited to 4 GB. To use more memory, we recommend that you use a 64-bit system first. If a 64-bit server has a large amount of memory resources, it should be allocated more

Store resources for the DB2 data warehouse. I have seen that on a server with 32 GB system memory, only MB buffer pool is configured for DB2, which is too small and 10-20 GB is suitable.

2.2 logical Optimization

Logically, we need to use the partition, index, MDC, MQT, and other features provided by DB2 to improve performance. The specific optimization suggestions are as follows:

2.2.1 partition a large table.

There are no absolute rules for which tables should be partitioned. For example, you can partition a large table with 1 million or more rows. Select a partition key for the table, which can be a single column or multiple columns, depending on your needs. For example, you can select a date column as the partition key.

2.2.2 use indexes.

In the OLTP environment, indexes will slow down the insert, update, and delete operations. Therefore, the index creation is often very conservative. Generally, we recommend that you have three to five indexes.

Compared with OLTP databases, the vast majority of operations in data warehouses are queries. Therefore, it is generally meaningful to create more indexes. We recommend five to ten indexes. However, do not use a data warehouse

Too many indexes are created in the database or table. Otherwise, the ETL data update performance will be affected.

2.2.3 use the MDC features.

In a data warehouse environment, a large number of rows are often obtained. For example, how to obtain all the sales records of a customer in a region in the past five years? In this case, you can use the multi-dimensional cluster MDC feature provided by DB2

You can obtain rows with the same dimension value in batches.

2.2.4 use the materialized query table (MQT) function.

MQT is a result set defined by a query statement, so that you do not need to dynamically build this result set during query execution. MQT has three advantages:

DB2 can automatically rewrite the query to use MQT.

The result set in MQT has been created. The query accessing this MQT through DB2 rewriting does not have to spend time dynamically constructing the result set during query execution, which significantly reduces the running time.

You can define indexes on MQT, which further improves query performance.

In a data warehouse environment, you can use MQT to improve performance for some common complex reports or analysis applications.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.