Discussion on ODB2 System Performance Optimization

Last Update:2013-12-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The following article mainly introduces the in-depth study of ODB2 system performance optimization. We all know that DB2 database is a high-performance large-scale relational database management system, which is mainly used in the customer/server architecture. The criteria for evaluating DB2 System Performance Optimization include throughput, response time, and parallel capability.

Design Database

1. Familiar with business systems

The degree of familiarity with the business system has a great impact on the performance of the entire database system. A designer who is unfamiliar with the business, despite having a wealth of database knowledge, it is also difficult to design a database application system with the best performance.

2. Normalization and non-Standardization

After the database is normalized, data redundancy is reduced, data volume becomes smaller, and data rows become narrower. In this way, each page of DB2 can contain more rows, so there is more data in each partition, which accelerates table scanning and improves the query performance of a single table. However, when querying multiple tables, you need to use a lot of Join Operations to combine the information from each table, resulting in higher CPU and I/O costs.

In most cases, we need to maintain a balance between normalization and non-standardization, use appropriate redundant information to reduce system overhead, and use the space price in exchange for the time cost. There is an order information table OrderDetail, which records the courier information, cashier information, item information, price policy, customer information ..... The information is stored in the courier information table, cashier information table, item information table, price policy table, and customer information table.

According to the standardization requirements, OrderDetail must be connected to so many tables or nested queries. If the data volume in the OrderDetail table is millions, the time required for a query may be several hours. In fact, as long as the logic validity of the data is ensured during the design, a lot of information can be directly redundant in the OrderDetail table. These redundant data can greatly improve the query efficiency, this reduces CPU and I/O operations.

3. Data striping

If the number of records in a table exceeds a certain size, the most basic query operation will also be affected. You need to divide the table according to the date level, divides recent and most frequently used data into historical and infrequently used data, or by geographic location or department. There is also a way of partitioning-vertical partitioning, that is, dividing a table with many attribute columns into several small tables, such as placing frequently used attributes in a table, attributes that are not frequently used are placed in another table, which can speed up table scanning and improve efficiency.

4. Select a data type

The data type selected for each attribute depends on the requirements of the table to a large extent. However, selecting an appropriate data type without violating the requirements of the table can improve the system performance. For example, a text column stores the information of a book, BLOB instead of character (1024), BLOB stores pointer or file reference variables, and real text information can be stored outside the database, this reduces the database storage space and increases the running speed. DB2 provides the UDT (User-Defined PES ypes) function. You can define your own data types as needed.

5. Select an index

An index is an important data structure in a database. Its fundamental goal is to improve query efficiency. Currently, most database products adopt the ISAM index structure first proposed by IBM. You can use indexes to quickly, directly, and orderly access data. Although indexing accelerates queries, it also reduces the data update speed, because new data must be added not only to tables, but also to indexes. In addition, indexing requires additional disk space and maintenance costs. Therefore, you must use the index properly:

Indexes are created on Attribute columns that are frequently connected but not specified as foreign keys.

Indexes are created on columns that are frequently sorted or grouped (that is, group by or order by operations. Sorting or grouping by index can improve the efficiency.

Create a search for columns with different values that are frequently used in conditional expressions. Do not create an index for columns with fewer values.

If there are multiple columns to be sorted, you can create a compound index on these columns, that is, the index is composed of multiple fields.

Query Optimization

Currently, database products have better performance in optimizing system query DB2 system performance. However, because SQL statements submitted by users are the basis of system optimization, it is hard to imagine that a bad query plan will become efficient after the system is optimized. Therefore, the merits and demerits of the statements written by users are crucial. The following describes solutions to improve the user query plan.

1. Sort

In many cases, duplicate sorting of large tables should be simplified or avoided. When indexes can be used to automatically generate outputs in an appropriate order, sorting steps can be avoided. When the following situations occur, sorting cannot be omitted:

The index does not contain one or more columns to be sorted;

The order of columns in the group by or order by clause is different from that of the index;

Sort columns from different tables.

In order to avoid unnecessary sorting, We need to correctly add indexes and reasonably merge database tables. Although Table normalization may be affected, it is worthwhile to Improve the efficiency. If sorting is unavoidable, you should try to simplify it, such as narrowing the range of sorting columns.

2. Primary Key

Using an integer type for a primary key greatly improves query efficiency, and the comparative overhead for a primary key type is much larger than that for an integer type. Using primary data as a primary key will reduce the efficiency of data insertion, update, and query. This reduction may not be noticed when the data volume is small. However, when the data volume is large, small improvements can also increase the system response speed.

3. nested Query

In SQL, a query block can be used as an operand OF THE predicates in another query block. Therefore, SQL queries can be nested. For example, in a large distributed database system, the Order table Order and Order information table OrderDetail are available. If two tables need to be associated for query:

 
 
  
  SELECT CreateUser  
  
  FROM Order  
  
  WHERE OrderNo IN  
  
  ( SELECT OrderNo  
  
  FROM OrderDetail  
  
  WHERE Price=0.5)

In this query, find out the list of subscriber whose unit price is 0.5 yuan. The lower-level query returns a group of values to the upper-level query, and then the upper-level query blocks continue to query based on the values provided by the lower-level blocks. In this nested query, each OrderNo value in the upper-level query and all lower-level queries must scan the OrderDetail table, and the execution efficiency is obviously not high. This query contains two layers of nesting. IF 1000 rows are queried at each layer, 1 million rows of data are queried. In the system overhead, 82% of table Order scans and 16% of table OrderDetail searches. If we use a connection instead, that is:

 
 
  
  SELECT CreateUser  
  
  FROM Order,OrderDetail  
  
  WHERE Order.OrderNo=OrderDetail.OrderNo AND Praice=0.5

Therefore, the table Order scan accounts for 74%, and the table OrderDetail search accounts for 14%.

In addition, the label of a column appears in both the primary query and the query in the where clause. It is very likely that after the column value in the primary query changes, the subquery must perform a new query. The more nested query layers, the lower the efficiency. Therefore, avoid subqueries as much as possible. If the subquery is unavoidable, filter as many rows as possible in the subquery.

4. wildcard characters

In SQL statements, the LIKE keyword supports wildcard matching, but this matching is especially time-consuming. For example, SELECT * FROM Order WHERE CreateUser LIKE'm ___'. Even if an index is created on the CreateUser field, the sequence scan is used in this case. If the Order table contains 1000 records, it needs to be compared for 1000 times. If you change the statement to SELECT * FROM Order WHERE CreateUser> 'M' AND CreateUser <'n', the query will be executed using indexes, which will obviously increase the speed.

5. distinct

Distinct is used to ensure that duplicate values are not displayed in the result set, but distinct generates a worksheet and sorts it to delete duplicate records, which greatly increases the number of queries and I/O operations. Therefore, do not use the distinct keyword.

6. Negative Logic

The negative logic is like! =, <>, Not in, and so on will cause DB2 to use table scanning to complete the query. When the table is large, the system performance is seriously affected and can be replaced by other operations.

7. Temporary table

When a temporary table is used, the database will create a corresponding data structure in the disk, because the memory access speed is much higher than the access speed of external memory, when a temporary table is used in complex queries, the intermediate results will be imported into the temporary table, which greatly reduces the query efficiency. In addition, in a distributed system, the use of temporary tables also results in synchronization between multiple query processes. Therefore, it is best not to use temporary tables for complex queries.

8. Stored Procedure

The Stored Procedure Builder in DB2 can generate a Stored Procedure, run and test the Stored Procedure. Stored Procedures can contain massive and complex queries or SQL operations, which are compiled and stored in the DB2 database. When you use the same SQL operation for multiple times, you can first make these SQL operations into a stored procedure and directly reference its name for calling where necessary.

During the first execution of the stored procedure, a query scheme for DB2 system performance optimization was established. DB2 stored the query scheme in the cache, which can be executed directly from the cache when calling the runtime, the DB2 system performance optimization and compilation stages are eliminated, saving the execution time and improving the efficiency and System utilization.

The optimal query scheme is usually not feasible according to certain criteria, and should be selected through comparison based on actual requirements and specific situations. The Query Patroller provided by DB2 can compare the Query costs of different Query schemes. By tracing Query statements, it returns the system overhead of different Query stages to make the best choice. The Performance Monitor provided by DB2 also monitors the Performance of the entire database system, including I/O time, queries, sorting time, and synchronization read/write time.

The concurrency control of the database system can also affect the system performance. Simultaneous operations by multiple users may result in data inconsistency. To prevent data loss and access to uncommitted data due to simultaneous modifications, and to protect read data, DB2, use the Lock mechanism to implement control.

DB2 can lock table spaces, tables, columns, and indexes. The larger the lock granularity, the simpler the lock, the smaller the overhead, and the lower the concurrency; the smaller the granularity, the complicated lock mechanism, the large overhead, and the high concurrency. If the resources to be allocated are locked during concurrent processing, the system suspends the process and waits. If a time-consuming query operator runs on a frequently used table, using the table level-1 lock means that the entire system can continue to run after your query is complete.

Therefore, avoid using table-level locks in complex queries. What should I do if there is a need for this category? View can be used to solve this problem. The view avoids direct operations on the table and ensures the efficient operation of the database.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Discussion on ODB2 System Performance Optimization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Discussion on ODB2 System Performance Optimization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support