BKJIA Summary: IBM provides the free version of DB2 Express-C for the Community, which provides the same core data features as DB2 Express Edition, it lays a solid foundation for building and deploying applications.
DB2 performance optimization is a complex and comprehensive task. It is necessary to fully explore and think about the root cause of the problem. At the same time, it also requires profound database management experience and optimization knowledge. This may be difficult for beginners. However, in many cases, as the data volume in the DB2 database increases or the number of users increases, the performance of the database system will decrease significantly. In this case, it is critical to quickly locate the performance bottleneck. The following briefly introduces some factors, tools, and principles of DB2 optimization, so that beginners can have a general understanding of performance optimization.
DB2 performance optimization can be analyzed in three aspects: memory, CPU and I/O.
In terms of memory, we mainly consider the use of BUFFERPOOL. The buffer pool is a memory area used to buffer data and indexes read from the disk. The data and index information will be written back to the disk after calculation in the buffer pool. There are four page sizes (4 K, 8 K, 16 K, 32 K) for the buffer pool, which correspond to four tablespaces of different page sizes. The size of the buffer pool determines the size of data that can be buffered from the disk. Of course, the larger the buffer pool, the better. A large buffer pool may lead to a too long time to connect to the database, because it is necessary to allocate memory space for the database buffer pool when connecting to the database. The buffer pool usage efficiency can be evaluated by calculating the buffer pool hit rate: Buffer Pool hit rate = (1-(physical data read + index physical read) /(logical read of Data + logical read of index) * 100%. The higher the buffer pool hit rate, the higher the efficiency of the buffer pool. The buffer pool hit rate is too small, indicating that the buffer pool is too small and should be increased. The physical reads of data, physical reads of indexes, and logical reads of data and indexes can all be obtained from the snapshot of the buffer pool.
Another important factor to consider in terms of memory is SORTHEAP, LOCKLIST, and LOGBUFSZ ). The sorting heap will be used when the query results have sorting options and there is no corresponding index pair. If the sorting heap is too small, the sorting overflow (Overflowed) will occur ), sorting data that cannot be installed in the sorting heap will overflow into a temporary table, which will degrade the performance. SHEAPTHRES_SHR and SHEAPTHRES are related to SORTHEAP parameters. SHEAPTHRES_SHR limits the maximum memory for shared sorting in a database. SHEAPTHRES limits the maximum memory for private sorting. LOCKLIST refers to the memory space used to store locks in a database, if this parameter is set too small, the lock will be upgraded after the lock is used up (that is, multiple row locks are converted into one table lock to release more resources ). This will lead to a decline in system concurrency, and many application connections are suspended, resulting in a decline in system performance. Therefore, increase the value of the LOCKLIST parameter as much as possible. It must be noted that the LOCKLIST parameter is not the number of locks, A memory area is a database page (each lock requires 96 bytes in a 32-bit system, and each lock requires 48 bytes for locking. In a 64-bit system, each lock requires 128 bytes. For locking and locking, each lock requires 64 bytes ). The MAXLOCKS parameter corresponds to the LOCKLIST parameter. MAXLOCKS defines a percentage, which specifies the percentage of the maximum lock space occupied by an application to the LOCKLIST. The log buffer (LOGBUFSZ) refers to a piece of memory space used to buffer logs before they are written to the disk. This can reduce the excessive I/O caused by log writing.
After version 9, DB2 introduced a new feature of the Self-Tuning Memory Manager (STMM: Self Tuning Memory Manager). This feature enables many Memory parameters, such as the SORTHEAP, LOCKLIST, LOGBUFSZ and so ON are automatically adjusted. When the database parameter SELF_TUNING_MEM is set to ON, these parameters can be set to AUTOMATIC. This saves a lot of time for manual adjustment.
The CPU factor is first concerned with the analysis and optimization of the access plan by the DB2 OPTIMIZER. Generally, an SQL statement will be parsed first during execution, then semantic analysis is performed, and then the SQL statement is rewritten, the optimizer performs cost-based analysis on the overwritten SQL statements and selects the most effective access plan. Finally, an executable code (Execution Plan) is generated to execute this statement. There are many tools for querying an access plan, including Visual Explain, a graphical tool, and the command db2exfmt to format the data in an Explain table to generate an access plan. The command db2expln is also used to query the access plan.
In DB2, the optimization level is divided into nine levels. The default is the fifth level. The higher the level, the deeper the optimizer analysis. This level is determined by the Database Configuration Parameter DFT_QUERYOPT. The higher the level, the better, because for some simple SQL statements, if the optimization level is too high, the time spent on SQL optimization will be too long, and the execution time is relatively short, some are not worth the candle. When selecting an access plan, index scanning is often more efficient than table scanning, so index optimization is worth noting. Correct index creation will greatly improve the query performance.
In DB2, JOIN is divided into nest-loop join, merge-join, and hash-join ). In general, nested loop connections are the most efficient. These connections use the flute set for multiple loop traversal to get the result. However, merging connections and hash connections only perform loop traversal once, which is relatively more efficient. The hash join can use multiple equations as the condition, while the merge join can only use a single equation as the condition. However, nested loop connections are more efficient in the case of index scanning. When the optimization level is equal to zero, the connection can only adopt nested loop connections. When the optimization level is greater than or equal to 1, the connection can adopt merged connections. If the optimization level is greater than 5, hash connections can be used for connections. The SORTHEAP must be relatively large because space is required for generating the hash.
When considering the CPU, you must also consider the CPUSPEED parameter, which indicates the CPU running speed and helps the optimizer evaluate the best access plan. Generally, if this parameter is set to-1, the optimizer automatically calculates the CPU speed. In addition, the multi-partition feature can be used to distribute a database to multiple machines, so that the CPU resources of multiple machines can be fully utilized to perform parallel processing of application transactions, this improves the database performance.
I/O factors should consider the following aspects: first, the disk I/O, in order to maximize the disk I/O data, indexes and logs are stored on different hard disks. In a transaction, data and indexes may need to be accessed at the same time. When a transaction is committed, data and logs must be written to the disk at the same time, and indexes may need to be synchronized and maintained, therefore, placing them on different hard disks can make their read/write operations run in parallel without causing the disk to become a bottleneck. At the same time, it is better to select a database management tablespace (DMS) than the system management tablespace (SMS) because the read/write SMS needs to go through the operating system cache and then to the buffer pool, DMS that can use bare devices is not required. However, DMS is more difficult to maintain than SMS.
The second consideration is the log file size. When the database writes transaction logs, when a log file is full, it will switch to another log file, switching such log files will cause overhead on the operating system. Therefore, we should try to increase the log file size (LOGFILSIZ) to reduce the number of Log File switching times. However, if the log file is too large, it will inevitably waste some space.
The isolation level should also be considered. In DB2, the isolation level should be divided into four levels: Repeatable read, read stability, cursor stability and uncommitted read. These four levels are reduced one by one. The higher the isolation level, the more data integrity can be guaranteed, but the concurrency is reduced. Therefore, we should make a decision after comprehensive trade-offs. The isolation level can be changed using the following command:
Change isolation to = CS | RR | RS | UR
In terms of connection, the relationship between proxy and connection should also be taken into account, which will also affect the database concurrency. For details, refer to the resource section.
The last thing to consider is the multi-partition feature. In a multi-partition database, a request is first sent to the Coordination partition, and then the request is subdivided into multiple parts and sent to other partitions by the Coordination partition, so that data can be read and written in parallel in each partition, maximize I/O.
Performance optimization tools
There are many tools and commands related to performance optimization in DB2, which are described below:
- SNAPSHOT: This is a method for DB2 to obtain database information snapshots. It can about buffer pools, locks, sorting, and SQL in the database. DBAs can evaluate each component in the database by obtaining the information to analyze the bottleneck of the problem.
- DB2PD: this command is used to analyze the current status of the database. It has many parameters. It can be used to analyze information such as applications, proxies, memory blocks, buffer pools, logs, and lock statuses.
- RUNSTATS: this command is used to collect the latest statistics of data in the database and update it to the system table. Updating statistics will prompt the optimizer to select a more practical and efficient access plan to improve work efficiency.
- REORG: this command is used to refresh data and index fragments in the database so that they can be physically arranged according to certain rules, which can speed up retrieval.
- DB2DART: this command is a database analysis and reporting tool used to check the correctness of space, indexes, and database structures and analyze some causes of performance problems.
- DB2SUPPORT: this command is used to collect all information about DB2 and the operating system and generate a compressed file, which can be passed to the optimizer for analysis.
Other DB2 files can be used to analyze performance problems, such as diagnostic logs and tracing files. Some third-party tools are also available for reference, such as "tivoli monitor for db2" and QUEST.
Other performance factors
XML optimization:The XML-only data type was introduced after DB2 V9, which is a hierarchical data type. This is different from the traditional relational data type. Before V9, DB2 used the CLOB data type to store XML data. when accessing XML data, applications must Parse XML before using the data. In the pure XML type, you can directly read the elements, which greatly improves the performance. In addition, there are XML indexes for pure XML, which will also increase the access performance.
Operating System:The database exists on the operating system, and the operating system performance will directly affect the operation efficiency of the database. Therefore, optimizing the operating system is also an important process to optimize the database. The memory can be optimized at the operating system level. For example, the system shared memory, semaphore, and virtual memory settings can all affect the database performance. At the same time, the distribution of disks also affects the database I/O efficiency.
Network:The network will affect the I/O performance of the database. when data is transmitted through the network on the client and server, the Network bottleneck will cause a significant decline in the I/O performance of the database. Therefore, selecting a good network device and a well-configured network environment is very important to the database performance. At the same time, we also need to take into account the firewall factors, sometimes the firewall will block packets from some IP addresses.
Li YueLiyyue@cn.ibm.com ),Software Engineer, IBM
Li Yue, software engineer of WebSphere Federation Server testing department, IBM China Software Development Center. I have published an article on optimizing proxy and connection to DB2 in developerWorks China.
Wang feipeng,Software Engineer, WSO2 Inc
Wang feipeng is from the Avalanche team in IBM China and is currently engaged in pre-sales consulting and customer support for migrating Oracle/Teradata databases to DB2 databases; successful Experience in database and data warehouse implementation for China Telecom, Metro, and central ministries and commissions; he is a Training Instructor for DB2 performance optimization, Oracle migration to DB2, DB2 9.7 training camp, IBM Information Server training camp, and DB2 college training camp. He owns three software patents, he has written the book "DB2 Design and Performance Optimization-principles, methods, and practices.
Di Hao,Software Engineer, WSO2 Inc
Di Hao, a software engineer at the IBM China software development center, is mainly engaged in the work of ibm cm content management products. Recently, he was interested in the Performance Tuning of DB2.
Zhang RongrongRrzhang@cn.ibm.com ),Software Engineer, WSO2 Inc