Use a hashed connection to obtain optimal performance from a DB2 Universal Database

Source: Internet
Author: User

Use a hashed connection to obtain optimal performance from a DB2 Universal Database
The DB2 UDB optimizer can select different methods when executing connections: by default, it selects between nested loop join and merge join. When special environment variables are set, you can also select hash join ). Hash join can significantly improve the performance of some queries, especially in the Decision Support System (DSS) environment, because the query in this environment is more complex. This article aims to explain how hash connections work and how to correctly tune them for optimal performance.
First, we will introduce the background knowledge necessary to read this article. We will explain different types of Connection Methods, how these methods work, and how DB2 UDB selects specific methods for specific connections. Next, we will explain the elements required for optimizing and monitoring hash connections. After that, we will introduce several experiments and many interesting results. The last is our conclusion.
1. Background Knowledge
The connection between two tables is as follows: the rows in one table and the rows in another table are put together. In addition, you can specify conditions to define the parallel rows. To perform this operation, DB2 can select different connection methods. This section describes how different connection methods work and how DB2 selects a connection method for a specific connection.
1.1 connection method
When connecting two tables, no matter which connection method is used, one table is always selected as the outer table, and the other is selected as the inner table ). The optimizer determines the External table and internal table based on the cost and type of the selected connection method. First, access the External table and scan only once. You can scan the internal table multiple times based on the connection type and existing indexes. It is also important to remember that even if you try to connect two or more tables, the optimizer will connect only two tables at a time and save intermediate results if necessary. To understand the advantages of the hash connection method, it is also important to understand the working principles of other connection methods.
Nested loop connection: as we mentioned earlier, the External table is scanned only once. For nested loop connections, there are two methods to find the rows that match each row in the External table:
Scan the internal table. That is, read each row in the internal table and decide whether to connect the row with the row in the External table being considered.
Index search for the connected columns in the internal table. This method is feasible when the table index contains the columns of the join predicates. This greatly reduces the number of rows accessed in the internal table.
In nested loop join, it is very important to determine which is the external table and which is the internal table, because the external table is only scanned once, and each row in the External table must be accessed once. As mentioned above, the optimizer uses a cost model to determine who is the external table and who is the internal table. When the optimizer makes this decision, several factors will be taken into account: the table size, buffer, predicate, sorting requirement, existence of index, and connection Column cannot be a LONG or LOB field.
Merge join: The merge join requires an equality join predicate (that is, a predicate with table1.column = table2.column format ). It also requires sorting the input table based on the connection column. This can be done by scanning existing indexes or sorting tables before they are connected. The connection Column cannot be a LONG or LOB field.
Scan two tables at the same time to find matching rows. Both external and internal tables scan only once, unless there are duplicate values in the External table, then some parts of the internal table may be scanned again. Because a table is usually scanned only once, it is not as important as other join methods to determine which table is the external table and which is the internal table. However, since duplicate values may exist, the optimizer usually selects a table with fewer duplicate values as the External table. However, the optimizer uses the cost model to determine who is the external table and who is the internal table.
Hash join: one or more equality join predicates are required for the hash join. Each of the predicates has the same column type. For the CHAR type, the length must be the same. For DECIMAL type, the precision and DECIMAL place must be the same. Similarly, the connection Column cannot be a LONG or LOB field. The fact that hash join can process multiple equality predicates is a major advantage over merge join. The latter can only process one equality predicates.
For hash join, first scan the internal table (also called build table, bulid table), and the rows in the table are copied to the memory buffer. Based on the hash code, These buffers are divided into several partitions. The hash code is calculated based on the columns in the join predicates. If the memory does not have enough space to accommodate the entire table, some partitions are written to the temporary table on the disk. Then scan the External table (called probe table ). Apply the same hash algorithm to the connection columns for each row in the test table. If the obtained hash code matches the hash code of the build row, the actual join column is compared. If the partition matching the row in the test table is in memory, the comparison is performed immediately. If a partition is written to a temporary table, the test row is also written to the temporary table. Finally, a temporary table containing rows in the same partition is processed for matching.
Because of the benefits of saving the build table in memory, the optimizer usually selects a small table as the build table to avoid the need to overflow the table (spill) to the disk. However, the cost model determines which table is the internal table and which table is the external table.
Let's take a deeper look at how hash connections use the SMP system. In SMP systems (where intra-parallel = ON and dft_degree> 1), a hash connection may be executed in parallel by multiple tasks ON the same CPU or multiple CPUs. During execution in parallel mode, the build table is dynamically divided into multiple parallel tuples. Each stream is processed by an independent task to input the build tuples into the memory. At the end of the process for building table streams, the hashed connection process will adjust the memory content and perform any necessary operations to move partitions into or out of memory. Next, you can process multiple parallel tuples in the test table based on the partitions in the resident memory, to overflow the parallel tuples in the hash join partition of the temporary table. Finally, overflow partitions are processed in parallel. Each task processes one or more overflow partitions.
2.2 which connection method is selected?
So far, we have discussed different connection methods available in DB2 UDB. As we know, some methods seem to be a better choice than other methods. For example, compared with nested loop connections of tables scanned in each row of the External table, the combined connections have the advantage of only scanning the table once. Therefore, merging connections seems to be a better choice. However, if an index exists, nested loops will be a better choice.
Similarly, the hash join seems to be a better choice than the merge join because it does not need to sort the input table before execution, but if we need to maintain the order of rows in the External table, merge connections or nested loop connections may be a better choice-Hash Connections cannot maintain the order, because they may overflow to the disk and disrupt the order.
So how does the DB2 UDB optimizer decide which connection method to use for specific connections? First, it must consider the type of the predicate in the query. When a possible connection method is selected, the DB2 UDB optimizer then determines which connection method to use based on the cost model and the selected optimization level. The optimization level is the configurable parameter in the database configuration file, which tells the optimizer how much optimization is required. The higher the value, the more optimization operations. Possible optimization levels include 0, 1, 2, 3, 5, 7, and 9. The impact of these values on possible connection methods is as follows: nested loop connections are feasible at each optimization level, and merged connections are feasible at optimization level 1 and above. Hash connections are feasible at optimization level 5 and above.
3. How to optimize and monitor hash connections
Hash join can significantly improve the performance of some queries, especially in the decision-making support system (DDS) with complex queries. Compared with merged connections, one of the advantages of hash connections is that it does not require any sort in advance, and the cost of sorting is very high. The key to hashed join is to put all (or as many) rows of the created table into the memory without having to overflow the table to the disk. The memory used by these buffers comes from sortheap, so tuning of this parameter (and sheapthres) is very important.
Sortheap is a database configurable parameter that defines the maximum amount of memory that can be used for sorting or hashed connections. Each sort or hash connection has an independent sortheap, which is allocated by the database manager as needed. Not all sorting and hash connections need to allocate this amount of memory. If you do not need all the memory, you can allocate a smaller sortheap. Sortheap can be allocated from the shared or dedicated memory as needed. Share sortheap only for intra-parallel (SMP) queries with degree> 1; dedicated sortheap is used when only one Agent executes sorting or hash connections and does not need to share memory with other proxies. A query may have multiple hash connections and/or sorting, and multiple sortheap may be allocated at the same time according to the nature of the plan. This is also important to understand.
Sheapthres is a configurable parameter of the Database Manager. The usage of this parameter varies with sharing and dedicated sorting:
For dedicated sorting or hash connections, sheapthres serves as a soft limit on the total amount of memory that can be used for all concurrent dedicated sorting instances. When the memory used reaches this limit, the memory allocated for the new sortheap request will be significantly reduced.
For shared sorting or hash connections, sheapthres imposes a hard limit on the total memory used by all shared sorting instances. When the memory used is close to this limit, the memory allocated for new sortheap requests will be significantly reduced, and memory requests for shared sorting will no longer be allowed.
In a single processor system, hash connections only use dedicated sortheap. In SMP systems (where intra-parallel = ON and dft_degree> 1), hash connections use shared sortheap in the first phase of the operation, and dedicated sortheap in the last phase. For more information about these parameters, see the Administration Guide: Performance provided with DB2 UDB.
How can we optimize these parameters? Like most tuning exercises, you need a starting point: a tool for testing workloads and measuring tuning results. The next step is the iterative process of changing a parameter and then performing measurement. In most cases, the sortheap and sheapthres values may have been set in the existing system. Therefore, we recommend that you start with the current settings. If you have a completely new installation, you can follow the experience of the DSS System to allocate 50% of the memory to the buffer pool, and 50% to sheapthres. Then, divide sheapthres by the number of complex concurrent queries to be executed at the same time, and then divide it by the maximum number of concurrent sorting and hash connections in the general query to get sortheap. (Numbers 5 or 6 are suitable for the start .) Summary:
Sortheap = sheapthres/(number of complex concurrent queries * Maximum number of concurrent sorting and hash connections in a general query)
Remember that "number of complex concurrent queries" is not equal to "number of concurrent users ". The number of users is usually much larger than the number of complex concurrent queries executed each time. In most DSS TPC-H benchmarking programs that manage to maximize the capabilities of the database, for a database of up to 10 TB, make the number not greater than 8 or 9 as the number of complex concurrent queries (also known as streams). Therefore, keep it conservative at the beginning and then increase it as necessary.
After the initial sortheap and sheapthres values are set, a general workload is run and snapshots of the database and Database Manager are collected. DB2 UDB provides some monitor elements to monitor hash connections. You do not need to turn on any monitor switch because all hash Connection Monitor elements are collected using the Basic monitor switch. In addition, all elements are counters and can be reset. For more information about how DB2 UDB snapshots work, see the System Monitor and Reference Guide provided with DB2 UDB.
The following describes each monitor element:
1. Total Hash Joins: This monitor element calculates the Total number of Hash connections executed. This value can be collected using database or application snapshot.
2. Hash Join Overflows (Hash Join Overflows): This monitor element calculates the total number of times that the Hash Join attempts to put rows in the build table into memory. This value can be collected using database or application snapshot.
3. hash Joins Small Overflows: This monitor element calculates the total number of times that the Hash connection is less than 10% of the available sortheap when you try to put the rows in the table to the memory. A small overflow indicates that increasing sortheap helps improve performance. This value can be collected using database or application snapshot.
4. Total Hash Loops: This monitor element calculates the Total number of times that a single partition of a Hash connection is greater than the Total number of sortheap instances available.
Remember that if DB2 UDB cannot put all rows in the table to the memory, some partitions will overflow into the temporary table for post-processing. When processing these temporary tables, DB2 tries to load each build partition into a sortheap. DB2 UDB then reads the corresponding probe rows and tries to match them with the build table. If DB2 cannot load certain build partitions into sortheap, DB2 must use the hash loop algorithm. It is very important to monitor the hash loop, because the hash loop shows that the execution efficiency of the hash join is low and may cause serious performance degradation. It may indicate that the sortheap size is too small for the workload, or, more likely, sheapthres is too small to get requests for the sortheap memory. This value can be collected using database or application snapshot.
5. Hash Join Threshold: This monitor element calculates the total number of times that Hash Join sortheap requests are limited due to concurrent use of shared or dedicated memory heap space. This means that the hash connection requests a certain amount of sortheap, but the result is less than the request. It may indicate that sheapthres is too small for the workload. This value can be collected using the database manager snapshot.
Test the hljoin connection:
1. sortheap Experiment
As we mentioned earlier, tuning the sortheap and sheapthres parameters is the most important factor in the performance of hash connections. The main purpose of this experiment is to understand how different sortheap sizes affect the performance of hash connections. In this test, we used all the initial tuning described in the previous section, only changed the sortheap size, and kept sheapthres with its original value of 192000. We only run one stream in these tests. The result is displayed in Figure 5.
The displayed time is compared with the time used to run sortheap on the 256 page. We regard the time used to run the 256 page sortheap workload as a time unit. In this experiment, all other results are expressed relative to this unit.
Let's analyze the results in detail. We can see that the larger the sortheap, the less overflow we encounter and the better performance. We get the best performance on page 40000, with only three overflow.
The first question is, why cannot a larger sortheap give us better performance? If you look at the monitor information shown in figure 6, you can see that we hit sheapthres on page 6. Furthermore, the larger the sortheap, the worse the performance we get, because the more times we hit sheapthres. The reason for performance degradation when sheapthres is hit is that when sortheap requests hit the threshold value, DB2 tries to satisfy it, but the given memory is less than the requested memory. As we can see in these examples, although the sortheap we request is larger than before, it gets more overflow. This is because the sortheap we get is actually less than our request, which leads to overflow, thus reducing the performance of hash connections.
By observing the chart, we noticed another phenomenon: With sortheap going from 256 pages to 500 pages, from 500 pages to 1000 pages, and then to 1000 pages, we have made great improvements on the left side of the chart. Why? After observing the monitor information, we can find that the number of hash loops decreases significantly with the increase of sortheap and disappears at page 4000, after that, the performance gain of sortheap is greatly lower than that of the previous one.
We can conclude from these results that the biggest enemy of hash join is the hash loop. Although hit sheapthres may cause performance degradation, as we can see when sortheap is greater than 40000, the performance decreases by an order of magnitude when hit sheapthres occurs, keep this in mind when it is impossible to avoid both cases at the same time.
2 sheapthres Experiment
The main purpose of these experiments is to understand the relationship between sheapthres optimization, sortheap and sheapthres, and the impact of this relationship in hash connections. For this purpose, we divide the experiment into five groups of tests. In each group of tests, we fixed the sortheap size and changed sheapthres so that it was twice, three, four, five, six, and seven times of sortheap. The size of the five sortheap used is 5000, 10000, 15000, 20000, and 30000 pages. The results are displayed in figures 7 to 9.
The displayed time is compared with the time used to run when sortheap is 5000 pages and when sheapthres is 10000 pages (twice the sortheap size. Therefore, we regard the time used to run the 5000 page sortheap and 10000 page sheapthres workloads as a time unit. In this experiment, all other results are expressed relative to this unit.
If you observe figure 7, we can see that for all sortheap, the optimal performance of this particular workload occurs 6 to 7 times the sheapthres value is sortheap value. Observe the monitor data in figure 8 and Figure 9. We can see that at this level we have eliminated all hash connection threshold hits. In all examples, the best improvement occurs when the sheapthres value changes from two times of the sortheap value to three times, because the hash connection threshold decreases significantly at that time. In addition, when sortheap is 5000, the hash loop decreases significantly when the sheapthres value is changed from double to triple. As stated in the DB2 UDB manual, it is generally recommended that sheapthres be at least three times the sortheap, but we want to see what the results of the small sheapthres hash connection will be. We will never recommend that sheapthres be three times lower than sortheap for your actual system.
Another important phenomenon observed is that the performance gains obtained by increasing sheapthres are very small in some cases. In the overall performance tuning solution of databases/instances, it may not be cost-effective to allocate memory to sheapthres, but (for example) a larger sortheap or more buffer pools may be more cost-effective. You need to know how to set sheapthres and its impact on the entire system. The most important conclusion we have drawn from these experiments is that the optimization of sheapthres has some impact on the performance of hash connections, but this effect is not as significant as the optimization of sortheap.
6. Conclusion
As we can see, hash connections can significantly improve performance with only a few optimizations. As we demonstrated in the first experiment, as long as you follow some basic experience, you can significantly improve performance by enabling hash connections in our DSS workload. We also demonstrated how to improve the scalability when there are many streams.
We also know from the two main parameters-sortheap and sheapthres-that the sortheap is the most important. By adding sortheap, we may avoid the biggest enemy of the hash connection, which is a hash loop. In addition, we can see that the larger the sortheap, the less overflow and better performance. This does not mean we can ignore sheapthres, which will certainly affect performance, but usually sheapthres is determined by the amount of memory available on the machine. So we can draw the following conclusion: the best way to optimize the hash connection is to determine how much memory you can allocate to sheapthres, and then tune the sortheap value accordingly.
The key to optimizing hash connections is to make sortheap as large as possible to avoid overflow and hash loops, but not to exceed the sheapthres limit.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.