We often see DB2 table join operations. The following describes the principle of DB2 table join in detail. If you are interested in this, take a look.
In DB2, the optimizer can select nested connections or merge connections. If correct, you can also select hash connections. If the system is properly tuned, hash connections can significantly improve the performance of some queries. The DB2 optimizer can select different methods when executing the connection: by default, it is selected between nested loop join) and merge join. When special environment variables are set, you can also select hash join ).
The connection between two tables is as follows: the rows in one table and the rows in another table are put together. In addition, you can specify conditions to define the parallel rows. To perform this operation, DB2 can select different connection methods. When connecting two tables, no matter which connection method is used, one table is always selected as the External table) and the other table is selected as the inner table ). The optimizer determines the External table and internal table based on the cost and type of the selected connection method. First, access the External table and scan only once. You can scan the internal table multiple times based on the connection type and existing indexes. It is also important to remember that even if you try to connect two or more tables, the optimizer will connect only two tables at a time and save intermediate results if necessary. To understand the advantages of the hash connection method, it is also important to understand the working principles of other connection methods.
Nested loop connection
As we mentioned earlier, the appearance is scanned only once. For nested loop connections, there are two methods to find the rows that match each row in the External table:
Scan the internal table. That is, read each row in the internal table and decide whether to connect the row with the row in the External table being considered.
Index search for the connected columns in the internal table. This method is feasible when the table index contains the columns of the join predicates. This greatly reduces the number of rows accessed in the internal table.
In nested loop join, it is very important to determine which is the external table and which is the internal table, because the external table is only scanned once, and each row in the External table must be accessed once. As mentioned above, the optimizer uses a cost model to determine who is the external table and who is the internal table. The optimizer will consider the following factors when making this decision:
Table size
Buffer
Predicate
Sorting requirements
Index?
The connection Column cannot be a LONG or LOB field.
Merge connections
The merge join requires an equality join predicate that has the table1.column = table2.column format ). It also requires sorting the input table based on the connection column. This can be done by scanning existing indexes or sorting tables before they are connected. The connection Column cannot be a LONG or LOB field.
Scan two tables at the same time to find matching rows. Both external and internal tables scan only once, unless there are duplicate values in the External table, then some parts of the internal table may be scanned again. Because a table is usually scanned only once, it is not as important as other join methods to determine which table is the external table and which is the internal table. However, since duplicate values may exist, the optimizer usually selects a table with fewer duplicate values as the External table. However, the optimizer uses the cost model to determine who is the external table and who is the internal table.
Hash join
Hash join requires one or more equality join predicates, with each of which has the same column type. For the CHAR type, the length must be the same. For DECIMAL type, the precision and DECIMAL place must be the same. Similarly, the connection Column cannot be a LONG or LOB field. The fact that hash join can process multiple equality predicates is a major advantage over merge join. The latter can only process one equality predicates.
For hash join, first scan the internal table, also known as build table, bulid table), the rows in the table are copied to the memory buffer. Based on the hash code), These buffers are divided into several partitions. The hash code is calculated based on the columns in the join predicates. If the memory does not have enough space to accommodate the entire table, some partitions are written to the temporary table on the disk. Then scan the External table called probe table ). Apply the same hash algorithm to the connection columns for each row in the test table. If the obtained hash code matches the hash code of the build row, the actual join column is compared. If the partition matching the row in the test table is in memory, the comparison is performed immediately. If a partition is written to a temporary table, the test row is also written to the temporary table. Finally, a temporary table containing rows in the same partition is processed for matching.
Because of the benefits of saving the build table in memory, the optimizer usually selects a small table as the build table to avoid having to overflow the table with spill) to the disk. However, the cost model determines which table is the internal table and which table is the external table.
Which connection method is selected?
So far, we have discussed different connection methods available in DB2. As we know, some methods seem to be a better choice than other methods. For example, compared with nested loop connections of tables scanned in each row of the External table, the combined connections have the advantage of only scanning the table once. Therefore, merging connections seems to be a better choice. However, if an index exists, nested loops will be a better choice.
Similarly, the hash join seems to be a better choice than the merge join because it does not need to sort the input table before execution, but if we need to maintain the order of rows in the External table, merge connections or nested loop connections may be a better choice-Hash Connections cannot maintain the order, because they may overflow to the disk and disrupt the order.
Common DB2 cycle usage
Implementation of creating a database in DB2
DB2 database authorization
DB2 Authentication Overview
Create a table in DB2 -- a table with an auto-incrementing Column