Oracle Table Connection

Source: Internet
Author: User
Tags joins sorts

Oracle table Connection was asked, feel the details are still far from enough ah, from the Internet to find information to learn. Reproduced.

In the ViewSQLWhen we execute the plan, we find that there are many ways to connect tables, and this article introduces how tables are connected in order to better understand the execution plan and understand the principles of SQL execution.
First, the connection method:
Nested Loops (Nested Loops (NL))
(hash) Hash connection (hash join (HJ))
(merge) sort merge joins (sort merge Join (SMJ))

Second, connection description:
1.OracleOnly two tables can be connected at a time. Regardless of the number of tables in the query,OracleOnly two tables can be manipulated at a time in a connection.
2. When performing a connection to multiple tables, the optimizer starts with a table, joins it to another table, then joins the intermediate result to the next table, and so on until all tables are processed.

Third, the table connection detailed:
1. Nested Loops (Nested Loops (NL)):
Nested loop implementation mechanism (pseudo code):
For R1 in (select rows from Table_1 where Colx={value})
Loop
For R2 in (select rows from table_2, that match, current row from Table_1)
Loop
Output values from the current row of table_1 and current row of table_2;
End Loop;
End Loop;
This code is composed of two loops.
These two tables in a nested loop are often referred to asExternal Tables (Outer table) and internal tables (inner table).
In a nested loop join, the outer table is also known asDriver Table (driver table)
pseudo code: Table_1 as Driver table, table_2 as inner table
It can be seen from pseudocode that the connection process is a 2-layer nested loop, so the less the number of outer loops the better, which is why we use a small table or a table that returns a smaller result set as the driving table.
NEST LOOP JOIN cost = costs to get data from the first table + the cardinality of the results from the first table Х the cost of accessing the second table once
Therefore, nested loops are generally suitable for fewer driver table recordsets (<10000) and the inner table has an efficient index access method.
Use USE_NL (table_1 table_2) to force the CBO to perform nested loop joins.
Driver Table Determination: The driver table "select rows from Table_1 where Colx={value}" is generally a table with a smaller result set based on the where condition, not necessarily a smaller table for the entire table record.
         
  2. (hash) hash connection (hash join (HJ)):
Hash joins are typically used for a small table and a large table for joinwhen。 In most cases, the hash join efficiency is more efficient than the other join methods.
For a detailed understanding of the hash join, you can see a more thorough article written on the Web:HTTP://WWW.HELLODBA.COM/READER.PHP?ID=144&LANG=CN

3. Sort merge joins (sort merge Join (SMJ)):

Generally, the effect of a hash join is better than a sort merge connection, but if the row source has already been sequenced and does not need to be sorted when the sort merge connection is performed, the performance of the sort merge connection is better than the hash join. You can use Use_merge (table_1 table_2) to force the use of sort merge connections.
Procedure: Sort two tables, and then merge the two tables after sorting.

Iv. Summary of connection methods:
1)) nested loop (Nest Loop):
Nested loops are a good choice for cases where the subset of data being connected is small. In a nested loop, the outer surface drives the inner table, and each row returned by the surface is retrieved in the inner table to find the row it matches, so the result set returned by the entire query cannot be too large (more than 10000 inappropriate), the table with the smaller returned subset as the Appearance (driver table), and must have an index on the join field of the inner
2) Hash connection (hash join):
Hash joins are a common way of connecting large datasets, using smaller tables in two tables, using connection keys to create a hash table in memory, and then scanning large tables and detecting hashes to find rows that match the hash list.
This applies to a situation where a smaller table can be completely put into memory, so the cost is the sum of the cost of accessing the two tables. However, when the table is very large, it cannot be completely put into memory, when the optimizer divides it into several different partitions, which cannot be put into the memory section to write the partition to a temporary segment of the disk.
Hash connections can only be applied toequivalent Connection(such as where a.col3 = B.col4), non-equivalent connections (where A.col3 > B.col4), outer joins (where A.col3 = B.col4 (+)).
3) Sort Merge connection (sort merge Join)
Generally, hash joins are better than sort merge connections. However, if the row source has already been sequenced and does not need to be reordered when performing a sort merge join, the performance of the sort merge connection is worrying about the hash connection.

Five, the connection mode application scenario:
1. Hash connection only applies to equivalent connections.
2. Nested loops are row source connections and are only suitable for small amounts of data connections.
Hash joins and Sort merge connections are collection connections and are suitable for a large number of data connections.
3. In the equivalent connection mode, a small number of records (&LT;10000) are returned and the internal table has an index on the connection column, which is suitable for nested loops. A hash connection is appropriate if a large number of records are returned.
4. In the equivalent connection mode, two row source collections are large, and if the connection column is a high cardinality column, it is appropriate to hash the connection, otherwise it is appropriate to sort the merge connection.
5. A nested loop connection can return rows that have already been connected without having to wait for all of the connection operations to finish processing before returning data. The other two ways to connect are not.
6. The two datasets of a sort merge connection can be processed in parallel, while nested loops and hash connections cannot.

Note: Organized from the network

NLJ:
Depending on the connection key, each row of the small table is compared to a large amount of each row. In general, indexis built on the large table connection key.
Cost calculation: Read the row of a small table + (each row of a small table x reads rows of large tables)

SMJ:
Reads small tables and large table-read rows, sorts according to the connection key, and then joins according to the sorted dataset (small table and large table).
Ideal state: 2 table sorting operations can be performed in memory
General situation: 2 phases:
1.sort Run stage: Data reads into memory, sorts, writes out to temporary table space. Until all row sourse are finished sorting.
2.merge Stage: The data that was written to the temporary tablespace (that is, sort run) was reread into memory for merge.
Costing: reading small table rows + writing small table run sort to temp table space +
Read large table rows + write large table run sort to temp table space +
CPU sort consumption for small tables and large tables

parallel mechanisms in join joins:
Can be used in NLJ and SMJ. The execution plan for a concurrent query is a tree structure (DFO), and the DFO node on each tree is a SQL operation procedure that can be assigned to a query slave process.

Hash Join:
In an environment where conditions are equal equals, hash joins are more efficient than SMJ and NLJ (if the blevel of the index is higher), and the hash join does not require an index.
The basic algorithm of hash join is to build hash table in memory, the small table is called build input, in ideal state, build input is in memory; Big table is called probe input.
In practice, build input does not have to be completely in memory, and as with probe input, the overflow portion of build input is split into small discontinuous partitions on disk with a hash function.
Hash connections are performed in 2 phases:
1.partitioning stage: That is, the memory of the build input, if not, and probe input, the same on disk using the hash function to split input into small discontinuous partitions.
1.join stage: On the same key value, the partition of build input and probe input are paired one by one and join.
The above hash join algorithm is also called Grace join.

The limit of the hash algorithm: The algorithm is assumed that the value of the hash after the gradient (skew) is not high, so that each partition maintain about the same number of rows. But it is virtually impossible to guarantee that each partition has about the same amount of rows.

Hybrid hash join IS inOracleThe more efficient hash algorithm applied after 7.3, it is on the basis of grace join, as far as possible in the presence of build build input.
But since it is not possible to ensure the same rows in each partition, there are laterTechnologysuch as bit-vector filtering, role reversal and histograms. We'll talk about these techniques in a later chapter.
The number of partitions we call fan-out. Fan-out too much will lead to more partition, thus affecting IO, but if fan-out too little will cause a small number of large partition, these large partition can not be placed in the hash memory. Therefore, choosing a suitable fan-out and partition size is the key to the hash join tuning.

When partitioning, build input or probe input cannot be dropped in memory, the overflow portion of hash table will be nested-loops hash join.
Hash table is composed of part (in-memory) build input partition and all probe input connections. The remaining build input that is not in memory will continue to be fetched by iteration until all build input is iterated.

Hash Join rule:
Suppose there are 2 of tables:
S = {1, 1, 1, 3, 3, 4, 4, 4, 4, 5, 8, 8, 8, 8, 10}
B = {0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 8, 9, 9, 9, 10, 10, 11}
First, the hash_area_size determines whether the small table can do a build table. If build input is not fully in memory, then build input is partitioned, the number of partitions we call fan-out,
Fan-out is determined by hash_area_size and cluster size, and cluster size refers to the number of contiguous blocks in the partition that have not yet been written out to the temporary table space.
Cluster Size=db_block_size * Hash_multiblock_io_count,hash_multiblock_io_count is an implied parameter in Oracle9i
The hash algorithm divides s and b tables into non-connected buckets (buckets), which are also called partitions (partition), and the hash algorithm minimizes the tilt of the data, making the data as evenly distributed as possible.

As an example of the S and B tables above, if we simply assume that the hash algorithm is to take the remainder, then:
The partition for S is: {0,1,3,4,5,8}
The partition for B is: {0,1,2,3,8,9}
After such a partition, only the corresponding partition is required to do a join (that is, the so-called partition pairs), if a partition is null, then the corresponding partition join can be ignored. That is, they can make connections on the 0,1,3,8.
Correspondingly, if we use SMJ or NLJ, the consumption on the connection is much higher.

When build input is read into hash area memory for partitioning, the unique column value in the build input table is built as a connection key, the so-called bitmap vector (bitmap vector).
In the example above, bitmap vector is: {1,3,4,8,10}.
The bitmap vector is used to determine which rows are needed and which are not needed when the partitioning phase is connected to the large table (probe input), which is what we call bit-vector filtering technology.
When partitioning the B table, the values on each connection key are compared to the bitmap vectors, and if not, their records are discarded. In our example, the following data in table B will be discarded
In this example, the following data in table B will be discarded {0,0,2,2,2,2,2,2,9,9,9,9,9}.

When the first s partition and the B partition have been connected, it is necessary to read the section I s partition and the B partition into the memory to do the connection, at this time, according to the size of the partition, automatically choose which do build input, which do probe input. This is the dynamic role conversion technique, the role reversal we have described earlier.

In general, the hash algorithm is the following steps (consider the hash area size is not large enough to write to the disk case):
1. Determine the number of fanout, that is, the number of partitions. Number of partitions xcluster size <= hasharea proportional xhashareasize size that can be used in memory
2. Read the S table and map the values on the connection column to the partition according to the internal hash algorithm (we are temporarily called hash_fun_1). In this step, another hash function (which we call hash_fun_2) is used to generate another value, which is stored with the connection key. This value will be used in the subsequent build Hashtable.
3. Bitmap vector formed for the independent connection key of S table
4. According to the size of partition, so as much as possible (that is, as small as possible partition into memory. This is the partition that was previously to be sorted according to partition size into memory to build the Hashtable. If the memory is not enough to drop all the parittion, output to tempsegment).
5. Use the previous hash value to build the hashtable of the S table.
6. Read the B table, according to the bitmap vector filter, if the value of B after the hash algorithm is not in the comparison with the bitmap vector, then discard the row.
7. Filter the rows of table B and use the internal hash_fun_1 and connection keys to form the partition.
8. If the rows of table B can be partitioned in memory, the internal hash_fun_2 is used to perform the connection and the appropriate hash bucket is formed.
9. If a partition cannot be formed in memory, the partition of S, the connection key, and the remaining rows of table B are written out to disk.
10. Reads the partitions of the unhandled S and B tables from the disk. Build hashtable with internal hash_fun_2 values, and use dynamic role conversion techniques at build time. In the first cycle, the optimizer will first use the small table to do the buildinput, the large table do probeinput, the role conversion technology only after the first cycle of use.
11. If Probeinput or buildinput (already converted by role) the smaller one cannot be put into memory, then the smaller buildinput will be read into the memory chunk, and the loop and probeinput do the hash connection. This is what we call Nestedhashloopsjoin.

Cost calculation for hash join:
1. In the simplest case, the hash area is large enough to put down all build input after the S table partition:
Cost (HJ) =read (S) +build hash table in memory (CPU) +read (B) +perform. In memory join (CPU)
Cost (HJ) is infinitely close to read (s) +read (b) if the costs of the CPU are ignored

2. When the hash area (denoted by m) is not large enough to hold the build input,s, it will be written out to disk. Of course, Table B will also write out to disk:
Total cost is infinitely close to cost (HJ Cycle 1) +cost (HJ-Cycle 2)
where cost (HJ Cycle 1) is infinitely close to read (S) +read (B) +write ((s-m) + (b-b*m/s))
That is 2 to 9 steps above.

Due to the HJ Cycle 2 the nested hash loops Join,hash Join algorithm is used to process the SI and bi partitions. When the chunk of each build input is read, probe input is read multiple times.
The cost (HJ-Cycle 2) is therefore infinitely close to read ((S-M) +nx (b-b*m/s))
That is 10 to 11 steps above.
N for the number of nested hash loops joins, n is generally above 10, which is the hash area that needs to be constructed for partition greater than 10 times times.

Oracle Table Connection

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.