This article discusses the mechanisms behind join technology. We know that common table joins have the following kinds of
Descartes Connection
Internal connection
Left outer connection
Right outer connection
Fully connected
As you can see in these SQL notation, how do these connected data accesses work?
Let's look at the following query
sql> alter session set optimizer_mode=rule; Session altered. sql> select ename,dname from emp,dept where emp.deptno=dept.deptno;14 Rows selected. Execution plan----------------------------------------------------------plan hash value: 3625962092------------------------------------------------| id | operation | name |------------------------------------------------| 0 | SELECT STATEMENT | | | 1 | NESTED LOOPS | | | 2 | NESTED LOOPS | | | 3 | TABLE ACCESS full | emp | | * 4 | index unique scan | pk_dept | | 5 | TABLE ACCESS BY INDEX ROWID| DEPT |------------------------------------------------predicate information (identified by operation id):--------------------------------------------------- 4 - access ("EMP"). DEPTNO "=" DEPT "." DEPTNO ")
According to our previous interpretation of the implementation plan, this query is implemented as follows:
Full table Scan EMP table (non-blocking scan, do not take all the data out before performing the next step).
The data in the EMP is fetched out by index pk_dept, and the result set becomes (ENAME,ROWID) by querying the ROWID in the index.
The result set of 2 is fetched out by ROWID to access the Dept table, and the result set becomes (Ename,dname)
Returns the result set.
This join implementation, which takes the data out of the loop, is called a nested loop.
This plan can be implemented with the following logical pseudo-code
For y in (for x in (SELECT * from emp) loop index Lookup the ROWID for X.deptno output joined record (ename, DEPT.ROWID) end loop) Loop select * FROM dept where rowid=y.rowid Output joined record (ename,dname) end loop
We refer to the EMP table as the driver table (Note that the driver table is independent of the table order of the FROM clause, mainly the execution plan).
This kind of connection method, applies to the driver table to return the data to be few, and the Deptno column has the index on the Driver table dept. If the query returns n rows, then the Dept table will be scanned n times. This connection specializes in quickly extracting the first row from the result set.
Hash join is suitable for processing large result sets, the optimizer chooses two tables or smaller in the source data, using the join key to create a hash table in memory. Then scan the large table and probe the hash table to find the matching records.
A small table is called a driver table, and a large table is called a profiling table
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/71/9B/wKioL1XUrsiwbid2AAHA4Wz1Vxc804.jpg "title=" hash Join.jpg "alt=" Wkiol1xursiwbid2aaha4wz1vxc804.jpg "/>
This is best when hash table can be put all in memory. If the hash table is not placed in memory, the optimizer will partition the hash table and the partition beyond the memory range will be written to the temporary tablespace.
We discuss the implementation of the hash join in two different situations.
Hash table All in memory
Hash table is a hash function used by Oracle to split a small table into buckets based on join key benefit. After the hash table is established, Oracle scans the large table and uses the same hash algorithm to divide the read data into buckets. The join operation between the bucket and the bucket returns the result. Until the big table is finished reading.
2. Hash table cannot be put into memory
Using the algorithm, the small table is partitioned first (partition) so that a partition can be loaded into memory. The large table is also made the same partition. The partition of the table and the partition are taken out of the data in a 1 implementation manner. After you finish a partition, load another partition into memory and continue the same operation until the data is all processed.
sql> insert into big_emp select * from big_emp; sql> insert into big_emp select * from big_emp; #重复执行多次SQL > / 458752 rows created. sql> create table dept_new as select * from dept; Table created. Sql> set autot traceonlysql> select * from big_emp a,dept_new b where a.deptno=b.deptno;917504 rows selected. Execution plan----------------------------------------------------------plan hash value: 1925493178-------------------------------------------------------------------------------| id | operation | name | rows | bytes | Cost (%CPU) | time |--------------------------------------- ----------------------------------------| &Nbsp; 0 | select statement | | 917k| 54m| 1490 (2) | 00:00:18 | | * 1 | hash join | | 917k| 54m| 1490 (2) | 00:00:18 | | 2 | TABLE ACCESS FULL| DEPT_NEW | 4 | 120 | 3 (0) | 00:00:01 | | 3 | TABLE ACCESS FULL| BIG_EMP | 917k| 28m| 1482 (1) | 00:00:18 |------------------------------ -------------------------------------------------predicate information (identified by Operation id):--------------------------------------------------- 1 - access ("A". ") DEPTNO "=" B "." DEPTNO ") Note----- - dynamic sampling used for this statement (level=2) Statistics---------------------------------------------------------- 4 recursive calls 1 db block gets 66338 consistent gets 0 physical reads 0 redo size 62512398 bytes sent via sql*net to client 673349 bytes received via SQL*Net from Client 61168 sql*net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 917504 rows processed
This article is from the "Ding Dong" blog, please be sure to keep this source http://lqding.blog.51cto.com/9123978/1686257
Oracle Learning Performance Optimization (vii) How join IS implemented