Oracle table Join mode analysis __oracle

Source: Internet
Author: User
Tags joins first row

first, the introduction

Data Warehouse technology is a more mature and widely adopted solution, it is used for all the dispersed raw business data in the whole telecom operation Enterprise, and it can support different departments, different needs and different levels of users to obtain the information they need at any time by means of convenient and effective data access. Data Warehouse system needs to be able to track and analyze a large number of historical data in time, and can make analysis and prediction in time, so real-time is a very important indicator. due to the characteristics of reliability, high performance and so on, ORACLE has been in the background of most of the data Warehouse system in the telecom industry . the role of the database . Due to the characteristics of the telecommunications industry, the amount of data processed is very large and the processing time is long. Especially for the association operation between large table, some large table record number reaches hundreds of millions of, processing time is long, this becomes the main factor that affects the database running efficiency. Therefore, it is very important to optimize the performance of the database. Performance optimization is a big issue that needs to be taken into account, starting with servers, disks, networks, Oracle instances, Oracle SQL , and more. This paper focuses on the analysis of the table connection mode, characteristics and application scope of Oracle SQL optimization which has great influence on system performance, and discusses how to use and optimize it.

Second, the table connection

A table's connection is the retrieval of related data from one or more tables in an SQL statement by an association between tables and tables. A connection is achieved through the multiple table names of the FROM clause in the SQL statement, and the join conditions between the tables defined in the WHERE clause. If there are more than two association tables for an SQL statement, then what is the order of the joins? Oracle first connects the two tables, produces a result set, then associates the resulting result set with the next table, continues until all the tables are connected, and finally produces the data that is required. For example, the following are two-table connections:

CREATE TABLE User_info (user_name char (), user_id char (10));

CREATE TABLE Dev_info (Dev_no char (), user_id char (Ten), Dev_type char (10));

Describes and analyzes the various ways in which tables are connected.

ORACLE starts with version 6, the optimizer uses 4 different tables to connect: nested loop joins (NESTED loop join) cluster connections (CLUSTER join) Sort merge joins (sort merge Join) Cartesian joins (Cartesian join In Oracle 7.3, new Hash joins (hash join) are added to Oracle 8, and new index joins are added (indexed join)

These six kinds of connection methods have their own unique technical characteristics, under certain conditions, can give full play to high-performance performance.

But they also have its limitations, if used improperly, not only can not improve efficiency, but will seriously affect the performance of the system. Therefore, it is necessary to deeply discuss the internal operating mechanism of the connection mode for performance optimization.

1. Nested Loops connection

The process of internal processing of nested loops of connections:

1 The Oracle Optimizer selects one of the two tables as the driving table and specifies that it is an external table, based on the principles of rule-based Rbo or cost based CBO.

2 The Oracle optimizer then assigns another table as an internal table.

3 Oracle reads the first row from the external table and then contrasts it with the data in the internal table, and all matching records are placed in the result set.

4 Oracle reads the second row in the external table, compares it to the data in the internal table, and adds all the matching records to the result set.

5 Repeat the above steps until all the records in the external table are processed.

6 finally produce the result set satisfying the requirement.

You can see which table is an external table and which is an internal table by querying the execution plan of the SQL statement.

such as select A.user_name,b.dev_no from User_info A, dev_info b where a.user_id = b.user_id;

The table above is the external table, which is the driver table. The following table is the execution plan for the internal table:

SELECT STATEMENT Optimizer=choose
NESTED LOOPS
TABLE ACCESS (Full) ' User_info '
TABLE ACCESS (Full) ' Dev_info '

using nested loops joins is one of the quickest ways to extract the first batch of records from a result set. Nested loops join effects are ideal when driving a row source table (that is, the record being looked for) is small, or if the columns connected to the internal Row source table have a unique index or a highly optional, not unique index. Nested loops joins have an advantage over other join methods, and they can quickly extract the first batch of records from the result set without waiting for the entire result set to be completely finalized. In this way, in an ideal situation, end users can view the first batch of records through the query screen while simultaneously reading other records. The nested loops join is very flexible, regardless of how the condition or pattern of the connection is defined, and any two rows of record sources can use nested loops.

However, nested loop joins are inefficient if the Inner Row source table (the second table that is read) does not contain an index on the connected column, or if the index is not highly selectable. If the record of the driver table is very large, other connection methods may be more efficient.

You can force the Oracle optimizer to produce an execution plan for nested loops of connections by adding hints to the SQL statement.

2. Cluster Connection (CLUSTER join)

Cluster connections are actually a special case of nested loops joins. If the two source tables that are connected are tables in the cluster, where two tables belong to the same segment (SEGMENT), Oracle can use cluster connections. The process is that Oracle reads the first row from the first Row source table and then uses the cluster index in the second Zhangxing source table to find the records that can be matched; Continue with the steps above to process the second row in the Row source table until all the records have been processed.

Cluster connections are highly efficient because two of the row source tables participating in the connection are actually on the same physical block. However, there are limits to cluster connectivity, and it is not possible for two tables with no cluster to connect to the cluster. Therefore, cluster connections are actually rarely used.

3. Sort merge joins (sort merge Join)

To sort the process of merging internal processing of a merge connection:

1 The optimizer determines whether the first source table has been sorted, if it has been sorted, to step 3rd, otherwise to step 2nd.

2 The first source table sorting

3 The optimizer determines whether the second source table has been sorted, if it has been sorted, to step 5th, otherwise to step 4th.

4 The second source table sorting

5 The two source tables that have been ordered are merged and the final result set is generated.

When the data is not selective or available, or if the two source tables are too large (the selected data exceeds 5% of the table record number), the sorted merge join will be more efficient than the nested loops link.

Arranging merge joins requires large temporary memory blocks for sorting, which will result in more memory and disk I/O in the temporary table space.

Select A.user_name,b.dev_no from User_info A, dev_info b where a.user_id > b.user_id;
Plan
--------------------------------------------------
SELECT STATEMENT optimizer=choose (cost=7 card=336 bytes=16128)
MERGE JOIN (cost=7 card=336 bytes=16128)
SORT (JOIN) (cost=4 card=82 bytes=1968)
TABLE ACCESS (Full) ' User_info ' (cost=2 card=82 bytes=1968)
SORT (JOIN) (cost=4 card=82 bytes=1968)
TABLE ACCESS (Full) ' Dev_info ' (cost=2 card=82 bytes=1968)

You can force the Oracle optimizer to produce an execution plan for a sorted merge connection by adding hints to the SQL statement.

Select/*+ Use_merge (a b) * * A.user_name,b.dev_no from User_info A, dev_info b where a.user_id > b.user_id;

The sort merge join is based on Rbo.

  4. Cartesian connection (Cartesian join)

A Cartesian connection is a condition in which a table connection is not written in an SQL statement, and the optimizer joins each record of the first table with all the records in the second table. If the number of records in the first table is M and the second table has a record of M, the number of M*n records will be generated.

The following query, without naming the join condition, produces a Cartesian connection.

Select A.user_name,b.dev_no from User_info a, dev_info b;

Because the Cartesian connection can result in poorly performing SQL, it is generally rarely used.

5. Hash connection

Hash connections are a common choice for Oracle Optimizer when memory can provide enough space. In a hash connection, the optimizer first selects a small table in two tables based on statistics. A hash table based on the join key is created in memory, the optimizer scans the large table in the table connection, compares the data in the large table with the hash, and adds the data to the result set if there are associated data.

A hash join works best when a small table in a table connection can be completely cache to available memory. The cost of a hash connection is only the cost of two tables being read from the hard disk into memory.

However, if the hash table is too large to cache all available memory, the optimizer will divide the hash table into multiple partitions, and then cache the partitions into memory one at a time. When the partition of a table exceeds the available memory, part of the data in the partition is temporarily written to the temporary tablespace on the disk. As a result, the partition's data is written to disk, and the larger interval (EXTENT) improves I/O performance. The interval for Oracle's recommended temporary table space is 1MB. The interval size of the temporary table space is specified by uniform size.

When the hash table is built, do the following:

1) Second large table for scanning

2 If the large table can not be completely cache to the available memory, the large table will also be divided into many partitions

3 The first partition of large table cache to memory

4 The data of the first partition of the large table is scanned, and compared with the Hashtable, if there is a matching record, added to the result set inside 5, like the first partition, the other partitions are similar.

6 after all the partitions are processed, Oracle merges and aggregates the resulting set of results to produce the final result.

When the hash table is too large or the available memory is limited, the hash table cannot be completely cache memory. As the result set that satisfies the join condition increases, the available memory drops, and the data that is already cache memory may be written back to the hard drive. If this happens, the performance of the system will decrease.

The optimizer may use a hash connection when the two tables connected are connected with an equivalent value and the table has a large amount of data. The hash join is based on the CBO. Oracle uses a hash edge connection only if the database initialization parameter hash_join_enabled is set to true and a sufficiently large value is set for the parameter pga_aggregate_target. Hash_area_size is a backward-compatible parameter, but hash_area_size should be used in previous versions of Oracle9i. When you use the ordered prompt, the first table in the FROM clause is used to create a hash table.

Select A.user_name,b.dev_no from User_info A, dev_info b where a.user_id = b.user_id;
Plan
----------------------------------------------------------
0 SELECT STATEMENT optimizer=choose (cost=5 card=82 bytes=3936)
1 0 HASH JOIN (cost=5 card=82 bytes=3936)
2 1 TABLE ACCESS (full) ' User_info ' (cost=2 card=82 bytes=1968)
3 1 TABLE ACCESS (full) ' Dev_info ' (cost=2 card=82 bytes=1968)

You can force the Oracle optimizer to produce a hash connection execution plan by adding hints to the SQL statement.

Select/*+ Use_hash (a b) * * A.user_name,b.dev_no from User_info A, dev_info b where a.user_id = b.user_id;

A hash connection is more efficient than a nested loop connection when a useful index is missing. Hash joins can also be faster than nested loops, because processing in-memory hash tables is quicker than retrieving b_ tree indexes.

6. Index Connection

If a set of existing indexes contains all the information that the query requires, the optimizer will selectively generate a set of hash tables in the index. You can access each index through a range or fast global scan, and the choice of scan depends on the conditions in the WHERE clause. This is very effective when you have a large number of columns in a table and you want to access only limited columns. The more a WHERE clause is constrained, the faster it executes. Because the optimizer evaluates the optimization path of the execution query, it treats the constraint as an option. You must index the appropriate columns (those that satisfy the entire query), so that you can ensure that the optimizer has an indexed connection as one of the optional options. This task usually involves adding indexes to columns that do not have an index, or that have not previously established a federated index. Compared to fast global scans, the advantage of connecting indexes is that fast global scans have only one single index to satisfy the entire query, and index connections can have multiple indexes that satisfy the entire query.

Suppose there are two dev_info on the table (one in Dev_no, one on Dev_type).

Make the following query

Select Dev_no,dev_type
From User_info
where user_id = ' U101010 '
and Dev_type = ' 1010 ';

comparison of several main table joins

category

Nested loops Join

sort Merge Joins

Hash Connection

Optimizer hints

Use_nl

Use_merge

Use_hash

Conditions of Use

any connection

mainly used for unequal price connection, such as <, <=, >, >=;

but not including <>

for equivalent connections only

Related Resources

CPU, disk I/O

memory, temporary space

memory, temporary space

features

It is more efficient to return to the first search results quickly when there is a high selectivity index or a restrictive search.

sort merge joins are more efficient than nested loops when the index is missing or the index condition is blurred.

a hash join connection is more efficient than a nested loop when the index is missing or the index condition is blurred. Usually faster than the sort merge connection.

in a data warehouse environment, if the table has more records, the efficiency is high.

Disadvantage

When the index is lost or the query condition is not limited, the efficiency is very low, while the number of records in the table is low.

All tables need to be sorted. It is designed for optimized throughput and does not return data until the results are found.

a large amount of memory is required to create a hash table. The results of the first time return slower.

Iv. Concluding remarks

Deep Understanding and mastery of Oracle table joins is critical to optimizing database performance. Due to the different selection of optimizer, the lack of statistic information and inaccurate statistic information, it is not necessarily optimal for Oracle to choose the table join mode automatically. When the execution of SQL statements is inefficient, the execution plan can be tracked and analyzed through auto trace. When multiple table joins occur, you need to carefully analyze whether there are better connection conditions. Depending on the characteristics of the system, you can add hints to SQL if necessary to change the execution plan of SQL to achieve performance optimization.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.