Physical connection of a relational database

Source: Internet
Author: User
Tags joins one table

The left Join,inner join that is often heard in development, right join, and cross join are logical connections to the database, so how does the database implement the associated table at the bottom when the database is executing these connections? This is the physical connection.
Most relational databases currently support physical connections to 3 physical connections (MySQL appears to be 5.6 unknown to a nested connection only)
Database: 3 methods are considered when a database is connected to two tables: nested joins, merge connections, Hash joins. 3 Methods of connection have a table selected as the appearance (only once), a table is selected as the inner table (in-pit scan multiple times). If the query is connected to more than one table, each query can only connect 2 tables and save the intermediate results, and the user joins the other table.
1, nested connections: the appearance is scanned only once, and each row for the appearance needs to be scanned for the inner table. (Note: The connection cannot use a long or LOB field)
2, merge connections: A merge connection requires an equality junction word (A.col1=b.col1), and two connected tables are ordered based on the connection column. Merge connections both the table and the appearance are scanned only once, and sometimes you need to scan some of the values in the inner table again, for example, if there are duplicate values on the outside. Therefore, the optimizer usually chooses a table with a lower repetition value as its appearance.
3, hash join: A hash join requires one or more equality join predicates, where each predicate has the same type. The char type must be the same length. Decimal progress must be the same. A hash join can handle multiple equality predicates. For a hash join, the inner table (also known as the Component table build tables) is scanned first, and the rows in the table are copied to the memory buffer. Depending on the hash key, these buffers are divided into partitions, and some partitions are written to a temporary table on disk if there is not enough space in the memory to accommodate the entire table. Then scan the exterior. Then scan the exterior (probe table). For each row in the appearance, the consent hashing algorithm is applied to the join column. If the obtained hash key matches the hash key of the row, read all rows inside the partition and apply the join verb. If the partition that matches the probe table row is in memory, reading the partition and applying the connection verb will take place immediately. If the partition is written to a temporary table, the probe row is also not written to the staging table. Finally, the matching uniform partition rows and corresponding partitions are processed together.


The optimizer uses 3 connected scenarios:
Nested connections:
The nested loop connection is a good choice for cases where the subset of data being connected is small. Nested loop is to scan a table, each read to a record, according to the index to the other table to find, no index is generally not nested loops.
Generally in the nested loop, the driver table satisfies the condition result set is not big, the driver table's connection field must have the index, thus goes nstedloop. If the driver table returns too many records, it is not suitable for nested loops. If the connection field does not have an index, it is appropriate to walk the hash join because no index is required.
Inner table is driven by the outer table, and each row returned by outer table is retrieved in Inner table to match the row.
Outer table: Small table, Driver table
Inner table: driven, Big table

Hash join: A hash join is a common way to enlarge a dataset connection. The optimizer scans the small table (or data source), uses the connection key (that is, calculates the hash value based on the Connection field), establishes a hash table in memory, and then scans the large table, each time a record is read to probe the hash table, and the rows matching the hash table are found.
When a small table can be put all in memory, its cost is close to the sum of the cost of a full table scan of two tables. If the table is large and cannot be completely put into memory, then the optimizer splits it into several different partitions, which cannot be put into the memory section to write the partition to a temporary segment of the disk, with a large temporary segment to maximize I/O performance. Partitions in a temporary segment need to be swapped into memory for a hash join. At this time the cost is close to the full table Scan small table + partition number * Full table scan large table cost and.
As for the partitioning of two tables, the benefit is that you can use parallel query, which is where multiple processes join and then merge with different partitions. But complex.
Hash join may have advantages under the following conditions:
A connection between two huge tables.
A connection between a huge table and a small table.

Merge connections:
Make table access full for each table that is connected;
The results of table access full are sorted by the connection key;
The merge join merges the sorting results.
The Sort Merge Join performance cost is almost always in the first two steps. The fastest way to merge connections is compared to the first two connections. But also look at the database index and data volume situation.

In general, the effect of a hexi connection is better than a merge connection, but if the row source has been ordered and no reordering is required to perform the merge connection, the performance of the merge connection will be better than the hash connection.
Merging can be better than nesting performance in cases where full table scan peso range scanning is preferable to table access through ROWID.

Each type of relational database has parameters that can be used to force the specified connection to be tested ...

Physical connection of a relational database

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.