International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > SQL Server

I/O cost for the three large SQL Server algorithms

Last Update:2017-02-28 Source: Internet

Author: User

Tags hash join joins

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Nested loop join (nested cyclic join)

Algorithm:

The idea is quite simple and straightforward: for each tuple R of relation R, it is directly compared with each tuple s of relation s in the field of the join condition, and the eligible tuples are filtered out. The pseudocode is written in the following code:

Price:

The order of the joined tables in the inner or outer layers has a very important impact on disk I/O overhead. The CPU overhead is relatively small, mainly because the cost of the tuple read into memory (in-memory) is O (n * m)

For I/O overhead, according to the page-at-a-time prerequisites, I/o cost = m + M * N,

The cost of I/O is the overhead of reading I/O cost of M page + I/o cost reading n pages.

2. Sort-merge Join (sort merge Join)

Nested loop is generally inefficient when two sets are large, and sort-merge is much more efficient in this case, especially if there is a clustered index (clustered index) on the join field of the two collection. Sort-merge performance will be best achieved.

Algorithm:

The basic idea is also very simple (review the data structure of the merge sort bar), there are two main steps:

A. Sort by join Field

B. Merge the two sets of sorted collections and compare them with each other from the source side (you need to do a special "Partition" based on whether there are duplicate values in the Join field)

Cost: (mainly I/O overhead)

There are two factors around the cost of Sort-merge: whether the join field is sorted and how many duplicate values are on the join field.

In the best case (two columns are sorted and at least one column has no duplicates): O (n + m) only needs to scan each of the two sets. (The m,n here is better if you can use the index.)

Worst case (two columns are not sorted and all values on both columns are the same): O (n * log n + M * Log m + n * m) two times sorted and a Cartesian product between all tuples

3. Hash join (Hash joins)

A Hash join is essentially similar to the processing thought of a sort-merge-partition (patitioning), when both columns have duplicate values. But they are also different: Hash joins are partitioned by hash partitions (each bucket is a partition) and sort-merge by sorting (each duplicate value is a partition).

It is noteworthy that the large difference between the hash join and the above two algorithms is also a large limitation that it can only be applied to the equivalent join (equality join), which is mainly due to the deterministic and unordered nature of the hash function and its bucket.

Algorithm:

The basic hash join algorithm consists of the following two steps:

With nested loop, build input is located above the execution plan, probe input is located below.

The hash join operation is done in two phases: the build (construction) phase and the probe (probing) phase.

A.build Input Phase: Based on the join field, the hash function is used to build an in-memory (in-memory) hash table with the same key value as the linked list of a bucket (bucket).

B.probe Input Phase: A hash table is checked on a larger R set to complete the join.

Price:

It is noteworthy that for each tuple r of a large set R, each tuple in the bucket of the corresponding r in the hash bucket needs to be compared with R, which is where the algorithm is most time-consuming.

CPU overhead is O (M + n * b) b is the average number of tuples per bucket.

Summarize:

Three join methods, all with two inputs, are optimized for the basic principles:

1. Avoid large data hash join, (hash join suitable for low concurrency, he occupies memory and Io is very large);

2. Try to convert it into an efficient merge join, nested loop join. Possible tools are table structure design, index tuning design, SQL optimization, and business design optimization.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

mcsa sql server certification cost websites for three ways import large sql file large emojis for android litespeed for sql server sql manager for sql server navicat for sql server

SQL Server 2005 32-bit + 64-bit, Enterprise Edition + Standar... 12-24

pymssql calling SQL Server stored procedure with output param... 09-07

Nolock, HOLDLOCK, UPDLOCK, TABLOCK, Tablockx in SQL Server 08-28

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

I/O cost for the three large SQL Server algorithms

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support