How relational databases work (iv)

Source: Internet
Author: User
Tags joins cpu usage

Query optimization:

Modern databases use a way of optimizing queries based on cost optimization (see part One), and the idea is to set a cost for each basic operation and then query in a way that minimizes the total cost of an operation sequence to get the best results.

To simplify understanding, queries on the database focus on query time complexity, regardless of CPU consumption, memory footprint versus disk I/O, and database bottlenecks more in disk I/O than CPU consumption.


B + Trees, bitmap index, and so on are common index implementations, and different index implementations have different memory consumption, I/O, and CPU usage. Some modern databases can also create temporary indexes.

get (data) way:

Before the connection query operation, you need to obtain the data, the following is a common way to obtain (data acquisition of the key in disk I/O, so in the measurement of access, the amount of investigation should also be here).

Full Scan

Full scan or scan, which scans the entire table or all indexes, the disk I/O for full table scan is significantly higher than the full index scan.

Range Scan

The Age field has an index that is used when SQL uses the predicate where the < ages >20 (between and is changed to < and > in the query parsing stage above). See the first section to know that the complexity of range scanning is log (N) +m,n is the amount of data in the index, M is the number of search scopes, and the visible range scan has lower disk I/O than full index scan.

Unique Scan

If you want to take only one value from the index, you can use a unique scan.

obtained according to rowID

When querying columns related to an index row, rowID is used, such as the name of the person who queried age=28 (an index on age, name is not indexed):

Select  from where = ;

The above query will follow the following: Query index column age, filter out all the rows of age=28, and then follow the queried row number to look up the name column, that is, read the index reread the table. But the following list does not have to read the table (name has an index):

Select  from where =;

This method is more effective when the amount of data is small, but when the amount of data is large, it is equivalent to a full scan.

other ways to get there

Take Oracle as an example.


After obtaining the data to connect to the data, here are three ways to connect: Merge Join, hash join, nested loop join, and introduction of inner relation and outer relation two concepts. A relational database defines the definition of a relationship, which can be: A table, an index, and the result of the preceding operation.

When connecting two relationships, database connection operations can handle two relationships differently, as defined in this article:

The relationship to the left of the join operator is called outer relation;

The relationship to the right of the join operator is called inner relation.

For example a join b,a called outer relation (often seen in the form of appearances), B is called inner relation (often seen in the inner table). In most cases a join B is not the same cost as B join a. This section assumes that outer relation has n elements, inner relation has m elements (in fact, these information databases can be known through statistics, such as the previous section).

 Nested loop joins (Nested loop join):

Fig. 11

It is generally divided into two steps:

    1. Read outer relation per line
    2. Check if each row in the inner relation matches the connection

  Pseudo Code:

nested_loop_join (array outer, array inner)    for inch outer      for inch Inner       if (Match_join_condition (A, b))        Write_result_in_output (      A, b)if       for    for 

Obviously time complexity is (n*m). From disk I/O considerations, the algorithm needs to read n+n*m rows from disk. It is known that when m is enough hours, only need to read n+m times, so that the reading results can be put into memory, so the general case will be small relation as inner relation.

Of course, while this improves disk I/O, time complexity does not change. If you further optimize disk I/O, you can also consider replacing the inner relation with an index.

  Consider putting inner relation into memory as much as possible, making an improvement, the basic idea:

    1. Read two relationships without row by line, but instead read them in groups and put the group information in memory;
    2. Compare inter-group rows (in-memory), preserving rows that meet the join criteria
    3. Load the other groups sequentially until you compare all the groups in the two relationships.

Pseudo code

//improved version to reduce the disk I/O.nested_loop_join_v2 (file outer, file inner) forEach bunch bainchouter//BA is now in memory     forEach bunch bbinchInner//BB is now in memory         forEach row AinchBA forEach row binchBBif(match_join_condition (b)) Write_result_in_output ( A, b) endifEnd forEnd forEnd forEnd for

 This version has no change in time complexity compared to the previous version, but the disk I/O is significantly smaller: number_of_bunches_for (outer) + number_of_bunches_for (outer) * Number_of_ bunches_for (inner), and the increase in the size of the packet, that is, each time you read more data, you can continue to reduce the number of reads.

Hash Connection (hash join)

  Hash joins are more complex, but in most cases there is less cost than looping nested connections.


Fig. 12


 Basic ideas:

    1. Get all the elements in inner relation
    2. (according to the elements in inner relation) build a permanent memory hash table
    3. Get outer relation all elements individually
    4. Computes the hash value of each element (computes a hash table with a hash function), and compares the elements in the inner relation one by one to determine which bucket the inner relation corresponds to
    5. Determine the corresponding relationship between bucket and outer relation (BUCKT exists outer relation element)

Analyze its time complexity: the number of times the inner relation is divided into X buckets,outer relation and buckets depends on the number of elements in the buckets. The hash function is evenly distributed among the elements in each relationship, meaning that the size of the buckets is the same.

Time complexity: (m/x) * N + cost_to_create_hash_table (M) + cost_of_hash_function*n, when the hash function creates a small enough buckets, such as buckets has only one element, Then the time complexity can be (m+n).

Memory footprint Smaller disk I/O smaller version:

    1. Create a hash table for both inner and outer relation
    2. Put the created hash tables into the disk
    3. Compare the 2 relations bucket by bucket (with one loaded in-memory and the other read row by row)

Merge Join

  A Merge join is the only connection query that produces a sort result.


In the first introduction to merge sort, you can see that the merge sort is a good algorithm (of course, if the memory is not considered a better algorithm, such as a hash join). However, the merge join is typically selected when the following conditions are available.

    1. A relationship (in the table) is already sorted.
    2. A relational join condition has an index
    3. The join condition produces an intermediate result, and the intermediate result is sorted .

Fig. 13


The merge process is similar to the merge sort described earlier, but does not read two relationship elements individually, only elements that match the join criteria are selected. The basic ideas are as follows:

    1. Compare the current elements of the two relations;
    2. If the two elements are equal, remove the element and compare the following elements;
    3. If the two elements are not equal, the smaller elements enter the next comparison.
    4. Repeat until two relations are processed to the last element.

The above thinking is in the two relations has been ordered and any relationship does not exist in the same element of the simplified model, the concrete is more complex.

Time complexity, if two relations have been sorted well, the complexity is n+m; If you need to sort and reconnect first, the complexity is (N*log (N) +m*log (M)).

  Pseudo code

Mergejoin (Relation A, relation B) Relation output integer A_key:=0; Integer B_key:=0;  while(a[a_key]!=NULLand b[b_key]!=NULL)     if(A[a_key] <B[b_key]) A_key++; Else if(A[a_key] >B[b_key]) B_key++; Else //Join Predicate Satisfiedwrite_result_in_output (A[a_key],b[b_key])//We need to is careful when We increase the pointers      if(a[a_key+1] !=B[b_key]) B_key++; Endif      if(b[b_key+1] !=A[a_key]) A_key++; Endif      if(b[b_key+1] = = A[a_key] && B[b_key] = = a[a_key+1]) B_key++; A_key++; EndifEndifEnd while

Algorithm comparison selection:

  1. Memory footprint: If there is not enough memory, basically say goodbye to the powerful hash join (at least also say goodbye to the full memory hash join).
  2. Data volume for 2 relationships: for example, the two tables to be connected, a very large amount of data, a small and small, the effect of nested loop join is better than hash join, because the hash join is very troublesome to create a hash table for a table with huge data volume. If the two tables have a large amount of data, the nested loop join connection mode CPU load will be larger;
  3. Indexed by: If both relationships have a B + Tree index, it is definitely the merge join that works best;
  4. Whether the result needs to be sorted: If you want this connection to get a sorted result (so that you can use the Merge Join method to implement the next connection), or the query itself (with the order By/group by/distinct operator), the result of the ordering required; Even if the 2 relationships that are currently being connected are not well-sequenced, it is advisable to choose a slightly more cumbersome merge join (which can give a sorted result);
  5. The 2 connected relationships themselves are sorted: In this case, a merge join is required;
  6. Connection type: If the connection is equivalent (for example: Tablea.col1 = tableb.col2)? Or is it an inner connection, an outer connection, a Cartesian product, a self-connected? Some connection methods may not handle these different types of connections;
  7. Data distribution: If the data of the connection condition is distorted (for example, to join the table person join condition is the column "last name", but it means that many people have the same surname), this situation if the use of hash join will certainly bring disaster, right? Since the hash function calculates the distribution of the data on each buckets there must be a huge problem (some buckets are small, only one or two elements, while some buckets are too large, good thousands of elements).

The next article will have a brief example of how the process should be changed.

How relational databases work (iv)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.