Analysis of connection of database tables by Cartesian product phenomena

Source: Internet
Author: User
Tags joins

First, let's briefly explain the Cartesian product.

Now, we have two collections A and B.

A = {0,1} B = {2,3,4}

The result set of the collection AXB and BXA can be represented in the following form, respectively:

AXB = {(0,2), (UP), (0,3), (1,3), (0,4), (1,4)};

Bxa = {(2,0), (2,1), (3,0), (3,1), (4,0), (4,1)};

The results of the above AXB and BXA can be called "Cartesian product" multiplied by two sets.

From the above data analysis, we can draw the following two conclusions:

1, two sets multiply, do not meet the exchange rate, both axb≠bxa;

The 2,a collection and the B collection are multiplied to include all the possibilities of combining elements in a collection A with those in set B. The number of elements in the new collection that is multiplied by the two collection is the number of elements in the set of elements of the A set.

The algorithm that the database table joins data row matches is the Cartesian product mentioned above, the connection between table and table can be considered as multiplication operation.

For example, now there are two tables in the database, the student table and the Student_subject table, as follows:

  

We execute the following SQL statements purely for table joins.

SELECT * from student JOIN Student_subject; SELECT * from Student_subject JOIN student;

Take a look at the results:

  

Table 1.0 Table 1.1

From the results of implementation, the results are in line with our two conclusions (the Red Line callout part);

Take the first SQL statement as an example let's take a look at his execution flow,

The 1,from statement loads the student table and the Student_subject table from the database file into memory.

The 2,join statement is equivalent to multiplying two tables, matching each row in the student table sequentially with the records in the Student_subject table.

3, when the match is complete, we get a temporary table with the strip (number of records in student Xstudent_subject table). The temporary tables formed in memory are shown in Table 1.0. We also refer to the table 1.0 in memory as the ' Cartesian product table '.

In view of the above theory, we put forward a question, whether the table joins the time to form a Cartesian accumulation table, if the data of the two tables are relatively large, then it will occupy a large amount of memory space this is obviously unreasonable. Therefore, when we make Table connection query generally use the syntax of join XXX on XXX, the execution of the ON statement is before the join statement, that is to say, two table data rows between matching, will first determine whether the data row conforms to the conditions behind the on statement, and then decide whether to join.

Therefore, there is an obvious solution to SQL optimization is that when the data volume of the two tables is larger, and you need to connect the query, you should use the FROM table1 JOIN table2 on XXX syntax, avoid using the from Table1,table2 WHERE XXX syntax, because The latter will make a larger Cartesian scale in memory, increasing the overhead of memory.

Based on the previous blog (http://www.cnblogs.com/cdf-opensource-007/p/6502556.html), and the analysis of this blog, we can summarize a query SQL statement execution process.

From

On

JOIN

WHERE

GROUP by

SELECT

Having

ORDER by

LIMIT

Finally, for the underlying implementation of the two database table joins, I modeled it with Java code, which is interesting to look at and can help us understand:

Package Com.opensource.util;import Java.util.arrays;public class Decareproduct {public static void main (string[] A RGS) {//using a two-dimensional array, simulating student table string[][] student ={{"0", "Jsonp"}, {"1", "                Alice "}}; Using a two-dimensional array, simulate Student_subject table string[][] student_subject2 ={{"0", "0", "language"}, {"1", "0", "        Math "}};        Analog SELECT * from student JOIN Student_subject;        string[][] ResultTowArray1 = Gettwodimensionarray (STUDENT,STUDENT_SUBJECT2);        Analog SELECT * from Student_subject JOIN student;                string[][] ResultTowArray2 = Gettwodimensionarray (student_subject2,student);        int length1 = Resulttowarray1.length;        for (int i = 0; I <length1; i++) {System.out.println (arrays.tostring (Resulttowarray1[i]));        } System.err.println ("-----------------------------------------------");        int length2 = Resulttowarray2.length;for (int i = 0; I <length2; i++) {System.out.println (arrays.tostring (Resulttowarray2[i]));      }}/** * Simulates the operation of two table connections * @param towArray1 * @param towArray2 * @return */public static string[][] Gettwodimensionarray (string[][] towarray1,string[][] towArray2) {//Get two-dimensional        The height of the array (which has several one-dimensional arrays in the two-dimensional array used to refer to the number of records in the database table) int high1 = Towarray1.length;                int high2 = Towarray2.length;        Gets the width of the two-dimensional array (two-bit array, the length of the one-dimensional array used to refer to the columns in the database table) int wide1 = Towarray1[0].length;                int wide2 = Towarray2[0].length;        Calculates the height and width of the array of result sets obtained by the two-dimensional array after the Cartesian product operation, both the number of rows and the number of columns of the Cartesian product table int resulthigh = HIGH1 * HIGH2;                int resultwide = wide1 + wide2;                Initializes an array of result sets, both Cartesian integrable string[][] resultarray = new String[resulthigh][resultwide];                iteration variable int index = 0; The 22nd dimension array is traversed for the for (int i = 0; i < HIGH2; i++) {//Take out TowArray2 this twoElements of a string[] Temparray = towarray2[i]; Loop nesting, towArray1 This two-dimensional array traversal for (int j = 0; J < High1; J + +) {//Initialize a length of ' Resul                                An array of Twide ', as an element of the result set array, both one row of the Cartesian product table string[] tempextened = new String[resultwide];                                Take out towArray1 element of this two-dimensional array string[] tempArray1 = towarray1[j]; Copy the elements of the tempArray1 and temparray two arrays into the elements of the result set array.                (The array expansion is used here) system.arraycopy (tempArray1, 0, tempextened, 0, temparray1.length);                                System.arraycopy (temparray, 0, tempextened, Temparray1.length, temparray.length);                                Put the tempextened into the result set array resultarray[index] = tempextened;            Iteration plus a index++;                    }} return resultarray; }}

Execution Result:

Analysis of connection of database tables by Cartesian product phenomena

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.