Python implements a simple database (iii) Join multiple table joins and group by grouping

Source: Internet
Author: User
Tags join joins log

In the previous article we implemented a single table query and top n query, this article we will describe how to implement a multiple table join and group by grouping.

One, multiple table connection

The time of a multiple table connection is a time-consuming operation for the database, because the time complexity of the connection is M*n (M,n is the number of records in the table to be connected), and if not optimized, the resulting temporary table for the connection can be very large and needs to be written to disk for processing.

1, double table equivalent join

Let's look at a connection SQL like this:

Select Ps_availqty,ps_supplycost,s_name from
supplier,partsupp
where Ps_suppkey = S_suppkey and Ps_availqty > 2000and s_nationkey = 1;

This SQL can be understood to be an equivalent connection to the S_suppkey attribute of the Supplier table and the Ps_suppkey attribute of the Partsupp table, and to select records that satisfy Ps_availqty > 2000 and S_nationkey = 1. Enter the Ps_availqty,ps_supplycost,s_name property that satisfies the condition record. This understanding for us is very clear, but the database can not be implemented in this way, the above ps_suppkey is actually partsupp foreign key, two tables for the equivalent connection, the resulting connection is very large. So we should start with a single table query, after the single table query filtering and then make the equivalent connection, so need to connect the number of records will be much less.

First, according to Ps_availqty > 2000 to find the Partsupp table to meet the conditions of the record line number set a, and then according to S_nationkey = 1 to find the Supplier table to find the corresponding record row number set B, in the Recordset A, B on the equivalent connection, see the picture is very simple:

The time complexity of the scan in turn is Max (m,n), plus binary lookup, the total time complexity is max (m,n) * (LOG (M1) +log (N1)), where M1, N1 represents the number of records selected where the condition stopper.

Take a look at the results of the execution:

Input Sql:select ps_availqty,ps_supplycost,s_name from Supplier,partsupp where Ps_suppkey = S_suppkey and PS_AVAILQTY &G T
S_nationkey = 1;
            {' From ': [' SUPPLIER ', ' Partsupp '], ' GROUP ': none, ' order ': None, ' SELECT ': [[' Partsupp.ps_availqty ', none, none], [' Partsupp.ps_supplycost ', none, none], [' SUPPLIER. S_name ', none, none]], ' WHERE ': [[' Partsupp.ps_availqty ', ' > ', ' '] ', [' SUPPLIER ']. S_nationkey ', ' = ', ' 1 '], [' partsupp.ps_suppkey ', ' = ', ' SUPPLIER. S_suppkey ']]} Quering:PARTSUPP.PS_AVAILQTY > Quering:supplier. S_nationkey = 1 Quering:PARTSUPP.PS_SUPPKEY = SUPPLIER. S_suppkey output:the result Hava 26322 rows, this is the fisrt rows:-------------------------------------------- -----rows Partsupp.ps_availqty partsupp.ps_supplycost SUPPLIER.        S_name-------------------------------------------------8895 378.49 supplier#000000003 4286
           502.00 supplier#000000003 6996 739.71 supplier#000000003 4436 377.80 supplier#000000003 6728 529.58 Suppl
            ier#000000003 8646 722.34 supplier#000000003 9975 841.19 5401 139.06 supplier#000000003 6858 786.94 supplier#000000003 8268 444.21 Supp lier#000000003-------------------------------------------------Take 26.58 seconds.

The information that follows from the quering can be seen in the order in which we handle the where sub conditions, first by processing a single table query, and then by handling multiple table joins.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.