Left Join, leftjoin

Source: Internet
Author: User

Left Join, leftjoin

After the development has executed a statement for more than two hours, the developer asked me why I have been running it for so long.
Statement format:select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=6 order by a.id;This is a typical misunderstanding. The intention is to filter Table[] Left join]Let's see what is real.[Left join].

[gpadmin@mdw ~]$ psql bigdatagppsql (8.2.15)Type "help" for help.bigdatagp=# drop table tgt1;DROP TABLEbigdatagp=# drop table tgt2;DROP TABLEbigdatagp=# explain  select t1.telnumber,t2.ua,t2.url,t1.apply_name,t2.apply_name from gpbase.tb_csv_gn_ip_session t1 ,gpbase.tb_csv_gn_http_session_hw t2 where  t1.bigdatagp=# \q                                                                                                                                                       bigdatagp=# create table tgt1(id int, name varchar(20));                                                                                                             NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.CREATE TABLEbigdatagp=# create table tgt2(id int, name varchar(20)); NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.CREATE TABLEbigdatagp=# insert into tgt1 select generate_series(1,3),('a','b');ERROR:  column "name" is of type character varying but expression is of type recordHINT:  You will need to rewrite or cast the expression.bigdatagp=# insert into tgt1 select generate_series(1,5),generate_series(1,5)||'a';INSERT 0 5bigdatagp=# insert into tgt2 select generate_series(1,2),generate_series(1,2)||'a';    INSERT 0 2bigdatagp=# select * from tgt1; id | name ----+------  2 | 2a  4 | 4a  1 | 1a  3 | 3a  5 | 5a(5 rows)bigdatagp=# select * from tgt1 order by id; id | name ----+------  1 | 1a  2 | 2a  3 | 3a  4 | 4a  5 | 5a(5 rows)bigdatagp=# select * from tgt2 order by id;  id | name ----+------  1 | 1a  2 | 2a(2 rows)bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id; id | name | id | name ----+------+----+------  3 | 3a   |    |   5 | 5a   |    |   1 | 1a   |  1 | 1a  2 | 2a   |  2 | 2a  4 | 4a   |    | (5 rows)bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id order by a.id; id | name | id | name ----+------+----+------  1 | 1a   |  1 | 1a  2 | 2a   |  2 | 2a  3 | 3a   |    |   4 | 4a   |    |   5 | 5a   |    | (5 rows)bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id where id>=3 order by a.id;ERROR:  column reference "id" is ambiguousLINE 1: ...* from tgt1 a left join tgt2 b on a.id=b.id where id>=3 orde...                                                             ^bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id where a.id>=3 order by a.id; id | name | id | name ----+------+----+------  3 | 3a   |    |   4 | 4a   |    |   5 | 5a   |    | (3 rows)bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=3 order by a.id;         id | name | id | name ----+------+----+------  1 | 1a   |    |   2 | 2a   |    |   3 | 3a   |    |   4 | 4a   |    |   5 | 5a   |    | (5 rows)bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id where a.id>=6 order by a.id;  id | name | id | name ----+------+----+------(0 rows)bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=6 order by a.id;      id | name | id | name ----+------+----+------  1 | 1a   |    |   2 | 2a   |    |   3 | 3a   |    |   4 | 4a   |    |   5 | 5a   |    | (5 rows)bigdatagp=# explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id where a.id>=3 order by a.id;                                                                    QUERY PLAN                                                                     --------------------------------------------------------------------------------------------------------------------------------------------------- Gather Motion 64:1  (slice1; segments: 64)  (cost=7.18..7.19 rows=1 width=14)   Merge Key: "?column5?"   Rows out:  3 rows at destination with 21 ms to end, start offset by 559 ms.   ->  Sort  (cost=7.18..7.19 rows=1 width=14)         Sort Key: a.id         Rows out:  Avg 1.0 rows x 3 workers.  Max 1 rows (seg52) with 5.452 ms to first row, 5.454 ms to end, start offset by 564 ms.         Executor memory:  63K bytes avg, 74K bytes max (seg2).         Work_mem used:  63K bytes avg, 74K bytes max (seg2). Workfile: (0 spilling, 0 reused)         ->  Hash Left Join  (cost=2.04..7.15 rows=1 width=14)               Hash Cond: a.id = b.id               Rows out:  Avg 1.0 rows x 3 workers.  Max 1 rows (seg52) with 4.190 ms to first row, 4.598 ms to end, start offset by 565 ms.               ->  Seq Scan on tgt1 a  (cost=0.00..5.06 rows=1 width=7)                     Filter: id >= 3                     Rows out:  Avg 1.0 rows x 3 workers.  Max 1 rows (seg52) with 0.156 ms to first row, 0.158 ms to end, start offset by 565 ms.               ->  Hash  (cost=2.02..2.02 rows=1 width=7)                     Rows in:  (No row requested) 0 rows (seg0) with 0 ms to end.                     ->  Seq Scan on tgt2 b  (cost=0.00..2.02 rows=1 width=7)                           Rows out:  (No row requested) 0 rows (seg0) with 0 ms to end. Slice statistics:   (slice0)    Executor memory: 332K bytes.   (slice1)    Executor memory: 446K bytes avg x 64 workers, 4329K bytes max (seg52).  Work_mem: 74K bytes max. Statement statistics:   Memory used: 128000K bytes Total runtime: 580.630 ms(24 rows)bigdatagp=# explain analyze  select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=3 order by a.id;                                                                        QUERY PLAN                                                                        --------------------------------------------------------------------------------------------------------------------------------------------------------- Gather Motion 64:1  (slice1; segments: 64)  (cost=7.23..7.24 rows=1 width=14)   Merge Key: "?column5?"   Rows out:  5 rows at destination with 24 ms to end, start offset by 701 ms.   ->  Sort  (cost=7.23..7.24 rows=1 width=14)         Sort Key: a.id         Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 6.292 ms to first row, 6.294 ms to end, start offset by 715 ms.         Executor memory:  70K bytes avg, 74K bytes max (seg0).         Work_mem used:  70K bytes avg, 74K bytes max (seg0). Workfile: (0 spilling, 0 reused)         ->  Hash Left Join  (cost=2.04..7.17 rows=1 width=14)               Hash Cond: a.id = b.id               Join Filter: a.id >= 3               Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 4.422 ms to first row, 5.055 ms to end, start offset by 717 ms.               Executor memory:  1K bytes avg, 1K bytes max (seg42).               Work_mem used:  1K bytes avg, 1K bytes max (seg42). Workfile: (0 spilling, 0 reused)               (seg42)  Hash chain length 1.0 avg, 1 max, using 1 of 262151 buckets.               ->  Seq Scan on tgt1 a  (cost=0.00..5.05 rows=1 width=7)                     Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 0.179 ms to first row, 0.180 ms to end, start offset by 717 ms.               ->  Hash  (cost=2.02..2.02 rows=1 width=7)                     Rows in:  Avg 1.0 rows x 2 workers.  Max 1 rows (seg42) with 0.194 ms to end, start offset by 721 ms.                     ->  Seq Scan on tgt2 b  (cost=0.00..2.02 rows=1 width=7)                           Rows out:  Avg 1.0 rows x 2 workers.  Max 1 rows (seg42) with 0.143 ms to first row, 0.145 ms to end, start offset by 721 ms. Slice statistics:   (slice0)    Executor memory: 332K bytes.   (slice1)    Executor memory: 581K bytes avg x 64 workers, 4353K bytes max (seg42).  Work_mem: 74K bytes max. Statement statistics:   Memory used: 128000K bytes Total runtime: 725.316 ms(27 rows)bigdatagp=# explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id where a.id>=6 order by a.id;                                                    QUERY PLAN                                                  -------------------------------------------------------------------------------------------------------------- Gather Motion 64:1  (slice1; segments: 64)  (cost=7.17..7.18 rows=1 width=14)   Merge Key: "?column5?"   Rows out:  (No row requested) 0 rows at destination with 6.536 ms to end, start offset by 1.097 ms.   ->  Sort  (cost=7.17..7.18 rows=1 width=14)         Sort Key: a.id         Rows out:  (No row requested) 0 rows (seg0) with 0 ms to end.         Executor memory:  33K bytes avg, 33K bytes max (seg0).         Work_mem used:  33K bytes avg, 33K bytes max (seg0). Workfile: (0 spilling, 0 reused)         ->  Hash Left Join  (cost=2.04..7.15 rows=1 width=14)               Hash Cond: a.id = b.id               Rows out:  (No row requested) 0 rows (seg0) with 0 ms to end.               ->  Seq Scan on tgt1 a  (cost=0.00..5.06 rows=1 width=7)                     Filter: id >= 6                     Rows out:  (No row requested) 0 rows (seg0) with 0 ms to end.               ->  Hash  (cost=2.02..2.02 rows=1 width=7)                     Rows in:  (No row requested) 0 rows (seg0) with 0 ms to end.                     ->  Seq Scan on tgt2 b  (cost=0.00..2.02 rows=1 width=7)                           Rows out:  (No row requested) 0 rows (seg0) with 0 ms to end. Slice statistics:   (slice0)    Executor memory: 332K bytes.   (slice1)    Executor memory: 225K bytes avg x 64 workers, 225K bytes max (seg0).  Work_mem: 33K bytes max. Statement statistics:   Memory used: 128000K bytes Total runtime: 8.615 ms(24 rows)bigdatagp=# explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=6 order by a.id;                                                                               QUERY PLAN                                                                       -------------------------------------------------------------------------------------------------------------------------------------------------------- Gather Motion 64:1  (slice1; segments: 64)  (cost=7.23..7.24 rows=1 width=14)   Merge Key: "?column5?"   Rows out:  5 rows at destination with 115 ms to end, start offset by 1.195 ms.   ->  Sort  (cost=7.23..7.24 rows=1 width=14)         Sort Key: a.id         Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 6.979 ms to first row, 6.980 ms to end, start offset by 12 ms.         Executor memory:  72K bytes avg, 74K bytes max (seg0).         Work_mem used:  72K bytes avg, 74K bytes max (seg0). Workfile: (0 spilling, 0 reused)         ->  Hash Left Join  (cost=2.04..7.17 rows=1 width=14)               Hash Cond: a.id = b.id               Join Filter: a.id >= 6               Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 5.570 ms to first row, 6.157 ms to end, start offset by 12 ms.               Executor memory:  1K bytes avg, 1K bytes max (seg42).               Work_mem used:  1K bytes avg, 1K bytes max (seg42). Workfile: (0 spilling, 0 reused)               (seg42)  Hash chain length 1.0 avg, 1 max, using 1 of 262151 buckets.               ->  Seq Scan on tgt1 a  (cost=0.00..5.05 rows=1 width=7)                     Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 0.050 ms to first row, 0.051 ms to end, start offset by 12 ms.               ->  Hash  (cost=2.02..2.02 rows=1 width=7)                     Rows in:  Avg 1.0 rows x 2 workers.  Max 1 rows (seg42) with 0.153 ms to end, start offset by 18 ms.                     ->  Seq Scan on tgt2 b  (cost=0.00..2.02 rows=1 width=7)                           Rows out:  Avg 1.0 rows x 2 workers.  Max 1 rows (seg42) with 0.133 ms to first row, 0.135 ms to end, start offset by 18 ms. Slice statistics:   (slice0)    Executor memory: 332K bytes.   (slice1)    Executor memory: 583K bytes avg x 64 workers, 4353K bytes max (seg42).  Work_mem: 74K bytes max. Statement statistics:   Memory used: 128000K bytes Total runtime: 116.997 ms(27 rows)bigdatagp=#  explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id where id=6 order by a.id;ERROR:  column reference "id" is ambiguousLINE 1: ...* from tgt1 a left join tgt2 b on a.id=b.id where id=6 order...                                                             ^bigdatagp=#  explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id where a.id=6 order by a.id;                                             QUERY PLAN                                              ----------------------------------------------------------------------------------------------------- Gather Motion 1:1  (slice1; segments: 1)  (cost=7.17..7.18 rows=4 width=14)   Merge Key: "?column5?"   Rows out:  (No row requested) 0 rows at destination with 3.212 ms to end, start offset by 339 ms.   ->  Sort  (cost=7.17..7.18 rows=1 width=14)         Sort Key: a.id         Rows out:  (No row requested) 0 rows with 0 ms to end.         Executor memory:  58K bytes.         Work_mem used:  58K bytes. Workfile: (0 spilling, 0 reused)         ->  Hash Left Join  (cost=2.04..7.14 rows=1 width=14)               Hash Cond: a.id = b.id               Rows out:  (No row requested) 0 rows with 0 ms to end.               ->  Seq Scan on tgt1 a  (cost=0.00..5.06 rows=1 width=7)                     Filter: id = 6                     Rows out:  (No row requested) 0 rows with 0 ms to end.               ->  Hash  (cost=2.02..2.02 rows=1 width=7)                     Rows in:  (No row requested) 0 rows with 0 ms to end.                     ->  Seq Scan on tgt2 b  (cost=0.00..2.02 rows=1 width=7)                           Filter: id = 6                           Rows out:  (No row requested) 0 rows with 0 ms to end. Slice statistics:   (slice0)    Executor memory: 252K bytes.   (slice1)    Executor memory: 251K bytes (seg3).  Work_mem: 58K bytes max. Statement statistics:   Memory used: 128000K bytes Total runtime: 342.067 ms(25 rows)bigdatagp=#  explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id and a.id=6 order by a.id;                                                                             QUERY PLAN                                                                       -------------------------------------------------------------------------------------------------------------------------------------------------------- Gather Motion 64:1  (slice1; segments: 64)  (cost=7.23..7.24 rows=1 width=14)   Merge Key: "?column5?"   Rows out:  5 rows at destination with 435 ms to end, start offset by 1.130 ms.   ->  Sort  (cost=7.23..7.24 rows=1 width=14)         Sort Key: a.id         Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 5.156 ms to first row, 5.158 ms to end, start offset by 7.597 ms.         Executor memory:  58K bytes avg, 58K bytes max (seg0).         Work_mem used:  58K bytes avg, 58K bytes max (seg0). Workfile: (0 spilling, 0 reused)         ->  Hash Left Join  (cost=2.04..7.17 rows=1 width=14)               Hash Cond: a.id = b.id               Join Filter: a.id = 6               Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 4.155 ms to first row, 4.813 ms to end, start offset by 7.930 ms.               Executor memory:  1K bytes avg, 1K bytes max (seg42).               Work_mem used:  1K bytes avg, 1K bytes max (seg42). Workfile: (0 spilling, 0 reused)               (seg42)  Hash chain length 1.0 avg, 1 max, using 1 of 262151 buckets.               ->  Seq Scan on tgt1 a  (cost=0.00..5.05 rows=1 width=7)                     Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 0.126 ms to first row, 0.127 ms to end, start offset by 7.941 ms.               ->  Hash  (cost=2.02..2.02 rows=1 width=7)                     Rows in:  Avg 1.0 rows x 2 workers.  Max 1 rows (seg42) with 0.103 ms to end, start offset by 12 ms.                     ->  Seq Scan on tgt2 b  (cost=0.00..2.02 rows=1 width=7)                           Rows out:  Avg 1.0 rows x 2 workers.  Max 1 rows (seg42) with 0.074 ms to first row, 0.076 ms to end, start offset by 12 ms. Slice statistics:   (slice0)    Executor memory: 332K bytes.   (slice1)    Executor memory: 569K bytes avg x 64 workers, 4337K bytes max (seg42).  Work_mem: 58K bytes max. Statement statistics:   Memory used: 128000K bytes Total runtime: 436.384 ms(27 rows)

Therefore, to filter Table a, write the conditionsWhereTo filter table B, you need to write the instruction in table B.SubqueryAs[ON]It is only used to control the display.

-EOF-

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.