Greenplum 5.7 + CREATE TABLE + INSERT INTO

Source: Internet
Author: User
Tags file size md5 modifiers psql

Os:centos 7.4
gp:gpdb-5.7.0

Three machines
Node1 as Master host
Node2, Node3 for segment host

Psql Login Node1 Master

$ psql-d peiybdb
peiybdb=# Select Current_database ();
 Current_database 
------------------
 peiybdb
(1 row)

CREATE TABLE tmp_t0 (
C1 varchar),
C2 varchar (+),
c3 varchar (+)
);
NOTICE:  Table doesn ' t has ' distributed by ' clause--Using column named ' C1 ' as the Greenplum Database data Distribu tion key for this table.
HINT: The  ' distributed by ' clause determines the distribution of data. Make sure column (s) chosen is the optimal data distribution key to minimize skew.
CREATE table

peiybdb=# \d+ tmp_t0
                        table "public.tmp_t0"
 Column |          Type          | Modifiers | Storage  | Description 
--------+------------------------+-----------+----------+-------------
 C1     | Character varying () |           | Extended | 
 C2     | character Varying |           | extended | 
 C3     | character varying |           | extended | 
Have Oids:no
distributed by: (C1)

View file directory sizes for Node1, Node2, Node3

[Gpadmin@node1 gpseg-1]$ pwd
/u01/greenplum-data/gpseg-1
[gpadmin@node1 gpseg-1]$ du-sh
109M    .

[Gpadmin@node2 gpseg0]$ pwd
/u01/greenplum-data/gpseg0
[gpadmin@node2 gpseg0]$ du-sh
109M    .

[Gpadmin@node3 gpseg1]$ pwd
/u01/greenplum-data/gpseg1
[gpadmin@node3 gpseg1]$ du-sh
109M    .

Inserting data

peiybdb=# select Gp_opt_version ();
                Gp_opt_version                 
-----------------------------------------------
 gpopt version:2.55.20, Xerces version: 3.1.2
(1 row)


set optimizer=on;
Set Optimizer_enumerate_plans=on;
Set optimizer_minidump=always;
Set Optimizer_enable_constant_expression_evaluation=off;
Set client_min_messages= ' Debug5 ';


INSERT INTO Tmp_t0
(c1,c2,c3)
Select Generate_series (1, 1000000) as C1,
       MD5 (Random ():: Text) as C2,
       MD5 (Random ():: Text) as C3
;

Node1 Master on Query pg_stat_activity

peiybdb=# \x Expanded display is on.
peiybdb=# select * from Pg_stat_activity; -[RECORD 1]----+-------------------------------------------Datid | 16384 Datname | PEIYBDB Procpid | 10904 sess_id | Usesysid | Ten Usename | Gpadmin Current_query | Insert INTO Tmp_t0 | (C1,C2,C3) |        Select Generate_series (1, 1000000) as C1, |        MD5 (Random ():: Text) as c2, | 
MD5 (Random ():: Text) as C3 | Waiting | F Query_start | 2018-05-02 07:20:33.325637+00 Backend_start | 2018-05-02 07:13:17.573451+00 client_addr | 192.168.56.1 Client_port | 
63217 Application_name | Xact_start | 
2018-05-02 07:20:33.3212+00 Waiting_reason | Rsgid | 
0 Rsgname | 
rsgqueueduration | -[RECORD 2]----+-------------------------------------------Datid | 16384 Datname |        Peiybdb Procpid  | 10906 sess_id | Usesysid | Ten Usename | Gpadmin Current_query | <IDLE> Waiting | F Query_start | 2018-05-02 07:17:17.675179+00 Backend_start | 2018-05-02 07:13:17.580506+00 client_addr | 192.168.56.1 Client_port | 
63218 Application_name | 
Xact_start | 
Waiting_reason | Rsgid | 
0 Rsgname | 
rsgqueueduration | -[RECORD 3]----+-------------------------------------------Datid | 16384 Datname | PEIYBDB Procpid | 11095 sess_id | Usesysid | Ten Usename | Gpadmin Current_query |
SELECT * from Pg_stat_activity; Waiting | F Query_start | 2018-05-02 07:28:07.30111+00 Backend_start | 
2018-05-02 07:27:54.501407+00 client_addr | Client_port | -1 Application_name | Psql Xact_start | 
2018-05-02 07:28:07.30111+00 Waiting_reason | Rsgid | 
0 Rsgname |  rsgqueueduration |

Check the file directory size of Node1, Node2, Node3 again

[Gpadmin@node1 gpseg-1]$ du-sh
109M    .

[Gpadmin@node2 gpseg0]$ du-sh
224M    .

[Gpadmin@node3 gpseg1]$ du-sh
224M    .

It can be observed that the data file size of the master node Node1 has not changed, and segment's Node2, node3 data files have grown a lot.
The main reason is that the master node is used to store the definition, and segment is used to store the data.

View the definition of tmp_t0

peiybdb=# 
peiybdb=# \d
              List of relations
 Schema |  Name  | Type  |  Owner  | Storage 
--------+--------+-------+---------+---------public
 | tmp_t0 | table | gpadmin | heap
(1 row)

peiybdb=# \d+ tmp_t0;
                        Table "public.tmp_t0"
 Column |          Type          | Modifiers | Storage  | Description 
--------+------------------------+-----------+----------+-------------
 C1     | Character varying () |           | Extended | 
 C2     | character Varying |           | extended | 
 C3     | character varying |           | extended | 
Have Oids:no
distributed by: (C1)

Distributed by this is the TMP_T0 table distribution column, the table distribution column must be reasonable, can reduce the data more evenly distributed to each segment node.
Data can be processed concurrently on multiple nodes when retrieving data.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.