Os:centos 7.4
gp:gpdb-5.7.0
Three machines
Node1 as Master host
Node2, Node3 for segment host
Psql Login Node1 Master
$ psql-d peiybdb
peiybdb=# Select Current_database ();
Current_database
------------------
peiybdb
(1 row)
CREATE TABLE tmp_t0 (
C1 varchar),
C2 varchar (+),
c3 varchar (+)
);
NOTICE: Table doesn ' t has ' distributed by ' clause--Using column named ' C1 ' as the Greenplum Database data Distribu tion key for this table.
HINT: The ' distributed by ' clause determines the distribution of data. Make sure column (s) chosen is the optimal data distribution key to minimize skew.
CREATE table
peiybdb=# \d+ tmp_t0
table "public.tmp_t0"
Column | Type | Modifiers | Storage | Description
--------+------------------------+-----------+----------+-------------
C1 | Character varying () | | Extended |
C2 | character Varying | | extended |
C3 | character varying | | extended |
Have Oids:no
distributed by: (C1)
View file directory sizes for Node1, Node2, Node3
[Gpadmin@node1 gpseg-1]$ pwd
/u01/greenplum-data/gpseg-1
[gpadmin@node1 gpseg-1]$ du-sh
109M .
[Gpadmin@node2 gpseg0]$ pwd
/u01/greenplum-data/gpseg0
[gpadmin@node2 gpseg0]$ du-sh
109M .
[Gpadmin@node3 gpseg1]$ pwd
/u01/greenplum-data/gpseg1
[gpadmin@node3 gpseg1]$ du-sh
109M .
Inserting data
peiybdb=# select Gp_opt_version ();
Gp_opt_version
-----------------------------------------------
gpopt version:2.55.20, Xerces version: 3.1.2
(1 row)
set optimizer=on;
Set Optimizer_enumerate_plans=on;
Set optimizer_minidump=always;
Set Optimizer_enable_constant_expression_evaluation=off;
Set client_min_messages= ' Debug5 ';
INSERT INTO Tmp_t0
(c1,c2,c3)
Select Generate_series (1, 1000000) as C1,
MD5 (Random ():: Text) as C2,
MD5 (Random ():: Text) as C3
;
Node1 Master on Query pg_stat_activity
peiybdb=# \x Expanded display is on.
peiybdb=# select * from Pg_stat_activity; -[RECORD 1]----+-------------------------------------------Datid | 16384 Datname | PEIYBDB Procpid | 10904 sess_id | Usesysid | Ten Usename | Gpadmin Current_query | Insert INTO Tmp_t0 | (C1,C2,C3) | Select Generate_series (1, 1000000) as C1, | MD5 (Random ():: Text) as c2, |
MD5 (Random ():: Text) as C3 | Waiting | F Query_start | 2018-05-02 07:20:33.325637+00 Backend_start | 2018-05-02 07:13:17.573451+00 client_addr | 192.168.56.1 Client_port |
63217 Application_name | Xact_start |
2018-05-02 07:20:33.3212+00 Waiting_reason | Rsgid |
0 Rsgname |
rsgqueueduration | -[RECORD 2]----+-------------------------------------------Datid | 16384 Datname | Peiybdb Procpid | 10906 sess_id | Usesysid | Ten Usename | Gpadmin Current_query | <IDLE> Waiting | F Query_start | 2018-05-02 07:17:17.675179+00 Backend_start | 2018-05-02 07:13:17.580506+00 client_addr | 192.168.56.1 Client_port |
63218 Application_name |
Xact_start |
Waiting_reason | Rsgid |
0 Rsgname |
rsgqueueduration | -[RECORD 3]----+-------------------------------------------Datid | 16384 Datname | PEIYBDB Procpid | 11095 sess_id | Usesysid | Ten Usename | Gpadmin Current_query |
SELECT * from Pg_stat_activity; Waiting | F Query_start | 2018-05-02 07:28:07.30111+00 Backend_start |
2018-05-02 07:27:54.501407+00 client_addr | Client_port | -1 Application_name | Psql Xact_start |
2018-05-02 07:28:07.30111+00 Waiting_reason | Rsgid |
0 Rsgname | rsgqueueduration |
Check the file directory size of Node1, Node2, Node3 again
[Gpadmin@node1 gpseg-1]$ du-sh
109M .
[Gpadmin@node2 gpseg0]$ du-sh
224M .
[Gpadmin@node3 gpseg1]$ du-sh
224M .
It can be observed that the data file size of the master node Node1 has not changed, and segment's Node2, node3 data files have grown a lot.
The main reason is that the master node is used to store the definition, and segment is used to store the data.
View the definition of tmp_t0
peiybdb=#
peiybdb=# \d
List of relations
Schema | Name | Type | Owner | Storage
--------+--------+-------+---------+---------public
| tmp_t0 | table | gpadmin | heap
(1 row)
peiybdb=# \d+ tmp_t0;
Table "public.tmp_t0"
Column | Type | Modifiers | Storage | Description
--------+------------------------+-----------+----------+-------------
C1 | Character varying () | | Extended |
C2 | character Varying | | extended |
C3 | character varying | | extended |
Have Oids:no
distributed by: (C1)
Distributed by this is the TMP_T0 table distribution column, the table distribution column must be reasonable, can reduce the data more evenly distributed to each segment node.
Data can be processed concurrently on multiple nodes when retrieving data.