GPDB parallel loading test and gpdb parallel loading
Test File Information
GPFDIST solution 1 Single Server
10G Dec 12 14:10 A111G Dec 12 14:32 A210G Dec 12 14:10 B111G Dec 12 14:35 B2
Solution 2: two servers
drop table if exists host_1;drop EXTERNAL TABLE if exists exttable_ext_1_host;drop table if exists host_1_err;create table host_1 (like sourcetable) distributed randomly;CREATE EXTERNAL TABLE exttable_ext_1_host (like sourcetable) LOCATION ('gpfdist://10.2.22.81:9999/A*') FORMAT 'text' (delimiter as ',' null as '' escape 'OFF') ENCODING 'UTF8' LOG ERRORS INTO host_1_err SEGMENT REJECT LIMIT 100 PERCENT;insert into host_1 select * from exttable_ext_1_host;
GPLOAD solution 3
drop table if exists host_2;drop EXTERNAL TABLE if exists exttable_ext_2_host;drop table if exists host_2_err;create table host_2 (like sourcetable) distributed randomly;CREATE EXTERNAL TABLE exttable_ext_2_host (like sourcetable) LOCATION ('gpfdist://10.2.22.81:9999/B1','gpfdist://10.2.22.82:9999/B2') FORMAT 'text' (delimiter as ',' null as '' escape 'OFF') ENCODING 'UTF8' LOG ERRORS INTO host_2_err SEGMENT REJECT LIMIT 100 PERCENT;insert into host_2 select * from exttable_ext_2_host;
---VERSION: 184.108.40.206DATABASE: gpdbUSER: gpadminHOST: 10.4.2.4PORT: 5432GPLOAD: INPUT: - SOURCE: LOCAL_HOSTNAME: - 10.2.22.81 PORT: 9999 FILE: - /data/ptest/A* - FORMAT: text - DELIMITER: ',' - ESCAPE: 'OFF' - NULL_AS: '' - ENCODING: UTF8 - ERROR_LIMIT: 10000 - ERROR_TABLE: host_1_err OUTPUT: - TABLE: host_1 - MODE: insert
To prevent cache interference testing, the results of Multiple tests are as follows: solution 2 is superior to solution 1.
. File Information File | file size | storage size | Number of inserted records | Number of abnormal records ------ | ------------------------------------------ solution 1 | 21 GB | 25 GB | 49826141 | solution 2 | 21 GB | 25 GB | 52108083 | 1867
. Loading duration
It is unclear why the difference between 1st reads (test sequence 1 solution 1/2) and later may be related to gpfs.
In solution 4, machine B fails to read files, and solution 3/4 does not feel very stable throughout the test (hang). In view of the fact that solution 3 does not have much advantage in comparison, in addition, by observing solution 4, we can find that the loading time of server A has reached 22.24 s. Compared with solution 2, it may not be advantageous. Therefore, the test is not completed.
|Test order||Solution 1||Solution 2||Solution 3||Solution 4|
|Test Order 1||440403.263 MS||204201.096 MS||36.41 seconds||22.24 seconds + B?|
|Test order 2||35854.612 MS||26303.240 MS||Solution 3||Solution 4|
|Test order 3||42007.990 MS||25593.730 MS||Solution 3||Solution 4|
|Test order 4||43795.502 MS||25706.479 MS||Solution 3||Solution 4|
|Test order 5||36576.681 MS||26405.977 MS||Solution 3||Solution 4|