In the previous article, I wrote how to use gp_segment_id to determine whether the Dk value of the table has data. The following describes how to check the allocated unbalanced table by checking the amount of space occupied and re-adjust the Dk value. One of my greenplum machines, one of which has more data volumes than other nodes, shows the imbalance of data distribution, as shown in the following example: sdw16:
$ Du-SH/gpdata {1, 2}/data/GP *
347g/gpdata1/data/gp30
412G/gpdata1/data/gp62
357g/gpdata1/data/gp63
336g/gpdata2/data/gp31
338g/gpdata2/data/gp94
343g/gpdata2/data/gp95
$ CD/gpdata1/data/gp62/base/516096
$ Ll | awk '{if ($5 >=209715200) Print $9}' | awk-f. '{file [$1] ++; count ++;} end {for (a in file) print file [a] "" file [a]/count * 100 "%" A} '| column-C3-s ""-T | sort-Nr | head-N10
Find out the number of times that a single file exceeds MB and count the number of times that a file exceeds 1 GB (because PG sets the size of a single file to not exceed 1 GB, files larger than 1 GB will be divided into xxx.1, xxx.2 format ). The last column in the script is the file name, that is, relfilenode in pg_class. Gp_distribution_policy stores the Rule Distribution of Dk values, while pg_attribute stores the information of specific columns in the table. The following SQL statement is used to find the table corresponding to the file and the definition of DK.
Select A. OID, A. relname, B. nspname, C. attrnums, D. attname From Pg_class Inner Join Pg_namespace B On A. relnamespace= B. OID Inner Join Gp_distribution_policy C On A. OID = C. localoid Inner Join Pg_attribute d On A. OID = D. attrelid Where A. relfilenode = [ Filename ] And D. attnum = Any (C. attrnums)
Next, check the approximate data of the table and re-set the Dk value. Then, check the gp_segment_id distribution. Finally, vacuum full + analyze is used to adjust the table to the optimal state.