Simple and practical gpload

Source: Internet
Author: User

Preparatory work:

1. Because the Gpload is a gpfdist package, the Gpfdist service must be turned on before using gpload, otherwise it cannot be used.

8181-l/tmp/gpfdist.log &

  command to open the service here is an unknown solution, you can refer to the Gpfdist document I explained earlier:

2. The system requires a Python tool, because I use the Red Hat System (Redhat), built-in this tool, so do not need to install.

3. First prepare the data file (here is the subsidy data file, directly print out the file name and the book in the document):

4.Create a table for loading data in the GP library:

CreateTableMember_delta (mever_idvarchar54), Phonenovarchar20), actionChar1), Dw_ins_date date)With(appendonly=true, Compresslevel=5)distributed by (Mever_ ID) partition by list (action) (Partition U values ( ' u "), Partition I values ( " i ), partition D values ( ' d "), default partition Other_action);     

Table structure is probably explained below: This table is attached to the distribution key, data compression, data partitioning and other functions.

To start the Gpload code:

Create the executed script (the script here is the Python language mentioned above), the file name is member.yml:

version: PORT:8181FILE:-/home/admin/Member_delta.txt-COLUMNS:-Mever_id:varchar (54)               -Phoneno:varchar (20)               -Action:char (1)               -dw_ins_date:date-Format:text-DELIMITER:' '-QUOTE:'"'-header:true-Error_limit:25-ERROR_TABLE:public.member_err OUTPUT:-TABLE:public.member_delta-Mode:insert

meaning of the parameters used by the script

VERSION:            --Specifies the version of the control file schemaDATABASE:db_name           --Specifies the name of the connection database, if not specified, by the environment variable $pgdatabase, or by the Gpload parameter-DUSER: Db_username--Specifies the user name to connect to the target database, and if you do not use a super administrator, the service parameter gp_external_grant_privileges must be set to ON. Host:master_hostname--Specify the master host name, or the-H option for gpload, or the environment variable $pghost to specifyPort:master_port--Specify the connection port number for master, which is 5432 by default, or specified by the-p option of the Gpload command or the environment variable $pgport. Gpload:--must be specified, indicating that the Mount settings section below it must define input: and output: two parts. INPUT:--must be specified, this section specifies the format and location of the loading data-SOURCE:--You must specify the location where the source file is defined, each input section can define multiple source parts, and the Windows path is specified more specifically, such as C \ To be written c:/Local_hostname:--Specify the host name and IP address of the Gpload run, and if you have multiple NICs, you can use them at the same time to increase the loading speed. Only the preferred host name and IP are used by default. -Hostname_or_ipport:http_port--specify the port that the gpfdist uses, or select the port range, which is selected by the system, and the port setting has a high priority if specified at the same time. |Port_range:[Start_port_range, End_port_range]FILE:--Specifies the location, directory, or named pipe where the data file is mounted. If the file is compressed using Gpzip or bzip2, it can be decompressed automatically. You can specify multiple files using wildcard * and C-style relationship matching patterns. - /Path/ to/Input_file-COLUMNS:--specifies the data format of the data source, and if this section is not specified, the column order, quantity, and data type of the source table must match the target table. -Field_name:data_type-FORMAT:text |Csv--Specify whether the file format is text or CSV-DELIMITER:'Delimiter_character'  --Specifies the delimiter between Text data fields (columns), which by default is |- ESCAPE:'Escape_character' | 'OFF'  --text definition escape character, text format default is \, in the text format can choose off to switch off the escape character (Web log processing is more useful)-Null_as:'null_string'       --Specifies a string that describes the null value, which by default is a null value in the \N,CSV format that does not use escape symbols. -Force_not_null:true|False--CSV format, forcing all characters to be enclosed by default, so there can be no null value, if there is no value between the two delimiters, it is treated as a 0-length string, and the value is considered lost. -QUOTE:'Csv_quote_character'  --CSV Specifies the escape character, which by default is "-Header:true|False--whether to skip the first row of the data file, as a table header-Encoding:database_encoding--Specify the character set of the data source-Error_limit:integer          --specifies that because the upper limit of the format data record is not met, if the upper limit is exceeded, gpload stops loading, otherwise the correct record can be loaded, and the error record throws the write error table. But it only supports data format errors and does not support violations of constraints-Error_table:Schema. table_name--Specifies the error table that does not conform to the format requirement record. If the specified table does not exist, the system is automatically created. OUTPUT:- TABLE:Schema. table_name--Specify the target table for the Mount-MODE:Insert | Update |Merge--Specifies the operation mode, which is insert by default. The merge operation does not support tables that use a random distribution policy. -Match_columns:--Specify a match condition for the update operation and the merge operation.      -Target_column_name-Update_columns:--specifying updated columns for update operations and merge operations     -Target_column_name-Update_condition:'boolean_condition'  --Specifies the where condition in which only records that meet the criteria in the target table can be changed (in the case of merge, only records that meet the criteria can be insert)-MAPPING:--Specifies the mapping relationship of the source and destination columns. Target_column_name:source_column_name| 'expression'preload:--specify the action before load- TRUNCATE: True|False--if set to True, delete all records in the target table before loading-Reuse_tables:true|False--When set to true, the outer table object is not removed from this intermediate Table object. To improve performance. SQL:-Before: "Sql_command"--SQL to execute before the mount operation, such as writing a log table-After: "Sql_command"--SQL executed after the mount operation. 

Execute script:

When the script is complete, place it in the specified server location, such as the/home/command/directory

Enter the server to find the directory where the script is stored: cd/home/command/executes the script

The execution command is: Gpload-f member.yml when the script finishes executing, the print table will get the following information:


Query Member_delta to see if the data was added successfully:

SELECT * from Member_delta;


Writing This example today, though, is a simple example, but probably lets me know what the process of gpload execution is. Write this script also toss a lot of time, the key no one to teach you, from beginning to end are their own in groping, do not understand Baidu, do not understand the book, a word in order to write this program unscrupulous to carry out various attempts, of which the grasping is helpless ... But compared with the result is now so insignificant, write drifting, into the topic, today encountered a problem of egg pain is not serious enough, the script of the individual key words are wrong, indeed more catching urgent, there is just beginning not to find the formal documents, pinch to an example, do not know the meaning of change for half a day, it is really a waste of time For this handling personal feel try to find official documents, see the meaning of each parameter, those can be written, those can be omitted. Make sure the basic syntax is OK when you start writing, and don't write too many useless arguments.

The above must have insufficient place, also please Daniel advice, also can add me qq together to discuss, qq:2233699673 add QQ please explain Greenplum exchange.

Simple and practical gpload

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.