GPDB Administrator notes (3) Load And Unload data

Source: Internet
Author: User
Tags aliyun
External table definition readable External table (cannot do dml operation) can be written to External table (only insert, not select, update, delete) load to create external table # CREATEEXTERNALWEBTABLEext_expenses (nametext, datedate, amountfloat4, categorytext, descriptiontext) LOCATION (intr

External tables define readable External tables (dml operations are not allowed) and can be written to external tables (insert only, select, update, delete is not allowed) load and create external table = # create external web table ext_expenses (name text, date, amount float4, category text, description text) LOCATION ('HTTP: // intr

External tables define readable External tables (dml operations are not allowed) and can be written to external tables (insert only, select, update, delete is not allowed)
LoadCreate external table = # create external web table ext_expenses (name text,
Date, amount float4, category text, description text)
LOCATION ('HTTP: // intranet.company.com/expenses/sales/file.csv ',
'Http: // intranet.company.com/expenses/exec/file.csv ',
'Http: // intranet.company.com/expenses/finance/file.csv ',
'Http: // intranet.company.com/expenses/ops/file.csv ',
'Http: // intranet.company.com/expenses/marketing/file.csv ',
'Http: // intranet.company.com/expenses/eng/file.csv ')
FORMAT 'csv' (HEADER );
Load External table data = # insert into expenses_travel
SELECT * from ext_expenses where category = 'travel ';
Or you want to quickly load all the data to a new database table:
= # Create table expenses as select * from ext_expenses;
Test:[Root @ mdw ~] # Wget http://mirrors.aliyun.com/repo/Centos-6.repo
-- 13:51:30 -- http://mirrors.aliyun.com/repo/Centos-6.repo
Parsing host mirrors.aliyun.com... 115.28.122.210, 112.124.140.210
Connecting mirrors.aliyun.com | 115.28.122.210 |: 80... already connected.
An HTTP request has been sent and is waiting for response... 200 OK
Length: 2086 (2.0 K) [application/octet-stream]
Saving to: Centos-6.repo"

100% [============================================== ========================================================== ========================================================== =======>] 2,086 --. -K/s in 0 s

13:51:30 (194 MB/s)-saved "Centos-6.repo" [2086/2086])


Libo = # create external web table ext_expenses (name text)
Libo-# location ('HTTP: // mirrors.aliyun.com/repo/Centos-6.repo ')
Libo-# FORMAT 'text' (DELIMITER '| 'null '');
Create external TABLElibo = # create table expenses as select * from ext_expenses;
NOTICE: Table doesn't have 'distributed BY 'clause -- Using column (s) named' colum 'as the Greenplum Database data distribution key for this table.
HINT: The 'stributed BY 'clause determines the distribution of data. Make sure column (s) chosen are the optimal data distribution key to minimize skew.
ERROR: cocould not translate host name "mirrors.aliyun.com", port "80" to address: Temporary failure in name resolution (cdbutil. c: 754) (seg0 slice1 sdw1: 40000 pid = 26261) (cdbdisp. c: 1489)
Libo = #
Libo = #
Libo = # SELECT * from ext_expenses;
ERROR: cocould not translate host name "mirrors.aliyun.com", port "80" to address: Temporary failure in name resolution (cdbutil. c: 754) (seg0 slice1 sdw1: 40000 pid = 26254) (cdbdisp. c: 1489)
Libo = # drop external web table ext_expenses;
DROP EXTERNAL TABLE
Libo = # create external web table ext_expenses (colum text)
Libo-# location ('HTTP: // 115.28.122.210/repo/Centos-6.repo ')
Libo-# FORMAT 'text' (DELIMITER '| 'null '');
CREATE EXTERNAL TABLE
Libo = # select * from ext_expenses;
ERROR: connection with gpfdist failed for http: // 115.28.122.210/repo/Centos-6.repo. Valid tive url: http: // 115.28.122.210/repo/Centos-6.repo. (seg0 slice1 sdw1: 40000 pid = 26296)
Libo = #



[Gpadmin @ mdw data_tst] $ gpfdist-d/home/gpadmin/data_tst-p 8081-l/home/gpadmin/log1 &
[1] 10321
[Gpadmin @ mdw data_tst] $ Serving HTTP on port 8081, directory/home/gpadmin/data_tst

[Root @ mdw ~] # Wget http: // 192.168.100.101: 8081/aaa
-- 14:14:01 -- http: // 192.168.100.101: 8081/aaa
Connecting 192.168.100.101: 8081... connected.
An HTTP request has been sent and is waiting for response... 200 OK
Length: [text/plain] Not Specified
Saving to: "aaa"

[<=>] 17 --.-K/s in 0 s

14:14:01 (1.61 MB/s)-"aaa" saved [17]

Libo = # create external web table ext_expenses (colum text)
Libo-# location ('HTTP: // 192.168.100.101: 8081/aaa ')
Libo-# FORMAT 'text' (DELIMITER '| 'null '');
CREATE EXTERNAL TABLE
Libo = # select * from ext_expenses;
Colum
-------
Aaaa
Aaa

Aa
A


(7 rows)

Create table t as select * from t_ext distributed by (id );
Libo = # create table t as select * from t_ext distributed randomly;
SELECT 10

Libo = # create external table t_ext (id int, name text)
Libo-# location ('gpfdist: // 192.168.100.11: 8081/aaa.csv ')
Libo-# format 'csv ';
CREATE EXTERNAL TABLE
Libo = # select * from t_ext;
ERROR: missing data for column "name" (seg3 slice1 sdw2: 40001 pid = 10243)
DETAIL: External table t_ext, line 4000 of gpfdist: // 192.168.100.11: 8081/aaa.csv :""
Cause: Empty rows exist in csv
Conclusion: The External table only supports the http gpfdist service of gpfdist, which is a simple web Service of GP.

Load error handling: Use the create external table command to define readable EXTERNAL tables
Use the segment reject limit clause in combination.
The distinct deny limit parameter can be used to specify the number of records (default) or use PERCENT to specify the number of records
Percentage.
Future stores error records for future checks. Use the log errors into clause to specify error records.
Log table.

Use gpload to load
Uninstall data
Disable the use of executelibo = # show gp_external_enable_exec in the web table definition
Libo -#;
Gp_external_enable_exec
-------------------------
On
(1 row)
Data format
When using a variety of GP commands to load or unload data, you need to specify how the data is formatted
Line-separated GPDB is expected to contain LF characters (Line Feed/0x0A), CR (Carriage Return/0x0D)
Or CR plus LF (CR + LF/press enter to wrap/0x0A 0x0D) as the segmentation of a row. LF is a standard UNIX or
Standard line feed identifier for UNIX-like operating systems. Other operating systems (such as Windows and Mac OS 9) can
Can be CR or CR + LF. All these line feed identifiers are supported as line delimiters in GPDB.
Column Separation the default column separator for TEXT files is the TAB character (0x09), which is missing for CSV files.
The column separator is comma (0x2C ). However, when using COPY and create external table
You can use the DELIMITER clause to execute other orders when defining the data format using gpload.
Character separator.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.