OSS Loading data to PostgreSQL

Source: Internet
Author: User
Tags md5 postgresql

Oss_fdw

On the Ali cloud, supports loading data into PostgreSQL and PPAs by OSS_FDW parallel

OSS_FDW parameters

Like other FDW interfaces, OSS_FDW provides data encapsulation of OSS for external data sources, and users can use OSS_FDW to read stored files on OSS like a table.
Like other FDW, OSS_FDW provides a unique number of parameters for connecting and parsing file data on OSS.

and OSS-related parameters are

1. Ossendpoint parameters, is the intranet access to OSS address, also known as host

2. ID OSS Account ID

3. Key OSS Account Key

4. Bucket Ossbucket, need to create OSS account after allocation

5. filepath OSS file name with path
5.1 filename contains file path, but does not contain bucket
5.2 This parameter matches multiple files on the OSS counterpart path and supports loading them into the database
5.3 Files named filepath and filepath.x support are imported into the database, X requires starting from 1 and is continuous
5.4 Cases of filepath filepath.1 filepath.2 filepath.3 filepath.5 The first 4 files will be matched and imported, but filepath.5 will not.

6. Dir directory of virtual files in Oss
6.1 Dir needs to end
All files in the virtual file directory developed by 6.2 dir (not including files under subfolders and subfolders) are matched and imported into the database.

Need to be aware

1. The first 4 parameters ossendpoint ID key bucket placed in the server object
2. FilePath and dir need to be specified in the OPTIONS parameter parameters of FDW
3. FilePath and Dir must specify one of two parameters and cannot specify both
4. The value of each parameter is caused by using ', cannot include the useless space


Other parameters

1. Format
Specifies the format of the file, which currently supports only CSV

2. Encoding
The encoded format of the data in the file that supports common PG encoding, such as UTF8

3. Parse_errors
Fault tolerant mode parsing, ignoring errors in file analysis according to behavior units

4. Delimiter
Make a column separator

5. Quote
Specify a reference character for a file

6. Escape
Specify the escape character of a file

7. Null
Specifies the column null to match the corresponding string
For example, null ' test ', that is, the string with the column value ' test ' is null

8. Force_not_null
A value that lists multiple columns is not NULL
Example Force_not_null ' id ', that is, if the ID column in the table is NULL, replace it with an empty string


Case

# Create Plug-ins
Create extension OSS_FDW;

# Create Server
CREATE SERVER ossserver FOREIGN DATA wrapper oss_fdw OPTIONS
(Host ' oss-cn-hangzhou-zmf.aliyuncs.com ', id ' xxx ', key ' xxx ', bucket ' mybucket ');

# Create an OSS external table
CREATE FOREIGN TABLE Ossexample
(date text, time text, open float,
High float, low float, volume int)
SERVER Ossserver
OPTIONS (filepath ' osstest/example.csv ', delimiter ', ', ',
Format ' CSV ', encoding ' UTF8 ', parse_errors ' 100 ');

# Create a table, and the data is loaded into this table
CREATE TABLE Example
(date text, time text, open float,
High float, low float, volume int);

# data is loaded in parallel from the ossexample into the example.
INSERT INTO Example SELECT * from Ossexample;

# can see
# OSS_FDW can correctly estimate the size of the files on OSS, and correctly plan the query plan.
Explain insert into the example select * from Ossexample;
                              QUERY plan                               
---------------------------------------------------------------------
 insert on example  (cost=0.00..1.60 rows=6 width=92)
  ->  Foreign Scan on Ossexamp le  (cost=0.00..1.60 rows=6 width=92)
         Foreign ossfile: osstest/example.csv.0
         Foreign ossfile size:728
(4 rows)

OSS_FDW Use attention
OSS_FDW opens the OSS to PostgreSQL and PPAs data channels, where users can put data into low-cost OSS and import them into PostgreSQL or PPAs.

1. OSS_FDW is an external table plug-in developed under the PostgreSQL FOREIGN table framework.
2. The performance of the data import is related to the resources of the PostgreSQL cluster (CPU IO MEM MET) and to OSS.
3. In order to ensure the performance of the data import Ossprotocol Ossendpoint needs to match the PostgreSQL Cloud where Region. For information, please refer to the link below.

ID and key Hide

The ID and key information in the CREATE server, if not processed, will enable the user to see the plaintext information in the SELECT * from Pg_foreign_server, which will expose the user's ID and key.
In order to hide IDs and keys, we implement symmetric encryption of IDs and keys (different instances use different secrets to maximize user information), but you cannot add a data type to the same GP, because it will not be compatible with old instances.

The information after the final encryption is as follows:

postgres=# select * from Pg_foreign_server;
  srvname  | srvowner | srvfdw | srvtype | srvversion | Srvacl |       & nbsp;                                                                         srvoptions

-----------+----------+--------+---------+------------+--------+----------------------------------------------- -------------------------------------------------------------------------------------
----------------------------------
Ossserver |  10 |         16390 |            |        | | {host=oss-cn-hangzhou-zmf.aliyuncs.com,id=md5xxxxxxxx,key=md5xxxxxxxx,bucket=067862}

The encrypted information will begin with MD5 (the total length is len%8==3), so the import will not be encrypted again, but the user cannot create the key and ID at the beginning of the MD5

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.