Environmental information:
greenplum:greenplum Version: ' Postgres (greenplum Database) 4.3.8.2 build 1 '
Oracle:Release 11.2.0.1.0 Production
1. Oracle Client Deployment
Before using the SQLULDR2 tool, you first need to deploy the Oracle client on the Master node of Greenplum ( hereinafter referred to as GP) because SQLULDR2 need to use shared libraries such as libclntsh.so .
If you want to save the hassle, also directly from the Oracle server to copy the installation directory, and then configure the following environment variables can be.
To prevent the configuration of Oracle environment variables from affecting the use of the GP database Client Tools, we create a single user for the Oracle client and then configure the Oracle environment variables separately. Not to participating with GP .
First create the Oracle user groups and users:
Groupadd-g-DBA
Groupadd-g 10001 Oinstall
Useradd-u 1002-d/var/lib/oracle-s/bin/bash-g oinstall-g dba-m Oracle
passwd Oracle
For example, we will directly copy the Oracle server's software Installation Catalog product and put it under /var/lib/oracle :
[Email protected] oracle]# pwd
/var/lib/oracle
[Email protected] oracle]# chown-r oracle:oinstall/var/lib/oracle
[email protected] oracle]# LL
DRWXRWXRWX 3 Oracle Oinstall 4096 Jul 22:14 product
-RWXRWXRWX 2 Oracle Oinstall 185766 Mar sqluldr2_linux64_10204.bin
Finally, we configure the relevant environment variables for Oracle to be added to the . BASHRC file in the Oracle 's home directory:
Export Oracle_base=/var/lib/oracle
Export Oracle_home=/var/lib/oracle/product/11.2.0/dbhome_1
Export Ld_library_path= $ORACLE _home/lib:/usr/lib
Export path= $ORACLE _home/bin:/sbin:/usr/sbin:/usr/ccs/bin:/usr/bin:/sbin: $PATH:/bin
Export tns_names= $ORACLE _home/network/admin
Export tns_admin= $ORACLE _home/network/admin
# CLASSPATH must include the following JRE locations:
Classpath=${oracle_home}/rdbms/jlib:${classpath}
Export classpath= $CLASSPATH:/opt/oracle/product/11/network/jlib
# also configure the GP 's environment information, and then use gpload to access the GP database.
source/usr/local/greenplum-db/greenplum_path.sh
Export Master_data_directory=/hadoopdata1/master/gpseg-1
Export pgport=5432
Export Pguser=gpadmin
Export Pgdatabase=template1
2. SQLULDR2 Installation Deployment
Download the sqluldr2_linux64_10204.bin file, if your environment is Windows or linux32 , download the corresponding file.
We can rename the file and put it in the following path or create a soft link ( which we have configured in path ):
[Email protected] ~]$ pwd
/var/lib/oracle
[email protected] ~]$ ll product/11.2.0/dbhome_1/bin/sqluldr2
-RWXRWXRWX 2 Oracle Oinstall 185766 Mar PRODUCT/11.2.0/DBHOME_1/BIN/SQLULDR2
Let's check it out:
[Email protected] ~]$ SQLULDR2
Sql*unloader:fast Oracle Text unloader (GZIP, Parallel), Release 4.0.1
(@) Copyright Lou fangxin (anysql.net) 2004-2010, All rights reserved.
License:free for non-commercial useage, else USD per server.
USAGE:SQLULDR2 Keyword=value [, Keyword=value,...]
Valid Keywords:
user = Username/[email protected]
sql = SQL file name
query = SELECT statement
field = Separator string between fields
Record = separator string between records
rows = Print progress for every given rows (default, 1000000)
File = Output file name (default:uldrdata.txt)
Log = log file name, prefix with + to append mode
Fast = Auto Tuning the session level parameters (YES)
Text = output Type (MYSQL, CSV, Mysqlins, Oracleins, FORM, SEARCH).
CharSet = Character Set name of the target database.
Ncharset= national Character set name of the target database.
parfile = Read command option from parameter file
For field and record, you can use the ' 0x ' to specify hex character code,
\r=0x0d \n=0x0a |=0x7c, =0x2c, \t=0x09,: =0x3a, #=0x23, "=0x22" =0x27
You can see a very detailed help note.
3. gpload Brief Introduction
Gpload is a tool provided in GP for data loading, which can be seen as a encapsulation of gpfdist , so it relies on gpfdist components.
gpload Specifies the details of the load through its control file. So writing control files is the key to good gpload . gpload 's control file is written in the YAML1.1 document format, so it must be in a valid YAML format.
This piece of content I have introduced in the blog http://blog.csdn.net/jiangshouzhuang/article/details/51817520 , we can look at. This chapter focuses on migrating Oracle data directly into the GP database.
4. The principle of direct parallel loading of Oracle data into GP
Using gpload, we can realize the merge operation that can't be implemented directly in GP , and we can implement parallel fast loading without landing files by combining named pipes. This helps us to increase the load efficiency of massive data, and avoids the excessive storage overhead that is loaded with traditional landing files, as well as the loading performance bottleneck caused by the massive file landing process.
The method used in this article is through the Linux Named pipes,Gpload, and sqluldr2 programs ( fast offload for Oracle data ) enables fast loading from Oracle to GP without landing files.
5. Example
Scenario: Load Oracle 's iphone_user_detail table into the iphone_user_detail table in the GP database.
First, we create the gpload control file, thegpload.ctl content is as follows:
---
version:1.0.0.1
database:zhangyun_db
User:zhangyun
Host:cdha
port:5432
Gpload:
INPUT:
-SOURCE:
Local_hostname:
-Cdha
port:46666
FILE:
-/var/lib/oracle/iphone_user_detail/iphone_user_detail_file
-Format:text
-Delimiter:e ' \001 '
-Null_as: "
-Encoding:utf8
-Error_limit:25
-Error_table:err_iphone_user_detail
OUTPUT:
-TABLE:ann_gbs_safe.iphone_user_detail
-Mode:insert
Then write the script export_iphone_user_detail.sh, which is used to perform data migration work:
#!/bin/bash
Mknod/var/lib/oracle/iphone_user_detail/iphone_user_detail_file P
SQLULDR2 user=zy/[email protected]:1521/salesdb query= "Select/*+ Parallel (8) */Deptno,fmi,location,iphone_number From Pps.iphone_user_detail "field=0x01 Charset=utf8 safe=yes File=/var/lib/oracle/iphone_user_detail/iphone_user_ Detail_file &
Gpload-f gpload.ctl-v-L Gpload.log
Rm-rf/var/lib/oracle/iphone_user_detail/iphone_user_detail_file
6. Execute the export_iphone_user_detail.sh script and view the log
Before executing the script, make sure that the iphone_user_detail table has been created in GP .
[Email protected] ~]$ bash export_iphone_user_detail.sh
2016-07-13 14:47:00|info|gpload Session started 2016-07-13 14:47:00
2016-07-13 14:47:00|info|started gpfdist-p 21211-p 21212-f "/var/lib/oracle/iphone_user_detail/iphone_user_detail_ File "-T 30
0 rows exported at 2016-07-13 14:47:03, size 0 MB.
1000000 rows exported at 2016-07-13 14:47:10, size 719 MB.
2000000 rows exported at 2016-07-13 14:47:18, size 1451 MB.
3000000 rows exported at 2016-07-13 14:47:25, size 2154 MB
.............................
When the execution is complete, you can go to the GP database to view it.
zhangyun_db=# Select COUNT (*) from Iphone_user_detail;
Count
--------
194000000
(1 row)
Use SQLULDR2 to quickly import Oracle data into Greenplum database without landing