Sqoop realization of data transfer between relational database and Hadoop-import

Last Update:2017-12-17 Source: Internet

Author: User

Tags dname sqoop

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Due to the increasing volume of business data and the large amount of computing, the traditional number of silos has been unable to meet the computational requirements, so it is basically to put the data on the Hadoop platform to implement the logical computing, then it involves how to migrate Oracle Data Warehouse to the Hadoop platform.

Here we have to mention a very useful tool--sqoop, which is an open source tool used primarily to implement data transfer between relational databases and HDFs in Hadoop, with the most used is import,export.

The installation configuration of the Sqoop is also very simple, it is not explained here, this article mainly for how to use the Sqoop implementation of Oracle to Hive (HDFS) data transfer test.

For the comparison of the full parameter use, you can go to Sqoop Official document Http://sqoop.apache.org/docs/View, here are some of the parameters will be used to explain:

-M N: Open N map to import data

--query: Import data from the query results, note that if you use this parameter, you must specify the--target-dir parameter and the query criteria to include $conditions

--target-dir: Specifying the storage directory for data in HDFs

--hive-table: The name of the target table to import into hive

--fetch-size: Number of records read from the database at one time

--hive-drop-import-delims: When importing data into hive, remove special characters such as \n,\r,\001

--null-string <null-string>: For a String Type field, if the value is null, use <null-string> overrides

--non-null-string <non-null-string>: For fields that are not string type, if the value is null, use <null-non-string> overrides

(usually we are using the

--null-string ' \\n ' \

--null-non-string ' \\n ' \

Null by default in hive is denoted by \ n, and if you want to replace \ n, add another \ to escape.

Partition fields for--hive-partition-key:hive tables

--hive-partition-value: Specifies the partition value corresponding to the partition imported into the Hive table

--hive-overwrite: Overwrite overrides (note here, if not used to--hive-partition-key,hive-partition-value, Then the use of--hive-overwrite will overwrite the data of the entire table, and vice versa, just overwrite the data of the corresponding partition .

--verbose: Print out detailed information

================================================================================================

It is important to note that if the data you are importing contains special characters such as \n,\r,\001, then use--hive-drop-import-delims to remove the special characters, otherwise, if there is a newline in the string, then the data after the newline character will be recognized as another line, Cause the result to be incorrect.

Another area to be aware of is that if some field values are null for the imported data, add the--null-string,--null-non-string parameter, otherwise the null value will be replaced with the string "null" incorrectly.

In order to better illustrate the importance of these parameters, let's do some experiments below:

Platform Description:

Oracle Oracle Database 11g Enterprise Edition Release 11.2.0.1.0-64bit Production

Hadoop hadoop-2.7.2

Hive hive-2.1.0

Sqoop sqoop-1.4.6

Data under the Scott user in Oracle:

Note: Empno is a 7000,8000,9000,9001 record, there are some jobs the value of this VARCHAR2 type is null, some MGR the value of the number type is null, and Empno is 9002 of the record, the job is \nclerk \ n (There are line breaks before and after).

(1) Do not use the relevant parameters of Sqoop for processing:

1Sqoop Import--query "Select A.empno,2 A.ename,3 A.job,4 A.mgr,5 A.hiredate,6 A.sal,7 B.deptno,8 B.dname9                          fromEmp_t1 aTen                          Left JoinDept_t1 b One                          onA.deptno=B.deptno A                         where /$CONDITIONS " -             --ConnectJdbc:oracle:thin:@192.168.134. $:1521/Orclwin -             --username Scott \ the             --password Tiger \ -             -M1  -             --hive-table test_db.emp_t1 \ -             --hive-overwrite \19--target-dir/sqoop/emp_t1 \20--Hive-import

To view data in hive:

It can be seen that the original 19 data was imported into hive and became 21, with two because Empno is a piece of data that is caused by 9002 not correctly handling the newline character.

The job field value of the two data for 9000 and 9001 is not NULL, but rather a ' null ' string:

(2) Use the--hive-drop-import-delims parameter to process the special symbols of the imported data,--null-string,--null-non-string processing when the Import data field value is empty

1Sqoop Import--query "Select A.empno,2 A.ename,3 A.job,4 A.mgr,5 A.hiredate,6 A.sal,7 B.deptno,8 B.dname9                          fromEmp_t1 aTen                          Left JoinDept_t1 b One                          onA.deptno=B.deptno A                         where /$CONDITIONS " -             --connect \ jdbc:oracle:thin:@192.168.134.200:1521/orclwin \ -             --username Scott \ the             --password Tiger \ -             -M1  -             --hive-drop-import-delims \ -             --null-string ' \\n ' \ +             --null-non-string ' \\n ' \ -             --hive-table test_db.emp_t1 \ +             --hive-overwrite \ A             --target-dir/sqoop/emp_t1 \ at             --Hive-import

To view data in hive:

Data import is normal.

OK, this import function is said, the next article export~

Sqoop implement data transfer between relational database and Hadoop-import article

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More