Sqoop realization of data transfer between relational database and Hadoop-import

Source: Internet
Author: User
Tags dname sqoop

Due to the increasing volume of business data and the large amount of computing, the traditional number of silos has been unable to meet the computational requirements, so it is basically to put the data on the Hadoop platform to implement the logical computing, then it involves how to migrate Oracle Data Warehouse to the Hadoop platform.

Here we have to mention a very useful tool--sqoop, which is an open source tool used primarily to implement data transfer between relational databases and HDFs in Hadoop, with the most used is import,export.

The installation configuration of the Sqoop is also very simple, it is not explained here, this article mainly for how to use the Sqoop implementation of Oracle to Hive (HDFS) data transfer test.

For the comparison of the full parameter use, you can go to Sqoop Official document Http://sqoop.apache.org/docs/View, here are some of the parameters will be used to explain:

-M N: Open N map to import data

--query: Import data from the query results, note that if you use this parameter, you must specify the--target-dir parameter and the query criteria to include $conditions

--target-dir: Specifying the storage directory for data in HDFs

--hive-table: The name of the target table to import into hive

--fetch-size: Number of records read from the database at one time

--hive-drop-import-delims: When importing data into hive, remove special characters such as \n,\r,\001

--null-string <null-string>: For a String Type field, if the value is null, use <null-string> overrides

--non-null-string <non-null-string>: For fields that are not string type, if the value is null, use <null-non-string> overrides

(usually we are using the

--null-string ' \\n ' \

--null-non-string ' \\n ' \

Null by default in hive is denoted by \ n, and if you want to replace \ n, add another \ to escape.

Partition fields for--hive-partition-key:hive tables

--hive-partition-value: Specifies the partition value corresponding to the partition imported into the Hive table

--hive-overwrite: Overwrite overrides (note here, if not used to--hive-partition-key,hive-partition-value, Then the use of--hive-overwrite will overwrite the data of the entire table, and vice versa, just overwrite the data of the corresponding partition .

--verbose: Print out detailed information

================================================================================================

It is important to note that if the data you are importing contains special characters such as \n,\r,\001, then use--hive-drop-import-delims to remove the special characters, otherwise, if there is a newline in the string, then the data after the newline character will be recognized as another line, Cause the result to be incorrect.

Another area to be aware of is that if some field values are null for the imported data, add the--null-string,--null-non-string parameter, otherwise the null value will be replaced with the string "null" incorrectly.

In order to better illustrate the importance of these parameters, let's do some experiments below:

Platform Description:

Oracle Oracle Database 11g Enterprise Edition Release 11.2.0.1.0-64bit Production

Hadoop hadoop-2.7.2

Hive hive-2.1.0

Sqoop sqoop-1.4.6

Data under the Scott user in Oracle:

Note: Empno is a 7000,8000,9000,9001 record, there are some jobs the value of this VARCHAR2 type is null, some MGR the value of the number type is null, and Empno is 9002 of the record, the job is \nclerk \ n (There are line breaks before and after).

(1) Do not use the relevant parameters of Sqoop for processing:

1Sqoop Import--query "Select A.empno,2 A.ename,3 A.job,4 A.mgr,5 A.hiredate,6 A.sal,7 B.deptno,8 B.dname9                          fromEmp_t1 aTen                          Left JoinDept_t1 b One                          onA.deptno=B.deptno A                         where /$CONDITIONS " -             --ConnectJdbc:oracle:thin:@192.168.134. $:1521/Orclwin -             --username Scott \ the             --password Tiger \ -             -M1  -             --hive-table test_db.emp_t1 \ -             --hive-overwrite \19--target-dir/sqoop/emp_t1 \20--Hive-import

To view data in hive:

It can be seen that the original 19 data was imported into hive and became 21, with two because Empno is a piece of data that is caused by 9002 not correctly handling the newline character.

The job field value of the two data for 9000 and 9001 is not NULL, but rather a ' null ' string:

(2) Use the--hive-drop-import-delims parameter to process the special symbols of the imported data,--null-string,--null-non-string processing when the Import data field value is empty

1Sqoop Import--query "Select A.empno,2 A.ename,3 A.job,4 A.mgr,5 A.hiredate,6 A.sal,7 B.deptno,8 B.dname9                          fromEmp_t1 aTen                          Left JoinDept_t1 b One                          onA.deptno=B.deptno A                         where /$CONDITIONS " -             --connect \ jdbc:oracle:thin:@192.168.134.200:1521/orclwin \ -             --username Scott \ the             --password Tiger \ -             -M1  -             --hive-drop-import-delims \ -             --null-string ' \\n ' \ +             --null-non-string ' \\n ' \ -             --hive-table test_db.emp_t1 \ +             --hive-overwrite \ A             --target-dir/sqoop/emp_t1 \ at             --Hive-import

To view data in hive:

Data import is normal.

OK, this import function is said, the next article export~

Sqoop implement data transfer between relational database and Hadoop-import article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.