SQOOP Load Data from Oracle to Hive Table

來源:互聯網
上載者:User

標籤:des   style   blog   io   ar   color   os   sp   for   

sqoop import -D oraoop.disabled=true --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=HOSTNAME)(port=PORT))(connect_data=(service_name=SERVICE_NAME)))" --username USERNAME --table TABLE_NAME --null-string ‘\\N‘ --null-non-string ‘\\N‘ --hive-import --hive-table HIVEDB.HIVETALBENAME --num-mappers 1 --verbose --password PWD --hive-drop-import-delims --hive-overwrite--fetch-size 500

-D is not the parameter for sqoop, it is used for hadoop.

oraoop.disabled=true

If not set this parameter, the command report a issue: table or view does not exists.

Oraoop is a special plugin for sqoop that provides faster access to Oracle‘s RDBMS by using custom protocols that are not available publicly. Quest software partnered with Oracle to get those protocols, implemented them and created Oraoop.

In our test environment, without this parameter setting, it works fine. For another environment, encounter this issue, before this, I see one log message is : it can‘t be recognized a valid thin url. Maybe the driver issue .

   

Another thing need to take care is , you ‘d better write TABLE_NAME(VIEW) AND username in UPPER CASE. Or else you may encounter same issue: table or view not exists.

   

--hive-drop-import-delims

This parameter used to address the known issue, when your fields in the RDBMS table has new line (\r \n or special char such as \001) in the content.

It will break the hive rule. Hive use \001 as default field separator and \n as the row terminator in default.

But if you specify the fields separator or row terminator by yourself, hive will report a error. Hive now just support \n as the row terminator. So you can replace or drop the special char or \r\n in the fields.

--hive-overwrite

This will overwrite the data in the hive table

--fetch-size

This parameter ‘s default value is 1000.

One time, when we load a width view, has about 80 columns. The sqoop command report a error: out of memory .

The java file not generated now. I don‘t know why, but this error occurs before the fetch size setting, so I change this.

The root cause may need get more information from source code .

--null-string ‘\\N‘ --null-non-string ‘\\N‘

For this parameter, the hive will parse NULL in RDBMS to string ‘null‘, with this parameter, it will keep null in hive table.

   

If the sqoop command will generate the hadoop jar file in temp path, and then execute the mapreduce job.

First , it will load data to HDFS, then create table for hive, then use load command load data from HDFS to datawarehouse folder.

If the command execute successfully, it will clean the staging file.

If it fails when load data to hive or create hive table. The hdfs folder and file will keep in the HDFS.

If you rerun the same command again, it will fail, report the output directory has exists. So just drop it or load the data by self.

If you use --query (-e) , use free query to load data.

Demo : --query "select *from table where \$conditions", in double quote , you should add \, in single quote, not needed for this.

And you should add parameter --target-dir /hdfspath , if you use --query.

 when load data from rdbms to hive, if you let the sqoop create the table for you. you will find the integer type will convert to double.

so you need do something for this. please take care.

SQOOP Load Data from Oracle to Hive Table

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.