Sqoop1.4.4 import incremental data from Oracle10g to Hive0.13.1 and update the master table in Hive.

Last Update:2018-06-12 Source: Internet

Author: User

Tags sqoop

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Import incremental data from the basic business table in Oracle to Hive and merge it with the current full table into the latest full table. Import Oracle tables to Hive through Sqoop to simulate full scale and

Requirement

Import incremental data from the basic business table in Oracle to Hive and merge it with the current full table into the latest full table.

Design

Three tables involved:

Steps:

Import tables from Oracle to Hive through Sqoop to simulate full scale and incremental scale

Use Hive to merge "full table + incremental table" into "updated full table", covering the current full table

Step 1: Use Sqoop to import the tables in Oracle to Hive to simulate the full scale and incremental scale

To simulate the scenario, a full table and an incremental table are required. Because the data source is limited, both tables are from the OMP_SERVICE in Oracle. The full table contains all the data ,, in Hive, the name is service_all, the incremental table contains some time period data, and the name is service_tmp in Hive.

(1) full table import: export all data, as long as some fields are imported to the Hive specified table.

To implement the Hive import function, you must first configure the environment variable of HCatalog (HCatalog is a Hive submodule), and add the following in/etc/profile:

Export HCAT_HOME =/home/fulong/Hive/apache-hive-0.13.1-bin/hcatalog

Run the following command to import data:

Fulong @ FBI006 :~ /Sqoop/sqoop-1.4.4/bin $./sqoop import \

> -- Connect jdbc: oracle: thin: @ 192.168.0.147: 1521: ORCLGBK -- username SP -- password fulong \

> -- Table OMP_SERVICE \

> -- Columns "SERVICE_CODE, SERVICE_NAME, SERVICE_PROCESS, CREATE_TIME, ENABLE_ORG, ENABLE_PLATFORM, IF_DEL "\

> -- Hive-import -- hive-table SERVICE_ALL

Note: The username must be in uppercase.

(2) incremental table import: Only data within the required time range can be exported. As long as some fields are required, import them to the Hive specified table.

Run the following command to import data:

Fulong @ FBI006 :~ /Sqoop/sqoop-1.4.4/bin $./sqoop import \

> -- Connect jdbc: oracle: thin: @ 192.168.0.147: 1521: ORCLGBK -- username SP -- password fulong \

> -- Table OMP_SERVICE \

> -- Columns "SERVICE_CODE, SERVICE_NAME, SERVICE_PROCESS, CREATE_TIME, ENABLE_ORG, ENABLE_PLATFORM, IF_DEL "\

> -- Where "CREATE_TIME> to_date ('2017/4 17:00:00 ', 'yyyy-mm-dd hh24: mi: ss') and CREATE_TIME <to_date ('2017/4 18:00:00 ', 'yyyy-mm-dd hh24: mi: ss ')"\

> -- Hive-import -- hive-overwrite -- hive-table SERVICE_TMP

Note:

Because the -- hive-overwrite parameter is used, this statement can be executed repeatedly to overwrite the latest incremental data into the service_tmp table;

Sqoop also supports the use of complex SQL statements to Query data Imports. For more information, see section 7.2.3.Free-form Query Imports.

(3) Verify the import result: list all tables, count the number of rows, and view the table structure

Hive> show tables;

Searchlog

Searchlog_tmp

Service_all

Service_tmp

Time taken: 0.04 seconds, Fetched: 4 row (s)

Hive> select count (*) from service_all;

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes ):

Set hive.exe c. reducers. bytes. per. Cer CER =

In order to limit the maximum number of specified CERs:

Set hive.exe c. Fetch CERs. max =

In order to set a constant number of specified CERs:

Set mapreduce. job. CES =

Starting Job = job_1407233914535_0013, Tracking URL =: 8088/proxy/application_1407233914535_0013/

Kill Command =/home/fulong/Hadoop/hadoop-2.2.0/bin/hadoop job-kill job_1407233914535_0013

Hadoop job information for Stage-1: number of mappers: 3; number of concurrent CERs: 1

16:51:47, 389 Stage-1 map = 0%, reduce = 0%

16:51:59, 816 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 1.36 sec

16:52:01, 996 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 2.45 sec

16:52:07, 877 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.96 sec

16:52:17, 639 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.29 sec

MapReduce Total cumulative CPU time: 5 secondds 290 msec

Ended Job = job_1407233914535_0013

MapReduce Jobs Launched:

Job 0: Map: 3 Reduce: 1 Cumulative CPU: 5.46 sec HDFS Read: 687141 HDFS Write: 5 SUCCESS

Total MapReduce CPU Time Spent: 5 secondds 460 msec

6803

Time taken: 59.386 seconds, Fetched: 1 row (s)

Hive> select count (*) from service_tmp;

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes ):

Set hive.exe c. reducers. bytes. per. Cer CER =

In order to limit the maximum number of specified CERs:

Set hive.exe c. Fetch CERs. max =

In order to set a constant number of specified CERs:

Set mapreduce. job. CES =

Starting Job = job_1407233914535_0014, Tracking URL =: 8088/proxy/application_1407233914535_0014/

Kill Command =/home/fulong/Hadoop/hadoop-2.2.0/bin/hadoop job-kill job_1407233914535_0014

Hadoop job information for Stage-1: number of mappers: 3; number of concurrent CERs: 1

16:53:03, 951 Stage-1 map = 0%, reduce = 0%

16:53:15, 189 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 2.17 sec

16:53:16, 236 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.38 sec

16:53:57, 935 Stage-1 map = 100%, reduce = 22%, Cumulative CPU 3.78 sec

16:54:01, 811 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.34 sec

MapReduce Total cumulative CPU time: 5 secondds 340 msec

Ended Job = job_1407233914535_0014

MapReduce Jobs Launched:

Job 0: Map: 3 Reduce: 1 Cumulative CPU: 5.66 sec HDFS Read: 4720 HDFS Write: 3 SUCCESS

Total MapReduce CPU Time Spent: 5 secondds 660 msec

Time taken: 75.856 seconds, Fetched: 1 row (s)

Hive> describe service_all;

Service_code string

Service_name string

Service_process string

Create_time string

Enable_org string

Enable_platform string

If_del string

Time taken: 0.169 seconds, Fetched: 7 row (s)

Hive> describe service_tmp;

Service_code string

Service_name string

Service_process string

Create_time string

Enable_org string

Enable_platform string

If_del string

Time taken: 0.117 seconds, Fetched: 7 row (s)

The logic for merging new tables is as follows:

Enter the entire tmp table into the final table

All table data not included in the range of service_code of tmp table enters the new table.

Run the following SQL statement to merge the updated full table:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More