Sqoop1.4.4 import incremental data from Oracle10g to Hive0.13.1 and update the master table in Hive.

Source: Internet
Author: User
Tags sqoop
Import incremental data from the basic business table in Oracle to Hive and merge it with the current full table into the latest full table. Import Oracle tables to Hive through Sqoop to simulate full scale and

Import incremental data from the basic business table in Oracle to Hive and merge it with the current full table into the latest full table. Import Oracle tables to Hive through Sqoop to simulate full scale and

Requirement

Import incremental data from the basic business table in Oracle to Hive and merge it with the current full table into the latest full table.

Design

Three tables involved:

Steps:

  • Import tables from Oracle to Hive through Sqoop to simulate full scale and incremental scale
  • Use Hive to merge "full table + incremental table" into "updated full table", covering the current full table
  • Step 1: Use Sqoop to import the tables in Oracle to Hive to simulate the full scale and incremental scale

    To simulate the scenario, a full table and an incremental table are required. Because the data source is limited, both tables are from the OMP_SERVICE in Oracle. The full table contains all the data ,, in Hive, the name is service_all, the incremental table contains some time period data, and the name is service_tmp in Hive.

    (1) full table import: export all data, as long as some fields are imported to the Hive specified table.

    To implement the Hive import function, you must first configure the environment variable of HCatalog (HCatalog is a Hive submodule), and add the following in/etc/profile:

    Export HCAT_HOME =/home/fulong/Hive/apache-hive-0.13.1-bin/hcatalog

    Run the following command to import data:

    Fulong @ FBI006 :~ /Sqoop/sqoop-1.4.4/bin $./sqoop import \

    > -- Connect jdbc: oracle: thin: @ 192.168.0.147: 1521: ORCLGBK -- username SP -- password fulong \

    > -- Table OMP_SERVICE \

    > -- Columns "SERVICE_CODE, SERVICE_NAME, SERVICE_PROCESS, CREATE_TIME, ENABLE_ORG, ENABLE_PLATFORM, IF_DEL "\

    > -- Hive-import -- hive-table SERVICE_ALL

    Note: The username must be in uppercase.

    (2) incremental table import: Only data within the required time range can be exported. As long as some fields are required, import them to the Hive specified table.

    Run the following command to import data:

    Fulong @ FBI006 :~ /Sqoop/sqoop-1.4.4/bin $./sqoop import \

    > -- Connect jdbc: oracle: thin: @ 192.168.0.147: 1521: ORCLGBK -- username SP -- password fulong \

    > -- Table OMP_SERVICE \

    > -- Columns "SERVICE_CODE, SERVICE_NAME, SERVICE_PROCESS, CREATE_TIME, ENABLE_ORG, ENABLE_PLATFORM, IF_DEL "\

    > -- Where "CREATE_TIME> to_date ('2017/4 17:00:00 ', 'yyyy-mm-dd hh24: mi: ss') and CREATE_TIME <to_date ('2017/4 18:00:00 ', 'yyyy-mm-dd hh24: mi: ss ')"\

    > -- Hive-import -- hive-overwrite -- hive-table SERVICE_TMP

    Note:

  • Because the -- hive-overwrite parameter is used, this statement can be executed repeatedly to overwrite the latest incremental data into the service_tmp table;
  • Sqoop also supports the use of complex SQL statements to Query data Imports. For more information, see section 7.2.3.Free-form Query Imports.
  • (3) Verify the import result: list all tables, count the number of rows, and view the table structure

    Hive> show tables;

    OK

    Searchlog

    Searchlog_tmp

    Service_all

    Service_tmp

    Time taken: 0.04 seconds, Fetched: 4 row (s)

    Hive> select count (*) from service_all;

    Total jobs = 1

    Launching Job 1 out of 1

    Number of reduce tasks determined at compile time: 1

    In order to change the average load for a reducer (in bytes ):

    Set hive.exe c. reducers. bytes. per. Cer CER =

    In order to limit the maximum number of specified CERs:

    Set hive.exe c. Fetch CERs. max =

    In order to set a constant number of specified CERs:

    Set mapreduce. job. CES =

    Starting Job = job_1407233914535_0013, Tracking URL =: 8088/proxy/application_1407233914535_0013/

    Kill Command =/home/fulong/Hadoop/hadoop-2.2.0/bin/hadoop job-kill job_1407233914535_0013

    Hadoop job information for Stage-1: number of mappers: 3; number of concurrent CERs: 1

    16:51:47, 389 Stage-1 map = 0%, reduce = 0%

    16:51:59, 816 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 1.36 sec

    16:52:01, 996 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 2.45 sec

    16:52:07, 877 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.96 sec

    16:52:17, 639 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.29 sec

    MapReduce Total cumulative CPU time: 5 secondds 290 msec

    Ended Job = job_1407233914535_0013

    MapReduce Jobs Launched:

    Job 0: Map: 3 Reduce: 1 Cumulative CPU: 5.46 sec HDFS Read: 687141 HDFS Write: 5 SUCCESS

    Total MapReduce CPU Time Spent: 5 secondds 460 msec

    OK

    6803

    Time taken: 59.386 seconds, Fetched: 1 row (s)

    Hive> select count (*) from service_tmp;

    Total jobs = 1

    Launching Job 1 out of 1

    Number of reduce tasks determined at compile time: 1

    In order to change the average load for a reducer (in bytes ):

    Set hive.exe c. reducers. bytes. per. Cer CER =

    In order to limit the maximum number of specified CERs:

    Set hive.exe c. Fetch CERs. max =

    In order to set a constant number of specified CERs:

    Set mapreduce. job. CES =

    Starting Job = job_1407233914535_0014, Tracking URL =: 8088/proxy/application_1407233914535_0014/

    Kill Command =/home/fulong/Hadoop/hadoop-2.2.0/bin/hadoop job-kill job_1407233914535_0014

    Hadoop job information for Stage-1: number of mappers: 3; number of concurrent CERs: 1

    16:53:03, 951 Stage-1 map = 0%, reduce = 0%

    16:53:15, 189 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 2.17 sec

    16:53:16, 236 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.38 sec

    16:53:57, 935 Stage-1 map = 100%, reduce = 22%, Cumulative CPU 3.78 sec

    16:54:01, 811 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.34 sec

    MapReduce Total cumulative CPU time: 5 secondds 340 msec

    Ended Job = job_1407233914535_0014

    MapReduce Jobs Launched:

    Job 0: Map: 3 Reduce: 1 Cumulative CPU: 5.66 sec HDFS Read: 4720 HDFS Write: 3 SUCCESS

    Total MapReduce CPU Time Spent: 5 secondds 660 msec

    OK

    13

    Time taken: 75.856 seconds, Fetched: 1 row (s)

    Hive> describe service_all;

    OK

    Service_code string

    Service_name string

    Service_process string

    Create_time string

    Enable_org string

    Enable_platform string

    If_del string

    Time taken: 0.169 seconds, Fetched: 7 row (s)

    Hive> describe service_tmp;

    OK

    Service_code string

    Service_name string

    Service_process string

    Create_time string

    Enable_org string

    Enable_platform string

    If_del string

    Time taken: 0.117 seconds, Fetched: 7 row (s)

    The logic for merging new tables is as follows:

  • Enter the entire tmp table into the final table
  • All table data not included in the range of service_code of tmp table enters the new table.
  • Run the following SQL statement to merge the updated full table:

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.