Incremental import of Sqoop (increment import)

Last Update:2015-10-12 Source: Internet

Author: User

Tags sqoop

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1, import incremental imports of the official description

2, test sqoop increment Import

Incremental import in the enterprise, generally need to execute frequently, such as one weeks to perform an incremental import, so the incremental import method needs to be executed several times, and each execution, and write the corresponding execution command, it is more troublesome. And Sqoop provides a great tool for the save job approach.

The test is done by--incremental to execute the lastmodified mode,--check-column to set the field of the LastModified check, which means that the import is performed when the field is updated or added. --last-value to set the initial value ' 2014/8/27 13:00:00 ', which is used as the lower bound of the first import, and from the second start, Sqoop automatically updates the value to the upper bound of the previous import.

Test start: Sqoop create a job to implement the daily incremental import, first in the relational database Oracle wears a test table oracletablename, add two pieces of data:

SELECT * from Oracletablename;

ID Name lastmodified

1 Three 2015-10-10 17:52:20.0

2 John Doe 2015-10-10 17:52:20.0

(1) Create Sqoop Job

Sqoop Job--create JobName--Import--connect jdbc:oracle:thin:@192.168.27.235:1521/orcl--username DATACENTER--passwo Rd Clear--table oracletablename--hive-import--hive-table hivetablename--incremental lastmodified--check-column LAS Tmodified--last-value ' 2014/8/27 13:00:00 '

Description

1) in the above job, can not specify-m, because of the specified-m, the corresponding import will be poor in HDFs the corresponding intermediate results, when you perform the job again the next time, because the output directory is exist error.

2) The above hivetablename must be existing. At the first import, to make the table exist, you can execute the following command by importing the ORACLETABLENAME table structure into hive:

Sqoop create-hive-table--connect jdbc:oracle:thin:@//192.168.27.235:1521/orcl--username DATACENTER--password Clear --table TableName

After execution, a table with the same name and the same table structure is created in hive.

(2) View and execute job

Once the job has been created above, you can see if the job was created successfully by following the command:

Sqoop Job--list Lists all job

Sqoop Job--show jobname Display jobname Information

Sqoop Job--delete JobName Delete JobName

Sqoop job--exec jobname Execution JobName

(3) After performing the job, see if the tables in hive have data. Of course, there must be data in the accident.

And in the process of execution, we can see the corresponding execution log as follows:

Slf4j:see http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Slf4j:actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/10/12 15:59:37 INFO Manager. Oraclemanager:time zone have been set to GMT
15/10/12 15:59:37 INFO Manager. Sqlmanager:executing SQL statement:select t.* F ROM TEMP2 t WHERE 1=0
15/10/12 15:59:37 INFO tool. Importtool:incremental import based on column lastm odified
15/10/12 15:59:37 INFO tool. Importtool:lower bound Value:to_timestamp (' 2014/8/ 13:00:00 ', ' Yyyy-mm-dd HH24:MI:SS. FF ')
15/10/12 15:59:37 INFO tool. Importtool:upper bound Value:to_timestamp (' 2015-10-12 15:59:35.0 ', ' Yyyy-mm-dd HH24:MI:SS. FF ')
15/10/12 15:59:37 WARN Manager. Oraclemanager:the table TEMP2 contains a multi-c olumn primary key. Sqoop would default to the column ID of the only for this job.
15/10/12 15:59:37 INFO Manager. Oraclemanager:time zone have been set to GMT
15/10/12 15:59:37 WARN Manager. Oraclemanager:the table TEMP2 contains a multi-c olumn primary key. Sqoop would default to the column ID of the only for this job.
15/10/12 15:59:37 INFO MapReduce. Importjobbase:beginning Import of TEMP2
15/10/12 15:59:37 INFO Configuration.deprecation:mapred.jar is deprecated. Inst EAD, use Mapreduce.job.jar
15/10/12 15:59:37 INFO Manager. Oraclemanager:time zone have been set to GMT
15/10/12 15:59:37 INFO Configuration.deprecation:mapred.map.tasks is deprecated. Instead, use Mapreduce.job.maps
15/10/12 15:59:37 INFO Client. Rmproxy:connecting to ResourceManager at hadoop3/192.168.27.233:8032
15/10/12 15:59:42 INFO db. Dbinputformat:using Read commited transaction Isolati on
15/10/12 15:59:42 INFO db. DataDrivenDBInputFormat:BoundingValsQuery:SELECT MIN (ID), MAX (ID) from TEMP2 WHERE (lastmodified >= To_timestamp (' 2014/8/27 13:00:0 0 ', ' yyyy-mm-dd HH24:MI:SS. FF ') and LastModified < To_timestamp (' 2015-10-12 15:59:35.0 ', ' Yyyy-mm-dd HH24:MI:SS. FF '))
15/10/12 15:59:42 INFO mapreduce. Jobsubmitter:number of Splits:4

Description: From the red section above we know very well how sqoop is imported when importing. We can know that the value of the set--last-value is the corresponding lower bound.

(4) Add a field to Oracletablename in the relational database Oracle

ID Name lastmodified

1 Three 2015-10-10 17:52:20.0

2 John Doe 2015-10-10 17:52:20.0

3 John Doe 2015-10-12 16:01:23.0

(5) Incremental import at this time

Once again, execute the job:sqoop job--exec JobName

View the contents of the log again as follows:

Slf4j:see http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Slf4j:actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/10/12 16:02:17 INFO Manager. Oraclemanager:time zone have been set to GMT
15/10/12 16:02:17 INFO Manager. Sqlmanager:executing SQL statement:select t.* F ROM TEMP2 t WHERE 1=0
15/10/12 16:02:17 INFO tool. Importtool:incremental import based on column lastm odified
15/10/12 16:02:17 INFO tool. Importtool:lower bound Value:to_timestamp (' 2015-10-12 15:59:35.0 ', ' Yyyy-mm-dd HH24:MI:SS. FF ')
15/10/12 16:02:17 INFO tool. Importtool:upper bound Value:to_timestamp (' 2015-10-12 16:02:15.0 ', ' Yyyy-mm-dd HH24:MI:SS. FF ')
15/10/12 16:02:17 WARN Manager. Oraclemanager:the table TEMP2 contains a multi-c olumn primary key. Sqoop would default to the column ID of the only for this job.
15/10/12 16:02:17 INFO Manager. Oraclemanager:time zone have been set to GMT
15/10/12 16:02:17 WARN Manager. Oraclemanager:the table TEMP2 contains a multi-c olumn primary key. Sqoop would default to the column ID of the only for this job.
15/10/12 16:02:17 INFO MapReduce. Importjobbase:beginning Import of TEMP2
15/10/12 16:02:17 INFO Configuration.deprecation:mapred.jar is deprecated. Inst EAD, use Mapreduce.job.jar
15/10/12 16:02:17 INFO Manager. Oraclemanager:time zone have been set to GMT
15/10/12 16:02:17 INFO Configuration.deprecation:mapred.map.tasks is deprecated. Instead, use Mapreduce.job.maps
15/10/12 16:02:17 INFO Client. Rmproxy:connecting to ResourceManager at hadoop3/192.168.27.233:8032
15/10/12 16:02:23 INFO db. Dbinputformat:using Read commited transaction Isolati on
15/10/12 16:02:23 INFO db. DataDrivenDBInputFormat:BoundingValsQuery:SELECT MIN (ID), MAX (ID) from TEMP2 WHERE (lastmodified >= To_timestamp (' 2015-10-12 15:59:35.0 ', ' Yyyy-mm-dd HH24:MI:SS. FF ') and LastModified < To_timestamp (' 2015-10-12 1 6:02:15.0 ', ' Yyyy-mm-dd HH24:MI:SS. FF '))

15/10/12 16:02:23 WARN db. Bigdecimalsplitter:set BigDecimal splitsize to min_in crement
15/10/12 16:02:23 INFO MapReduce. Jobsubmitter:number of Splits:1

Note: We can see from the execution log that the value of the--last-value is automatically updated to the previous upper bound value, and note the last upper bound.

Incremental import of Sqoop (increment import)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More