Sqoop1.4.4 exploring native incremental import features
Original ideas
To implement incremental import, you do not need to use the native incremental feature of Sqoop. You only need to use shell scripts to generate a fixed time range based on the current time, and then splice Sqoop command statements. Native incremental import features Sqoop provides native incremental import features, including the following three key parameters:
Argument |
Description |
--check-column (col) |
Specify a "flag column" to determine the data range of incremental import. This column cannot be a numeric type, preferably a number or a date type (this is a good understanding ). |
--incremental (mode) |
Specifies the incremental mode, including the "APPEND mode"append And "Last modified mode"Lastmodified (this mode meets common requirements ).
|
--last-value (value) |
Specifies the upper bound of the last import of the Flag column. If the flag column is the last modification time, -- last-value indicates the time when the last import script was executed. |
Combined with the Saved Jobs mechanism, you can automatically update and assign values for the last-value field when Incrementally Updating a Job during repeated scheduling. Combined with the cron or oozie scheduled scheduling, you can achieve real incremental update. Experiment: Create and execute incremental jobs create incremental update job: fulong @ FBI006 :~ /Sqoop/sqoop-1.4.4/bin $ sqoop job -- create incretest -- import -- connect jdbc: Oracle: thin: @ 192.168.0.138: 1521: orcl -- username HIVE -- password hivefbi -- table FBI_SQOOPTEST -- hive-import -- hive-table INCRETEST -- incremental lastmodified -- check-column LASTMODIFIED -- last-value '2017/27 13:00:00 '2014/8/27 17:29:37 WARN tool. baseSqoopTool: Setting your password on the command-line is insecure. consid Er using-P instead.14/08/27 17:29:37 INFO tool. baseSqoopTool: Using Hive-specific delimiters for output. you can override14/08/27 17:29:37 INFO tool. baseSqoopTool: delimiters with-fields-terminated-by, etc.14/08/27 17:29:37 WARN tool. baseSqoopTool: It seems that you 've specified at least one of following: 14/08/27 17:29:37 WARN tool. baseSqoopTool: -- hive-home14/08/27 17:29:37 WARN tool. baseSqoop Tool: -- hive-overwrite14/08/27 17:29:37 WARN tool. baseSqoopTool: -- create-hive-table14/08/27 17:29:37 WARN tool. baseSqoopTool: -- hive-table14/08/27 17:29:37 WARN tool. baseSqoopTool: -- hive-partition-key14/08/27 17:29:37 WARN tool. baseSqoopTool: -- hive-partition-value14/08/27 17:29:37 WARN tool. baseSqoopTool: -- map-column-hive14/08/27 17:29:37 WARN tool. baseSqoopTool: Without specifying parameter- -Hive-import. please note that14/08/27 17:29:37 WARN tool. baseSqoopTool: those arguments will not be used in this session. either14/08/27 17:29:37 WARN tool. baseSqoopTool: specify -- hive-import to apply them correctly or remove them14/08/27 17:29:37 WARN tool. baseSqoopTool: from command line to remove this warning.14/08/27 17:29:37 INFO tool. baseSqoopTool: Please note that -- hive-home, -- hive-part Ition-key, 14/08/27 17:29:37 INFO tool. baseSqoopTool: hive-partition-value and -- map-column-hive options are14/08/27 17:29:37 INFO tool. baseSqoopTool: are also valid for HCatalog imports and exports run Job: fulong @ FBI006 :~ /Sqoop/sqoop-1.4.4/bin $. /sqoop job -- exec incretest note: the SQL statement shown in the log: 14/08/27 17:36:23 INFO db. dataDrivenDBInputFormat: BoundingValsQuery: select min (ID), MAX (ID) FROM FBI_SQOOPTEST WHERE (LASTMODIFIED> = TO_DATE ('2017/27 13:00:00 ', 'yyyy-MM-DD HH24: MI: ss') and lastmodified <TO_DATE ('2017-08-27 17:36:23 ', 'yyyy-MM-DD HH24: MI: ss') where, the lower bound of LASTMODIFIED is specified in the statement for creating a job. The upper bound is the current time 17:36:23. Verify: hive> select * from incretest; OK2 lion 2014-08-27Time taken: 0.085 seconds, Fetched: 1 row (s) Then I insert a piece of data into Oracle: Execute again: fulong @ FBI006 :~ /Sqoop/sqoop-1.4.4/bin $. the SQL statement displayed in the/sqoop job -- exec incretest log is 14/08/27 17:47:19 INFO db. dataDrivenDBInputFormat: BoundingValsQuery: select min (ID), MAX (ID) FROM FBI_SQOOPTEST WHERE (LASTMODIFIED> = TO_DATE ('2017-08-27 17:36:23 ', 'yyyy-MM-DD HH24: MI: ss') and lastmodified <TO_DATE ('2017-08-27 17:47:19 ', 'yyyy-MM-DD HH24: MI: ss') where, the lower bound of LASTMODIFIED is the upper bound of the previous job execution. That is to say, the Sqoop "Saved Jobs" mechanism applies to the incremental import class Jo B. The last execution time is automatically recorded and assigned to the -- last-value parameter of the next execution! That is to say, we only need to use crontab settings to regularly execute the job. The -- last-value in the job will be automatically updated by the "Saved Jobs" mechanism to achieve real incremental import. The new data in the preceding Oracle table is successfully inserted into the Hive table. Add a new data record to the oracle table and execute the job again. As a result, the upper bound of the last log is automatically the lower bound of the current import: 14/08/27 17:59:34 INFO db. dataDrivenDBInputFormat: BoundingValsQuery: select min (ID), MAX (ID) FROM FBI_SQOOPTEST WHERE (LASTMODIFIED> = TO_DATE ('2017-08-27 17:47:19 ', 'yyyy-MM-DD HH24: MI: SS ') and lastmodified <TO_DATE ('2017-08-27 17:59:34', 'yyyy-MM-DD HH24: MI: ss '))
Implement data import between Mysql, Oracle, and HDFS/Hbase through Sqoop
[Hadoop] Detailed description of Sqoop Installation Process
Use Sqoop to export data between MySQL and HDFS Systems
Hadoop Oozie learning notes Oozie does not support Sqoop Problem Solving
Hadoop ecosystem construction (hadoop hive hbase zookeeper oozie Sqoop)
Full history of Hadoop learning-use Sqoop to import MySQL Data to Hive
This article permanently updates the link address: