Sqoop Timing Incremental Import _sqoop

Source: Internet
Author: User
Tags sqoop

Sqoop use Hsql to store job information, open Metastor service to share job information, Sqoop on all node can run the same job

One, sqoop configuration file in Sqoop.site.xml:

1, Sqoop.metastore.server.location

Local storage path, default under TMP, change to other path

2, Sqoop.metastore.server.port

Metastore Service port number

3, Sqoop.metastore.client.autoconnect.url

Sqoop automatically connected Metastore address, default is local Metastore address

4, Sqoop.metastore.client.enable.autoconnect

Turn on automatic connection. Sqoop The default connection to the local metastore. Note This configuration will turn on automatic connection.

Second, open Metastore service

Sqoop, Nohup bin/sqoop metastore

Third, create a job

Sqoop supports two incremental import modes,
One is append, that is, by specifying an incremented column, such as:
--incremental Append--check-column num_iid--last-value 0

Check fields of type varchar can also be incrementally imported in this way (incrementing numbers with ID varchar type):

--incremental Append--check-column ID--last-value 8
Another can be based on time stamps, such as:
--incremental lastmodified--check-column created--last-value ' 2012-02-01 11:0:00 '
is to import only created data that is larger than ' 2012-02-01 11:0:00 '.

Bin/sqoop job--meta-connect jdbc:hsqldb:hsql://10.106.1.234:16000/sqoop--create job_zyztest13--Import--connect JDBC:ORACLE:THIN:@10.106.1.236:1521:ORCL--username sqoop--password sqoop--table LXC_TEST_HBASE_TO_ORACLE--columns Name,sex,age,csrq-m 1--hbase-table sqoop_import_test3--column-family info--hbase-row-key name--split-by name--incre Mental lastmodified--check-column csrq--last-value ' 2012-02-01 11:0:00 '--verbose
Nohup/opt/hadoopcluster/sqoop-1.4.2.bin__hadoop-2.0.0-alpha/bin/sqoop job--exec job_zyztest13 > Job_ Zyztest13.out 2>&1 &

At this point, the job_zyztest13 job was created on 10.106.1.234, through Bin/sqoop job--meta-connect Jdbc:hsqldb:hsql://10.106.1.234:16000/sqoop --list can view all job

Iv. timing of implementation

Using Linux timer: CRONTAB-E

Edit Timer, add */4 * * * * /opt/hadoopcluster/sqoop-1.4.2.bin__hadoop-2.0.0-alpha/bin/sqoop job--meta-connect jdbc: Hsqldb:hsql://10.106.1.234:16000/sqoop--exec job_zyztest13 > Job_zyztest13.out 2>&1 & Tasks will be executed once every four minutes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.