Sqoop use Hsql to store job information, open Metastor service to share job information, Sqoop on all node can run the same job
One, sqoop configuration file in Sqoop.site.xml:
1, Sqoop.metastore.server.location
Local storage path, default under TMP, change to other path
2, Sqoop.metastore.server.port
Metastore Service port number
3, Sqoop.metastore.client.autoconnect.url
Sqoop automatically connected Metastore address, default is local Metastore address
4, Sqoop.metastore.client.enable.autoconnect
Turn on automatic connection. Sqoop The default connection to the local metastore. Note This configuration will turn on automatic connection.
Second, open Metastore service
Sqoop, Nohup bin/sqoop metastore
Third, create a job
Sqoop supports two incremental import modes,
One is append, that is, by specifying an incremented column, such as:
--incremental Append--check-column num_iid--last-value 0
Check fields of type varchar can also be incrementally imported in this way (incrementing numbers with ID varchar type):
--incremental Append--check-column ID--last-value 8
Another can be based on time stamps, such as:
--incremental lastmodified--check-column created--last-value ' 2012-02-01 11:0:00 '
is to import only created data that is larger than ' 2012-02-01 11:0:00 '.
Bin/sqoop job--meta-connect jdbc:hsqldb:hsql://10.106.1.234:16000/sqoop--create job_zyztest13--Import--connect JDBC:ORACLE:THIN:@10.106.1.236:1521:ORCL--username sqoop--password sqoop--table LXC_TEST_HBASE_TO_ORACLE--columns Name,sex,age,csrq-m 1--hbase-table sqoop_import_test3--column-family info--hbase-row-key name--split-by name--incre Mental lastmodified--check-column csrq--last-value ' 2012-02-01 11:0:00 '--verbose
Nohup/opt/hadoopcluster/sqoop-1.4.2.bin__hadoop-2.0.0-alpha/bin/sqoop job--exec job_zyztest13 > Job_ Zyztest13.out 2>&1 &
At this point, the job_zyztest13 job was created on 10.106.1.234, through Bin/sqoop job--meta-connect Jdbc:hsqldb:hsql://10.106.1.234:16000/sqoop --list can view all job
Iv. timing of implementation
Using Linux timer: CRONTAB-E
Edit Timer, add */4 * * * * /opt/hadoopcluster/sqoop-1.4.2.bin__hadoop-2.0.0-alpha/bin/sqoop job--meta-connect jdbc: Hsqldb:hsql://10.106.1.234:16000/sqoop--exec job_zyztest13 > Job_zyztest13.out 2>&1 & Tasks will be executed once every four minutes