Sqoop supports incremental Import
View job:
Sqoop job -- meta-connect jdbc: hsqldb: hsql: // ip: port/sqoop -- list
Copy the table structure in mysql to the hive table:
Sqoop create-hive-table -- connect jdbc: mysql: // ip: port/dbName -- table tableName -- username -- password pass -- hive-table qinshiwei
The table qinshiwei is in the default database by default.
Import mysql Data to hive
Sqoop import -- connect jdbc: mysql: // ip: port/dbName -- username root -- password mysql-password -- table t1 -- hive-import
Import hdfs data to mysql
Sqoop-export -- connect jdbc: mysql: // ip: port/dbName -- username -- password pass -- table qinshiwei -- export-dir tablename
Export-dir is the location of the HDFS flat file to be exported. If it is not an absolute path, it indicates/user/username/datadir
Parameter: -- input-fields-terminated-by '\ 100'
The delimiter used to import hdfs files from mysql: -- fields-terminated-by '\ 100'
Import hdfs from mysql:
Sqoop import -- connect jdbc: mysql: // Hadoop48/toplists -- username root -- table test-m 1
Import hive from mysql:
Sqoop import -- connect jdbc: mysql: // ip/tjcommon -- verbose-m 1 -- username -- password pass -- hive-overwrite -- direct -- table tablename -- hive-import -- create-hive-table -- hive-table mysql_award -- fields-terminated-by '\ t' -- lines-terminated-by' \ n' -- append
Before executing the command, delete the table name. java file rm/usr/lib/hadoop-0.20/tablename. java under hadoop to ensure that the table name mysql_award does not exist in the default database of hive.
Sqoop import -- connect jdbc: mysql: // ip/dbName -- username -- password pass -- table qinshiwei-m 1 -- hive-import -- hive-table err -- hive-overwrite -- hive-delims-replacement "\ t" -- incremental append -- last-value 0 -- check-column id
Hive null value processing
Sqoop automatically converts NULL to null, but \ N is used to represent null by default in hive, because the pre-processing will not take effect, we need to use -- null-string and -- null-non-string to process null values and convert \ N to \ N.
Sqoop import... -- null-string '\ n' -- null-non-string' \ N'
If hive-overwirte is not added when Sqoop is imported to hive, The hadoop directory is left behind, which affects the next execution of the same task.
-- Incremental append
-- Check-column id
-- Last-value 0
In this way, sqoop imports IDs greater than 0 from visit_log, implementing incremental import. Non-incremental import can be used, but sqoop seems to create a directory with the source table name under the warehouse-dir directory. If this directory exists, an error is reported. Therefore, incremental import is recommended.
When the last value is imported every day, how can we dynamically read the new last value every day? Sqoop supports converting a sqoop command into a job. sqoop automatically replaces the value of last value through the job result.
The final sqoop command we get is:
Sqoop job visit_import -- create -- import -- connect jdbc: mysql: // localhost: 3306/main -- username root -- password pass -- table user -- columns "id, url, time "-- direct -- hive-import -- hive-table hive_visit_log -- incremental append -- check-column id -- last-value 0
Sqoop job -- exec visit_import
Create job:
Sqoop job -- meta-connect jdbc: hsqldb: hsql: // ip: port/sqoop -- create visit_import -- import -- connect jdbc: mysql: // ip: port/dbname -- username -- password pass -- table -- direct -- hive-import -- hive-table mysql_award -- incremental append -- check-column id -- last-value 0
Sqoop details: click here
Sqoop: click here
Implement data import between Mysql, Oracle, and HDFS/Hbase through Sqoop
[Hadoop] Detailed description of Sqoop Installation Process
Use Sqoop to export data between MySQL and HDFS Systems
Hadoop Oozie learning notes Oozie does not support Sqoop Problem Solving
Hadoop ecosystem construction (hadoop hive hbase zookeeper oozie Sqoop)
Full history of Hadoop learning-use Sqoop to import MySQL Data to Hive