Sqoop data transfer between Hadoop and relational databases

Last Update:2014-05-08 Source: Internet

Author: User

Tags hadoop ecosystem sqoop

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Sqoop supports incremental Import

View job:

Sqoop job -- meta-connect jdbc: hsqldb: hsql: // ip: port/sqoop -- list

Copy the table structure in mysql to the hive table:

Sqoop create-hive-table -- connect jdbc: mysql: // ip: port/dbName -- table tableName -- username -- password pass -- hive-table qinshiwei

The table qinshiwei is in the default database by default.

Import mysql Data to hive

Sqoop import -- connect jdbc: mysql: // ip: port/dbName -- username root -- password mysql-password -- table t1 -- hive-import

Import hdfs data to mysql

Sqoop-export -- connect jdbc: mysql: // ip: port/dbName -- username -- password pass -- table qinshiwei -- export-dir tablename

Export-dir is the location of the HDFS flat file to be exported. If it is not an absolute path, it indicates/user/username/datadir

Parameter: -- input-fields-terminated-by '\ 100'

The delimiter used to import hdfs files from mysql: -- fields-terminated-by '\ 100'

Import hdfs from mysql:

Sqoop import -- connect jdbc: mysql: // Hadoop48/toplists -- username root -- table test-m 1

Import hive from mysql:

Sqoop import -- connect jdbc: mysql: // ip/tjcommon -- verbose-m 1 -- username -- password pass -- hive-overwrite -- direct -- table tablename -- hive-import -- create-hive-table -- hive-table mysql_award -- fields-terminated-by '\ t' -- lines-terminated-by' \ n' -- append

Before executing the command, delete the table name. java file rm/usr/lib/hadoop-0.20/tablename. java under hadoop to ensure that the table name mysql_award does not exist in the default database of hive.

Sqoop import -- connect jdbc: mysql: // ip/dbName -- username -- password pass -- table qinshiwei-m 1 -- hive-import -- hive-table err -- hive-overwrite -- hive-delims-replacement "\ t" -- incremental append -- last-value 0 -- check-column id

Hive null value processing

Sqoop automatically converts NULL to null, but \ N is used to represent null by default in hive, because the pre-processing will not take effect, we need to use -- null-string and -- null-non-string to process null values and convert \ N to \ N.

Sqoop import... -- null-string '\ n' -- null-non-string' \ N'

If hive-overwirte is not added when Sqoop is imported to hive, The hadoop directory is left behind, which affects the next execution of the same task.

-- Incremental append

-- Check-column id

-- Last-value 0

In this way, sqoop imports IDs greater than 0 from visit_log, implementing incremental import. Non-incremental import can be used, but sqoop seems to create a directory with the source table name under the warehouse-dir directory. If this directory exists, an error is reported. Therefore, incremental import is recommended.

When the last value is imported every day, how can we dynamically read the new last value every day? Sqoop supports converting a sqoop command into a job. sqoop automatically replaces the value of last value through the job result.

The final sqoop command we get is:

Sqoop job visit_import -- create -- import -- connect jdbc: mysql: // localhost: 3306/main -- username root -- password pass -- table user -- columns "id, url, time "-- direct -- hive-import -- hive-table hive_visit_log -- incremental append -- check-column id -- last-value 0

Sqoop job -- exec visit_import

Create job:

Sqoop job -- meta-connect jdbc: hsqldb: hsql: // ip: port/sqoop -- create visit_import -- import -- connect jdbc: mysql: // ip: port/dbname -- username -- password pass -- table -- direct -- hive-import -- hive-table mysql_award -- incremental append -- check-column id -- last-value 0

Sqoop details: click here
Sqoop: click here

Implement data import between Mysql, Oracle, and HDFS/Hbase through Sqoop

[Hadoop] Detailed description of Sqoop Installation Process

Use Sqoop to export data between MySQL and HDFS Systems

Hadoop Oozie learning notes Oozie does not support Sqoop Problem Solving

Hadoop ecosystem construction (hadoop hive hbase zookeeper oozie Sqoop)

Full history of Hadoop learning-use Sqoop to import MySQL Data to Hive

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More