Sqoop data transfer between Hadoop and relational databases

Source: Internet
Author: User
Tags hadoop ecosystem sqoop

Sqoop supports incremental Import

View job:

Sqoop job -- meta-connect jdbc: hsqldb: hsql: // ip: port/sqoop -- list

Copy the table structure in mysql to the hive table:

Sqoop create-hive-table -- connect jdbc: mysql: // ip: port/dbName -- table tableName -- username -- password pass -- hive-table qinshiwei

The table qinshiwei is in the default database by default.

Import mysql Data to hive

Sqoop import -- connect jdbc: mysql: // ip: port/dbName -- username root -- password mysql-password -- table t1 -- hive-import

Import hdfs data to mysql

Sqoop-export -- connect jdbc: mysql: // ip: port/dbName -- username -- password pass -- table qinshiwei -- export-dir tablename

Export-dir is the location of the HDFS flat file to be exported. If it is not an absolute path, it indicates/user/username/datadir

Parameter: -- input-fields-terminated-by '\ 100'

The delimiter used to import hdfs files from mysql: -- fields-terminated-by '\ 100'

Import hdfs from mysql:

Sqoop import -- connect jdbc: mysql: // Hadoop48/toplists -- username root -- table test-m 1

Import hive from mysql:

Sqoop import -- connect jdbc: mysql: // ip/tjcommon -- verbose-m 1 -- username -- password pass -- hive-overwrite -- direct -- table tablename -- hive-import -- create-hive-table -- hive-table mysql_award -- fields-terminated-by '\ t' -- lines-terminated-by' \ n' -- append

Before executing the command, delete the table name. java file rm/usr/lib/hadoop-0.20/tablename. java under hadoop to ensure that the table name mysql_award does not exist in the default database of hive.

Sqoop import -- connect jdbc: mysql: // ip/dbName -- username -- password pass -- table qinshiwei-m 1 -- hive-import -- hive-table err -- hive-overwrite -- hive-delims-replacement "\ t" -- incremental append -- last-value 0 -- check-column id

Hive null value processing

Sqoop automatically converts NULL to null, but \ N is used to represent null by default in hive, because the pre-processing will not take effect, we need to use -- null-string and -- null-non-string to process null values and convert \ N to \ N.

Sqoop import... -- null-string '\ n' -- null-non-string' \ N'

If hive-overwirte is not added when Sqoop is imported to hive, The hadoop directory is left behind, which affects the next execution of the same task.

-- Incremental append

-- Check-column id

-- Last-value 0

In this way, sqoop imports IDs greater than 0 from visit_log, implementing incremental import. Non-incremental import can be used, but sqoop seems to create a directory with the source table name under the warehouse-dir directory. If this directory exists, an error is reported. Therefore, incremental import is recommended.

When the last value is imported every day, how can we dynamically read the new last value every day? Sqoop supports converting a sqoop command into a job. sqoop automatically replaces the value of last value through the job result.

The final sqoop command we get is:

Sqoop job visit_import -- create -- import -- connect jdbc: mysql: // localhost: 3306/main -- username root -- password pass -- table user -- columns "id, url, time "-- direct -- hive-import -- hive-table hive_visit_log -- incremental append -- check-column id -- last-value 0

Sqoop job -- exec visit_import

Create job:

Sqoop job -- meta-connect jdbc: hsqldb: hsql: // ip: port/sqoop -- create visit_import -- import -- connect jdbc: mysql: // ip: port/dbname -- username -- password pass -- table -- direct -- hive-import -- hive-table mysql_award -- incremental append -- check-column id -- last-value 0

Sqoop details: click here
Sqoop: click here

Implement data import between Mysql, Oracle, and HDFS/Hbase through Sqoop

[Hadoop] Detailed description of Sqoop Installation Process

Use Sqoop to export data between MySQL and HDFS Systems

Hadoop Oozie learning notes Oozie does not support Sqoop Problem Solving

Hadoop ecosystem construction (hadoop hive hbase zookeeper oozie Sqoop)

Full history of Hadoop learning-use Sqoop to import MySQL Data to Hive

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.