SQOOP is an open-source tool mainly used for data transmission between Hadoop and traditional databases. The following is an excerpt from the SQOOP user manual.
Sqoopis a tool designed to transfer data between Hadoop and relational databases. you can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data backinto an RDBMS.
SQOOP is an open-source software for Cloudera to transmit data between HDFS and databases. Internally, HADOOP and databases are connected through JDBC. Therefore, SQOOP can be compatible with any database that supports JDBC theoretically. In addition, SQOOP can not only import data into HDFS as files, but also directly import data to HBASE or HIVE.
The following are some performance test data for your reference only:
Table Name: tb_keywords
Number of rows: 11628209
Data File Size: 1.4 GB
|
HDFS-> DB |
DB-> HDFS |
SQOOP |
428 s |
166 s |
HDFS <-> FILE <-> DB |
209 s |
105 s |
From the result, the performance of using FILE as a transit mode is higher than that of SQOOP. The reason is as follows:
1. SQOOP uses JDBC in essence and is not more efficient than MYSQL's built-in import/export tools.
2. Taking importing data to DB as an example, SQOOP is designed to commit data in stages. That is to say, if a table has 1 K rows, it reads 100 rows (default value) first and then inserts the data, submit and read 100 rows ...... So reciprocating
Even so, SQOOP has advantages, such as ease of use and fault tolerance of task execution. You can use it as a tool if needed in some testing environments.
Below are some operation records
[Wanghai01@tc-crm-rd01.tc.baidu.com bin] $ sh export. sh
Fri Sep 23 20:15:47 CST 2011
11/09/23 20:15:48 WARN tool. BaseSqoopTool: Setting your password on the command-line is insecure. Consider using-P instead.
11/09/23 20:15:48 INFO tool. CodeGenTool: Beginning code generation
11/09/23 20:15:48 INFO manager. MySQLManager: Executing SQL statement: SELECT t. * FROM 'tb _ keyword_data_201104 'AS t LIMIT 1
11/09/23 20:15:48 INFO manager. MySQLManager: Executing SQL statement: SELECT t. * FROM 'tb _ keyword_data_201104 'AS t LIMIT 1
11/09/23 20:15:48 INFO orm. CompilationManager: HADOOP_HOME is/home/wanghai01/hadoop/hadoop-0.20.2/bin /..
11/09/23 20:15:48 INFO orm. CompilationManager: Found hadoop core jar at:/home/wanghai01/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar
11/09/23 20:15:49 ERROR orm. CompilationManager: cocould not rename/tmp/sqoop-wanghai01/compile/Tables/tb_keyword_data_201104.java to/home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/bin/./tb_keyword_data_201104.java
11/09/23 20:15:49 INFO orm. CompilationManager: Writing jar file:/tmp/sqoop-wanghai01/compile/eb16aae87a119b93acb3bc6ea74b5e97/tb_keyword_data_201104.jar
11/09/23 20:15:49 INFO mapreduce. ExportJobBase: Beginning export of tb_keyword_data_201104
11/09/23 20:15:49 INFO manager. MySQLManager: Executing SQL statement: SELECT t. * FROM 'tb _ keyword_data_201104 'AS t LIMIT 1
11/09/23 20:15:49 INFO input. FileInputFormat: Total input paths to process: 1
11/09/23 20:15:49 INFO input. FileInputFormat: Total input paths to process: 1
11/09/23 20:15:49 INFO mapred. JobClient: Running job: job_201101_21_0012
11/09/23 20:15:50 INFO mapred. JobClient: map 0% reduce 0%
11/09/23 20:16:04 INFO mapred. JobClient: map 1% reduce 0%
11/09/23 20:16:10 INFO mapred. JobClient: map 2% reduce 0%
11/09/23 20:16:13 INFO mapred. JobClient: map 3% reduce 0%
11/09/23 20:16:19 INFO mapred. JobClient: map 4% reduce 0%
11/09/23 20:16:22 INFO mapred. JobClient: map 5% reduce 0%
11/09/23 20:16:25 INFO mapred. JobClient: map 6% reduce 0%
11/09/23 20:16:31 INFO mapred. JobClient: map 7% reduce 0%
11/09/23 20:16:34 INFO mapred. JobClient: map 8% reduce 0%
11/09/23 20:16:41 INFO mapred. JobClient: map 9% reduce 0%
11/09/23 20:16:44 INFO mapred. JobClient: map 10% reduce 0%
11/09/23 20:16:50 INFO mapred. JobClient: map 11% reduce 0%
11/09/23 20:16:53 INFO mapred. JobClient: map 12% reduce 0%
11/09/23 20:16:56 INFO mapred. JobClient: map 13% reduce 0%
11/09/23 20:17:02 INFO mapred. JobClient: map 14% reduce 0%
11/09/23 20:17:05 INFO mapred. JobClient: map 15% reduce 0%
11/09/23 20:17:11 INFO mapred. JobClient: map 16% reduce 0%
11/09/23 20:17:14 INFO mapred. JobClient: map 17% reduce 0%
11/09/23 20:17:17 INFO mapred. JobClient: map 18% reduce 0%
11/09/23 20:17:23 INFO mapred. JobClient: map 19% reduce 0%
11/09/23 20:17:25 INFO mapred. JobClient: map 20% reduce 0%
11/09/23 20:17:28 INFO mapred. JobClient: map 21% reduce 0%
11/09/23 20:17:34 INFO mapred. JobClient: map 22% reduce 0%
11/09/23 20:17:37 INFO mapred. JobClient: map 23% reduce 0%
11/09/23 20:17:43 INFO mapred. JobClient: map 24% reduce 0%
11/09/23 20:17:46 INFO mapred. JobClient: map 25% reduce 0%
11/09/23 20:17:49 INFO mapred. JobClient: map 26% reduce 0%
11/09/23 20:17:55 INFO mapred. JobClient: map 27% reduce 0%
11/09/23 20:17:58 INFO mapred. JobClient: map 28% reduce 0%
11/09/23 20:18:04 INFO mapred. JobClient: map 29% reduce 0%
11/09/23 20:18:07 INFO mapred. JobClient: map 30% reduce 0%
11/09/23 20:18:10 INFO mapred. JobClient: map 31% reduce 0%
11/09/23 20:18:16 INFO mapred. JobClient: map 32% reduce 0%
11/09/23 20:18:19 INFO mapred. JobClient: map 33% reduce 0%
11/09/23 20:18:25 INFO mapred. JobClient: map 34% reduce 0%
11/09/23 20:18:28 INFO mapred. JobClient: map 35% reduce 0%
11/09/23 20:18:31 INFO mapred. JobClient: map 36% reduce 0%
11/09/23 20:18:37 INFO mapred. JobClient: map 37% reduce 0%
11/09/23 20:18:40 INFO mapred. JobClient: map 38% reduce 0%
11/09/23 20:18:46 INFO mapred. JobClient: map 39% reduce 0%
11/09/23 20:18:49 INFO mapred. JobClient: map 40% reduce 0%
11/09/23 20:18:52 INFO mapred. JobClient: map 41% reduce 0%
11/09/23 20:18:58 INFO mapred. JobClient: map 42% reduce 0%
11/09/23 20:19:01 INFO mapred. JobClient: map 43% reduce 0%
11/09/23 20:19:04 INFO mapred. JobClient: map 44% reduce 0%
11/09/23 20:19:10 INFO mapred. JobClient: map 45% reduce 0%
11/09/23 20:19:13 INFO mapred. JobClient: map 46% reduce 0%
11/09/23 20:19:19 INFO mapred. JobClient: map 47% reduce 0%
11/09/23 20:19:22 INFO mapred. JobClient: map 48% reduce 0%
11/09/23 20:19:25 INFO mapred. JobClient: map 49% reduce 0%
11/09/23 20:19:34 INFO mapred. JobClient: map 50% reduce 0%
11/09/23 20:19:37 INFO mapred. JobClient: map 52% reduce 0%
11/09/23 20:19:40 INFO mapred. JobClient: map 53% reduce 0%
11/09/23 20:19:43 INFO mapred. JobClient: map 54% reduce 0%
11/09/23 20:19:46 INFO mapred. JobClient: map 55% reduce 0%
11/09/23 20:19:49 INFO mapred. JobClient: map 56% reduce 0%
11/09/23 20:19:52 INFO mapred. JobClient: map 57% reduce 0%
11/09/23 20:19:55 INFO mapred. JobClient: map 58% reduce 0%
11/09/23 20:19:58 INFO mapred. JobClient: map 59% reduce 0%
11/09/23 20:20:01 INFO mapred. JobClient: map 60% reduce 0%
11/09/23 20:20:04 INFO mapred. JobClient: map 62% reduce 0%
11/09/23 20:20:07 INFO mapred. JobClient: map 63% reduce 0%
11/09/23 20:20:10 INFO mapred. JobClient: map 64% reduce 0%
11/09/23 20:20:13 INFO mapred. JobClient: map 65% reduce 0%
11/09/23 20:20:16 INFO mapred. JobClient: map 66% reduce 0%
11/09/23 20:20:19 INFO mapred. JobClient: map 67% reduce 0%
11/09/23 20:20:22 INFO mapred. JobClient: map 68% reduce 0%
11/09/23 20:20:25 INFO mapred. JobClient: map 69% reduce 0%
11/09/23 20:20:28 INFO mapred. JobClient: map 70% reduce 0%
11/09/23 20:20:31 INFO mapred. JobClient: map 72% reduce 0%
11/09/23 20:20:34 INFO mapred. JobClient: map 73% reduce 0%
11/09/23 20:20:37 INFO mapred. JobClient: map 74% reduce 0%
11/09/23 20:20:40 INFO mapred. JobClient: map 75% reduce 0%
11/09/23 20:20:43 INFO mapred. JobClient: map 76% reduce 0%
11/09/23 20:20:46 INFO mapred. JobClient: map 77% reduce 0%
11/09/23 20:20:49 INFO mapred. JobClient: map 78% reduce 0%
11/09/23 20:20:52 INFO mapred. JobClient: map 80% reduce 0%
11/09/23 20:20:55 INFO mapred. JobClient: map 81% reduce 0%
11/09/23 20:20:58 INFO mapred. JobClient: map 82% reduce 0%
11/09/23 20:21:01 INFO mapred. JobClient: map 83% reduce 0%
11/09/23 20:21:04 INFO mapred. JobClient: map 84% reduce 0%
11/09/23 20:21:07 INFO mapred. JobClient: map 85% reduce 0%
11/09/23 20:21:10 INFO mapred. JobClient: map 86% reduce 0%
11/09/23 20:21:13 INFO mapred. JobClient: map 87% reduce 0%
11/09/23 20:21:22 INFO mapred. JobClient: map 88% reduce 0%
11/09/23 20:21:28 INFO mapred. JobClient: map 89% reduce 0%
11/09/23 20:21:37 INFO mapred. JobClient: map 90% reduce 0%
11/09/23 20:21:47 INFO mapred. JobClient: map 91% reduce 0%
11/09/23 20:21:53 INFO mapred. JobClient: map 92% reduce 0%
11/09/23 20:22:02 INFO mapred. JobClient: map 93% reduce 0%
11/09/23 20:22:11 INFO mapred. JobClient: map 94% reduce 0%
11/09/23 20:22:17 INFO mapred. JobClient: map 95% reduce 0%
11/09/23 20:22:26 INFO mapred. JobClient: map 96% reduce 0%
11/09/23 20:22:32 INFO mapred. JobClient: map 97% reduce 0%
11/09/23 20:22:41 INFO mapred. JobClient: map 98% reduce 0%
11/09/23 20:22:47 INFO mapred. JobClient: map 99% reduce 0%
11/09/23 20:22:53 INFO mapred. JobClient: map 100% reduce 0%
11/09/23 20:22:55 INFO mapred. JobClient: Job complete: job_201101_21_0012
11/09/23 20:22:55 INFO mapred. JobClient: Counters: 6
11/09/23 20:22:55 INFO mapred. JobClient: Job Counters
11/09/23 20:22:55 INFO mapred. JobClient: Launched map tasks = 4
11/09/23 20:22:55 INFO mapred. JobClient: Data-local map tasks = 4
11/09/23 20:22:55 INFO mapred. JobClient: FileSystemCounters
11/09/23 20:22:55 INFO mapred. JobClient: HDFS_BYTES_READ = 1392402240
11/09/23 20:22:55 INFO mapred. JobClient: Map-Reduce Framework
11/09/23 20:22:55 INFO mapred. JobClient: Map input records = 11628209
11/09/23 20:22:55 INFO mapred. JobClient: Spilled Records = 0
11/09/23 20:22:55 INFO mapred. JobClient: Map output records = 11628209
11/09/23 20:22:55 INFO mapreduce. ExportJobBase: Transferred 1.2968 GB in 425.642 seconds (3.1198 MB/sec)
11/09/23 20:22:55 INFO mapreduce. ExportJobBase: Exported 11628209 records.
Fri Sep 23 20:22:55 CST 2011
###############
[Wanghai01@tc-crm-rd01.tc.baidu.com bin] $ sh import. sh
Fri Sep 23 20:40:33 CST 2011
11/09/23 20:40:33 WARN tool. BaseSqoopTool: Setting your password on the command-line is insecure. Consider using-P instead.
11/09/23 20:40:33 INFO tool. CodeGenTool: Beginning code generation
11/09/23 20:40:33 INFO manager. MySQLManager: Executing SQL statement: SELECT t. * FROM 'tb _ keyword_data_201104 'AS t LIMIT 1
11/09/23 20:40:33 INFO manager. MySQLManager: Executing SQL statement: SELECT t. * FROM 'tb _ keyword_data_201104 'AS t LIMIT 1
11/09/23 20:40:33 INFO orm. CompilationManager: HADOOP_HOME is/home/wanghai01/hadoop/hadoop-0.20.2/bin /..
11/09/23 20:40:33 INFO orm. CompilationManager: Found hadoop core jar at:/home/wanghai01/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar
11/09/23 20:40:34 ERROR orm. CompilationManager: cocould not rename/tmp/sqoop-wanghai01/compile/Tables/tb_keyword_data_201104.java to/home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/bin/./tb_keyword_data_201104.java
11/09/23 20:40:34 INFO orm. CompilationManager: Writing jar file:/tmp/sqoop-wanghai01/compile/a913cede5621df95376a26c1af737ee2/tb_keyword_data_201104.jar
11/09/23 20:40:34 WARN manager. MySQLManager: It looks like you are importing from mysql.
11/09/23 20:40:34 WARN manager. MySQLManager: This transfer can be faster! Use the -- direct
11/09/23 20:40:34 WARN manager. MySQLManager: option to exercise a MySQL-specific fast path.
11/09/23 20:40:34 INFO manager. MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
11/09/23 20:40:34 INFO mapreduce. ImportJobBase: Beginning import of tb_keyword_data_201104
11/09/23 20:40:34 INFO manager. MySQLManager: Executing SQL statement: SELECT t. * FROM 'tb _ keyword_data_201104 'AS t LIMIT 1
11/09/23 20:40:40 INFO mapred. JobClient: Running job: job_201101_21_0014
11/09/23 20:40:41 INFO mapred. JobClient: map 0% reduce 0%
11/09/23 20:40:54 INFO mapred. JobClient: map 25% reduce 0%
11/09/23 20:40:57 INFO mapred. JobClient: map 50% reduce 0%
11/09/23 20:41:36 INFO mapred. JobClient: map 75% reduce 0%
11/09/23 20:42:00 INFO mapred. JobClient: map 100% reduce 0%
11/09/23 20:43:19 INFO mapred. JobClient: Job complete: job_201101_21_0014
11/09/23 20:43:19 INFO mapred. JobClient: Counters: 5
11/09/23 20:43:19 INFO mapred. JobClient: Job Counters
11/09/23 20:43:19 INFO mapred. JobClient: Launched map tasks = 4
11/09/23 20:43:19 INFO mapred. JobClient: FileSystemCounters
11/09/23 20:43:19 INFO mapred. JobClient: HDFS_BYTES_WRITTEN = 1601269219
11/09/23 20:43:19 INFO mapred. JobClient: Map-Reduce Framework
11/09/23 20:43:19 INFO mapred. JobClient: Map input records = 11628209
11/09/23 20:43:19 INFO mapred. JobClient: Spilled Records = 0
11/09/23 20:43:19 INFO mapred. JobClient: Map output records = 11628209
11/09/23 20:43:19 INFO mapreduce. ImportJobBase: Transferred 1.4913 GB in 165.0126 seconds (9.2544 MB/sec)
11/09/23 20:43:19 INFO mapreduce. ImportJobBase: Retrieved 11628209 records.
Fri Sep 23 20:43:19 CST 2011
The main Commands in import. sh and export. sh are as follows:
/Home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/bin/sqoop import -- connect jdbc: mysql: // XXXX/crm -- username XX -- password XX -- table tb_keyword_data_201104 -- split-by winfo_id -- target-dir/user/wanghai01/data/-- fields-terminated-by '\ t' -- lines-terminated-by '\ n' -- input-null-string ''-- input-null-non-string''
/Home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/bin/sqoop export -- connect jdbc: mysql: // XXXX/crm -- username XX -- password XX -- table tb_keyword_data_201104 -- export-dir/user/wanghai01/data/-- fields-terminated-by '\ t' -- lines-terminated- by '\ n' -- input-null-string ''-- input-null-non-string''