Install and configure Sqoop for MySQL in the Hadoop cluster environment,
Sqoop is a tool used to transfer data from Hadoop to relational databases. It can import data from a relational database (such as MySQL, Oracle, and S) into Hadoop HDFS, you can also import HDFS data to a relational database.
One of the highlights of Sqoop is that data can be imported from a relational database to HDFS through hadoop mapreduce.
1. Install sqoop
1. Download The sqoop compressed package and decompress it.
Compressed packages are: sqoop-1.2.0-cdh3b4.tar.gz, hadoop-0.20.2-cdh3b4.tar.gz, Mysql JDBC driver package mysql-connector-java-5.1.10-bin.jar
[root@node1 ~]# ll
drwxr-xr-x 15 root root 4096 Feb 22 2011 hadoop-0.20.2-CDH3B4-rw-r--r-- 1 root root 724225 Sep 15 06:46 mysql-connector-java-5.1.10-bin.jardrwxr-xr-x 11 root root 4096 Feb 22 2011 sqoop-1.2.0-CDH3B4
2. Copy the sqoop-1.2.0-CDH3B4 to the/home/hadoop directory, and the Mysql JDBC driver package and the hadoop-0.20.2-CDH3B4 under the hadoop-core-0.20.2-CDH3B4.jar to the sqoop-1.2.0-CDH3B4/lib, and finally modify a subordinate master.
[root@node1 ~]# cp mysql-connector-java-5.1.10-bin.jar sqoop-1.2.0-CDH3B4/lib[root@node1 ~]# cp hadoop-0.20.2-CDH3B4/hadoop-core-0.20.2-CDH3B4.jar sqoop-1.2.0-CDH3B4/lib[root@node1 ~]# chown -R hadoop:hadoop sqoop-1.2.0-CDH3B4[root@node1 ~]# mv sqoop-1.2.0-CDH3B4 /home/hadoop[root@node1 ~]# ll /home/hadoop
total 35748-rw-rw-r-- 1 hadoop hadoop 343 Sep 15 05:13 derby.logdrwxr-xr-x 13 hadoop hadoop 4096 Sep 14 16:16 hadoop-0.20.2drwxr-xr-x 9 hadoop hadoop 4096 Sep 14 20:21 hive-0.10.0-rw-r--r-- 1 hadoop hadoop 36524032 Sep 14 20:20 hive-0.10.0.tar.gzdrwxr-xr-x 8 hadoop hadoop 4096 Sep 25 2012 jdk1.7drwxr-xr-x 12 hadoop hadoop 4096 Sep 15 00:25 mahout-distribution-0.7drwxrwxr-x 5 hadoop hadoop 4096 Sep 15 05:13 metastore_db-rw-rw-r-- 1 hadoop hadoop 406 Sep 14 16:02 scp.shdrwxr-xr-x 11 hadoop hadoop 4096 Feb 22 2011 sqoop-1.2.0-CDH3B4drwxrwxr-x 3 hadoop hadoop 4096 Sep 14 16:17 tempdrwxrwxr-x 3 hadoop hadoop 4096 Sep 14 15:59 user
3. configure-sqoop and comment out the check for HBase and ZooKeeper.
[root@node1 bin]# pwd
/home/hadoop/sqoop-1.2.0-CDH3B4/bin
[root@node1 bin]# vi configure-sqoop
#!/bin/bash## Licensed to Cloudera, Inc. under one or more# contributor license agreements. See the NOTICE file distributed with# this work for additional information regarding copyright ownership....# Check: If we can't find our dependencies, give up here.if [ ! -d "${HADOOP_HOME}" ]; then echo "Error: $HADOOP_HOME does not exist!" echo 'Please set $HADOOP_HOME to the root of your Hadoop installation.' exit 1fi#if [ ! -d "${HBASE_HOME}" ]; then# echo "Error: $HBASE_HOME does not exist!"# echo 'Please set $HBASE_HOME to the root of your HBase installation.'# exit 1#fi#if [ ! -d "${ZOOKEEPER_HOME}" ]; then# echo "Error: $ZOOKEEPER_HOME does not exist!"# echo 'Please set $ZOOKEEPER_HOME to the root of your ZooKeeper installation.'# exit 1#fi
4. Modify the/etc/profile and. bash_profile files, add Hadoop_Home, and adjust PATH
[hadoop@node1 ~]$ vi .bash_profile
# .bash_profile# Get the aliases and functionsif [ -f ~/.bashrc ]; then . ~/.bashrcfi# User specific environment and startup programsHADOOP_HOME=/home/hadoop/hadoop-0.20.2PATH=$HADOOP_HOME/bin:$PATH:$HOME/binexport HIVE_HOME=/home/hadoop/hive-0.10.0export MAHOUT_HOME=/home/hadoop/mahout-distribution-0.7export PATH HADOOP_HOME
Ii. Test Sqoop
1. view the mysql database:
[hadoop@node1 bin]$ ./sqoop list-databases --connect jdbc:mysql://192.168.1.152:3306/ --username sqoop --password sqoop
13/09/15 07:17:16 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.13/09/15 07:17:17 INFO manager.MySQLManager: Executing SQL statement: SHOW DATABASESinformation_schemamysqlperformance_schemasqooptest
2. Import mysql tables to hive:
[hadoop@node1 bin]$ ./sqoop import --connect jdbc:mysql://192.168.1.152:3306/sqoop --username sqoop --password sqoop --table test --hive-import -m 1
13/09/15 08:15:01 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.13/09/15 08:15:01 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override13/09/15 08:15:01 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.13/09/15 08:15:01 INFO tool.CodeGenTool: Beginning code generation13/09/15 08:15:01 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 113/09/15 08:15:02 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 113/09/15 08:15:02 INFO orm.CompilationManager: HADOOP_HOME is /home/hadoop/hadoop-0.20.2/bin/..13/09/15 08:15:02 INFO orm.CompilationManager: Found hadoop core jar at: /home/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar13/09/15 08:15:03 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/a71936fd2bb45ea6757df22751a320e3/test.jar13/09/15 08:15:03 WARN manager.MySQLManager: It looks like you are importing from mysql.13/09/15 08:15:03 WARN manager.MySQLManager: This transfer can be faster! Use the --direct13/09/15 08:15:03 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.13/09/15 08:15:03 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)13/09/15 08:15:03 INFO mapreduce.ImportJobBase: Beginning import of test13/09/15 08:15:04 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 113/09/15 08:15:05 INFO mapred.JobClient: Running job: job_201309150505_000913/09/15 08:15:06 INFO mapred.JobClient: map 0% reduce 0%13/09/15 08:15:34 INFO mapred.JobClient: map 100% reduce 0%13/09/15 08:15:36 INFO mapred.JobClient: Job complete: job_201309150505_000913/09/15 08:15:36 INFO mapred.JobClient: Counters: 513/09/15 08:15:36 INFO mapred.JobClient: Job Counters 13/09/15 08:15:36 INFO mapred.JobClient: Launched map tasks=113/09/15 08:15:36 INFO mapred.JobClient: FileSystemCounters13/09/15 08:15:36 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=58332313/09/15 08:15:36 INFO mapred.JobClient: Map-Reduce Framework13/09/15 08:15:36 INFO mapred.JobClient: Map input records=6553613/09/15 08:15:36 INFO mapred.JobClient: Spilled Records=013/09/15 08:15:36 INFO mapred.JobClient: Map output records=6553613/09/15 08:15:36 INFO mapreduce.ImportJobBase: Transferred 569.6514 KB in 32.0312 seconds (17.7842 KB/sec)13/09/15 08:15:36 INFO mapreduce.ImportJobBase: Retrieved 65536 records.13/09/15 08:15:36 INFO hive.HiveImport: Removing temporary files from import process: test/_logs13/09/15 08:15:36 INFO hive.HiveImport: Loading uploaded data into Hive13/09/15 08:15:36 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 113/09/15 08:15:36 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 113/09/15 08:15:41 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/home/hadoop/hive-0.10.0/lib/hive-common-0.10.0.jar!/hive-log4j.properties13/09/15 08:15:41 INFO hive.HiveImport: Hive history file=/tmp/hadoop/hive_job_log_hadoop_201309150815_1877092059.txt13/09/15 08:16:10 INFO hive.HiveImport: OK13/09/15 08:16:10 INFO hive.HiveImport: Time taken: 28.791 seconds13/09/15 08:16:11 INFO hive.HiveImport: Loading data to table default.test13/09/15 08:16:12 INFO hive.HiveImport: Table default.test stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 583323, raw_data_size: 0]13/09/15 08:16:12 INFO hive.HiveImport: OK13/09/15 08:16:12 INFO hive.HiveImport: Time taken: 1.704 seconds13/09/15 08:16:12 INFO hive.HiveImport: Hive import complete.
Iii. Sqoop commands
Sqoop has 13 types of commands and several common parameters (all of which support these 13 commands). Here we will list these 13 types of commands first.
Then list various common Sqoop parameters, and list their own parameters for the above 13 commands. Sqoop Common parameters include Common arguments, Incremental import arguments, Output line formatting arguments, Input parsing arguments, Hive arguments, HBase arguments, Generic Hadoop command-line arguments, the following describes several common commands:
1. Common arguments
Common parameters, mainly for relational database connection Parameters
1) list all databases in the mysql database
sqoop list-databases –connect jdbc:mysql://localhost:3306/ –username root –password 123456
2) connect to mysql and list tables in the test Database
sqoop list-tables –connect jdbc:mysql://localhost:3306/test –username root –password 123456
In the command, test is the name of the test Database in mysql database, username password is the user password of mysql database respectively.
3) copy the table structure of relational data to hive, but copy the table structure. The table content is not copied.
sqoop create-hive-table –connect jdbc:mysql://localhost:3306/test–table sqoop_test –username root –password 123456 –hive-tabletest
-Table sqoop_test is the table-hive-table in the mysql database test.
Test is the name of the table created in hive.
4) import files from a relational database to hive
sqoop import –connect jdbc:mysql://localhost:3306/zxtest –usernameroot –password 123456 –table sqoop_test –hive-import –hive-tables_test -m 1
5) import the table data in hive to mysql. Before importing
Hive_test must have been created.
sqoop export –connect jdbc:mysql://localhost:3306/zxtest –usernameroot –password root –table hive_test –export-dir/user/hive/warehouse/new_test_partition/dt=2012-03-05
6) export table data from the database to HDFS files
./sqoop import –connectjdbc:mysql://10.28.168.109:3306/compression –username=hadoop–password=123456 –table HADOOP_USER_INFO -m 1 –target-dir/user/test
7) Incrementally import table data from the database to hdfs
./sqoop import –connect jdbc:mysql://10.28.168.109:3306/compression–username=hadoop –password=123456 –table HADOOP_USER_INFO -m 1–target-dir /user/test –check-column id –incremental append–last-value 3
Articles you may be interested in:
- Precautions for migrating sqlserver data to a mysql database
- One command to complete MySQL Data Migration (lightweight data)
- Considerations for MySQL data migration-direct replacement of data Directories
- Share data migration stored procedures in mysql