tutorial on configuring Sqoop for Mysql installation in a Hadoop cluster environment _mysql

Source: Internet
Author: User
Tags mysql in zookeeper hadoop mapreduce sqoop

Sqoop is a tool used to transfer data from Hadoop and relational databases to the HDFS of a relational database (such as MySQL, Oracle, Postgres, etc.). HDFs data can also be directed into a relational database.

One of the highlights of Sqoop is the fact that you can import data from a relational database to HDFs via Hadoop MapReduce.


I. Installation of Sqoop
1, download Sqoop compression package, and decompression

Compression packs are: sqoop-1.2.0-cdh3b4.tar.gz,hadoop-0.20.2-cdh3b4.tar.gz, Mysql JDBC driver package Mysql-connector-java-5.1.10-bin.jar

[Root@node1 ~]# LL
Drwxr-xr-x root root  4096 Feb hadoop-0.20.2-cdh3b4
-rw-r--r--1 root 724225 Sep 06:46 Mysql-con Nector-java-5.1.10-bin.jar
drwxr-xr-x root  4096 Feb sqoop-1.2.0-cdh3b4

2, will sqoop-1.2.0-cdh3b4 copy to the/home/hadoop directory, and will MySQL JDBC Driver package and HADOOP-0.20.2-CDH3B4 under the Hadoop-core-0.20.2-cdh3b4.jar to Sqoop-1.2.0-cdh3b4/lib, and finally modify the owner.

 [root@node1 ~]# CP Mysql-connector-java-5.1.10-bin.jar Sqoop-1.2.0-cdh3b4/lib [ ROOT@NODE1 ~]# cp Hadoop-0.20.2-cdh3b4/hadoop-core-0.20.2-cdh3b4.jar sqoop-1.2.0-cdh3b4/lib [root@node1 ~]# chown-r Hadoop:hadoop sqoop-1.2.0-cdh3b4 [root@node1 ~]# mv Sqoop-1.2.0-cdh3b4/home/hadoop root@node1 [~]# ll/home/hadoop 
Total 35748
-rw-rw-r--1 hadoop hadoop  343 Sep 05:13 derby.log drwxr-xr-x
hadoop hadoop  4096 Sep 14 16:16 hadoop-0.20.2
drwxr-xr-x 9 Hadoop hadoop  4096 Sep 20:21 hive-0.10.0-rw-r--r--
1 Hadoop hadoop 365240 20:20 hive-0.10.0.tar.gz
drwxr-xr-x 8 Hadoop hadoop  4096 Sep jdk1.7
drwxr-xr-x-Hadoop ha Doop  4096 Sep 00:25 mahout-distribution-0.7
drwxrwxr-x 5 hadoop hadoop 4096  Sep 05:13 metastore_db
   -rw-rw-r--1 Hadoop hadoop  406 Sep 16:02 scp.sh
drwxr-xr-x one Hadoop hadoop  4096 Feb .2.0-CDH3B4
drwxrwxr-x 3 hadoop hadoop  4096 Sep 16:17 temp
drwxrwxr-x 3 hadoop hadoop  4096 Sep 14 1 5:59 User

3, configure Configure-sqoop, comment out for hbase and zookeeper inspection

[Root@node1 bin]# pwd
/home/hadoop/sqoop-1.2.0-cdh3b4/bin
[Root@node1 bin]# VI Configure-sqoop 

#!/bin/bash
#
licensed to Cloudera, Inc. under one or more
# contributor license. The NOTICE file distributed with
# This work for additional information regarding copyright ownership.
.
.
# check:if We can ' t find my dependencies, give up here.
if [!-d ' ${hadoop_home} ']; Then
 echo "Error: $HADOOP _home does not exist!"
 Echo ' please set $HADOOP _home to the root of your HADOOP installation. '
 Exit 1
fi
#if [!-d "${hbase_home}"] then
# echo "Error: $HBASE _home does not exist!"
# echo ' Please set $HBASE _home to the root of your HBASE installation. '
# exit 1
#fi
#if [!-d ' ${zookeeper_home} ']; then
# echo "Error: $ZOOKEEPER _home does not exist!"
# echo ' Please set $ZOOKEEPER _home to the root of your zookeeper installation. '
# exit 1
#fi

4, modify/etc/profile and. bash_profile files, add hadoop_home, adjust path

[Hadoop@node1 ~]$ Vi. Bash_profile 
#. Bash_profile

# Get the aliases and functions
if [-f ~/.BASHRC]; then
  . ~/.bashrc
fi

# User Specif IC Environment and startup programs

hadoop_home=/home/hadoop/hadoop-0.20.2
path= $HADOOP _home/bin: $PATH: $ Home/bin
export hive_home=/home/hadoop/hive-0.10.0
export mahout_home=/home/hadoop/ mahout-distribution-0.7
Export PATH hadoop_home

Second, test Sqoop

1, view the database in MySQL:

[Hadoop@node1 bin]$./sqoop list-databases--connect jdbc:mysql://192.168.1.152:3306/--username sqoop--password sqoop
13/09/15 07:17:16 WARN tool. Basesqooptool:setting your password on the command-line is insecure. Consider using-p instead.
13/09/15 07:17:17 INFO Manager. Mysqlmanager:executing SQL statement:show DATABASES
information_schema
mysql
performance_schema
Sqoop
Test

2. Import MySQL table into hive:

[Hadoop@node1 bin]$./sqoop import--connect jdbc:mysql://192.168.1.152:3306/sqoop--username sqoop--password Sqoop-- Table Test--hive-import-m 1
13/09/15 08:15:01 WARN tool. Basesqooptool:setting your password on the command-line is insecure.
Consider Using-p instead. 13/09/15 08:15:01 INFO tool. Basesqooptool:using hive-specific delimiters for output. You can override 13/09/15 08:15:01 INFO tool. Basesqooptool:delimiters with--fields-terminated-by, etc. 13/09/15 08:15:01 INFO tool. Codegentool:beginning code Generation 13/09/15 08:15:01 INFO Manager. Mysqlmanager:executing SQL statement:select t.* from ' test ' as T LIMIT 1 13/09/15 08:15:02 INFO Manager. Mysqlmanager:executing SQL statement:select t.* from ' test ' as T LIMIT 1 13/09/15 INFO orm.
Compilationmanager:hadoop_home is/home/hadoop/hadoop-0.20.2/bin/. 13/09/15 08:15:02 INFO Orm. Compilationmanager:found Hadoop core jar at:/home/hadoop/hadoop-0.20.2/bin/. /hadoop-0.20.2-core.jar 13/09/15 08:15:03 INFO orm. compilationmanager:writing jar file:/tmp/sqoop-hadoop/compile/a71936fd2bb45ea6757df22751a320e3/test.jar 13/09/15 08:15:03 WARN Manager. Mysqlmanager:It looks like your are importing from MySQL. 13/09/15 08:15:03 WARN Manager. Mysqlmanager:this transfer can be faster! Use the--direct 13/09/15 08:15:03 WARN Manager.
Mysqlmanager:option to exercise a mysql-specific fast path. 13/09/15 08:15:03 INFO Manager. Mysqlmanager:setting Zero DATETIME behavior to Converttonull (MySQL) 13/09/15 08:15:03 INFO MapReduce. importjobbase:beginning Import of Test 13/09/15 08:15:04 INFO Manager. Mysqlmanager:executing SQL statement:select t.* from ' test ' as T LIMIT 1 13/09/15 08:15:05 INFO mapred. Jobclient:running job:job_201309150505_0009 13/09/15 08:15:06 INFO mapred. Jobclient:map 0% reduce 0% 13/09/15 08:15:34 INFO mapred. Jobclient:map 100% reduce 0% 13/09/15 08:15:36 INFO mapred. Jobclient:job complete:job_201309150505_0009 13/09/15 08:15:36 INFO mapred. Jobclient:counters:5 13/09/15 08:15:36 INFO mapred. Jobclient:job counters 13/09/15 08:15:36 INFO mapred. jobclient:launched map Tasks=1 13/09/15 08:15:36 INFO mapred. Jobclient:filesystemCounters 13/09/15 08:15:36 INFO mapred. jobclient:hdfs_bytes_written=583323 13/09/15 08:15:36 INFO mapred. Jobclient:map-reduce Framework 13/09/15 08:15:36 INFO mapred. Jobclient:map input records=65536 13/09/15 08:15:36 INFO mapred. jobclient:spilled records=0 13/09/15 08:15:36 INFO mapred. Jobclient:map output records=65536 13/09/15 08:15:36 INFO mapreduce. importjobbase:transferred 569.6514 KB in 32.0312 seconds (17.7842 kb/sec) 13/09/15 08:15:36 INFO MapReduce.
Importjobbase:retrieved 65536 Records. 13/09/15 08:15:36 INFO Hive. Hiveimport:removing temporary files from import process:test/_logs 13/09/15 08:15:36 INFO hive. hiveimport:loading uploaded data into Hive 13/09/15 08:15:36 INFO Manager. Mysqlmanager:executing SQL statement:select t.* from ' test ' as T LIMIT 1 13/09/15 08:15:36 INFO Manager. Mysqlmanager:executing SQL statement:select t.* from ' test ' as T LIMIT 1 13/09/15 08:15:41 INFO hive. Hiveimport:logging initialized using configuration in jar:file:/home/hadoop/hIve-0.10.0/lib/hive-common-0.10.0.jar!/hive-log4j.properties 13/09/15 08:15:41 INFO hive. Hiveimport:hive history file=/tmp/hadoop/hive_job_log_hadoop_201309150815_1877092059.txt 13/09/15 08:16:10 INFO Hive . Hiveimport:ok 13/09/15 08:16:10 INFO hive. Hiveimport:time taken:28.791 seconds 13/09/15 08:16:11 INFO hive. hiveimport:loading data to table Default.test 13/09/15 08:16:12 INFO hive. 
hiveimport:table default.test Stats: [num_partitions:0, Num_files:1, num_rows:0, total_size:583323, raw_data_size:0] 13/09/15 08:16:12 INFO Hive. Hiveimport:ok 13/09/15 08:16:12 INFO hive. Hiveimport:time taken:1.704 seconds 13/09/15 08:16:12 INFO hive.

 Hiveimport:hive import complete.

Iii. Sqoop Order

Sqoop has about 13 commands, and several common parameters (all of which support these 13 commands), and here are the 13 commands listed first.
It then lists the various common parameters for Sqoop, and then lists their own parameters for the above 13 commands. Sqoop General Parameters Common Arguments,incremental Import arguments,output line formatting arguments,input parsing-arguments,hive Arguments,hbase arguments,generic Hadoop Command-Line arguments, here are a few common commands:
1.Common arguments
Common parameters, primarily for links to relational databases
1 List all the databases in the MySQL database

Sqoop list-databases–connect jdbc:mysql://localhost:3306/–username Root–password 123456


2) connection MySQL and the table in the test database

Sqoop list-tables–connect jdbc:mysql://localhost:3306/test–username Root–password 123456

The test in the command is the name of the test database in the MySQL database username password user password for the MySQL database, respectively


3 Copy the table structure of the relational data into hive, only the structure of the copy table, the contents of the table did not copy the past.

Sqoop create-hive-table–connect jdbc:mysql://localhost:3306/test
–table sqoop_test–username Root–password 123456–hive-table
Test

Where –table sqoop_test is the table in the database test in MySQL –hive-table
Test is the new table name in hive


4 Import files from relational database into hive

Sqoop import–connect jdbc:mysql://localhost:3306/zxtest–username
root–password 123456–table sqoop_test– Hive-import–hive-table
s_test-m 1


5 The table data in the hive into MySQL, before the import, MySQL in the table
Hive_test must have been mentioned to create well.

Sqoop export–connect jdbc:mysql://localhost:3306/zxtest–username
root–password root–table hive_test–export-dir
/user/hive/warehouse/new_test_partition/dt=2012-03-05


6 from the database to export the table data to the HDFs file

./sqoop import–connect
jdbc:mysql://10.28.168.109:3306/compression–username=hadoop
–password=123456– Table Hadoop_user_info-m 1–target-dir
/user/test


7 Import table data into HDFs from database increment

./sqoop import–connect jdbc:mysql://10.28.168.109:3306/compression
–username=hadoop–password=123456–table Hadoop_user_info-m 1
–target-dir/user/test–check-column id–incremental append
–last-value 3

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.