Use sqoop to import data from a MySQL database to hbase
Prerequisites: Install sqoop and hbase.
Download jbdc DRIVER: mysql-connector-java-5.1.10.jar
Copy the mysql-connector-java-5.1.10.jar to/usr/lib/sqoop/lib/
Command for importing hbase from MYSQL:
Sqoop import -- connect JDBC: mysql: // 10.10.97.116: 3306/Rsearch -- table researchers -- hbase-Table A -- column-family person -- hbase-row-key id -- hbase-create-table -- username 'root'-P
Note:
-- Connect JDBC: mysql: // 10.10.97.116: 3306/Rsearch indicates the URI of the remote or local MySQL service. 3306 indicates the default listening port of MySQL, and Rsearch indicates the database. If it is another database, such as Oracle, you only need to modify the URI.
-- Table researchers indicates to export the researchers table of The Rsearch database.
-- Hbase-Table A indicates that table A is created in hbase.
-- Column-family person indicates that the column family person is created in table.
-- Hbase-row-key ID indicates that the row-Key of Table A is the ID field of the researchers table.
-- Hbase-create-table indicates creating a table in hbase.
-- Username 'root' indicates that the user root is used to connect to MySQL.
Note:
All hbase nodes must be able to access the MySQL database. Otherwise, the following error occurs:
Java. SQL. sqlexception: NULL, message from server: "host '10. 10.104.3 'is not allowed to connect to this MySQL Server"
Run the following command on the MySQL database server node to allow remote machines to access the local database server: [root @ gc01vm6 htdocs] #/opt/lampp/bin/mysqlmysql> use MySQL; database changedmysql> grant all privileges on Rsearch. * To 'root' @ '10. 10.104.3 'identified by ''with grant option; mysql> grant all privileges on Rsearch. * To 'root' @ '10. 10.104.5 'identified by ''with grant option; mysql> grant all privileges on Rsearch. * To 'root' @ '10. 10.104.2 'identified by'' with grant option;
Here, 10.10.104.2, 10.10.104.3, and 10.10.104.5 are hbase nodes.
Bytes -------------------------------------------------------------------------------------------------
MySQL import hbase logs:
[Root @ gd02 hadoop] #Sqoop import -- connect JDBC: mysql: // 10.10.97.116: 3306/Rsearch -- table researchers -- hbase-Table A -- column-family person -- hbase-row-key id -- hbase-create-table -- username 'root'-P
Enter password:
11/06/29 19:08:00 info tool. codegentool: Beginning code generation
11/06/29 19:08:00 info manager. mysqlmanager: executing SQL statement: Select T. * From 'researchers' as t limit 1
11/06/29 19:08:00 info manager. mysqlmanager: executing SQL statement: Select T. * From 'researchers' as t limit 1
11/06/29 19:08:00 info Orm. compilationmanager: hadoop_home is/usr/lib/hadoop
11/06/29 19:08:00 info Orm. compilationmanager: Found hadoop core jar at:/usr/lib/hadoop/hadoop-core.jar
Note:/tmp/sqoop-root/compile/d4dd4cb4e1e325fce31ca72c00a5589c/researchers. Java uses or overrides a deprecated API.
Note: recompile with-xlint: deprecation for details.
11/06/29 19:08:02 info Orm. compilationmanager: Writing JAR file:/tmp/sqoop-root/compile/d4dd4cb4e1e325fce31ca72c00a5589c/researchers. Jar
11/06/29 19:08:02 warn manager. mysqlmanager: It looks like you are importing from MySQL.
11/06/29 19:08:02 warn manager. mysqlmanager: this transfer can be faster! Use the -- direct
11/06/29 19:08:02 warn manager. mysqlmanager: Option to exercise a MySQL-specific fast path.
11/06/29 19:08:02 info manager. mysqlmanager: Setting zero datetime behavior to converttonull (MySQL)
11/06/29 19:08:02 info mapreduce. importjobbase: Beginning import of researchers
11/06/29 19:08:02 info manager. mysqlmanager: executing SQL statement: Select T. * From 'researchers' as t limit 1
11/06/29 19:08:02 info zookeeper. zookeeper: client environment: zookeeper. Version = 3.3.3-cdh3u0 -- 1, built on 03/26/2011 GMT
11/06/29 19:08:02 info zookeeper. zookeeper: client environment: host. Name = gd02
11/06/29 19:08:02 info zookeeper. zookeeper: client environment: Java. Version = 1.6.0 _ 13
11/06/29 19:08:02 info zookeeper. zookeeper: client environment: Java. Vendor = Sun Microsystems Inc.
11/06/29 19:08:02 info zookeeper. zookeeper: client environment: Java. Home =/usr/Java/jdk1.6.0 _ 13/JRE
11/06/29 19:08:02 info zookeeper. zookeeper: client environment: Java. class. path =/usr/lib/hadoop/conf:/usr/Java/jdk1.6.0 _ 13/lib/tools. jar:/usr/lib/hadoop/hadoop-core-0.20.2-cdh3u0.jar:/usr/lib/hadoop/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop/lib/aspectjrt-1.6.5.jar: /usr/lib/hadoop/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar: /usr/lib/hadoop/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.0.1.jar: /usr/lib/hadoop/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop/lib/commons-net-1.4.1.jar: /usr/lib/hadoop/lib/core-3.1.1.jar:/usr/lib/hadoop/lib/hadoop-fairscheduler-0.20.2-cdh3u0.jar:/usr/lib/hadoop/lib/hsqldb-1.8.0.10.jar: /usr/lib/hadoop/lib/jackson-core-asl-1.5.2.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.12.jar: /usr/lib/hadoop/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/jetty-6.1.26.jar: /usr/lib/hadoop/lib/jetty-servlet-tester-6.1.26.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar: /usr/lib/hadoop/lib/junit-4.5.jar:/usr/lib/hadoop/lib/kfs-0.2.2.jar:/usr/lib/hadoop/lib/log4j-1.2.15.jar: /usr/lib/hadoop/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop/lib/oro-2.0.8.jar:/usr/lib/hadoop/lib/servlet-api-2.5-20081211.jar: /usr/lib/hadoop/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.4.3.jar: /usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop/lib/jsp-2.1/jsp-api-2.1.jar: /usr/lib/sqoop/conf:/usr/lib/hbase/conf:/usr/lib/sqoop/lib/ant-contrib-1.0b3.jar: /usr/lib/sqoop/lib/ant-eclipse-1.0-jvm1.2.jar:/usr/lib/sqoop/lib/commons-io-1.4.jar:/usr/lib/sqoop/lib/hadoop-mrunit-0.20.2-CDH3b2-SNAPSHOT.jar: /usr/lib/sqoop/lib/ivy-2.0.0-rc2.jar:/usr/lib/sqoop/lib/mysql-connector-java-5.1.10.jar:/usr/lib/hbase/hbase-0.90.1-cdh3u0.jar:/usr/lib/hbase/hbase-0.90.1-cdh3u0-tests.jar: /usr/lib/hbase/lib/activation-1.1.jar:/usr/lib/hbase/lib/asm-3.1.jar:/usr/lib/hbase/lib/avro-1.3.3.jar: /usr/lib/hbase/lib/commons-cli-1.2.jar:/usr/lib/hbase/lib/commons-codec-1.4.jar:/usr/lib/hbase/lib/commons-el-1.0.jar: /usr/lib/hbase/lib/commons-httpclient-3.1.jar:/usr/lib/hbase/lib/commons-lang-2.5.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar: /usr/lib/hbase/lib/commons-net-1.4.1.jar:/usr/lib/hbase/lib/core-3.1.1.jar:/usr/lib/hbase/lib/guava-r06.jar: /usr/lib/hbase/lib/hadoop-core.jar:/usr/lib/hbase/lib/hbase-0.90.1-cdh3u0.jar:/usr/lib/hbase/lib/jackson-core-asl-1.5.2.jar: /usr/lib/hbase/lib/jackson-jaxrs-1.5.5.jar:/usr/lib/hbase/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hbase/lib/jackson-xc-1.5.5.jar: /usr/lib/hbase/lib/jasper-compiler-5.5.23.jar:/usr/lib/hbase/lib/jasper-runtime-5.5.23.jar:/usr/lib/hbase/lib/jaxb-api-2.1.jar: /usr/lib/hbase/lib/jaxb-impl-2.1.12.jar:/usr/lib/hbase/lib/jersey-core-1.4.jar:/usr/lib/hbase/lib/jersey-json-1.4.jar: /usr/lib/hbase/lib/jersey-server-1.4.jar:/usr/lib/hbase/lib/jettison-1.1.jar:/usr/lib/hbase/lib/jetty-6.1.26.jar: /usr/lib/hbase/lib/jetty-util-6.1.26.jar:/usr/lib/hbase/lib/jruby-complete-1.0.3.jar:/usr/lib/hbase/lib/jsp-2.1-6.1.14.jar: /usr/lib/hbase/lib/jsp-api-2.1-6.1.14.jar:/usr/lib/hbase/lib/jsp-api-2.1.jar:/usr/lib/hbase/lib/jsr311-api-1.1.1.jar: /usr/lib/hbase/lib/log4j-1.2.16.jar:/usr/lib/hbase/lib/protobuf-java-2.3.0.jar:/usr/lib/hbase/lib/servlet-api-2.5-6.1.14.jar: /usr/lib/hbase/lib/servlet-api-2.5.jar:/usr/lib/hbase/lib/slf4j-api-1.5.8.jar:/usr/lib/hbase/lib/slf4j-log4j12-1.5.8.jar: /usr/lib/hbase/lib/stax-api-1.0.1.jar:/usr/lib/hbase/lib/thrift-0.2.0.jar:/usr/lib/hbase/lib/xmlenc-0.52.jar: /usr/lib/hbase/lib/zookeeper. jar:/usr/lib/zookeeper/zookeeper-3.3.3-cdh3u0.jar:/usr/lib/zookeeper. jar:/usr/lib/zookeeper/lib/jline-0.9.94.jar:/usr/lib/zookeeper/lib/log4j-1.2.15.jar:/usr/lib/sqoop/sqoop-1.2.0-cdh3u0.jar: /usr/lib/sqoop/sqoop-test-1.2.0-cdh3u0.jar:
11/06/29 19:08:02 info zookeeper. zookeeper: client environment: Java. library. path =/usr/Java/jdk1.6.0 _ 13/JRE/lib/amd64/Server:/usr/Java/jdk1.6.0 _ 13/JRE/lib/amd64: /usr/Java/jdk1.6.0 _ 13/JRE /.. /lib/amd64:/usr/Java/packages/lib/amd64:/lib:/usr/lib
11/06/29 19:08:02 info zookeeper. zookeeper: client environment: Java. Io. tmpdir =/tmp
11/06/29 19:08:02 info zookeeper. zookeeper: client environment: Java. compiler = <Na>
11/06/29 19:08:02 info zookeeper. zookeeper: client environment: OS. Name = Linux
11/06/29 19:08:02 info zookeeper. zookeeper: client environment: OS. Arch = amd64
11/06/29 19:08:02 info zookeeper. zookeeper: client environment: OS. Version = 2.6.18-164. EL5
11/06/29 19:08:02 info zookeeper. zookeeper: client environment: User. Name = root
11/06/29 19:08:02 info zookeeper. zookeeper: client environment: User. Home =/root
11/06/29 19:08:02 info zookeeper. zookeeper: client environment: User. dir =/home/hadoop
11/06/29 19:08:02 info zookeeper. zookeeper: initiating client connection, connectstring = gd05: 2181, gd03: 2181, gd02: 2181 sessiontimeout = 180000 watcher = hconnection
11/06/29 19:08:02 info zookeeper. clientcnxn: Opening socket connection to server gd03/10.10.104.3: 2181
11/06/29 19:08:02 info zookeeper. clientcnxn: Socket Connection established to gd03/10.10.104.3: 2181, initiating session
11/06/29 19:08:02 info zookeeper. clientcnxn: session establishment complete on server gd03/10.10.104.3: 2181, sessionid = 0x130b2e901cd0012, negotiated timeout = 180000
11/06/29 19:08:02 info zookeeper. zookeeper: initiating client connection, connectstring = gd05: 2181, gd03: 2181, gd02: 2181 sessiontimeout = 180000 watcher = hconnection
11/06/29 19:08:02 info zookeeper. clientcnxn: Opening socket connection to server gd03/10.10.104.3: 2181
11/06/29 19:08:02 info zookeeper. clientcnxn: Socket Connection established to gd03/10.10.104.3: 2181, initiating session
11/06/29 19:08:02 info zookeeper. clientcnxn: session establishment complete on server gd03/10.10.104.3: 2181, sessionid = 0x130b2e901cd0013, negotiated timeout = 180000
11/06/29 19:08:02 info client. hconnectionmanager $ hconnectionimplementation: Closed zookeeper sessionid = 0x130b2e901cd0013
11/06/29 19:08:02 info zookeeper. zookeeper: Session: 0x130b2e901cd0013 closed
11/06/29 19:08:02 info zookeeper. clientcnxn: eventthread shut down
11/06/29 19:08:02 info mapreduce. hbaseimportjob: creating missing column family person
11/06/29 19:08:02 info client. hbaseadmin: started disable of
11/06/29 19:08:03 info client. hbaseadmin: disabled
11/06/29 19:08:03 info client. hbaseadmin: started enable of
11/06/29 19:08:06 info client. hbaseadmin: enabled Table
11/06/29 19:08:07 info mapred. jobclient: running job: job_201106212352_0010
11/06/29 19:08:08 info mapred. jobclient: Map 0% reduce 0%
11/06/29 19:08:19 info mapred. jobclient: Map 40% reduce 0%
11/06/29 19:08:20 info mapred. jobclient: Map 80% reduce 0%
11/06/29 19:08:34 info mapred. jobclient: Map 100% reduce 0%
11/06/29 19:08:34 info mapred. jobclient: job complete: job_201106212352_0010
11/06/29 19:08:34 info mapred. jobclient: counters: 11
11/06/29 19:08:34 info mapred. jobclient: Job counters
11/06/29 19:08:34 info mapred. jobclient: slots_millis_maps = 82848
11/06/29 19:08:34 info mapred. jobclient: total time spent by all CES waiting after reserving slots (MS) = 0
11/06/29 19:08:34 info mapred. jobclient: total time spent by all maps waiting after reserving slots (MS) = 0
11/06/29 19:08:34 info mapred. jobclient: Launched map tasks = 5
11/06/29 19:08:34 info mapred. jobclient: slots_millis_reduces = 0
11/06/29 19:08:34 info mapred. jobclient: filesystemcounters
11/06/29 19:08:34 info mapred. jobclient: hdfs_bytes_read = 527
11/06/29 19:08:34 info mapred. jobclient: file_bytes_written = 310685
11/06/29 19:08:34 info mapred. jobclient: Map-Reduce framework
11/06/29 19:08:34 info mapred. jobclient: Map input records = 81868
11/06/29 19:08:34 info mapred. jobclient: spilled records = 0
11/06/29 19:08:34 info mapred. jobclient: map output records = 81868
11/06/29 19:08:34 info mapred. jobclient: split_raw_bytes = 527
11/06/29 19:08:34 info mapreduce. importjobbase: Transferred 0 bytes in 28.108 seconds (0 bytes/sec)
11/06/29 19:08:34 info mapreduce. importjobbase: retrieved 81868 records.
References:
Synchronize Mysql Data to hive using sqoop
Http://www.54chen.com/java-ee/sqoop-mysql-to-hive.html
Use sqoop to import data from the database to HDFS
Http://www.cnblogs.com/gpcuster/archive/2011/03/01/1968027.html
Sqoop
Http://www.duyifan.com/
MySQL migration tool to hive/hbase
Http://www.javabloger.com/article/hadoop-hive-mysql-sqoop.html
Official Manual
Http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html