Alex's Hadoop rookie Tutorial: 8th Sqoop1 installation/import/export tutorial, hadoopsqoop1

Source: Internet
Author: User

Alex's Hadoop rookie Tutorial: 8th Sqoop1 installation/import/export tutorial, hadoopsqoop1

Depend! Sqoop2 has too few documents and does not support Hbase, which is very simple. So I gave up Sqoop2 and switched it to Sqoop1. I followed my tutorial and saw that my friends should not hit me with bricks, I am an uninformed citizen.

Uninstall sqoop2

This step is optional. If you have installed sqoop2 in my previous tutorial, you must uninstall it first. If you have not installed sqoop2, skip this step.

$ sudo su -$ service sqoop2-server stop$ yum -y remove sqoop2-server$ yum -y remove sqoop2-client

Install Sqoop1
 yum install -y sqoop


Use help to test whether the installation is successful
# sqoop helpWarning: /usr/lib/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.Please set $HCAT_HOME to the root of your HCatalog installation.Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.14/11/28 11:33:11 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.1usage: sqoop COMMAND [ARGS]Available commands:  codegen            Generate code to interact with database records  create-hive-table  Import a table definition into Hive  eval               Evaluate a SQL statement and display the results  export             Export an HDFS directory to a database table  help               List available commands  import             Import a table from a database to HDFS  import-all-tables  Import tables from a database to HDFS  job                Work with saved jobs  list-databases     List available databases on a server  list-tables        List available tables in a database  merge              Merge results of incremental imports  metastore          Run a standalone Sqoop metastore  version            Display version informationSee 'sqoop help COMMAND' for information on a specific command.

Copy the driver to/usr/lib/sqoop/lib

Mysql jdbc driver

After downloading the package, decompress it and find the driver jar package, upload it to the server, and move it

mv /home/alex/mysql-connector-java-5.1.34-bin.jar /usr/lib/sqoop/lib


Prepare to create a table in mysql
CREATE TABLE `employee` (      `id` int(11) NOT NULL,      `name` varchar(20) NOT NULL,      PRIMARY KEY (`id`)    ) ENGINE=MyISAM  DEFAULT CHARSET=utf8;  

Insert several data records
insert into employee (id,name) values (1,'michael');  insert into employee (id,name) values (2,'ted');insert into employee (id,name) values (3,'jack'); 
To import mysql to hdfs to list all tables, we will not rush to import them. We will first make several preparation steps to warm up and facilitate troubleshooting.
List all databases
# sqoop list-databases --connect jdbc:mysql://localhost:3306/sqoop_test --username root --password rootWarning: /usr/lib/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.Please set $HCAT_HOME to the root of your HCatalog installation.Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.14/12/01 09:20:28 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.114/12/01 09:20:28 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.14/12/01 09:20:28 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.information_schemacactimetastoremysqlsqoop_testwordpresszabbix



Connect to the database with sqoop and list all tables
# sqoop list-tables --connect jdbc:mysql://localhost/sqoop_test --username root --password rootWarning: /usr/lib/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.Please set $HCAT_HOME to the root of your HCatalog installation.Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.14/11/28 11:46:11 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.114/11/28 11:46:11 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.14/11/28 11:46:11 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.employeestudentworkers

This command does not need to be followed by the driver class name because sqoop supports mysql by default.
# sqoop list-tables --connect jdbc:mysql://localhost/sqoop_test --username root --password root --driver com.mysql.jdbc.Driver

Import data to hdfs
sqoop import --connect jdbc:mysql://localhost:3306/sqoop_test --username root --password root --table employee --m 1 --target-dir /user/test3

# sqoop import --connect jdbc:mysql://localhost:3306/sqoop_test --username root --password root --table employee --m 1 --target-dir /user/testWarning: /usr/lib/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.Please set $HCAT_HOME to the root of your HCatalog installation.Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.14/12/01 14:15:41 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.114/12/01 14:15:41 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.14/12/01 14:15:41 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.14/12/01 14:15:41 INFO tool.CodeGenTool: Beginning code generation14/12/01 14:15:42 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 114/12/01 14:15:42 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 114/12/01 14:15:42 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduceNote: /tmp/sqoop-root/compile/7b8091924ce8deb4f2ccae14c404a5bf/employee.java uses or overrides a deprecated API.Note: Recompile with -Xlint:deprecation for details.………………14/12/01 14:16:19 INFO mapreduce.Job:  map 100% reduce 0%14/12/01 14:16:20 INFO mapreduce.Job: Job job_1406097234796_0019 completed successfully14/12/01 14:16:21 INFO mapreduce.Job: Counters: 30File System CountersFILE: Number of bytes read=0FILE: Number of bytes written=99855FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=87HDFS: Number of bytes written=16HDFS: Number of read operations=4HDFS: Number of large read operations=0HDFS: Number of write operations=2Job Counters Launched map tasks=1Other local map tasks=1Total time spent by all maps in occupied slots (ms)=8714Total time spent by all reduces in occupied slots (ms)=0Total time spent by all map tasks (ms)=8714Total vcore-seconds taken by all map tasks=8714Total megabyte-seconds taken by all map tasks=8923136Map-Reduce FrameworkMap input records=2Map output records=2Input split bytes=87Spilled Records=0Failed Shuffles=0Merged Map outputs=0GC time elapsed (ms)=58CPU time spent (ms)=1560Physical memory (bytes) snapshot=183005184Virtual memory (bytes) snapshot=704577536Total committed heap usage (bytes)=148897792File Input Format Counters Bytes Read=0File Output Format Counters Bytes Written=1614/12/01 14:16:21 INFO mapreduce.ImportJobBase: Transferred 16 bytes in 33.6243 seconds (0.4758 bytes/sec)14/12/01 14:16:21 INFO mapreduce.ImportJobBase: Retrieved 2 records.

View the result
# hdfs dfs -ls /user/testFound 2 items-rw-r--r--   2 root supergroup          0 2014-12-01 14:16 /user/test/_SUCCESS-rw-r--r--   2 root supergroup         16 2014-12-01 14:16 /user/test/part-m-00000# hdfs dfs -cat /user/test/part-m-000001,michael2,ted

I don't know why mysql has three data records, but there are only two data records after the import. Which of the following statements can help me?

If you encounter the following problems
14/12/01 10:12:42 INFO mapreduce.Job: Task Id : attempt_1406097234796_0017_m_000000_0, Status : FAILEDError: employee : Unsupported major.minor version 51.0
Using ps aux | grep hadoop, we can see that hadoop uses jdk1.6. My cdh version 5.0.1 sqoop version is 1.4.4, And I encountered this problem.
Cause: sqoop is compiled using jdk1.7, So if you use ps aux | grep hadoop To See That hadoop uses 1.6 to run, sqoop cannot work normally. Note: CDH4.7 and later versions are compatible with jdk1.7. If you upgrade from 4.5, you will find that hadoop uses jdk1.6. You need to modify jdk 1.7 called by hadoop, and this is officially recommended combination.
About the jdk method to change the official provides two methods http://www.cloudera.com/content/cloudera/en/documentation/cdh4/latest/CDH4-Requirements-and-Supported-Versions/cdhrsv_topic_3.html
This is to let you set/usr/java/down to a soft chain named default pointing to the jdk you want, I did this, invalid http://www.cloudera.com/content/cloudera/en/documentation/archives/cloudera-manager-4/v4-5-3/Cloudera-Manager-Enterprise-Edition-Installation-Guide/cmeeig_topic_16_2.html
This is to ask you to add an environment variable. I did this, and it was invalid. Finally, I used a simple and crude method: Stop all the related services, delete the damn JDK 1.6, and restart it, now/usr/java/default is used.
Stop commands for all hadoop-related services
for x in `cd /etc/init.d ; ls hive-*` ; do sudo service $x stop ; donefor x in `cd /etc/init.d ; ls hbase-*` ; do sudo service $x stop ; done/etc/init.d/zookeeper-server stopfor x in `cd /etc/init.d ; ls hadoop-*` ; do sudo service $x stop ; done


Zookeeper, hbase, and hive are skipped if they are not installed. We recommend that you use ps aux | grep jre1.6 to find out what services are available, turn them off one by one, first turn off other services, and finally turn off hadoop.
Start all

for x in `cd /etc/init.d ; ls hadoop-*` ; do sudo service $x start ; done/etc/init.d/zookeeper-server startfor x in `cd /etc/init.d ; ls hbase-*` ; do sudo service $x start ; donefor x in `cd /etc/init.d ; ls hive-*` ; do sudo service $x start ; done

Export data from hdfs to mysql. In this example, we prepare to clear the data of the employee.
truncate employee

Export data to mysql
# sqoop export --connect jdbc:mysql://localhost:3306/sqoop_test --username root --password root --table employee --m 1 --export-dir /user/testWarning: /usr/lib/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.Please set $HCAT_HOME to the root of your HCatalog installation.Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.14/12/01 15:16:50 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.114/12/01 15:16:50 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.14/12/01 15:16:51 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.14/12/01 15:16:51 INFO tool.CodeGenTool: Beginning code generation14/12/01 15:16:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 114/12/01 15:16:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 114/12/01 15:16:52 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce……………………14/12/01 15:17:13 INFO mapreduce.Job:  map 0% reduce 0%14/12/01 15:17:21 INFO mapreduce.Job: Task Id : attempt_1406097234796_0021_m_000000_0, Status : FAILEDError: java.io.IOException: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown database 'sqoop_test'at org.apache.sqoop.mapreduce.ExportOutputFormat.getRecordWriter(ExportOutputFormat.java:79)at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:624)at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:744)at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:415)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown database 'sqoop_test'at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:526)at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)at com.mysql.jdbc.Util.getInstance(Util.java:360)at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:978)at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823)at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:870)at com.mysql.jdbc.MysqlIO.proceedHandshakeWithPluggableAuthentication(MysqlIO.java:1659)at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1206)at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2234)at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2265)at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2064)at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:790)at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:44)at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:526)at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:395)at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:325)at java.sql.DriverManager.getConnection(DriverManager.java:571)at java.sql.DriverManager.getConnection(DriverManager.java:215)at org.apache.sqoop.mapreduce.db.DBConfiguration.getConnection(DBConfiguration.java:302)at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.<init>(AsyncSqlRecordWriter.java:76)at org.apache.sqoop.mapreduce.ExportOutputFormat$ExportRecordWriter.<init>(ExportOutputFormat.java:95)at org.apache.sqoop.mapreduce.ExportOutputFormat.getRecordWriter(ExportOutputFormat.java:77)... 8 more14/12/01 15:17:29 INFO mapreduce.Job: Task Id : attempt_1406097234796_0021_m_000000_1, Status : FAILEDError: java.io.IOException: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown database 'sqoop_test'at org.apache.sqoop.mapreduce.ExportOutputFormat.getRecordWriter(ExportOutputFormat.java:79)at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:624)at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:744)at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:415)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown database 'sqoop_test'at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:526)at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)at com.mysql.jdbc.Util.getInstance(Util.java:360)at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:978)at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823)at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:870)at com.mysql.jdbc.MysqlIO.proceedHandshakeWithPluggableAuthentication(MysqlIO.java:1659)at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1206)at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2234)at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2265)at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2064)at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:790)at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:44)at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:526)at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:395)at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:325)at java.sql.DriverManager.getConnection(DriverManager.java:571)at java.sql.DriverManager.getConnection(DriverManager.java:215)at org.apache.sqoop.mapreduce.db.DBConfiguration.getConnection(DBConfiguration.java:302)at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.<init>(AsyncSqlRecordWriter.java:76)at org.apache.sqoop.mapreduce.ExportOutputFormat$ExportRecordWriter.<init>(ExportOutputFormat.java:95)at org.apache.sqoop.mapreduce.ExportOutputFormat.getRecordWriter(ExportOutputFormat.java:77)... 8 more14/12/01 15:17:40 INFO mapreduce.Job:  map 100% reduce 0%14/12/01 15:17:41 INFO mapreduce.Job: Job job_1406097234796_0021 completed successfully14/12/01 15:17:41 INFO mapreduce.Job: Counters: 32File System CountersFILE: Number of bytes read=0FILE: Number of bytes written=99542FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=139HDFS: Number of bytes written=0HDFS: Number of read operations=4HDFS: Number of large read operations=0HDFS: Number of write operations=0Job Counters Failed map tasks=2Launched map tasks=3Other local map tasks=2Rack-local map tasks=1Total time spent by all maps in occupied slots (ms)=21200Total time spent by all reduces in occupied slots (ms)=0Total time spent by all map tasks (ms)=21200Total vcore-seconds taken by all map tasks=21200Total megabyte-seconds taken by all map tasks=21708800Map-Reduce FrameworkMap input records=2Map output records=2Input split bytes=120Spilled Records=0Failed Shuffles=0Merged Map outputs=0GC time elapsed (ms)=86CPU time spent (ms)=1330Physical memory (bytes) snapshot=177094656Virtual memory (bytes) snapshot=686768128Total committed heap usage (bytes)=148897792File Input Format Counters Bytes Read=0File Output Format Counters Bytes Written=014/12/01 15:17:41 INFO mapreduce.ExportJobBase: Transferred 139 bytes in 43.6687 seconds (3.1831 bytes/sec)14/12/01 15:17:41 INFO mapreduce.ExportJobBase: Exported 2 records.


Finally, I went to mysql to check that two data records were successfully exported.
mysql> select * from employee;+----+---------+| id | name    |+----+---------+|  1 | michael ||  2 | ted     |+----+---------+2 rows in set (0.00 sec)

Have you noticed that an exception was reported during export? I would like to remind you that:
Hadoop is a distributed system. All map reduce tasks are distributed to each node for execution. Therefore, do not write localhost In the mysql jdbc url in the actual environment, otherwise, the mysql DATA on your machine will be read, and tasks on other machines will fail.
Good, class!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.