Alex's Hadoop rookie Tutorial: 8th Sqoop1 installation/import/export tutorial, hadoopsqoop1
Depend! Sqoop2 has too few documents and does not support Hbase, which is very simple. So I gave up Sqoop2 and switched it to Sqoop1. I followed my tutorial and saw that my friends should not hit me with bricks, I am an uninformed citizen.
Uninstall sqoop2
This step is optional. If you have installed sqoop2 in my previous tutorial, you must uninstall it first. If you have not installed sqoop2, skip this step.
$ sudo su -$ service sqoop2-server stop$ yum -y remove sqoop2-server$ yum -y remove sqoop2-client
Install Sqoop1
yum install -y sqoop
Use help to test whether the installation is successful
# sqoop helpWarning: /usr/lib/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.Please set $HCAT_HOME to the root of your HCatalog installation.Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.14/11/28 11:33:11 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.1usage: sqoop COMMAND [ARGS]Available commands: codegen Generate code to interact with database records create-hive-table Import a table definition into Hive eval Evaluate a SQL statement and display the results export Export an HDFS directory to a database table help List available commands import Import a table from a database to HDFS import-all-tables Import tables from a database to HDFS job Work with saved jobs list-databases List available databases on a server list-tables List available tables in a database merge Merge results of incremental imports metastore Run a standalone Sqoop metastore version Display version informationSee 'sqoop help COMMAND' for information on a specific command.
Copy the driver to/usr/lib/sqoop/lib
Mysql jdbc driver
After downloading the package, decompress it and find the driver jar package, upload it to the server, and move it
mv /home/alex/mysql-connector-java-5.1.34-bin.jar /usr/lib/sqoop/lib
Prepare to create a table in mysql
CREATE TABLE `employee` ( `id` int(11) NOT NULL, `name` varchar(20) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Insert several data records
insert into employee (id,name) values (1,'michael'); insert into employee (id,name) values (2,'ted');insert into employee (id,name) values (3,'jack');
To import mysql to hdfs to list all tables, we will not rush to import them. We will first make several preparation steps to warm up and facilitate troubleshooting.
List all databases
# sqoop list-databases --connect jdbc:mysql://localhost:3306/sqoop_test --username root --password rootWarning: /usr/lib/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.Please set $HCAT_HOME to the root of your HCatalog installation.Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.14/12/01 09:20:28 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.114/12/01 09:20:28 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.14/12/01 09:20:28 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.information_schemacactimetastoremysqlsqoop_testwordpresszabbix
Connect to the database with sqoop and list all tables
# sqoop list-tables --connect jdbc:mysql://localhost/sqoop_test --username root --password rootWarning: /usr/lib/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.Please set $HCAT_HOME to the root of your HCatalog installation.Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.14/11/28 11:46:11 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.114/11/28 11:46:11 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.14/11/28 11:46:11 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.employeestudentworkers
This command does not need to be followed by the driver class name because sqoop supports mysql by default.
# sqoop list-tables --connect jdbc:mysql://localhost/sqoop_test --username root --password root --driver com.mysql.jdbc.Driver
Import data to hdfs
sqoop import --connect jdbc:mysql://localhost:3306/sqoop_test --username root --password root --table employee --m 1 --target-dir /user/test3
# sqoop import --connect jdbc:mysql://localhost:3306/sqoop_test --username root --password root --table employee --m 1 --target-dir /user/testWarning: /usr/lib/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.Please set $HCAT_HOME to the root of your HCatalog installation.Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.14/12/01 14:15:41 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.114/12/01 14:15:41 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.14/12/01 14:15:41 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.14/12/01 14:15:41 INFO tool.CodeGenTool: Beginning code generation14/12/01 14:15:42 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 114/12/01 14:15:42 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 114/12/01 14:15:42 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduceNote: /tmp/sqoop-root/compile/7b8091924ce8deb4f2ccae14c404a5bf/employee.java uses or overrides a deprecated API.Note: Recompile with -Xlint:deprecation for details.………………14/12/01 14:16:19 INFO mapreduce.Job: map 100% reduce 0%14/12/01 14:16:20 INFO mapreduce.Job: Job job_1406097234796_0019 completed successfully14/12/01 14:16:21 INFO mapreduce.Job: Counters: 30File System CountersFILE: Number of bytes read=0FILE: Number of bytes written=99855FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=87HDFS: Number of bytes written=16HDFS: Number of read operations=4HDFS: Number of large read operations=0HDFS: Number of write operations=2Job Counters Launched map tasks=1Other local map tasks=1Total time spent by all maps in occupied slots (ms)=8714Total time spent by all reduces in occupied slots (ms)=0Total time spent by all map tasks (ms)=8714Total vcore-seconds taken by all map tasks=8714Total megabyte-seconds taken by all map tasks=8923136Map-Reduce FrameworkMap input records=2Map output records=2Input split bytes=87Spilled Records=0Failed Shuffles=0Merged Map outputs=0GC time elapsed (ms)=58CPU time spent (ms)=1560Physical memory (bytes) snapshot=183005184Virtual memory (bytes) snapshot=704577536Total committed heap usage (bytes)=148897792File Input Format Counters Bytes Read=0File Output Format Counters Bytes Written=1614/12/01 14:16:21 INFO mapreduce.ImportJobBase: Transferred 16 bytes in 33.6243 seconds (0.4758 bytes/sec)14/12/01 14:16:21 INFO mapreduce.ImportJobBase: Retrieved 2 records.
View the result
# hdfs dfs -ls /user/testFound 2 items-rw-r--r-- 2 root supergroup 0 2014-12-01 14:16 /user/test/_SUCCESS-rw-r--r-- 2 root supergroup 16 2014-12-01 14:16 /user/test/part-m-00000# hdfs dfs -cat /user/test/part-m-000001,michael2,ted
I don't know why mysql has three data records, but there are only two data records after the import. Which of the following statements can help me?
If you encounter the following problems
14/12/01 10:12:42 INFO mapreduce.Job: Task Id : attempt_1406097234796_0017_m_000000_0, Status : FAILEDError: employee : Unsupported major.minor version 51.0
Using ps aux | grep hadoop, we can see that hadoop uses jdk1.6. My cdh version 5.0.1 sqoop version is 1.4.4, And I encountered this problem.
Cause: sqoop is compiled using jdk1.7, So if you use ps aux | grep hadoop To See That hadoop uses 1.6 to run, sqoop cannot work normally.
Note: CDH4.7 and later versions are compatible with jdk1.7. If you upgrade from 4.5, you will find that hadoop uses jdk1.6. You need to modify jdk 1.7 called by hadoop, and this is officially recommended combination.
About the jdk method to change the official provides two methods http://www.cloudera.com/content/cloudera/en/documentation/cdh4/latest/CDH4-Requirements-and-Supported-Versions/cdhrsv_topic_3.html
This is to let you set/usr/java/down to a soft chain named default pointing to the jdk you want, I did this, invalid http://www.cloudera.com/content/cloudera/en/documentation/archives/cloudera-manager-4/v4-5-3/Cloudera-Manager-Enterprise-Edition-Installation-Guide/cmeeig_topic_16_2.html
This is to ask you to add an environment variable. I did this, and it was invalid. Finally, I used a simple and crude method: Stop all the related services, delete the damn JDK 1.6, and restart it, now/usr/java/default is used.
Stop commands for all hadoop-related services
for x in `cd /etc/init.d ; ls hive-*` ; do sudo service $x stop ; donefor x in `cd /etc/init.d ; ls hbase-*` ; do sudo service $x stop ; done/etc/init.d/zookeeper-server stopfor x in `cd /etc/init.d ; ls hadoop-*` ; do sudo service $x stop ; done
Zookeeper, hbase, and hive are skipped if they are not installed. We recommend that you use ps aux | grep jre1.6 to find out what services are available, turn them off one by one, first turn off other services, and finally turn off hadoop.
Start all
for x in `cd /etc/init.d ; ls hadoop-*` ; do sudo service $x start ; done/etc/init.d/zookeeper-server startfor x in `cd /etc/init.d ; ls hbase-*` ; do sudo service $x start ; donefor x in `cd /etc/init.d ; ls hive-*` ; do sudo service $x start ; done
Export data from hdfs to mysql. In this example, we prepare to clear the data of the employee.
truncate employee
Export data to mysql
# sqoop export --connect jdbc:mysql://localhost:3306/sqoop_test --username root --password root --table employee --m 1 --export-dir /user/testWarning: /usr/lib/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.Please set $HCAT_HOME to the root of your HCatalog installation.Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.14/12/01 15:16:50 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.114/12/01 15:16:50 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.14/12/01 15:16:51 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.14/12/01 15:16:51 INFO tool.CodeGenTool: Beginning code generation14/12/01 15:16:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 114/12/01 15:16:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 114/12/01 15:16:52 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce……………………14/12/01 15:17:13 INFO mapreduce.Job: map 0% reduce 0%14/12/01 15:17:21 INFO mapreduce.Job: Task Id : attempt_1406097234796_0021_m_000000_0, Status : FAILEDError: java.io.IOException: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown database 'sqoop_test'at org.apache.sqoop.mapreduce.ExportOutputFormat.getRecordWriter(ExportOutputFormat.java:79)at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:624)at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:744)at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:415)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown database 'sqoop_test'at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:526)at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)at com.mysql.jdbc.Util.getInstance(Util.java:360)at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:978)at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823)at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:870)at com.mysql.jdbc.MysqlIO.proceedHandshakeWithPluggableAuthentication(MysqlIO.java:1659)at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1206)at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2234)at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2265)at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2064)at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:790)at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:44)at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:526)at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:395)at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:325)at java.sql.DriverManager.getConnection(DriverManager.java:571)at java.sql.DriverManager.getConnection(DriverManager.java:215)at org.apache.sqoop.mapreduce.db.DBConfiguration.getConnection(DBConfiguration.java:302)at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.<init>(AsyncSqlRecordWriter.java:76)at org.apache.sqoop.mapreduce.ExportOutputFormat$ExportRecordWriter.<init>(ExportOutputFormat.java:95)at org.apache.sqoop.mapreduce.ExportOutputFormat.getRecordWriter(ExportOutputFormat.java:77)... 8 more14/12/01 15:17:29 INFO mapreduce.Job: Task Id : attempt_1406097234796_0021_m_000000_1, Status : FAILEDError: java.io.IOException: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown database 'sqoop_test'at org.apache.sqoop.mapreduce.ExportOutputFormat.getRecordWriter(ExportOutputFormat.java:79)at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:624)at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:744)at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:415)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown database 'sqoop_test'at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:526)at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)at com.mysql.jdbc.Util.getInstance(Util.java:360)at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:978)at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823)at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:870)at com.mysql.jdbc.MysqlIO.proceedHandshakeWithPluggableAuthentication(MysqlIO.java:1659)at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1206)at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2234)at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2265)at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2064)at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:790)at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:44)at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:526)at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:395)at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:325)at java.sql.DriverManager.getConnection(DriverManager.java:571)at java.sql.DriverManager.getConnection(DriverManager.java:215)at org.apache.sqoop.mapreduce.db.DBConfiguration.getConnection(DBConfiguration.java:302)at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.<init>(AsyncSqlRecordWriter.java:76)at org.apache.sqoop.mapreduce.ExportOutputFormat$ExportRecordWriter.<init>(ExportOutputFormat.java:95)at org.apache.sqoop.mapreduce.ExportOutputFormat.getRecordWriter(ExportOutputFormat.java:77)... 8 more14/12/01 15:17:40 INFO mapreduce.Job: map 100% reduce 0%14/12/01 15:17:41 INFO mapreduce.Job: Job job_1406097234796_0021 completed successfully14/12/01 15:17:41 INFO mapreduce.Job: Counters: 32File System CountersFILE: Number of bytes read=0FILE: Number of bytes written=99542FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=139HDFS: Number of bytes written=0HDFS: Number of read operations=4HDFS: Number of large read operations=0HDFS: Number of write operations=0Job Counters Failed map tasks=2Launched map tasks=3Other local map tasks=2Rack-local map tasks=1Total time spent by all maps in occupied slots (ms)=21200Total time spent by all reduces in occupied slots (ms)=0Total time spent by all map tasks (ms)=21200Total vcore-seconds taken by all map tasks=21200Total megabyte-seconds taken by all map tasks=21708800Map-Reduce FrameworkMap input records=2Map output records=2Input split bytes=120Spilled Records=0Failed Shuffles=0Merged Map outputs=0GC time elapsed (ms)=86CPU time spent (ms)=1330Physical memory (bytes) snapshot=177094656Virtual memory (bytes) snapshot=686768128Total committed heap usage (bytes)=148897792File Input Format Counters Bytes Read=0File Output Format Counters Bytes Written=014/12/01 15:17:41 INFO mapreduce.ExportJobBase: Transferred 139 bytes in 43.6687 seconds (3.1831 bytes/sec)14/12/01 15:17:41 INFO mapreduce.ExportJobBase: Exported 2 records.
Finally, I went to mysql to check that two data records were successfully exported.
mysql> select * from employee;+----+---------+| id | name |+----+---------+| 1 | michael || 2 | ted |+----+---------+2 rows in set (0.00 sec)
Have you noticed that an exception was reported during export? I would like to remind you that:
Hadoop is a distributed system. All map reduce tasks are distributed to each node for execution. Therefore, do not write localhost In the mysql jdbc url in the actual environment, otherwise, the mysql DATA on your machine will be read, and tasks on other machines will fail.
Good, class!