Alex's Hadoop cainiao Tutorial: 7th Sqoop2 import tutorial, hadoopsqoop2
For details about the installation and jdbc driver preparation, refer to section 6th. Now I will use an example to explain how to use sqoop2.
Data Preparation
There is a mysql table named worker, which contains three pieces of data. We want to import it to hadoop.
This is a table creation statement.
CREATE TABLE `workers` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(20) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Insert three data items
insert into workers (name) values ('jack');insert into workers (name) values ('vicky');insert into workers (name) values ('martin');
Next, we use the sqoop client to import data.
Import data to establish database connection
$ sqoop2Sqoop home directory: /usr/lib/sqoop2Sqoop Shell: Type 'help' or '\h' for help.sqoop:000> create connection --cid 1
This sentence means to establish a connection with id 1, and then sqoop will let you enter some necessary parameters
Creating connection for connector with id 1Please fill following values to create new connection objectName: First connectionConfiguration configurationJDBC Driver Class: com.mysql.jdbc.DriverJDBC Connection String: jdbc:mysql://mysql.server/databaseUsername: sqoopPassword: *****JDBC Connection Properties:There are currently 0 values in the map:entry#Security related configuration optionsMax connections: 0New connection was successfully created with validation status FINE and persistent id 1
Remember to replace jdbc: mysql: // mysql. server/database with your actual database connection.
Create a job and create a job with id 1. The type is import.
sqoop:000> create job --xid 1 --type import
Next, sqoop will let you enter the required parameters, just enter the job name and Table name, and select 0 for several storage options. Press ENTER
Creating job for connection with id 1Please fill following values to create new job objectName: First jobDatabase configurationTable name: workersTable SQL statement:Table column names:Partition column name:Boundary query:Output configurationStorage type: 0 : HDFSChoose: 0Output format: 0 : TEXT_FILE 1 : SEQUENCE_FILEChoose: 0Compression format: 0 : NONE 1 : DEFAULT 2 : DEFLATE 3 : GZIP 4 : BZIP2 5 : LZO 6 : LZ4 7 : SNAPPYChoose: 0Output directory: /user/jarcec/usersNew job was successfully created with validation status FINE and persistent id 1
Run the start job command to execute the task. Use -- jid to input the task id.
sqoop:000> start job --jid 1Submission detailsJob ID: 1Server URL: http://localhost:12000/sqoop/Created by: rootCreation date: 2014-11-26 16:41:30 CSTLastly updated by: rootExternal ID: job_1406097234796_0006N/A2014-11-26 16:41:30 CST: BOOTING - Progress is not available
Open an ssh terminal and run the hdfs command to view the result.
$ hdfs dfs -ls /user/jarcec/workers/Found 3 items-rw-r--r-- 2 sqoop2 supergroup 0 2014-11-26 16:42 /user/jarcec/workers/_SUCCESS-rw-r--r-- 2 sqoop2 supergroup 9 2014-11-26 16:41 /user/jarcec/workers/part-m-00000-rw-r--r-- 2 sqoop2 supergroup 21 2014-11-26 16:42 /user/jarcec/workers/part-m-00001
We can see that three results files are generated, and then we cat to view the file content.
$ hdfs dfs -cat /user/jarcec/workers/part-m-000001,'jack'$ hdfs dfs -cat /user/jarcec/workers/part-m-000012,'vicky'3,'martin'
Write it here today. Next lesson: Export