Alex 的 Hadoop 菜鳥教程: 第7課 Sqoop2 匯入教程,hadoopsqoop2

來源:互聯網
上載者:User

Alex 的 Hadoop 菜鳥教程: 第7課 Sqoop2 匯入教程,hadoopsqoop2

 具體的安裝和jdbc的驅動準備你們看第6課。現在我用一個例子講解sqoop2的具體使用方法

資料準備

有一個mysql的表叫worker,裡面有三條資料,我們要將其匯入hadoop

這是建表語句

CREATE TABLE `workers` (    `id` int(11) NOT NULL AUTO_INCREMENT,    `name` varchar(20) NOT NULL,    PRIMARY KEY (`id`)  ) ENGINE=MyISAM  DEFAULT CHARSET=utf8; 

插入三條資料

insert into workers (name) values ('jack');insert into workers (name) values ('vicky');insert into workers (name) values ('martin');

接下來我們使用sqoop用戶端進行匯入工作

匯入資料建立資料庫連接

$ sqoop2Sqoop home directory: /usr/lib/sqoop2Sqoop Shell: Type 'help' or '\h' for help.sqoop:000> create connection --cid 1

這句話的意思是建立一個id為1的串連,然後sqoop會讓你輸入一些必要參數

Creating connection for connector with id 1Please fill following values to create new connection objectName: First connectionConfiguration configurationJDBC Driver Class: com.mysql.jdbc.DriverJDBC Connection String: jdbc:mysql://mysql.server/databaseUsername: sqoopPassword: *****JDBC Connection Properties:There are currently 0 values in the map:entry#Security related configuration optionsMax connections: 0New connection was successfully created with validation status FINE and persistent id 1

記得把  jdbc:mysql://mysql.server/database  替換成你真實的資料庫連接

建立job建立一個id為1的job,類型是 import
sqoop:000> create job --xid 1 --type import

接下來sqoop會讓你輸入需要的參數,只需要輸入job的名字和Table name就好了,還有幾個儲存選項都選0,其他直接斷行符號
Creating job for connection with id 1Please fill following values to create new job objectName: First jobDatabase configurationTable name: workersTable SQL statement:Table column names:Partition column name:Boundary query:Output configurationStorage type:  0 : HDFSChoose: 0Output format:   0 : TEXT_FILE  1 : SEQUENCE_FILEChoose: 0Compression format:   0 : NONE  1 : DEFAULT  2 : DEFLATE  3 : GZIP  4 : BZIP2  5 : LZO  6 : LZ4  7 : SNAPPYChoose: 0Output directory: /user/jarcec/usersNew job was successfully created with validation status FINE and persistent id 1

執行任務用start job命令去執行這個任務,用--jid來傳入任務id
sqoop:000> start job --jid 1Submission detailsJob ID: 1Server URL: http://localhost:12000/sqoop/Created by: rootCreation date: 2014-11-26 16:41:30 CSTLastly updated by: rootExternal ID: job_1406097234796_0006N/A2014-11-26 16:41:30 CST: BOOTING  - Progress is not available

檢查結果再開啟一個ssh終端,然後用hdfs的命令查看結果
$ hdfs dfs -ls /user/jarcec/workers/Found 3 items-rw-r--r--   2 sqoop2 supergroup          0 2014-11-26 16:42 /user/jarcec/workers/_SUCCESS-rw-r--r--   2 sqoop2 supergroup          9 2014-11-26 16:41 /user/jarcec/workers/part-m-00000-rw-r--r--   2 sqoop2 supergroup         21 2014-11-26 16:42 /user/jarcec/workers/part-m-00001

可以看到有三個結果檔案被產生,然後我們cat看下檔案的內容
$ hdfs dfs -cat /user/jarcec/workers/part-m-000001,'jack'$ hdfs dfs -cat /user/jarcec/workers/part-m-000012,'vicky'3,'martin'

今天寫到這裡,下節課講講匯出



相關文章

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.