Alex 的 Hadoop 菜鳥教程: 第8課 Sqoop1 匯入 Hbase 以及 Hive,hadoopsqoop1

來源:互聯網
上載者:User

Alex 的 Hadoop 菜鳥教程: 第8課 Sqoop1 匯入 Hbase 以及 Hive,hadoopsqoop1

繼續寫,其實mysql 匯入匯出 hdfs 對於實際項目開發沒啥用的,但是那個可以拿來入門。今天寫跟Hbase和Hive的協作。我突然發現我的教程寫的順序很淩亂啊,沒有先介紹Hive 的安裝,這點向大家道歉,我後面補上。

資料準備mysql在mysql 裡面建立表 employee 並插入資料
CREATE TABLE `employee` (      `id` int(11) NOT NULL,      `name` varchar(20) NOT NULL,      PRIMARY KEY (`id`)    ) ENGINE=MyISAM  DEFAULT CHARSET=utf8;  

insert into employee (id,name) values (1,'michael');  insert into employee (id,name) values (2,'ted'); insert into employee (id,name) values (3,'jack'); 

Hbase
hbase(main):006:0> create 'employee','info'0 row(s) in 0.4440 seconds=> Hbase::Table - employee

Hive不需要資料準備,等等用--create-hive-table會自動建表

從mysql匯入到Hbase
# sqoop import --connect jdbc:mysql://localhost:3306/sqoop_test --username root --password root --table employee --hbase-table employee --column-family info --hbase-row-key id -m 1Warning: /usr/lib/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.Please set $HCAT_HOME to the root of your HCatalog installation.Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.14/12/01 17:36:25 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.114/12/01 17:36:25 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.14/12/01 17:36:25 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.14/12/01 17:36:25 INFO tool.CodeGenTool: Beginning code generation14/12/01 17:36:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 114/12/01 17:36:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 114/12/01 17:36:26 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce……中間日誌太多了,用省略符號代替14/12/01 17:37:12 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 37.3924 seconds (0 bytes/sec)14/12/01 17:37:12 INFO mapreduce.ImportJobBase: Retrieved 3 records.


去檢查下hbase
hbase(main):001:0> scan 'employee'SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]ROW                                      COLUMN+CELL                                                                                                            1                                       column=info:name, timestamp=1417426628685, value=michael                                                               2                                       column=info:name, timestamp=1417426628685, value=ted                                                                   3                                       column=info:name, timestamp=1417426628685, value=jack                                                                 3 row(s) in 0.1630 seconds

成功插入3條資料
從mysql匯入hive
# sqoop import --connect jdbc:mysql://localhost:3306/sqoop_test --username root --password root --table employee --hive-import --hive-table hive_employee --create-hive-tableWarning: /usr/lib/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.Please set $HCAT_HOME to the root of your HCatalog installation.Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.……………………14/12/02 15:12:13 INFO hive.HiveImport: Loading data to table default.hive_employee14/12/02 15:12:14 INFO hive.HiveImport: Table default.hive_employee stats: [num_partitions: 0, num_files: 4, num_rows: 0, total_size: 23, raw_data_size: 0]14/12/02 15:12:14 INFO hive.HiveImport: OK14/12/02 15:12:14 INFO hive.HiveImport: Time taken: 0.799 seconds14/12/02 15:12:14 INFO hive.HiveImport: Hive import complete.14/12/02 15:12:14 INFO hive.HiveImport: Export directory is empty, removing it.

這裡說下真實環境中mysql的jdbc連結不要用localhost,因為這個任務會被分布式的發送不同的hadoop機子上,要那些機子真的可以通過jdbc連到mysql上才行,否則會丟資料
檢查下hive
hive> select * from hive_employee;OK1michael2ted3jackTime taken: 0.179 seconds, Fetched: 3 row(s)

還有一點要聲明下:目前sqoop只能從mysql匯入資料到hive的原生表(也就是基於hdfs儲存的),無法匯入資料到外部表格(比如基於hbase建立的hive表)
下課!下次講匯出!

相關文章

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.