使用sqoop將mysql資料匯入到hadoop，sqoophadoop

最後更新：2014-12-09 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

hadoop的安裝配置這裡就不講了。

Sqoop的安裝也很簡單。完成sqoop的安裝後，可以這樣測試是否可以串連到mysql(注意：mysql的jar包要放到 SQOOP_HOME/lib 下)：sqoop list-databases --connect jdbc:mysql://192.168.1.109:3306/ --username root --password 19891231結果如下即說明sqoop已經可以正常使用了。下面，要將mysql中的資料匯入到hadoop中。我準備的是一個300萬條資料的身份證資料表：先啟動hive（使用命令列：hive 即可啟動）然後使用sqoop匯入資料到hive：sqoop import --connect jdbc:mysql://192.168.1.109:3306/hadoop --username root --password 19891231 --table test_sfz --hive-import sqoop 會啟動job來完成匯入工作。完成匯入用了2分20秒，還是不錯的。在hive中可以看到剛剛匯入的資料表：我們來一句sql測試一下資料：select * from test_sfz where id < 10;可以看到，hive完成這個任務用了將近25秒，確實是挺慢的（在mysql中幾乎是不費時間），但是要考慮到hive是建立了job在hadoop中跑，時間當然多。接下來，我們會對這些資料進行複雜查詢的測試：我機子的配置如下：hadoop 是運行在虛擬機器上的偽分布式，虛擬機器OS是ubuntu12.04 64位，配置如下：TEST 1 計算平均年齡測試資料：300.8 W1. 計算廣東的平均年齡mysql：select (sum(year(NOW()) - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz where address like '廣東%';用時： 0.877s hive：select (sum(year('2014-10-01') - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz where address like '廣東%';用時：25.012s 2. 對每個城市的的平均年齡進行從高到低的排序mysql：select address, (sum(year(NOW()) - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz GROUP BY address order by ageAvge desc;用時：2.949s hive：select address, (sum(year('2014-10-01') - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz GROUP BY address order by ageAvge desc;用時：51.29s 可以看到，在耗時上面，hive的增長速度較mysql慢。 TEST 2測試資料：1200Wmysql 引擎： MyISAM（為了加快查詢速度）匯入到hive： 1. 計算廣東的平均年齡mysql：select (sum(year(NOW()) - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 where address like '廣東%';用時： 5.642s hive：select (sum(year('2014-10-01') - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 where address like '廣東%';用時：168.259s 2. 對每個城市的的平均年齡進行從高到低的排序mysql：select address, (sum(year(NOW()) - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 GROUP BY address order by ageAvge desc;用時：11.964s hive：select address, (sum(year('2014-10-01') - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 GROUP BY address order by ageAvge desc;用時：311.714s TEST 3測試資料：2000Wmysql 引擎： MyISAM（為了加快查詢速度）匯入到hive：（這次用的時間很短！可能是因為TEST2中的匯入時，我的主機在做其他耗資源的工作..） 1. 計算廣東的平均年齡mysql：select (sum(year(NOW()) - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 where address like '廣東%';用時： 6.605s hive：select (sum(year('2014-10-01') - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 where address like '廣東%';用時：188.206s 2. 對每個城市的的平均年齡進行從高到低的排序mysql：select address, (sum(year(NOW()) - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 GROUP BY address order by ageAvge desc;用時：19.926s hive：select address, (sum(year('2014-10-01') - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 GROUP BY address order by ageAvge desc;用時：411.816s

來自為知筆記(Wiz)

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

使用sqoop將mysql資料匯入到hadoop，sqoophadoop

聯繫我們

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support