使用sqoop將mysql資料匯入到hadoop,sqoophadoop

來源:互聯網
上載者:User

使用sqoop將mysql資料匯入到hadoop,sqoophadoop

hadoop的安裝配置這裡就不講了。

Sqoop的安裝也很簡單。完成sqoop的安裝後,可以這樣測試是否可以串連到mysql(注意:mysql的jar包要放到 SQOOP_HOME/lib 下):sqoop list-databases --connect jdbc:mysql://192.168.1.109:3306/ --username root --password 19891231結果如下即說明sqoop已經可以正常使用了。 下面,要將mysql中的資料匯入到hadoop中。我準備的是一個300萬條資料的身份證資料表: 先啟動hive(使用命令列:hive 即可啟動) 然後使用sqoop匯入資料到hive:sqoop import --connect jdbc:mysql://192.168.1.109:3306/hadoop --username root --password 19891231 --table test_sfz --hive-import sqoop 會啟動job來完成匯入工作。 完成匯入用了2分20秒,還是不錯的。 在hive中可以看到剛剛匯入的資料表: 我們來一句sql測試一下資料:select * from test_sfz where id < 10;可以看到,hive完成這個任務用了將近25秒,確實是挺慢的(在mysql中幾乎是不費時間),但是要考慮到hive是建立了job在hadoop中跑,時間當然多。 接下來,我們會對這些資料進行複雜查詢的測試:我機子的配置如下:hadoop 是運行在虛擬機器上的偽分布式,虛擬機器OS是ubuntu12.04 64位,配置如下:TEST 1 計算平均年齡測試資料:300.8 W1. 計算廣東的平均年齡mysql:select (sum(year(NOW()) - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz where address like '廣東%';用時: 0.877s hive:select (sum(year('2014-10-01') - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz where address like '廣東%';用時:25.012s 2. 對每個城市的的平均年齡進行從高到低的排序mysql:select address, (sum(year(NOW()) - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz GROUP BY address order by ageAvge desc;用時:2.949s hive:select address, (sum(year('2014-10-01') - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz GROUP BY address order by ageAvge desc;用時:51.29s 可以看到,在耗時上面,hive的增長速度較mysql慢。 TEST 2測試資料:1200Wmysql 引擎: MyISAM(為了加快查詢速度)匯入到hive: 1. 計算廣東的平均年齡mysql:select (sum(year(NOW()) - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 where address like '廣東%';用時: 5.642s hive:select (sum(year('2014-10-01') - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 where address like '廣東%';用時:168.259s 2. 對每個城市的的平均年齡進行從高到低的排序mysql:select address, (sum(year(NOW()) - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 GROUP BY address order by ageAvge desc;用時:11.964s hive:select address, (sum(year('2014-10-01') - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 GROUP BY address order by ageAvge desc;用時:311.714s TEST 3測試資料:2000Wmysql 引擎: MyISAM(為了加快查詢速度)匯入到hive:(這次用的時間很短!可能是因為TEST2中的匯入時,我的主機在做其他耗資源的工作..) 1. 計算廣東的平均年齡mysql:select (sum(year(NOW()) - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 where address like '廣東%';用時: 6.605s hive:select (sum(year('2014-10-01') - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 where address like '廣東%';用時:188.206s 2. 對每個城市的的平均年齡進行從高到低的排序mysql:select address, (sum(year(NOW()) - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 GROUP BY address order by ageAvge desc;用時:19.926s hive:select address, (sum(year('2014-10-01') - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 GROUP BY address order by ageAvge desc;用時:411.816s



來自為知筆記(Wiz)



相關文章

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.