Using Sqoop to import MySQL data into Hadoop

Source: Internet
Author: User
Tags sqoop

The installation configuration of Hadoop is not spoken here.

The installation of Sqoop is also very simple. After you complete the installation of SQOOP, you can test if you can connect to MySQL (note: The MySQL Jar pack is to be placed under Sqoop_home/lib): SQOOP list-databases--connect jdbc:mysql:// 192.168.1.109:3306/--username Root--password 19891231 The result is as follows that Sqoop can be used normally. Below, you will import data from MySQL into Hadoop. I'm going to have a 3 million-piece ID data sheet: Start hive (using the command line: Hive to start) and then use Sqoop to import the data to Hive:sqoop import--connect jdbc:mysql:// 192.168.1.109:3306/hadoop--username root--password 19891231--table test_sfz--hive-import Sqoop will start the job to complete the import. It took 2 minutes and 20 seconds to complete the import, and it was good. You can see the data table you just imported in hive: Let's test the data with a SQL: SELECT * from TEST_SFZ where ID < 10; As you can see, Hive completed this task for nearly 25 seconds, It's really slow (almost no time in MySQL), but take into account that hive is creating a job to run in Hadoop, of course much time. Next, we test the data for complex queries: My machine is configured as follows: Hadoop is a pseudo-distributed operation on a virtual machine, and the virtual machine OS is ubuntu12.04 64-bit, configured as follows: Test 1 calculates the average age testing data: 300.8 W1. Calculate the average age of Guangdong MySQL:Select (Sum (now ())-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ where address like ' Guangdong ';Spents: 0.877s hive:select (SUM (Year (' 2014-10-01 ')-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ where address Li Ke ' guangdong% '; spents: 25.012s 2. The average age for each city is sorted from high to low for MySQL:Select Address, (SUM (now ())-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ GROUP by address order by Ageavge desc;spents: 2.949sHive:Select Address, (SUM (Year (' 2014-10-01 ')-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ GROUP by address order by Ageavge desc; spents: 51.29sAs you can see, hive is growing at a slower rate than MySQL in time-consuming. TEST 2test data: 1200WMySQL Engine: MyISAM (to speed up queries)Import to Hive:1. Calculate the average age of Guangdong MySQL:Select (Sum (now ())-SUBSTRING (borth,1,4))/count (*)) as Ageavge from test_sfz2 where address like ' Guangdong ';Spents: 5.642s hive:select (SUM (Year (' 2014-10-01 ')-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ2 where address L Ike ' Guangdong% '; spents: 168.259s 2. The average age for each city is sorted from high to low for MySQL:Select Address, (SUM (now ())-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ2 GROUP by address order by Ageavge desc;spents: 11.964sHive:Select Address, (SUM (Year (' 2014-10-01 ')-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ2 GROUP by address order by Ageavge desc; spents: 311.714s TEST 3test data: 2000WMySQL Engine: MyISAM (to speed up queries)Import to Hive: (This time is very short!) Maybe because of the import in TEST2, my host is doing other resource-consuming work. ) 1. Calculate the average age of Guangdong MySQL:Select (Sum (now ())-SUBSTRING (borth,1,4))/count (*)) as Ageavge from test_sfz2 where address like ' Guangdong ';Spents: 6.605s hive:select (SUM (Year (' 2014-10-01 ')-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ2 where address L Ike ' Guangdong% '; spents: 188.206s 2. The average age for each city is sorted from high to low for MySQL:Select Address, (SUM (now ())-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ2 GROUP by address order by Ageavge desc;spents: 19.926sHive:Select Address, (SUM (Year (' 2014-10-01 ')-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ2 GROUP by address order by Ageavge desc; spents: 411.816s



From for notes (Wiz)



Using Sqoop to import MySQL data into Hadoop

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.