Using Sqoop to import MySQL data into Hadoop

Last Update:2014-12-08 Source: Internet

Author: User

Tags sqoop

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The installation configuration of Hadoop is not spoken here.

The installation of Sqoop is also very simple. After you complete the installation of SQOOP, you can test if you can connect to MySQL (note: The MySQL Jar pack is to be placed under Sqoop_home/lib): SQOOP list-databases--connect jdbc:mysql:// 192.168.1.109:3306/--username Root--password 19891231 The result is as follows that Sqoop can be used normally. Below, you will import data from MySQL into Hadoop. I'm going to have a 3 million-piece ID data sheet: Start hive (using the command line: Hive to start) and then use Sqoop to import the data to Hive:sqoop import--connect jdbc:mysql:// 192.168.1.109:3306/hadoop--username root--password 19891231--table test_sfz--hive-import Sqoop will start the job to complete the import. It took 2 minutes and 20 seconds to complete the import, and it was good. You can see the data table you just imported in hive: Let's test the data with a SQL: SELECT * from TEST_SFZ where ID < 10; As you can see, Hive completed this task for nearly 25 seconds, It's really slow (almost no time in MySQL), but take into account that hive is creating a job to run in Hadoop, of course much time. Next, we test the data for complex queries: My machine is configured as follows: Hadoop is a pseudo-distributed operation on a virtual machine, and the virtual machine OS is ubuntu12.04 64-bit, configured as follows: Test 1 calculates the average age testing data: 300.8 W1. Calculate the average age of Guangdong MySQL:Select (Sum (now ())-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ where address like ' Guangdong ';Spents: 0.877s hive:select (SUM (Year (' 2014-10-01 ')-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ where address Li Ke ' guangdong% '; spents: 25.012s 2. The average age for each city is sorted from high to low for MySQL:Select Address, (SUM (now ())-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ GROUP by address order by Ageavge desc;spents: 2.949sHive:Select Address, (SUM (Year (' 2014-10-01 ')-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ GROUP by address order by Ageavge desc; spents: 51.29sAs you can see, hive is growing at a slower rate than MySQL in time-consuming. TEST 2test data: 1200WMySQL Engine: MyISAM (to speed up queries)Import to Hive:1. Calculate the average age of Guangdong MySQL:Select (Sum (now ())-SUBSTRING (borth,1,4))/count (*)) as Ageavge from test_sfz2 where address like ' Guangdong ';Spents: 5.642s hive:select (SUM (Year (' 2014-10-01 ')-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ2 where address L Ike ' Guangdong% '; spents: 168.259s 2. The average age for each city is sorted from high to low for MySQL:Select Address, (SUM (now ())-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ2 GROUP by address order by Ageavge desc;spents: 11.964sHive:Select Address, (SUM (Year (' 2014-10-01 ')-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ2 GROUP by address order by Ageavge desc; spents: 311.714s TEST 3test data: 2000WMySQL Engine: MyISAM (to speed up queries)Import to Hive: (This time is very short!) Maybe because of the import in TEST2, my host is doing other resource-consuming work. ) 1. Calculate the average age of Guangdong MySQL:Select (Sum (now ())-SUBSTRING (borth,1,4))/count (*)) as Ageavge from test_sfz2 where address like ' Guangdong ';Spents: 6.605s hive:select (SUM (Year (' 2014-10-01 ')-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ2 where address L Ike ' Guangdong% '; spents: 188.206s 2. The average age for each city is sorted from high to low for MySQL:Select Address, (SUM (now ())-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ2 GROUP by address order by Ageavge desc;spents: 19.926sHive:Select Address, (SUM (Year (' 2014-10-01 ')-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ2 GROUP by address order by Ageavge desc; spents: 411.816s

From for notes (Wiz)

Using Sqoop to import MySQL data into Hadoop

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More