The installation configuration of Hadoop is not spoken here.
The installation of Sqoop is also very simple. After you complete the installation of SQOOP, you can test if you can connect to MySQL (note: The MySQL Jar pack is to be placed under Sqoop_home/lib): SQOOP list-databases--connect jdbc:mysql:// 192.168.1.109:3306/--username Root--password 19891231 The result is as follows that Sqoop can be used normally. Below, you will import data from MySQL into Hadoop. I'm going to have a 3 million-piece ID data sheet: Start hive (using the command line: Hive to start) and then use Sqoop to import the data to Hive:sqoop import--connect jdbc:mysql:// 192.168.1.109:3306/hadoop--username root--password 19891231--table test_sfz--hive-import Sqoop will start the job to complete the import. It took 2 minutes and 20 seconds to complete the import, and it was good. You can see the data table you just imported in hive: Let's test the data with a SQL: SELECT * from TEST_SFZ where ID < 10; As you can see, Hive completed this task for nearly 25 seconds, It's really slow (almost no time in MySQL), but take into account that hive is creating a job to run in Hadoop, of course much time. Next, we test the data for complex queries: My machine is configured as follows: Hadoop is a pseudo-distributed operation on a virtual machine, and the virtual machine OS is ubuntu12.04 64-bit, configured as follows: Test 1 calculates the average age testing data: 300.8 W1. Calculate the average age of Guangdong MySQL:Select (Sum (now ())-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ where address like ' Guangdong ';Spents: 0.877s hive:select (SUM (Year (' 2014-10-01 ')-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ where address Li Ke ' guangdong% '; spents: 25.012s 2. The average age for each city is sorted from high to low for MySQL:Select Address, (SUM (now ())-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ GROUP by address order by Ageavge desc;spents: 2.949sHive:Select Address, (SUM (Year (' 2014-10-01 ')-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ GROUP by address order by Ageavge desc; spents: 51.29sAs you can see, hive is growing at a slower rate than MySQL in time-consuming. TEST 2test data: 1200WMySQL Engine: MyISAM (to speed up queries)Import to Hive:1. Calculate the average age of Guangdong MySQL:Select (Sum (now ())-SUBSTRING (borth,1,4))/count (*)) as Ageavge from test_sfz2 where address like ' Guangdong ';Spents: 5.642s hive:select (SUM (Year (' 2014-10-01 ')-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ2 where address L Ike ' Guangdong% '; spents: 168.259s 2. The average age for each city is sorted from high to low for MySQL:Select Address, (SUM (now ())-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ2 GROUP by address order by Ageavge desc;spents: 11.964sHive:Select Address, (SUM (Year (' 2014-10-01 ')-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ2 GROUP by address order by Ageavge desc; spents: 311.714s TEST 3test data: 2000WMySQL Engine: MyISAM (to speed up queries)Import to Hive: (This time is very short!) Maybe because of the import in TEST2, my host is doing other resource-consuming work. ) 1. Calculate the average age of Guangdong MySQL:Select (Sum (now ())-SUBSTRING (borth,1,4))/count (*)) as Ageavge from test_sfz2 where address like ' Guangdong ';Spents: 6.605s hive:select (SUM (Year (' 2014-10-01 ')-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ2 where address L Ike ' Guangdong% '; spents: 188.206s 2. The average age for each city is sorted from high to low for MySQL:Select Address, (SUM (now ())-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ2 GROUP by address order by Ageavge desc;spents: 19.926sHive:Select Address, (SUM (Year (' 2014-10-01 ')-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ2 GROUP by address order by Ageavge desc; spents: 411.816s
From for notes (Wiz)
Using Sqoop to import MySQL data into Hadoop