The installation configuration of Hadoop is not spoken here. The installation of Sqoop is also very simple. After you complete the installation of SQOOP, you can test if you can connect to MySQL (note: The MySQL Jar pack is to be placed under Sqoop_home/lib):
Sqoop list-databases--connect jdbc:mysql://192.168.1.109:3306/--username root--password 19891231 results are as follows
That means that the sqoop is ready for normal use.
Below, you will import data from MySQL into Hadoop. I'm going to have a 3 million-piece ID data sheet:
Start Hive First (use command line: Hive to start)
Then use Sqoop to import the data to Hive:sqoop import--connect jdbc:mysql://192.168.1.109:3306/hadoop--username root--password 19891231- -table TEST_SFZ--hive-import
Sqoop will start the job to complete the import work.
It took 2 minutes and 20 seconds to complete the import, and it was good.
You can see the data table you just imported in hive:
Let's have a SQL test data: SELECT * from TEST_SFZ where ID < 10;
As you can see, Hive takes almost 25 seconds to complete this task, which is really slow (almost no time in MySQL), but consider that hive is creating the job to run in Hadoop, of course.
Next, we'll test the data for complex queries: My machine is configured as follows:
Hadoop is a pseudo-distributed run on a virtual machine, and the virtual machine OS is ubuntu12.04 64-bit, configured as follows:
Test 1 Calculates the mean age testing data: 300.8 W1. Calculate the average age of Guangdong MySQL:Select (Sum (now ())-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ where address like ' Guangdong ';Spents: 0.877s
Hive:select (SUM (Year (' 2014-10-01 ')-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ where address like ' Guangdong '; Time: 25.012s
2. The average age of each city is sorted from high to low for MySQL:Select Address, (SUM (now ())-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ GROUP by address order by Ageavge desc;spents: 2.949s
Hive:Select Address, (SUM (Year (' 2014-10-01 ')-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ GROUP by address order by Ageavge desc; spents: 51.29s
As you can see, hive is growing at a slower rate than MySQL in time-consuming.
TEST 2test data: 1200WMySQL Engine: MyISAM (to speed up queries)Import to Hive:
1. Calculate the average age of Guangdong MySQL:Select (Sum (now ())-SUBSTRING (borth,1,4))/count (*)) as Ageavge from test_sfz2 where address like ' Guangdong ';Spents: 5.642s
Hive:select (SUM (Year (' 2014-10-01 ')-SUBSTRING (borth,1,4))/count (*)) as Ageavge from test_sfz2 where address like ' Guangdong '; Spents: 168.259s
2. The average age of each city is sorted from high to low for MySQL:Select Address, (SUM (now ())-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ2 GROUP by address order by Ageavge desc;spents: 11.964s
Hive:Select Address, (SUM (Year (' 2014-10-01 ')-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ2 GROUP by address order by Ageavge desc; spents: 311.714s
TEST 3test data: 2000WMySQL Engine: MyISAM (to speed up queries)Import to Hive:
(It's a very short time!) Maybe because of the import in TEST2, my host is doing other resource-consuming work. )
1. Calculate the average age of Guangdong MySQL:Select (Sum (now ())-SUBSTRING (borth,1,4))/count (*)) as Ageavge from test_sfz2 where address like ' Guangdong ';Spents: 6.605s
Hive:select (SUM (Year (' 2014-10-01 ')-SUBSTRING (borth,1,4))/count (*)) as Ageavge from test_sfz2 where address like ' Guangdong '; Spents: 188.206s
2. The average age of each city is sorted from high to low for MySQL:Select Address, (SUM (now ())-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ2 GROUP by address order by Ageavge desc;spents: 19.926s
Hive:Select Address, (SUM (Year (' 2014-10-01 ')-SUBSTRING (borth,1,4))/count (*)) as Ageavge from TEST_SFZ2 GROUP by address order by Ageavge desc; spents: 411.816s
From for notes (Wiz)
Using Sqoop to import MySQL data into Hadoop