Http://www.aliyun.com/zixun/aggregation/14417.html ">apache Hadoop is a widely used data analysis platform that is reliable, efficient and scalable. Percona Company's Alexander Rubin recently published a blog post describing how he exported a table from MySQL to Hadoop and then loaded the data into Cloudera Impala and ran reports on it. In this test example of Alexander Rubin, the cluster he uses contains 6 data nodes. The following are specific specifications:
Data export there are many ways to export data from MySQL to Hadoop. In this example of Rubin, he simply exports the OnTime table to a text file: SELECT * into outfile '/TMP/ONTIME.PSV '
FIELDS terminated by ', '
from OnTime; You can use the "|" or any other symbol as a separator. Of course, you can also download data directly from www.transtats.bts.gov using the following simple script.
Load Hadoop HDFS
Rubin first loads the data into the HDFs as a set of files. Hive or Impala will use the directory where the data is imported to connect all the files in that directory. In the Rubin example, he created the/data/ontime/directory on HDFs, and then copied all the files from the local matching on_time_on_time_performance_*.csv schema to the directory.
Create an external table in Impala
After all the data files are loaded, you need to create an external table: