Alex's Hadoop rookie Tutorial: 9th Sqoop1 exporting mysql from Hbase or Hive
Today we will talk about how to use sqoop to export Hbase or Hive stuff to mysql. But I want to tell you in advance
Currently, sqoop cannot export data directly from Hbase to mysql. Two tables must be created through Hive. One External table is based on this Hbase table, and the other is a pure hdfs-based hive native table, import the External table data to the native table (temporary), and then use hive to export the data in the temporary table to mysql.
Data Preparation mysql creates an empty table
CREATE TABLE `employee` ( `rowkey` int(11) NOT NULL, `id` int(11) NOT NULL, `name` varchar(20) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Note: The hive table is used to map the rowkey field of Hbase to the key field, so the corresponding key field may also be created when creating a mysql table, however, if the key is reserved for mysql, the insert statement cannot be inserted.
Hbase creates an employee table, creates an employee table, and inserts data.
hbase(main):005:0> create 'employee','info'0 row(s) in 0.4740 seconds=> Hbase::Table - employeehbase(main):006:0> put 'employee',1,'info:id',10 row(s) in 0.2080 secondshbase(main):008:0> scan 'employee'ROW COLUMN+CELL 1 column=info:id, timestamp=1417591291730, value=1 1 row(s) in 0.0610 secondshbase(main):009:0> put 'employee',1,'info:name','peter'0 row(s) in 0.0220 secondshbase(main):010:0> scan 'employee'ROW COLUMN+CELL 1 column=info:id, timestamp=1417591291730, value=1 1 column=info:name, timestamp=1417591321072, value=peter 1 row(s) in 0.0450 secondshbase(main):011:0> put 'employee',2,'info:id',20 row(s) in 0.0370 secondshbase(main):012:0> put 'employee',2,'info:name','paul'0 row(s) in 0.0180 secondshbase(main):013:0> scan 'employee'ROW COLUMN+CELL 1 column=info:id, timestamp=1417591291730, value=1 1 column=info:name, timestamp=1417591321072, value=peter 2 column=info:id, timestamp=1417591500179, value=2 2 column=info:name, timestamp=1417591512075, value=paul 2 row(s) in 0.0440 seconds
Create Hive External tables hive can be divided into Native tables and external tables. Native tables are stored in hdfs as simple files. External tables depend on other frameworks, such as Hbase, now we create a hive External table dependent on the newly created employee hbase table.
hive> CREATE EXTERNAL TABLE h_employee(key int, id int, name string) > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, info:id,info:name") > TBLPROPERTIES ("hbase.table.name" = "employee");OKTime taken: 0.324 secondshive> select * from h_employee;OK11peter22paulTime taken: 1.129 seconds, Fetched: 2 row(s)
Create Hive native table this hive native table is only used for temporary export, so the name is h_employee_export, And the delimiter between fields is comma
CREATE TABLE h_employee_export(key INT, id INT, name STRING)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054';
Let's see what the actually stored text files look like.
$ hdfs dfs -cat /user/hive/warehouse/h_employee_export/000000_01,1,peter2,2,paul
Import data from the source Hive table to the temporary table
Step 1 first import data from h_employee (External table based on Hbase) to h_employee_export (native Hive table)
hive> insert overwrite table h_employee_export select * from h_employee;
hive> select * from h_employee_export;OK11peter22paulTime taken: 0.359 seconds, Fetched: 2 row(s)
Let's see what the actually stored text files look like.
$ hdfs dfs -cat /user/hive/warehouse/h_employee_export/000000_01,1,peter2,2,paul
Export data from Hive to mysql
$ sqoop export --connect jdbc:mysql://localhost:3306/sqoop_test --username root --password root --table employee --m 1 --export-dir /user/hive/warehouse/h_employee_export/Warning: /usr/lib/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.Please set $HCAT_HOME to the root of your HCatalog installation.Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.14/12/05 08:49:35 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.114/12/05 08:49:35 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.14/12/05 08:49:35 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.14/12/05 08:49:35 INFO tool.CodeGenTool: Beginning code generation14/12/05 08:49:36 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 114/12/05 08:49:36 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 114/12/05 08:49:36 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduceNote: /tmp/sqoop-wlsuser/compile/d16eb4166baf6a1e885d7df0e2638685/employee.java uses or overrides a deprecated API.Note: Recompile with -Xlint:deprecation for details.14/12/05 08:49:39 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-wlsuser/compile/d16eb4166baf6a1e885d7df0e2638685/employee.jar14/12/05 08:49:39 INFO mapreduce.ExportJobBase: Beginning export of employee14/12/05 08:49:41 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar14/12/05 08:49:43 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative14/12/05 08:49:43 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative14/12/05 08:49:43 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps14/12/05 08:49:43 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.111.78.111:803214/12/05 08:49:45 INFO input.FileInputFormat: Total input paths to process : 114/12/05 08:49:45 INFO input.FileInputFormat: Total input paths to process : 114/12/05 08:49:45 INFO mapreduce.JobSubmitter: number of splits:114/12/05 08:49:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1406097234796_003714/12/05 08:49:46 INFO impl.YarnClientImpl: Submitted application application_1406097234796_003714/12/05 08:49:46 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1406097234796_0037/14/12/05 08:49:46 INFO mapreduce.Job: Running job: job_1406097234796_003714/12/05 08:49:59 INFO mapreduce.Job: Job job_1406097234796_0037 running in uber mode : false14/12/05 08:49:59 INFO mapreduce.Job: map 0% reduce 0%14/12/05 08:50:10 INFO mapreduce.Job: map 100% reduce 0%14/12/05 08:50:10 INFO mapreduce.Job: Job job_1406097234796_0037 completed successfully14/12/05 08:50:10 INFO mapreduce.Job: Counters: 30File System CountersFILE: Number of bytes read=0FILE: Number of bytes written=99761FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=166HDFS: Number of bytes written=0HDFS: Number of read operations=4HDFS: Number of large read operations=0HDFS: Number of write operations=0Job Counters Launched map tasks=1Data-local map tasks=1Total time spent by all maps in occupied slots (ms)=8805Total time spent by all reduces in occupied slots (ms)=0Total time spent by all map tasks (ms)=8805Total vcore-seconds taken by all map tasks=8805Total megabyte-seconds taken by all map tasks=9016320Map-Reduce FrameworkMap input records=2Map output records=2Input split bytes=144Spilled Records=0Failed Shuffles=0Merged Map outputs=0GC time elapsed (ms)=97CPU time spent (ms)=1360Physical memory (bytes) snapshot=167555072Virtual memory (bytes) snapshot=684212224Total committed heap usage (bytes)=148897792File Input Format Counters Bytes Read=0File Output Format Counters Bytes Written=014/12/05 08:50:10 INFO mapreduce.ExportJobBase: Transferred 166 bytes in 27.0676 seconds (6.1328 bytes/sec)14/12/05 08:50:10 INFO mapreduce.ExportJobBase: Exported 2 records.
Note:
There is such a sentence in this log
14/12/05 08:49:46 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1406097234796_0037/
This means that you can access this address in your browser to view the execution status of the task. If your task fails to be completed after a long time, an error occurs. You can view the detailed error log at this address.
View results
mysql> select * from employee;+--------+----+-------+| rowkey | id | name |+--------+----+-------+| 1 | 1 | peter || 2 | 2 | paul |+--------+----+-------+2 rows in set (0.00 sec)mysql>
Imported