Sqoop1.4.4 importing data from a MySQL database table into an hbase table

Source: Internet
Author: User
Tags relational database table sqoop

Questions Guide:

1 、--hbase-table 、--The role of Hbase-row-key 、--column-family and--hbase-create-table parameters?

2. Sqoop the data in the relational database table into HBase, what is the default Rowkey?

3. What if there are multiple keywords in the relational database table?

Introduction and some important parameters

In addition to being able to import data from a relational database into HDFs and hive, Sqoop can also be imported into an hbase table.
--hbase-table: Specifies thatdata is imported into an hbase table , rather than a directory on HDFs, by specifying the--hbase-table parameter value. Each row in the input table is converted to a row of an hbase put operation's output table.
--hbase-row-key: You can specify the row key manually by using the--hbase-row-key parameter. By default,Sqoop will use the split-by column as the hbase rowkey column . If the split-by value is not specified, it will attempt to identify the keyword of the relational table.

If the source table is a composite keyword, the--hbase-row-key parameter is a comma-delimited list of combinations of keyword attributes, in which case the row key for HBase is generated by merging the values of the Combined keyword property, separating each value with an underscore .

--column-family: The--column-family parameter must be specified, and each output column will be placed in the same family column family .

--hbase-create-table: If the target table and column family in HBase do not exist, if you use this parameter,Sqoop will first create the target table and the column family based on the default configuration of HBase when running the task .

Note: When the source table is a combination of keywords, you must manually specify the--hbase-row-key parameter, Sqoop to import the data into HBase, or not.
Note two: if the target table and the column family in HBase do not exist, if the--hbase-create-table parameter is not added, the Sqoop job will report an error exit run. So before you import data from a source table into hbase, you need to first create the target table and its corresponding column family in HBase.
Sqoop now serializes all the field values, converts the value to a string representation, and then inserts a binary value of the UTF-8 encoded string value into HBase.

Second, the original data are as follows

Mysql> SELECT * from user;+----+-----------+--------------+--------------+| ID | Name      | address      | school       |+----+-----------+--------------+--------------+|  1 | Zhang Sanfeng,    Wutai Mountain,       Anhui University |     |  2 | Huang Yi    | Jiangsu Nanjing     | Nanjing University |     |  3 | Suzhou University, Suzhou, Jiangsu, John Doe |     |  4 | Wang Hai |      Jiangsu nanjing     |              |  | 5 | Xiao Dukja    |              | Tsinghua University     |+----+-----------+--------------+--------------+5 rows in Set (0.00 sec)
The above ID is a keyword

Third, the single keyword as rowkey import

[Email protected] ~]$ sqoop import--connect jdbc:mysql://secondmgt:3306/test--username hive--password Hive--table use R--split-by ID--hbase-table sqooptest--column-family stuinfo

Import with--split-by ID as Rowkey, without this parameter, is also imported by default as an ID rowkey. The executive reported the following error:

15/01/19 19:17:29 INFO MapReduce. Job:running job:job_1421373857783_002915/01/19 19:17:43 INFO mapreduce. Job:job job_1421373857783_0029 running in Uber mode:false15/01/19 19:17:43 INFO MapReduce. Job:map 0% reduce 0%15/01/19 19:17:54 INFO mapreduce. Job:task Id:attempt_1421373857783_0029_m_000001_0, Status:FAILEDError:java.lang.RuntimeException:Could not access H Base table sqooptest at org.apache.sqoop.hbase.HBasePutProcessor.setConf (hbaseputprocessor.java:122) at Org. Apache.hadoop.util.ReflectionUtils.setConf (reflectionutils.java:73) at Org.apache.hadoop.util.ReflectionUtils.newInstance (reflectionutils.java:133) at Org.apache.sqoop.mapreduce.delegatingoutputformat$delegatingrecordwriter.<init> ( delegatingoutputformat.java:107) at Org.apache.sqoop.mapreduce.DelegatingOutputFormat.getRecordWriter ( delegatingoutputformat.java:82) at Org.apache.hadoop.mapred.maptask$newdirectoutputcollector.<init> ( maptask.java:623) at ORg.apache.hadoop.mapred.MapTask.runNewMapper (maptask.java:743) at Org.apache.hadoop.mapred.MapTask.run ( maptask.java:339) at Org.apache.hadoop.mapred.yarnchild$2.run (yarnchild.java:162) at Java.security.AccessCon Troller.doprivileged (Native Method) at Javax.security.auth.Subject.doAs (subject.java:415) at Org.apache.hado Op.security.UserGroupInformation.doAs (usergroupinformation.java:1491) at Org.apache.hadoop.mapred.YarnChild.main (yarnchild.java:157)
By the error log, it is due to the inability to access the Sqooptest table, after troubleshooting the connection problem, the view found that the table was not originally in HBase. The solution is as follows:

1. Create the Sqooptest table in hbase first

HBase (main):006:0> create ' sqooptest ', ' stuinfo ' 0 row (s) in 1.5930 seconds=> hbase::table-sqooptesthbase (main): 0 07:0> listTABLEsqooptestyzjt2 Row (s) in 0.0290 seconds
Execute again to see the results
HBase (main):012:0> scan ' sqooptest ' ROW Column+cell 1 column=stuinfo:address, timestamp=1421665851147 , Value=\xe4\xba\x94\xe5\x8f\xb0\xe5\xb1\xb1 1 column=stuinfo:name, timestamp=1421665851147, value=\xE5\xBC\xA0\ Xe4\xb8\x89\xe4\xb8\xb0 1 Column=stuinfo:school, timestamp=1421665851147, value=\xe5\xae\x89\xe5\xbe\xbd\xe5\xa4 \xa7\xe5\xad\xa6 2 column=stuinfo:address, timestamp=1421665846096, value=\xe6\xb1\x9f\xe8\x8b\x8f\xe5\x8d\x97\x           E4\xba\xac 2 Column=stuinfo:name, timestamp=1421665846096, value=\xe9\xbb\x84\xe6\x98\x93\xe6\x9f\x90 2 Column=stuinfo:school, timestamp=1421665846096, Value=\xe5\x8d\x97\xe4\xba\xac\xe5\xa4\xa7\xe5\xad\xa6 3 colum N=stuinfo:address, timestamp=1421665850852, value=\xe6\xb1\x9f\xe8\x8b\x8f\xe8\x8b\x8f\xe5\xb7\x9e 3 Column=stuI Nfo:name, timestamp=1421665850852, value=\xe6\x9d\x8e\xe5\x9b\x9b 3 Column=stuinfo:school, timestamp=14216658508 Value=\xe8\x8b\x8f\x,E5\xb7\x9e\xe5\xa4\xa7\xe5\xad\xa6 4 column=stuinfo:address, timestamp=1421665858288, Value=\xE6\xB1\x9F\xE8\x8B           \x8f\xe5\x8d\x97\xe4\xba\xac 4 Column=stuinfo:name, timestamp=1421665858288, Value=\xe7\x8e\x8b\xe6\xb5\xb7 4 Column=stuinfo:school, timestamp=1421665858288, value= 5 column=stuinfo:address, timestamp=1421665862608           , value= 5 column=stuinfo:name, timestamp=1421665862608, value=\xe5\xb0\x8f\xe5\xbe\xb7\xe5\xad\x90 5 Column=stuinfo:school, timestamp=1421665862608, Value=\xe6\xb8\x85\xe5\x8d\x8e\xe5\xa4\xa7\xe5\xad\xa65 Row (s) in 0.0650 seconds
2. Add the--hbase-create-table parameter so that Sqoop creates the corresponding table and the column family in HBase before importing.

The following example is shown in the Sqooptest2 table:

[[email protected] ~]$ sqoop import--connect jdbc:mysql://secondmgt:3306/test--username hive--password Hive-- Table user--split-by ID--hbase-table sqooptest2--column-family stuinfo--hbase-create-table hbase (main):020:0> Scan ' Sqooptest2 ' ROW Column+cell 1 column=stuinfo:address, timestamp=1421667190477, value=\xe4\ XBA\X94\XE5\X8F\XB0\XE5\XB1\XB1 1 Column=stuinfo:name, timestamp=1421667190477, value=\xe5\xbc\xa0\xe4\xb8\x 89\xe4\xb8\xb0 1 Column=stuinfo:school, timestamp=1421667190477, value=\xe5\xae\x89\xe5\xbe\xbd\xe5\xa4\xa7\ Xe5\xad\xa6 2 column=stuinfo:address, timestamp=1421667185489, Value=\xe6\xb1\x9f\xe8\x8b\x8f\xe5\x8d\x97\xe               4\xba\xac 2 Column=stuinfo:name, timestamp=1421667185489, value=\xe9\xbb\x84\xe6\x98\x93\xe6\x9f\x90 2               Column=stuinfo:school, timestamp=1421667185489, Value=\xe5\x8d\x97\xe4\xba\xac\xe5\xa4\xa7\xe5\xad\xa6 3 Column=stuinfo:address, Tim.estamp=1421667190279, value=\xe6\xb1\x9f\xe8\x8b\x8f\xe8\x8b\x8f\xe5\xb7\x9e 3 column=stuinfo:name, Timestam p=1421667190279, value=\xe6\x9d\x8e\xe5\x9b\x9b 3 Column=stuinfo:school, timestamp=1421667190279, value=\xE8 \x8b\x8f\xe5\xb7\x9e\xe5\xa4\xa7\xe5\xad\xa6 4 column=stuinfo:address, timestamp=1421667198066, value=\xE6\x B1\x9f\xe8\x8b\x8f\xe5\x8d\x97\xe4\xba\xac 4 Column=stuinfo:name, timestamp=1421667198066, value=\xE7\x8E\x8 B\xe6\xb5\xb7 4 Column=stuinfo:school, timestamp=1421667198066, Value=null 5 Column=stuinfo:ad Dress, timestamp=1421667201685, Value=null 5 column=stuinfo:name, timestamp=1421667201685, value=\xE5\xB0\x8 f\xe5\xbe\xb7\xe5\xad\x90 5 Column=stuinfo:school, timestamp=1421667201685, value=\xe6\xb8\x85\xe5\x8d\x8e\x E5\xa4\xa7\xe5\xad\xa65 row (s) in 0.0360 seconds

iii. Manual designation of Rowkey

Disables, removes the previously created Sqooptest table, demonstrating the case of manually specifying Rowkey. We represent Rowkey in the Id+name combination as follows:

[Email protected] ~]$ sqoop import--connect jdbc:mysql://secondmgt:3306/test--username hive--password Hive--table use R--split-by ID--hbase-table sqooptest--column-family stuinfo--hbase-create-table--hbase-row-key id,name
View results:

HBase (main):026:0> ListTABLEsqooptestyzjt2 row (s) in 0.0430 seconds=> ["Sqooptest", "YZJT"]hbase (main): 027:0       > Scan ' sqooptest ' ROW Column+cell 1_\xe5\xbc\xa0\xe4\xb8\x89\xe4\xb8\xb0 Column=stuinfo:address, timestamp=1421668051376, Value=\xe4\xba\x94\xe5\x8f\xb0\xe5\xb1\xb1 1_\xE5\xBC\xA0\xE4\ Xb8\x89\xe4\xb8\xb0 Column=stuinfo:school, timestamp=1421668051376, value=\xe5\xae\x89\xe5\xbe\xbd\xe5\xa4\xa7\ Xe5\xad\xa6 2_\xe9\xbb\x84\xe6\x98\x93\xe6\x9f\x90 column=stuinfo:address, timestamp=1421668046873, value=\xE6\xB1 \x9f\xe8\x8b\x8f\xe5\x8d\x97\xe4\xba\xac 2_\xe9\xbb\x84\xe6\x98\x93\xe6\x9f\x90 Column=stuInfo:school, timestamp= 1421668046873, Value=\xe5\x8d\x97\xe4\xba\xac\xe5\xa4\xa7\xe5\xad\xa6 3_\xe6\x9d\x8e\xe5\x9b\x9b column= Stuinfo:address, timestamp=1421668051471, value=\xe6\xb1\x9f\xe8\x8b\x8f\xe8\x8b\x8f\xe5\xb7\x9e 3_\xE6\x9D\x8E\ xe5\x9b\x9b Column=stuinfo:school, Timestamp=1421668051471, Value=\xe8\x8b\x8f\xe5\xb7\x9e\xe5\xa4\xa7\xe5\xad\xa6 4_\xe7\x8e\x8b\xe6\xb5\xb7 colum N=stuinfo:address, timestamp=1421668058375, Value=\xe6\xb1\x9f\xe8\x8b\x8f\xe5\x8d\x97\xe4\xba\xac 4_\xE7\x8E\x8B \xe6\xb5\xb7 Column=stuinfo:school, timestamp=1421668058375, Value=null 5_\xe5\xb0\x8f\xe5\xbe\xb7\xe5\x Ad\x90 column=stuinfo:address, timestamp=1421668063445, Value=null 5_\xe5\xb0\x8f\xe5\xbe\xb7\xe5\xad\x90 Colu Mn=stuinfo:school, timestamp=1421668063445, Value=\xe6\xb8\x85\xe5\x8d\x8e\xe5\xa4\xa7\xe5\xad\xa6
The data is also successfully imported into HBase, and Rowkey is a combination of the binary values of the ID number +name column values.The values between the Rowkey columns are separated by an underscore。


Recommended reading:
Previous article:
Sqoop1.4.4 importing data from MySQL into a hive table

Sqoop1.4.4 importing data from a MySQL database table into an hbase table

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.