Sqoop MySQL data into HBase's blood and tears

Last Update:2017-07-26 Source: Internet

Author: User

Tags sqoop

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Sqoop import MySQL data into HBase's blood and tears (for half a day)

Copyright NOTICE: This article is Yunshuxueyuan original article.
If you want to reprint please indicate the source: Https://my.oschina.net/yunshuxueyuan/blog
QQ Technology Group: 299142667

First, how the problem arises

Mr. Pang only explained MySQL and HDFS,MYSQ and hive data interoperability, so decided to study the MySQL data directly into hbase, there are a series of problems.

History of Sadness:

Ii. Start specific problem solving

Requirements: (Import the following table data into MySQL)

Thus, write the following Sqoop import command

Import -D sqoop.hbase.add.row.key=true --connect jdbc:mysql://--table Test_goods--hbase-create-table--hbase-table t_goods  --column-family CF--hbase-row-key Id-m 1

Everything looks normal, and then you start executing the command, reporting the following error:

Import  for Table *

The reason for the error is that the specified MySQL table name is not uppercase, so the MySQL table name must be capitalized

NULL  for Row-key column

The reason for the error is that no MySQL column name is specified, so the column name must be specified, and the ID in the Hbase-row-key ID must be displayed in –columns. --columns Id,goods_name, Goods_price

 for Import Unrecognized argument

The reason for the error is that when specifying MySQL column names, I add a space when separated by commas, so the column names displayed after columns can only be separated by commas, not with spaces.

After you exclude the above three issues: My latest import command changes to the following:

Import -D sqoop.hbase.add.row.key=true --connect jdbc:mysql://--table Test_goods----hbase-create-table--hbase-table t_goods--column---hbase-row-key ID--where "ID > = 5 "-M 1

Note: Here's a little problem: remember to bring id>=5.

Execute the import command again: The following occurs (card for a long time)

The map execution is complete, but only stuck here, the MapReduce task has been in the background, a period of time after the death, during which the execution of the import command and kill MapReduce job

Hadoop job-list View the job list for MapReduce

Hadoop Job-kill job_id kills a job

After a long period of testing, suddenly realized that the current use of hbase pseudo-distributed, all of a sudden suddenly:

Cause: Because the current environment is a pseudo-distributed hbase, hbase data is present on the local disk,

And is managed by the zookeeper that comes with it. The principle of importing MySQL data into hbase is actually importing data into HDFs, so if you want to import it successfully, the data address for hbase should be on HDFs. So how to solve this problem, I think you know, that is to open hbase fully distributed.

After a meal toss the virtual machine back to hbase fully distributed snapshot, install the Sqoop, carry out the final test!

The final import command is as follows: (Full import command)

Import -D sqoop.hbase.add.row.key=true --connect jdbc:mysql://--table Test_goods----hbase-create-table--hbase-table t_goods--column---hbase-row-key ID--where "ID > = 5 "-M 1

Finally see the long-lost page:

View HBase, data has been successfully imported

Finally I will write the command to a XXX file, execute the import command via Sqoop–options-file XXX

The following error is spelled:

Import-D sqoop.hbase.add.row.key=true--connect jdbc:mysql://192.168.1.9:3306/ Spider--username root--password root--table Test_goods--columns Id,goods_name, Goods_price--hbase-create-table--hbase-table test_goods--column-Family CF-- hbase-row-Key ID--where "ID >= 5"-M 1

Error reason: The name of the parameter and the value of the parameter did not enter a line break

Correct wording:

Import-Dsqoop.hbase.add.row.key=true--connectjdbc:mysql://  192.168.1.9:3306/spider--usernameroot--passwordroot--tabletest_goods--  Columnsid,goods_name,goods_price--hbase-create-table--hbase-tablett_goods--column- familycf--hbase-row-keyID--whereid>=5-m1

Note: parameter meaning explanation

-D sqoop.hbase.add.row.key=true whether to write Rowkey related fields to the column family, default to False, and by default you will not see any fields in the row key in the column family. Note that the parameter must be placed after import.

--connect Database Connection string

--username–password user name password for MySQL database

--table test_goods table name, note capitalization

--hbase-create-table If the table does not exist in HBase, create

--hbase-table the corresponding HBase table name

--hbase-row-key the Rowkey in the HBase table, note the format

--column-family The column family of the HBase table

--where import is the Where condition for MySQL tables, as in SQL

--split-by Create_time By default Sqoop uses 4 concurrent execution tasks, you need to develop a split column, and if you do not want to use concurrency, you can use parameters--m 1

To this, the bug resolved to complete!!!

Third, knowledge development, timing incremental Import

1. Sqoop Incremental Import

Import -D sqoop.hbase.add.row.key=true --connect jdbc:mysql://--table Test_goods----hbase-create-table--hbase-table t_goods--column---hbase-row-key ID--incremental LastModified--check---last-value ' 2017-06-27 '--split-by u_date

--incremental lastmodified Incremental Import supports two modes append incrementing columns; lastmodified timestamp.

--check-column columns referenced during incremental import

--last-value minimum value, which in this example represents the value of importing 2017-06-27 to today

2, Sqoop job:

Sqoop Job--create TESTJOB01--import --connect jdbc:mysql://--table test_goods--  --hbase-create-table--hbase-table t_goods--column---hbase-row-key id-m 1

Set timed execution of the above Sqoop job

Using the Linux timer: crontab-e

such as daily execution

0 0 * * */opt/local/sqoop-1.4.6/bin/sqoop job ....

--exec TESTJOB01

Copyright NOTICE: This article is Yunshuxueyuan original article.
If you want to reprint please indicate the source: Https://my.oschina.net/yunshuxueyuan/blog
QQ Technology Group: 299142667

Sqoop MySQL data into HBase's blood and tears

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More