Sqoop import MySQL data into HBase's blood and tears (for half a day)
Copyright NOTICE: This article is Yunshuxueyuan original article.
If you want to reprint please indicate the source: Https://my.oschina.net/yunshuxueyuan/blog
QQ Technology Group: 299142667
First, how the problem arises
Mr. Pang only explained MySQL and HDFS,MYSQ and hive data interoperability, so decided to study the MySQL data directly into hbase, there are a series of problems.
History of Sadness:
Ii. Start specific problem solving
Requirements: (Import the following table data into MySQL)
Thus, write the following Sqoop import command
Import -D sqoop.hbase.add.row.key=true --connect jdbc:mysql://--table Test_goods--hbase-create-table--hbase-table t_goods --column-family CF--hbase-row-key Id-m 1
Everything looks normal, and then you start executing the command, reporting the following error:
1.
Import for Table *
The reason for the error is that the specified MySQL table name is not uppercase, so the MySQL table name must be capitalized
2.
NULL for Row-key column
The reason for the error is that no MySQL column name is specified, so the column name must be specified, and the ID in the Hbase-row-key ID must be displayed in –columns. --columns Id,goods_name, Goods_price
3.
for Import Unrecognized argument
The reason for the error is that when specifying MySQL column names, I add a space when separated by commas, so the column names displayed after columns can only be separated by commas, not with spaces.
After you exclude the above three issues: My latest import command changes to the following:
Import -D sqoop.hbase.add.row.key=true --connect jdbc:mysql://--table Test_goods----hbase-create-table--hbase-table t_goods--column---hbase-row-key ID--where "ID > = 5 "-M 1
Note: Here's a little problem: remember to bring id>=5.
Execute the import command again: The following occurs (card for a long time)
The map execution is complete, but only stuck here, the MapReduce task has been in the background, a period of time after the death, during which the execution of the import command and kill MapReduce job
Hadoop job-list View the job list for MapReduce
Hadoop Job-kill job_id kills a job
After a long period of testing, suddenly realized that the current use of hbase pseudo-distributed, all of a sudden suddenly:
Cause: Because the current environment is a pseudo-distributed hbase, hbase data is present on the local disk,
And is managed by the zookeeper that comes with it. The principle of importing MySQL data into hbase is actually importing data into HDFs, so if you want to import it successfully, the data address for hbase should be on HDFs. So how to solve this problem, I think you know, that is to open hbase fully distributed.
After a meal toss the virtual machine back to hbase fully distributed snapshot, install the Sqoop, carry out the final test!
The final import command is as follows: (Full import command)
Import -D sqoop.hbase.add.row.key=true --connect jdbc:mysql://--table Test_goods----hbase-create-table--hbase-table t_goods--column---hbase-row-key ID--where "ID > = 5 "-M 1
Finally see the long-lost page:
View HBase, data has been successfully imported
Finally I will write the command to a XXX file, execute the import command via Sqoop–options-file XXX
The following error is spelled:
Import-D sqoop.hbase.add.row.key=true--connect jdbc:mysql://192.168.1.9:3306/ Spider--username root--password root--table Test_goods--columns Id,goods_name, Goods_price--hbase-create-table--hbase-table test_goods--column-Family CF-- hbase-row-Key ID--where "ID >= 5"-M 1
Error reason: The name of the parameter and the value of the parameter did not enter a line break
Correct wording:
Import-Dsqoop.hbase.add.row.key=true--connectjdbc:mysql:// 192.168.1.9:3306/spider--usernameroot--passwordroot--tabletest_goods-- Columnsid,goods_name,goods_price--hbase-create-table--hbase-tablett_goods--column- familycf--hbase-row-keyID--whereid>=5-m1
Note: parameter meaning explanation
-D sqoop.hbase.add.row.key=true whether to write Rowkey related fields to the column family, default to False, and by default you will not see any fields in the row key in the column family. Note that the parameter must be placed after import.
--connect Database Connection string
--username–password user name password for MySQL database
--table test_goods table name, note capitalization
--hbase-create-table If the table does not exist in HBase, create
--hbase-table the corresponding HBase table name
--hbase-row-key the Rowkey in the HBase table, note the format
--column-family The column family of the HBase table
--where import is the Where condition for MySQL tables, as in SQL
--split-by Create_time By default Sqoop uses 4 concurrent execution tasks, you need to develop a split column, and if you do not want to use concurrency, you can use parameters--m 1
To this, the bug resolved to complete!!!
Third, knowledge development, timing incremental Import
1. Sqoop Incremental Import
Import -D sqoop.hbase.add.row.key=true --connect jdbc:mysql://--table Test_goods----hbase-create-table--hbase-table t_goods--column---hbase-row-key ID--incremental LastModified--check---last-value ' 2017-06-27 '--split-by u_date
--incremental lastmodified Incremental Import supports two modes append incrementing columns; lastmodified timestamp.
--check-column columns referenced during incremental import
--last-value minimum value, which in this example represents the value of importing 2017-06-27 to today
2, Sqoop job:
Sqoop Job--create TESTJOB01--import --connect jdbc:mysql://--table test_goods-- --hbase-create-table--hbase-table t_goods--column---hbase-row-key id-m 1
Set timed execution of the above Sqoop job
Using the Linux timer: crontab-e
such as daily execution
0 0 * * */opt/local/sqoop-1.4.6/bin/sqoop job ....
--exec TESTJOB01
Copyright NOTICE: This article is Yunshuxueyuan original article.
If you want to reprint please indicate the source: Https://my.oschina.net/yunshuxueyuan/blog
QQ Technology Group: 299142667
Sqoop MySQL data into HBase's blood and tears