Objective
This article is primarily a summary of the pits that were encountered when importing data from MySQL to hive with Sqoop.
Environment:
- System: Centos 6.5
- hadoop:apache,2.7.3
- mysql:5.1.73
- jdk:1.8
- sqoop:1.4.7
Hadoop runs in pseudo-distributed mode.
One, the import command used
I mainly refer to an article to test, Sqoop:import Data from MySQL to Hive.
Referring to the method inside, the table was built in MySQL, populated with data, and then entered the command according to its own situation:
Sqoop import--connect jdbc:mysql://localhost:3306/test--username root-p--split-by ID--columns id,name--table custome R "," --hive-import--create-hive-table--hive-table sqoop_workspace.customers
Then began a trip to the gopher.
Second, the problems encountered and solutions 1. Issues with text fields for partitioning
Error message: "Generating splits for a textual index column allowed only in case of"-dorg.apache.sqoop.splitter.allow_text_splitte R=true "Property passed as a parameter".
The main problem is that the "--split-by ID" parameter specifies that the ID is a text format, so you need to add the option "-dorg.apache.sqoop.splitter.allow_text_splitter=true" to the command:
" -dorg.apache.sqoop.splitter.allow_text_splitter=true " --connect jdbc:mysql://localhost:3306/test--username root-p--split-by ID--columns id,name--table customer "," --hive-import--create-hive-table--hive-table sqoop_workspace.customers
2. Hadoop history server Hadoop jobhistory not open issues
Error message: "Error tool. Importtool:import Failed:java.io.IOException:java.net.ConnectException:Call from harry.com/192.168.0.210 to 0.0.0.0:10020 failed on connection exception: ... ".
The main reason is that Sqoop will need to log these job information with Hadoop Jobhistory after running the MapReduce task and store it in the specified HDFs directory, which is not started by default and will need to start the service manually after the configuration is complete.
Workaround: Mapred-site.xml Add the following configuration:
<property>
<name>mapreduce.jobhistory.address</name>
<value>0.0.0.0:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>0.0.0.0:19888</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/history/done</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/history/done_intermediate</value>
</property>
Restart HDFs and yarn:
stop-dfs.sh
stop-yarn.sh
start-dfs.sh
start-yarn.sh
To start the history server:
$HADOOP _home/sbin/mr-jobhistory-daemon.sh start Historyserver
If you need to stop later, you can use the command:
$HADOOP _home/sbin/mr-jobhistory-daemon.sh Stop Historyserver
Then execute the command again.
3. Connection Metadata Storage Database error
Error message: "caused By:javax.jdo.JDOFatalDataStoreException:Unable to open a test connection to the given database. JDBC URL ... ".
Mainly because I used the hive metadata database as the default Derby, I opened a hive CLI in another session to access hive, and one drawback of derby is that when multiple users access hive at the same time, an error occurs.
Workaround: Exit the Hive CLI and rerun
4. Database not created in hive
Error message: "Error QL. Driver:FAILED:SemanticException [Error 10072]: Database does not exist:sqoop_workspace ... ", this is obvious, just create one directly in hive.
5. Other warnings and errors
Other errors do not actually hinder the import process, such as the following warn:
"WARN HDFs. Dfsclient:caught exception java.lang.InterruptedException ... "is actually Hadoop's own bug, specifically the bug in HDFs 9794:
When Dfsstripedoutputstream is turned off, the streamer thread is not closed if it fails to brush back the data to the data/parity block. This problem also exists in the Dfsoutputstream#closeimpl. Dfsoutputstream#closeimpl will always be forced to shut down threads, which can cause interruptedexception.
These errors are generally ignored.
Third, reference
1. Sqoop Import: "-dorg.apache.sqoop.splitter.allow_text_splitter=true"
2. Hadoop jobhistory
3. Already tried 0 time (s); Retry policy is Retryuptomaximumcountwithfixedsleep workaround
4. WARN HDFs. Dfsclient:caught Exception Java.lang.InterruptedException
Finish
Import data from MySQL to hive using Sqoop