International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > MySQL

Import data from MySQL to hive using Sqoop

Last Update:2018-05-18 Source: Internet

Author: User

Tags sqoop

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objective

This article is primarily a summary of the pits that were encountered when importing data from MySQL to hive with Sqoop.

Environment:

System: Centos 6.5
hadoop:apache,2.7.3
mysql:5.1.73
jdk:1.8
sqoop:1.4.7

Hadoop runs in pseudo-distributed mode.

One, the import command used

I mainly refer to an article to test, Sqoop:import Data from MySQL to Hive.

Referring to the method inside, the table was built in MySQL, populated with data, and then entered the command according to its own situation:

Sqoop import--connect jdbc:mysql://localhost:3306/test--username root-p--split-by ID--columns id,name--table custome R "," --hive-import--create-hive-table--hive-table sqoop_workspace.customers

Then began a trip to the gopher.

Second, the problems encountered and solutions 1. Issues with text fields for partitioning

Error message: "Generating splits for a textual index column allowed only in case of"-dorg.apache.sqoop.splitter.allow_text_splitte R=true "Property passed as a parameter".

The main problem is that the "--split-by ID" parameter specifies that the ID is a text format, so you need to add the option "-dorg.apache.sqoop.splitter.allow_text_splitter=true" to the command:



" -dorg.apache.sqoop.splitter.allow_text_splitter=true " --connect jdbc:mysql://localhost:3306/test--username root-p--split-by ID--columns id,name--table customer "," --hive-import--create-hive-table--hive-table sqoop_workspace.customers

2. Hadoop history server Hadoop jobhistory not open issues

Error message: "Error tool. Importtool:import Failed:java.io.IOException:java.net.ConnectException:Call from harry.com/192.168.0.210 to 0.0.0.0:10020 failed on connection exception: ... ".

The main reason is that Sqoop will need to log these job information with Hadoop Jobhistory after running the MapReduce task and store it in the specified HDFs directory, which is not started by default and will need to start the service manually after the configuration is complete.

Workaround: Mapred-site.xml Add the following configuration:





<property>
    <name>mapreduce.jobhistory.address</name>
    <value>0.0.0.0:10020</value>
</property>

<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>0.0.0.0:19888</value>
</property>

<property>
    <name>mapreduce.jobhistory.done-dir</name>
    <value>/history/done</value>
</property>

<property>
    <name>mapreduce.jobhistory.intermediate-done-dir</name>
    <value>/history/done_intermediate</value>
</property>

Restart HDFs and yarn:





stop-dfs.sh
stop-yarn.sh
start-dfs.sh
start-yarn.sh

To start the history server:



$HADOOP _home/sbin/mr-jobhistory-daemon.sh start Historyserver

If you need to stop later, you can use the command:



$HADOOP _home/sbin/mr-jobhistory-daemon.sh Stop Historyserver

Then execute the command again.

3. Connection Metadata Storage Database error

Error message: "caused By:javax.jdo.JDOFatalDataStoreException:Unable to open a test connection to the given database. JDBC URL ... ".

Mainly because I used the hive metadata database as the default Derby, I opened a hive CLI in another session to access hive, and one drawback of derby is that when multiple users access hive at the same time, an error occurs.

Workaround: Exit the Hive CLI and rerun

4. Database not created in hive

Error message: "Error QL. Driver:FAILED:SemanticException [Error 10072]: Database does not exist:sqoop_workspace ... ", this is obvious, just create one directly in hive.

5. Other warnings and errors

Other errors do not actually hinder the import process, such as the following warn:

"WARN HDFs. Dfsclient:caught exception java.lang.InterruptedException ... "is actually Hadoop's own bug, specifically the bug in HDFs 9794:

When Dfsstripedoutputstream is turned off, the streamer thread is not closed if it fails to brush back the data to the data/parity block. This problem also exists in the Dfsoutputstream#closeimpl. Dfsoutputstream#closeimpl will always be forced to shut down threads, which can cause interruptedexception.

These errors are generally ignored.

Third, reference

1. Sqoop Import: "-dorg.apache.sqoop.splitter.allow_text_splitter=true"

2. Hadoop jobhistory

3. Already tried 0 time (s); Retry policy is Retryuptomaximumcountwithfixedsleep workaround

4. WARN HDFs. Dfsclient:caught Exception Java.lang.InterruptedException

Finish

Import data from MySQL to hive using Sqoop

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

how to import data from excel to mysql using php import data from excel to access using vba how to import data from excel to mysql mysql import data from file how to fetch data from mysql using php how to retrieve data from mysql using php how to delete data from mysql database using php

MySQL batch update and batch update different values for mult... 01-13

The solution of no package Mysql-server available error when ... 05-28

The efficiency of MySQL nested query and connection query 11-16

The differences between int, bigint, smallint, and tinyint in... 10-17

MySQL row-level lock, table-level lock, page-level lock Detai... 12-17

MySQL Case statement (with instance) 04-01

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Import data from MySQL to hive using Sqoop

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support