Use Sqoop to import data to Hive

Source: Internet
Author: User
Tags sqoop

1. Install sqoop

Download sqoop-1.2.0.tar.gz (version 1.20 is compatible with Hadoop0.20)

Put the hadoop-core-0.20.2-cdh3u3.jar, hadoop-tools-0.20.2-cdh3u3.jar into the sqoop/lib directory, the two jar packages are out of cloudera company, you can go to its official website to download.

2. import data from mysql

Go to the sqoop extract directory (add the mysql-connector-java-5.1.17-bin.jar to the sqoop/lib directory)

Bin/sqoop import -- connect jdbc: mysql: // IP: PORT/DATABASE -- username USERNAME -- password PASSWORD -- table TABLENAME -- hive-import-m 1

Note:-m 1 indicates how many threads are started. If the database does not have a primary key, only one thread can be started.

3. import data from postgresql

Bin/sqoop import -- connect jdbc: postgresql: // IP: PORT/DATABASE -- username USERNAME -- password PASSWORD -- table TABLENAME -- hive-import-m 1

4. FAQs:

1. jvm memory overflow occurs when importing large tables from postgresql to hive. This is a bug in sqoop1.2 and earlier versions. It reads the entire data table record into the memory, leading to memory overflow, the solution is to change sqoop1.3 and later versions. This bug was fixed in version 1.3, but sqoop1.3 must depend on the hadoop version of chd3. If it was previously installed with pure hadoop version, it can only be changed to chd3 version.

2. When sqoop1.3 is used and hadoop is replaced with cdh3, a strange question occurs: classNotFoundException: sqoop org. apache. hadoop. security. userGroupInformation. login (xxx), this problem is caused by the hadoop version of CDH3, The UserGroupInformation class in his hadoop-core-0.20.2-cdh3u3.jar package does not have the login method, for this reason, I restored hadoop to the apache hadoop0.20.2 version, and then downloaded a sqoop1.4 version. Actually, sqoop1.4 can support apache hadoop, I knew that CDH's hadoop version was not required to be installed. (This problem has plagued me for a long time)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.