1. Install sqoop
Download sqoop-1.2.0.tar.gz (version 1.20 is compatible with Hadoop0.20)
Put the hadoop-core-0.20.2-cdh3u3.jar, hadoop-tools-0.20.2-cdh3u3.jar into the sqoop/lib directory, the two jar packages are out of cloudera company, you can go to its official website to download.
2. import data from mysql
Go to the sqoop extract directory (add the mysql-connector-java-5.1.17-bin.jar to the sqoop/lib directory)
Bin/sqoop import -- connect jdbc: mysql: // IP: PORT/DATABASE -- username USERNAME -- password PASSWORD -- table TABLENAME -- hive-import-m 1
Note:-m 1 indicates how many threads are started. If the database does not have a primary key, only one thread can be started.
3. import data from postgresql
Bin/sqoop import -- connect jdbc: postgresql: // IP: PORT/DATABASE -- username USERNAME -- password PASSWORD -- table TABLENAME -- hive-import-m 1
4. FAQs:
1. jvm memory overflow occurs when importing large tables from postgresql to hive. This is a bug in sqoop1.2 and earlier versions. It reads the entire data table record into the memory, leading to memory overflow, the solution is to change sqoop1.3 and later versions. This bug was fixed in version 1.3, but sqoop1.3 must depend on the hadoop version of chd3. If it was previously installed with pure hadoop version, it can only be changed to chd3 version.
2. When sqoop1.3 is used and hadoop is replaced with cdh3, a strange question occurs: classNotFoundException: sqoop org. apache. hadoop. security. userGroupInformation. login (xxx), this problem is caused by the hadoop version of CDH3, The UserGroupInformation class in his hadoop-core-0.20.2-cdh3u3.jar package does not have the login method, for this reason, I restored hadoop to the apache hadoop0.20.2 version, and then downloaded a sqoop1.4 version. Actually, sqoop1.4 can support apache hadoop, I knew that CDH's hadoop version was not required to be installed. (This problem has plagued me for a long time)