Import data from HDFs to relational database with Sqoop

Last Update:2018-07-24 Source: Internet

Author: User

Tags sqoop

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Because of the needs of the work, need to transfer the data in HDFs to the relational database to become the corresponding table, on the Internet to find the relevant data for a long, found that different statements, the following is my own test process:

To use Sqoop to achieve this need, first understand what Sqoop is.

Sqoop is a tool used to transfer data from Hadoop and relational databases to each other in a relational database (e.g. MySQL, Oracle, Postgres, etc.) into the HDFs of Hadoop. HDFs data can also be directed into a relational database.

First, you need to prepare the following:

First: The Namenode node of Hadoop has the corresponding database-driven jar package and Sqoop jar package in the Lib folder.

Second: Create a table in the appropriate database beforehand, note: The data format on one of the HDFS directories is the same as the number of fields in the corresponding table.

Because I'm using an Oracle database and I'm using Java to manipulate it. So the following code and screenshots are examples of Java:

First standardize the file format in HDFs, as shown in the following figure:

The Java code is as follows:

Configuration conf = new configuration ();
Conf.set ("Fs.default.name", "hdfs://192.168.115.5:9000");
Conf.set ("Hadoop.job.ugi", "Hadooper,hadoopgroup");
Conf.set ("Mapred.job.tracker", "192.168.115.5:9001");

arraylist<string> list = new arraylist<string> (); Define a list
List.add ("--table");
List.add ("A_baat_client"); A table in Oracle. Future data will be imported into this table.
List.add ("--export-dir");
List.add ("/home/hadoop/traffic/capuse/near7date/activeuser/capuse_near7_activeuser_2013-02-06.log"); The directory on the HDFs. The data in this directory is to be imported into the A_baat_client table.
List.add ("--connect");
List.add ("Jdbc:oracle:thin:@10.18.96.107:1521:life"); Links to Oracle
List.add ("--username");
List.add ("Traffic"); User name for Oracle
List.add ("--password");
List.add ("Traffic"); Oracle's password
List.add ("--input-fields-terminated-by");
List.add ("|"); Data separator Symbol
List.add ("-M");
List.add ("1");//defines the number of MapReduce.

string[] arg = new string[1];
Exporttool exporter = new Exporttool ();
Sqoop Sqoop = new Sqoop (exporter);
sqoop.setconf (conf);
arg = List.toarray (new string[0]);
int result = Sqoop.runsqoop (Sqoop, ARG);
System.out.println ("Res:" + result); Print execution results.

Finally, run in the main method, and the resulting table data is shown in the following figure:

Through the above operation and code can be implemented in Java to the HDFS data generated corresponding table data;

However, in addition to being able to implement it in Java, it is possible to use basic commands, as follows:

In the Hadoop bin directory:

sqoop export--connect jdbc:oracle:thin:@10.18.96.107:1521:life \

--table a_baat_client--username traffic--password traffic \
--input-fields-terminated-by ' | ' \
--export-dir/home/hadoop/traffic/capuse/near7date/activeuser/test.log-m 1

The meaning is the same as the code in Java above.

Note:

1, the database table name, user name, password use uppercase (this may cause problems, because I in the test process, the use of lowercase errors, there is no columns this classic error. So it is recommended to capitalize, of course this is not required);

2, pre-built the corresponding table;

Well, the above code is actually very simple, but if it is never contacted, then in the process of doing will find a lot of problems, and the information on the Internet is very complex, this person in this article is to make a memo for themselves, but also want to give the need of a friend a help. Of course, there may be a lot of problems, look at the master treatise ...

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More