Import data from HDFs to relational database with Sqoop

Source: Internet
Author: User
Tags sqoop

Because of the needs of the work, need to transfer the data in HDFs to the relational database to become the corresponding table, on the Internet to find the relevant data for a long, found that different statements, the following is my own test process:

To use Sqoop to achieve this need, first understand what Sqoop is.

Sqoop is a tool used to transfer data from Hadoop and relational databases to each other in a relational database (e.g. MySQL, Oracle, Postgres, etc.) into the HDFs of Hadoop. HDFs data can also be directed into a relational database.


First, you need to prepare the following:

First: The Namenode node of Hadoop has the corresponding database-driven jar package and Sqoop jar package in the Lib folder.

Second: Create a table in the appropriate database beforehand, note: The data format on one of the HDFS directories is the same as the number of fields in the corresponding table.


Because I'm using an Oracle database and I'm using Java to manipulate it. So the following code and screenshots are examples of Java:

First standardize the file format in HDFs, as shown in the following figure:



The Java code is as follows:

Configuration conf = new configuration ();
Conf.set ("Fs.default.name", "hdfs://192.168.115.5:9000");
Conf.set ("Hadoop.job.ugi", "Hadooper,hadoopgroup");
Conf.set ("Mapred.job.tracker", "192.168.115.5:9001");


arraylist<string> list = new arraylist<string> (); Define a list
List.add ("--table");
List.add ("A_baat_client"); A table in Oracle. Future data will be imported into this table.
List.add ("--export-dir");
List.add ("/home/hadoop/traffic/capuse/near7date/activeuser/capuse_near7_activeuser_2013-02-06.log"); The directory on the HDFs. The data in this directory is to be imported into the A_baat_client table.
List.add ("--connect");
List.add ("Jdbc:oracle:thin:@10.18.96.107:1521:life"); Links to Oracle
List.add ("--username");
List.add ("Traffic"); User name for Oracle
List.add ("--password");
List.add ("Traffic"); Oracle's password
List.add ("--input-fields-terminated-by");
List.add ("|"); Data separator Symbol
List.add ("-M");
List.add ("1");//defines the number of MapReduce.


string[] arg = new string[1];
Exporttool exporter = new Exporttool ();
Sqoop Sqoop = new Sqoop (exporter);
sqoop.setconf (conf);
arg = List.toarray (new string[0]);
int result = Sqoop.runsqoop (Sqoop, ARG);
System.out.println ("Res:" + result); Print execution results.


Finally, run in the main method, and the resulting table data is shown in the following figure:


Through the above operation and code can be implemented in Java to the HDFS data generated corresponding table data;

However, in addition to being able to implement it in Java, it is possible to use basic commands, as follows:

In the Hadoop bin directory:

sqoop export--connect jdbc:oracle:thin:@10.18.96.107:1521:life \

--table a_baat_client--username traffic--password traffic \
--input-fields-terminated-by ' | ' \
--export-dir/home/hadoop/traffic/capuse/near7date/activeuser/test.log-m 1

The meaning is the same as the code in Java above.


Note:

1, the database table name, user name, password use uppercase (this may cause problems, because I in the test process, the use of lowercase errors, there is no columns this classic error. So it is recommended to capitalize, of course this is not required);

2, pre-built the corresponding table;


Well, the above code is actually very simple, but if it is never contacted, then in the process of doing will find a lot of problems, and the information on the Internet is very complex, this person in this article is to make a memo for themselves, but also want to give the need of a friend a help. Of course, there may be a lot of problems, look at the master treatise ...


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.