Introduction to the Data Migration Tool Sqoop

Source: Internet
Author: User
Tags sqoop

What is Sqoop?
Sqoop is a tool used to migrate data from Hadoop and RDBMS (MySQL, Oracle, Postgres) to each other, and he can import RDBMS data into HDFs, or it can export HDFS data into an RDBMS.
sqoop principle?
One of the highlights of Sqoop is the ability to import data from an RDBMS into the hdfs/data from the Hadoop by using MapReduce to export the data from HDFs to the RDBMS. The Sqoop architecture is simple, with the integration of Hive, HBase, and Oozie,
Data is transferred through the Map-reduce task to provide concurrency and fault tolerance. Sqoop is primarily interactive through JDBC and relational databases. Database that supports JDBC theoretically can use Sqoop and HDFs for data interaction.
However, only a small part of the Sqoop is officially tested, as follows:
Database version--direct support connect string matches
HSQLDB 1.8.0+ No jdbc:hsqldb:*//
MySQL 5.0+ Yes jdbc:mysql://
Oracle 10.2.0+ No jdbc:oracle:*//
PostgreSQL 8.3+ Yes (import only) jdbc:postgresql://
Older versions may also be supported, but not tested. For performance reasons, Sqoop provides a mechanism for fast access to data that is different from JDBC and can be used by--direct.
sqoop Work Flow
1. Read the table structure to import the data, generate the run class, default is QueryResult, hit the jar package, and then submit it to Hadoop
2. Set up the job, the main is to set the various parameters
3. This is where Hadoop executes the MapReduce to execute the import command,
1) The first thing to do is to slice the data, i.e. Datasplit
Datadrivendbinputformat.getsplits (Jobcontext Job)
2) After splitting the range, write the range to read
Datadrivendbinputformat.write (DataOutput output) Here is Lowerboundquery and upperboundquery
3) Read the range of the above 2) write
Datadrivendbinputformat.readfields (Datainput input)
4) then create Recordreader to read the data from the database
Datadrivendbinputformat.createrecordreader (inputsplit split,taskattemptcontext context)
5) Create a map
Textimportmapper.setup (Context context)
6) Recordreader a row from the relational database to read the data, set the map key and value, to the map
Dbrecordreader.nextkeyvalue ()
7) Run Map
Textimportmapper.map (longwritable Key, Sqooprecord Val, context context)
The last key generated is the row data, generated by QueryResult, and value is Nullwritable.get ()

Introduction to Data Migration Tool Sqoop

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.