What is Sqoop?
Sqoop is a tool used to migrate data from Hadoop and RDBMS (MySQL, Oracle, Postgres) to each other, and he can import RDBMS data into HDFs, or it can export HDFS data into an RDBMS.
sqoop principle?
One of the highlights of Sqoop is the ability to import data from an RDBMS into the hdfs/data from the Hadoop by using MapReduce to export the data from HDFs to the RDBMS. The Sqoop architecture is simple, with the integration of Hive, HBase, and Oozie,
Data is transferred through the Map-reduce task to provide concurrency and fault tolerance. Sqoop is primarily interactive through JDBC and relational databases. Database that supports JDBC theoretically can use Sqoop and HDFs for data interaction.
However, only a small part of the Sqoop is officially tested, as follows:
Database version--direct support connect string matches
HSQLDB 1.8.0+ No jdbc:hsqldb:*//
MySQL 5.0+ Yes jdbc:mysql://
Oracle 10.2.0+ No jdbc:oracle:*//
PostgreSQL 8.3+ Yes (import only) jdbc:postgresql://
Older versions may also be supported, but not tested. For performance reasons, Sqoop provides a mechanism for fast access to data that is different from JDBC and can be used by--direct.
sqoop Work Flow
1. Read the table structure to import the data, generate the run class, default is QueryResult, hit the jar package, and then submit it to Hadoop
2. Set up the job, the main is to set the various parameters
3. This is where Hadoop executes the MapReduce to execute the import command,
1) The first thing to do is to slice the data, i.e. Datasplit
Datadrivendbinputformat.getsplits (Jobcontext Job)
2) After splitting the range, write the range to read
Datadrivendbinputformat.write (DataOutput output) Here is Lowerboundquery and upperboundquery
3) Read the range of the above 2) write
Datadrivendbinputformat.readfields (Datainput input)
4) then create Recordreader to read the data from the database
Datadrivendbinputformat.createrecordreader (inputsplit split,taskattemptcontext context)
5) Create a map
Textimportmapper.setup (Context context)
6) Recordreader a row from the relational database to read the data, set the map key and value, to the map
Dbrecordreader.nextkeyvalue ()
7) Run Map
Textimportmapper.map (longwritable Key, Sqooprecord Val, context context)
The last key generated is the row data, generated by QueryResult, and value is Nullwritable.get ()
Introduction to Data Migration Tool Sqoop