Background of sqoop
Most enterprises that use hadoop technology to manage Big Data businesses have a large amount of data stored in traditional relational databases (RDBMS; due to lack of tool support, it is very difficult to transmit data between hadoop and traditional database systems. sqoop is a project for data transmission between RDBMS and hadoop;
Sqoop Overview
Sqoop is an import and export tool between hive/HDFS/hbase and relational databases.
Sqoop: SQL-to-hadoop
1) bridge between traditional relational databases and hadoop;
Import relational data into hadoop-related systems (such as hbase and hive;
Extract and export data from the hadoop system to a relational database;
2) use mapreduce to speed up data transmission;
3) Batch Processing Method for data transmission;
Why sqoop?
1) Efficient and controllable resource utilization: parallel Tasks
2) Data Type ing and conversion: automatic conversion and user-defined
3) supports multiple databases: MySQL, Oracle, and PostgreSQL
Sqoop data sources are commonly used in two ways.
1) text files, such as log files
2) relational databases
Sqoop-import: extracts data from a relational database to HDFS/hive/hbase
Sqoop-export: export data from HDFS to a relational database
We recommend that you use uppercase letters for the table name or field name used during import or export..
Note:The starting point of sqoop import and export operations is HDFS/hive/hbase, rather than relational databases.
Subsequent sqoop case operations use Oracle's EMP and dept tables as data sources