Overview
Sqoop is a top-level Apache project used to transmit data in hadoop and relational databases. Through sqoop, we can easily import data from a relational database to HDFS, or export data from HDFS to a relational database.
Sqoop architecture: the sqoop architecture is very simple. It integrates hive, hbase, and oozie to transmit data through map-reduce tasks, so as to provide concurrency features and fault tolerance.
For more information about sqoop, see a new generation of data transfer tools for hadoop: sqoop 2.
Sqoop interacts with relational databases through JDBC. Theoretically, all databases supporting JDBC can use sqoop and HDFS for data interaction.
However, only a small part has been officially tested by sqoop, as shown below: database version -- direct support connect string matches
HSQLDB 1.8.0 + no jdbc: HSQLDB :*//
MySQL 5.0 + Yes JDBC: mysql ://
Oracle 10.2.0 + no jdbc: oracle :*//
PostgreSQL 8.3 + Yes (import only) JDBC: PostgreSQL ://
Older versions may also be supported, but have not been tested.
For performance considerations, sqoop provides a fast data access mechanism different from JDBC and can be used through -- direct.
Sqoop-1.4.3-based
Install
For how to install and use sqoop, refer to http://www.54chen.com/java-ee/sqoop-mysql-to-hive.html to test work.
Tools
Sqoop contains a series of tools. You can run sqoop help to view related help,
$./Sqoop help
Usage: sqoop Command [ARGs]
Available commands:
Codegen generate code to interact with database records
Create-hive-Table import a table definition into hive
Eval evaluate a SQL statement and display the results
Export export an HDFS directory to a database table
Help list available commands
Import a table from a database to HDFS
Import-all-tables import tables from a database to HDFS
Job work with saved jobs
List-databases list available databases on a server
List-Tables List available tables in a database
Merge merge results of incremental imports
MetaStore run a standalone sqoop MetaStore
Version display version information
See 'sqoop help command' for information on a specific command.
Use the list-tables tool to View tables:
$./Sqoop list-tables -- connect JDBC: mysql: // 127.0.0.1/test -- username root -- password 123456
A
T1
You can use codegen to generate code without executing map-reduce, as shown below:
$./Sqoop codegen -- connect JDBC: mysql: // 127.0.0.1/test -- username root -- password 123456 -- table --
Class-name zxm_sqoop
......
13/03/21 21:02:01 info Orm. compilationmanager: Writing JAR file:/tmp/sqoop-work/compile/29864e3980ab5630b699e8e1e2145369/zxm_sqoop.jar
The related code and Java package can be found at/tmp/sqoop-work/compile/29864e3980ab5630b699e8e1e2145369 /.
Import
Sqoop data import has the following features:
1. Supports text files (-- As-textfile), Avro (-- As-avrodatafile), and sequencefiles (-- As-sequencefile ). Rcfile is not supported currently. The default value is text.
2. Data appending is supported and specified through -- apend
3. Supports table column selection (-- column), data selection (-- where), and -- table
4. supports data selection, for example, reading data after multi-table join 'select. *, B. * From a join B on (. id = B. ID) ', cannot be used together with -- table
5. Support map number customization (-m)
6. Supports compression (-- compress)
7. You can import data from relational databases to hive (-- hive-import) and hbase (-- hbase-table)
Data Import hive consists of three steps: 1) import data to HDFS 2) create a hive table 3) Use "load data inpaht" to load data to the table
Data Import hbase is divided into two parts: 1) import data to HDFS 2) Call hbase put to write data to tables row by row
* Example:
MySQL file content:
Mysql> select * From;
+ ------ + -------- +
| Key1 | value1 |
+ ------ + -------- +
| 1 | A1 |
| 2 | A2 |
| 3 | A3 |
| 4 | A4 |
| 5 | A5 |
| 6 | A6 |
| 7 | A7 |
| 8 | A8 |
| 9 | A9 |
+ ------ + -------- +
Compile the file a. conf with the following content:
import--append-m3--connect jdbc:mysql://127.0.0.1/test --username root --password 123456 --table a--target-dir/tmp/a--columnskey1--where'key1>3'
Run:
$./Sqoop -- options-File A. conf
View output:
$ Hadoop FS-ls/tmp//
Found 3 items
-RW-r -- 1 work supergroup 4 2013-03-21/tmp/A/part-m-00000
-RW-r -- 1 work supergroup 4 2013-03-21/tmp/A/part-m-00001
-RW-r -- 1 work supergroup 4 2013-03-21/tmp/A/part-m-00002
= "3 files correspond to 3 mapper
$ Hadoop FS-CAT/tmp//*
4
5
6
7
8
9
Export
Sqoop export can export files on HDFS to relational databases. The principle is to read and parse data based on the user-specified delimiter (field separator: -- fields-terminated-by), and then convert the data into an insert/update statement to import data to the relational database.
It has the following features:
1. You can export data to a table or call a stored procedure)
2. The insert and update modes are supported.
3. Support concurrency control (-m)
Instance:
$ Hadoop FS-CAT/tmp/B /*
1,
2, B
3, C
$./Sqoop export -- connect JDBC: mysql: // 127.0.0.1/test -- table B-username root-Password 123456 -- ex
Port-DIR/tmp/B
Mysql> select * from B;
+ ------ + -------- +
| Key1 | value1 |
+ ------ + -------- +
| 1 | A |
| 2 | B |
| 3 | c |
+ ------ + -------- +
Out of the tools mentioned above, sqoop also provides some interesting tools, such as sqoop job. If you are interested, you can study
Others:
1. By using map-Reduce, sqoop provides good concurrency and fault tolerance, and can be used as a synchronization tool for heterogeneous databases.
2. Although sqoop supports hive and hbase, It is incomplete. In some scenarios, processing after data transmission is inevitable.
3. Big Data Transmission, maybe -- direct
Reference:
Apache sqoop
Sqoop User Guide
Apache sqoop: a data transfer tool for hadoop
A new generation of data transfer tools for hadoop: sqoop 2
Use sqoop
Comparison between datax and sqoop in heterogeneous Big Data Environments