Hadoop Data Transfer Tool Sqoop

Source: Internet
Author: User
Tags join postgresql table definition sqoop
OverviewSqoop is an Apache top-level project that is used primarily to pass data in Hadoop and relational databases. With Sqoop, we can easily import data from a relational database into HDFs, or export data from HDFs to a relational database.
Sqoop Architecture:
The Sqoop architecture is simple enough to integrate hive, HBase, and Oozie to transmit data through map-reduce tasks, providing concurrency and fault tolerance.
Further development of Sqoop can be consulted in the following: A New Generation of Data Transfer Tools for Hadoop:sqoop 2

Sqoop is primarily interactive through JDBC and relational databases. Database that supports JDBC theoretically can use Sqoop and HDFs for data interaction.
However, only a small part of the Sqoop is officially tested, as follows:
Database version--direct support connect string matches
HSQLDB 1.8.0+ No jdbc:hsqldb:*//
MySQL 5.0+ Yes jdbc:mysql://
Oracle 10.2.0+ No jdbc:oracle:*//
PostgreSQL 8.3+ Yes (import only) jdbc:postgresql://
Older versions may also be supported, but not tested.
For performance reasons, Sqoop provides a mechanism for fast access to data that is different from JDBC and can be used by--direct.

The following are based on sqoop-1.4.3
installationSqoop installation can refer to http://www.54chen.com/java-ee/sqoop-mysql-to-hive.html, test work

ToolsSqoop contains a series of tools that run Sqoop help to see the relevant
$./sqoop Help
Usage:sqoop COMMAND [ARGS]
Available commands:
CodeGen Generate code to interact with database records
create-hive-table Import a table definition into hive
Eval Evaluate A SQL statement and display the results
Export export an HDFS directory to a database table
Help List available commands
Import Import a table from a database to HDFS
Import-all-tables Import tables from a database to HDFS
Job work with saved jobs
List-databases List available databases on a server
List-tables List available tables in a database
Merge merge Results of incremental imports
Metastore Run a standalone Sqoop metastore
Version Display version information
See "Sqoop help command" For information on a specific command.

Use the tool list-tables to view the table as follows:
$./sqoop List-tables--connect jdbc:mysql://127.0.0.1/test--username root--password 123456
A
T1

You can use CodeGen to generate code, but do not perform map-reduce, as follows:
$./sqoop CodeGen--connect jdbc:mysql://127.0.0.1/test--username root--password 123456--table A--
Class-name Zxm_sqoop
......
13/03/21 21:02:01 INFO Orm. compilationmanager:writing jar File:/tmp/sqoop-work/compile/29864e3980ab5630b699e8e1e2145369/zxm_sqoop.jar
Here the relevant code and Java package can be found in/tmp/sqoop-work/compile/29864e3980ab5630b699e8e1e2145369/

Import
Sqoop data import has the following characteristics:
1. Support for text files (--as-textfile), Avro (--as-avrodatafile), Sequencefiles (--as-sequencefile). Rcfile not supported, default to Text
2. Support data append, specify by--apend
3. Support table column selection (--column), support data selection (--where), and use with--table
4. Support data selection, such as data read into multi-table join ' SELECT a.*, b.* from a JOIN b on (a.id = = b.id) ', can not be used in conjunction with--table
5. Support Map number customization (-m)
6. Support Compression (--compress)
7. Support for importing data from relational databases into hive (--hive-import), HBase (--hbase-table)
Data import hive in three steps: 1) Import data into HDFs 2) hive build table 3) Load data into table using "Load Data inpaht"
Data import HBase Two part: 1) Import data to HDFs 2) Invoke HBase put operation row Data write table
*

Example:
MySQL File contents:
Mysql> select * from A;
+------+--------+
| Key1 | value1 |
+------+--------+
| 1 | A1 |
| 2 | A2 |
| 3 | A3 |
| 4 | A4 |
| 5 | A5 |
| 6 | A6 |
| 7 | A7 |
| 8 | A8 |
| 9 | A9 |
+------+--------+

Write file a.conf, content:
[HTML] View plain copy import--append-m 3--connect jdbc:mysql://127.0.0.1/test--username Root--password 123456--table

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.