Sqoop usage and introduction

Source: Internet
Author: User
Tags hadoop ecosystem sqoop

The Sqoop tool connects to a relational database in a Hadoop environment and serves as a bridge to the hadoop storage system. It supports the import of multiple relational data sources and hive, hdfs, and hbase. Generally, relational data tables exist in the backup environment of the online environment. data needs to be imported every day. sqoop can import the entire table based on the amount of data per day, the full table can be imported if the data volume generated every day is not large, but sqoop also provides the incremental data import mechanism.

The following describes several common sqoop commands and some parameters:

 

Serial number

Command/command

Class

Description

1

Impor

ImportTool

Import data from a relational database (from a table or query statement) to HDFS

2

Export

ExportTool

Import data from HDFS to a relational database

3

Codegen

CodeGenTool

Obtain the data of a table in the database to generate Java and compress it into a jar package.

4

Create-hive-table

CreateHiveTableTool

Create Hive table

5

Eval

EvalSqlTool

View SQL Execution results

6

Import-all-tables

ImportAllTablesTool

Import all tables in a database to HDFS

7

Job

JobTool

 

8

List-databases

ListDatabasesTool

List all database names

9

List-tables

ListTablesTool

List all tables in a database

10

Merge

MergeTool

 

11

Metastore

MetastoreTool

 

12

Help

HelpTool

View help

13

Version

VersionTool

View version

 

Then list various Common parameters of Sqoop, and list their own parameters for the above 13 commands. Common Parameters of Sqoop are also divided into Common arguments.

Incrementalimport arguments

Outputline formatting arguments

Inputparsing arguments, Hive arguments

HBasearguments

GenericHadoop command-line arguments

 

1. Common arguments Common parameters, mainly for relational database connection Parameters

Serial number

Parameters

Description

Example

1

Connect

URL for connecting to a relational database

Jdbc: mysql: // localhost/sqoop_datas

2

Connection-manager

Connection Management class, generally not used

 

3

Driver

Connection driver

 

4

Hadoop-home

Hadoop directory

/Home/hadoop

5

Help

View help information

 

6

Password

Password used to connect to a relational database

 

7

Username

User name for connecting to a relational database

 

8

Verbose

To view more information, you can actually lower the log level.

This parameter is not followed by a value

 

Importcontrol arguments:

Argument

Description

-- Append

Append data to an existing dataset in HDFS

-- As-avrodatafile

Imports data to Avro Data Files

-- As-sequencefile

Imports data to SequenceFiles

-- As-textfile

Imports data as plain text (default)

-- Boundary-query <statement>

Boundary query to use for creating splits

-- Columns <col, col, col…>

Columns to import from table

-- Direct

Use direct import fast path

-- Direct-split-size <n>

Split the input stream everyNBytes when importing in direct mode

-- Inline-lob-limit <n>

Set the maximum size for an inline LOB

-M, -- num-mappers <n>

UseNMap tasks to import in parallel

-E, -- query <statement>

Import the resultsStatement.

-- Split-by <column-name>

Column of the table used to split work units

-- Table <table-name>

Table to read

-- Target-dir <dir>

HDFS destination dir

-- Warehouse-dir <dir>

HDFS parent for table destination

-- Where <where clause>

WHERE clause to use during import

-Z, -- compress

Enable compression

-- Compression-codec <c>

Use Hadoop codec (default gzip)

-- Null-string <null-string>

The string to be written for a null value for string columns

-- Null-non-string <null-string>

The string to be written for a null value for non-string columns

 

 

Incrementalimport arguments:

Argument

Description

-- Check-column (col)

Specifies the column to be examined when determining which rows to import.

-- Incremental (mode)

Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified.

-- Last-value (value)

Specifies the maximum value of the check column from the previous import.

 

 

Output lineformatting arguments:

Argument

Description

-- Enclosed-by <char>

Sets a required field enclosing character

-- Escaped-by <char>

Sets the escape character

-- Fields-terminated-by <char>

Sets the field separator character

-- Lines-terminated-by <char>

Sets the end-of-line character

-- Mysql-delimiters

Uses MySQL's default delimiter set: fields:, lines: \ n escaped-by: \ optionally-enclosed-:'

-- Optionally-enclosed-by <char>

Sets a field enclosing character

 

 

Hivearguments:

Argument

Description

-- Hive-home <dir>

Override $ HIVE_HOME

-- Hive-import

Import tables into Hive (Uses Hive's default delimiters if none are set .)

-- Hive-overwrite

Overwrite existing data in the Hive table.

-- Create-hive-table

If set, then the job will fail if the target hive

 

Table exits. By default this property is false.

-- Hive-table <table-name>

Sets the table name to use when importing to Hive.

-- Hive-drop-import-delims

Drops\ N,\ R, And\ 01From string fields when importing to Hive.

-- Hive-delims-replacement

Replace\ N,\ R, And\ 01From string fields with user defined string when importing to Hive.

-- Hive-partition-key

Name of a hive field to partition are sharded on

-- Hive-partition-value <v>

String-value that serves as partition key for this imported into hive in this job.

-- Map-column-hive <map>

Override default mapping from SQL type to Hive type for configured columns.

 

 

HBasearguments:

Argument

Description

-- Column-family <family>

Sets the target column family for the import

-- Hbase-create-table

If specified, create missing HBase tables

-- Hbase-row-key <col>

Specifies which input column to use as the row key

-- Hbase-table <table-name>

Specifies an HBase table to use as the target instead of HDFS

 

 

Codegeneration arguments:

Argument

Description

-- Bindir <dir>

Output directory for compiled objects

-- Class-name <name>

Sets the generated class name. This overrides -- package-name. When combined with -- jar-file, sets the input class.

-- Jar-file <file>

Disable code generation; use specified jar

-- Outdir <dir>

Output directory for generated code

-- Package-name <name>

Put auto-generated classes in this package

-- Map-column-java <m>

Override default mapping from SQL type to Java type for configured columns.

Sqoop details: click here
Sqoop: click here

Implement data import between Mysql, Oracle, and HDFS/Hbase through Sqoop

[Hadoop] Detailed description of Sqoop Installation Process

Use Sqoop to export data between MySQL and HDFS Systems

Hadoop Oozie learning notes Oozie does not support Sqoop Problem Solving

Hadoop ecosystem construction (hadoop hive hbase zookeeper oozie Sqoop)

Full history of Hadoop learning-use Sqoop to import MySQL Data to Hive

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.